Using Unsloth with Sympozium¶
Sympozium supports Unsloth as an LLM provider. Unsloth
is primarily a fine-tuning library, but its Run Tutorials
walk you through serving fine-tuned (or stock) models over an OpenAI-compatible
HTTP API via llama.cpp's llama-server or vLLM. Sympozium treats Unsloth
exactly like any other OpenAI-compatible endpoint, so you can point an instance
at a locally-running Unsloth model and drive it with skills, channels, and
schedules like any cloud-backed agent.
Prerequisites¶
- A running Kubernetes cluster (Kind, minikube, etc.)
- Sympozium installed (
sympozium install) - Unsloth installed on your host machine (see the Unsloth install docs)
- A model exported to GGUF (for
llama.cpp) or served directly (for vLLM)
Starting the Unsloth server¶
Unsloth itself is a training library — it does not ship its own serve endpoint. Follow one of Unsloth's run tutorials (e.g. Run Gemma 3) to serve a model over HTTP. Two common paths:
Option A — llama.cpp llama-server (GGUF)¶
After exporting your model to GGUF with Unsloth:
This exposes an OpenAI-compatible API at http://localhost:8080/v1.
Option B — vLLM¶
This exposes an OpenAI-compatible API at http://localhost:8000/v1.
Bind to
0.0.0.0: Agent pods cannot reach127.0.0.1on the host — always bind the server to0.0.0.0(or explicitly to the host gateway IP).
Finding the host gateway IP¶
Kind:
minikube:
The base URL¶
Verify reachability:
Creating a SympoziumInstance¶
Unsloth-served models do not require an API key, but authRefs is mandatory —
create a Secret with a placeholder value.
apiVersion: sympozium.ai/v1alpha1
kind: SympoziumInstance
metadata:
name: unsloth-agent
spec:
agents:
default:
model: unsloth/gemma-3-12b-it
baseURL: "http://172.18.0.1:8080/v1"
authRefs:
- provider: unsloth
secret: unsloth-key
skills:
- skillPackRef: k8s-ops
policyRef: default-policy
Note: The
modelfield should match the ID reported by/v1/models— forllama-serverthis is usually the GGUF filename or the alias you passed via--alias; for vLLM it is the HuggingFace repo ID you loaded.
Running an AgentRun¶
apiVersion: sympozium.ai/v1alpha1
kind: AgentRun
metadata:
name: unsloth-test
spec:
instanceRef: unsloth-agent
task: "List all pods across every namespace and summarise their status."
model:
provider: unsloth
model: unsloth/gemma-3-12b-it
baseURL: "http://172.18.0.1:8080/v1"
authSecretRef: unsloth-key
skills:
- k8s-ops
timeout: "5m"
The phase transitions: Pending → Running → Succeeded (or Failed).
Because Unsloth runs locally, Sympozium applies local-provider timeouts automatically (5 min per request, 30 min per run, 2 retries).
Network policies¶
The default Sympozium network policies do not open egress on 8080 or 8000. You need to add an egress rule for whichever port your Unsloth server listens on.
Add to both sympozium-agent-allow-egress and
sympozium-agent-server-allow-egress in config/network/policies.yaml:
# Allow Unsloth via llama-server (port 8080) or vLLM (port 8000)
- to: []
ports:
- protocol: TCP
port: 8080
- protocol: TCP
port: 8000
Apply:
Sandbox note: Pods with
sympozium.ai/sandbox: "true"use thesympozium-sandbox-restrictedpolicy that only allows DNS and localhost IPC. Sandboxed agents cannot reach Unsloth directly.
Node discovery¶
Sympozium's node-probe DaemonSet already probes port 8080 under the
llama-cpp target name and port 8000 under the vllm target — both of
which will detect an Unsloth-served model running on those ports. The
discovered models appear under the corresponding provider annotation on the
node. There is intentionally no separate unsloth node-probe target to avoid
port conflicts with those existing targets.
Using with PersonaPacks¶
apiVersion: sympozium.ai/v1alpha1
kind: PersonaPack
metadata:
name: my-team
spec:
baseURL: "http://172.18.0.1:8080/v1"
authRefs:
- provider: unsloth
secret: unsloth-key
personas:
- name: assistant
displayName: "Unsloth Assistant"
systemPrompt: |
You are a helpful assistant running on a locally-served Unsloth model.
skills:
- k8s-ops
schedule:
type: heartbeat
interval: "1h"
task: "Check cluster health."
Unsloth vs LM Studio vs Ollama¶
| Feature | Unsloth | LM Studio | Ollama |
|---|---|---|---|
| Primary role | Fine-tuning + serve via llama.cpp/vLLM | GUI model server | CLI model server |
| GUI | None (Python / Jupyter) | Full desktop app | CLI-first |
| Default port | 8080 (llama-server) or 8000 (vLLM) | 1234 | 11434 |
| Model format | GGUF / HF / vLLM | GGUF | Ollama-native |
| Tool calling | Depends on serve layer (--jinja for llama-server) |
Supported (model dependent) | Supported (model dependent) |
| In-cluster deployment | Custom (requires packaging) | Not supported | Supported |
| Strengths | Fast fine-tuning of your own LoRA, then serve | Easy model browsing | In-cluster + auto-discovery |
Use Unsloth when you've fine-tuned a model with Unsloth and want to run Sympozium agents against that exact model.
Troubleshooting¶
Agent pod fails to connect¶
Symptom: AgentRun fails with connection refused or timeout.
If this fails, ensure your Unsloth serve process is running and bound to
0.0.0.0.
Tool calls never arrive¶
Symptom: Agent chats but never invokes skills.
Make sure llama-server was started with --jinja (for Gemma/Qwen/Llama3
tool-calling templates). Without this flag, tool-call JSON is emitted as
plain text and never parsed into structured tool_calls.
Slow responses¶
- Use a smaller quant (Q4_K_M instead of Q8_0)
- Increase AgentRun timeout:
timeout: "15m" - Verify GPU offload (
--n-gpu-layersfor llama-server)