Using Ollama with Sympozium¶
Sympozium has first-class support for Ollama as an LLM
provider. Ollama exposes an OpenAI-compatible API, so Sympozium treats it as an
OpenAI-compatible endpoint with the provider set to ollama. This lets you run
agents entirely on local hardware with no cloud API keys required.
Sympozium supports two inference topologies:
- Node-based — Ollama installed directly on host nodes, discovered automatically by the node-probe DaemonSet
- Workload-based — Ollama deployed as a Kubernetes Deployment/Service, reached via in-cluster DNS
Prerequisites¶
- A running Kubernetes cluster (Kind, minikube, etc.)
- Sympozium installed (
sympozium install) - Ollama installed on your host machine (or deployed in-cluster)
- At least one model pulled (
ollama pull llama3)
Configuring Ollama for Kubernetes access¶
By default Ollama binds to 127.0.0.1:11434, which means only processes on
the same machine can reach it. Kubernetes pods need to connect over the network,
so you must change the bind address to 0.0.0.0.
systemd (Linux)¶
Create an override file:
Add the following:
Then reload and restart:
macOS / manual¶
Export the variable before starting Ollama:
Finding the host gateway IP¶
Agent pods running inside the cluster need a routable IP to reach the host. The method depends on your cluster type.
Kind:
This typically returns something like 172.18.0.1.
minikube:
Cloud clusters: Ollama must be reachable from the cluster network. Use the machine's private IP or deploy Ollama in-cluster (see below).
The base URL¶
Once you have the host gateway IP, the base URL for Ollama is:
For example: http://172.18.0.1:11434/v1
Creating a SympoziumInstance¶
Create a SympoziumInstance that points at your Ollama server. Ollama does not
require an API key, but the authRefs field is mandatory -- create a Secret
with a placeholder value.
apiVersion: sympozium.ai/v1alpha1
kind: SympoziumInstance
metadata:
name: ollama-agent
spec:
agents:
default:
model: llama3
baseURL: "http://172.18.0.1:11434/v1"
authRefs:
- provider: ollama
secret: ollama-key
skills:
- skillPackRef: k8s-ops
policyRef: default-policy
Note: If you omit
baseURLand set the provider toollama, the agent-runner defaults tohttp://ollama.default.svc:11434/v1. This is useful when Ollama is deployed in-cluster in thedefaultnamespace.
Running an AgentRun¶
You can run a one-off task against your Ollama instance:
apiVersion: sympozium.ai/v1alpha1
kind: AgentRun
metadata:
name: ollama-test
spec:
instanceRef: ollama-agent
task: "List all pods across every namespace and summarise their status."
model:
provider: ollama
model: llama3
baseURL: "http://172.18.0.1:11434/v1"
authSecretRef: ollama-key
skills:
- k8s-ops
timeout: "5m"
The phase transitions: Pending -> Running -> Succeeded (or Failed).
Serving mode (web endpoint)¶
You can expose an Ollama-backed agent as an HTTP API using the web-endpoint
skill. This deploys a long-lived web-proxy sidecar with OpenAI-compatible
chat completions and MCP protocol support.
apiVersion: sympozium.ai/v1alpha1
kind: SympoziumInstance
metadata:
name: ollama-server
spec:
agents:
default:
model: llama3
baseURL: "http://172.18.0.1:11434/v1"
authRefs:
- provider: ollama
secret: ollama-key
skills:
- skillPackRef: k8s-ops
- skillPackRef: web-endpoint
policyRef: default-policy
Once the instance is running, the web-proxy creates a Service and (optionally) an HTTPRoute. See Web Endpoint Skill for full details on authentication, rate limiting, and routing.
Network policies¶
The default network policies in config/network/policies.yaml already include
egress rules for port 11434 (Ollama's default port). This applies to both
task-mode (sympozium-agent-allow-egress) and server-mode
(sympozium-agent-server-allow-egress) agent pods:
# Allow Ollama / local LLM provider (default port 11434)
- to: []
ports:
- protocol: TCP
port: 11434
No additional network policy configuration is needed for Ollama. If you run
Ollama on a non-standard port, add a matching egress rule to both the
sympozium-agent-allow-egress and sympozium-agent-server-allow-egress
policies.
Sandbox note: Pods with
sympozium.ai/sandbox: "true"use thesympozium-sandbox-restrictedpolicy that only allows DNS and localhost IPC. Sandboxed agents cannot reach Ollama directly.
Automatic discovery with node-probe DaemonSet¶
If Ollama is installed directly on your cluster nodes (e.g. via systemd), the node-probe DaemonSet can discover it automatically. The probe runs on every node, checks localhost ports for inference providers, and annotates nodes with what it finds.
Enable the node-probe¶
Add the following to your Helm values:
Or install with:
The DaemonSet probes these providers by default:
| Provider | Port | Health endpoint |
|---|---|---|
| Ollama | 11434 | /api/tags |
| vLLM | 8000 | /health |
| llama-cpp | 8080 | /health |
Custom targets can be added via nodeProbe.config.targets in values.yaml.
How it works¶
- The DaemonSet runs a lightweight probe on each node (with
hostNetwork: true) - Every 30 seconds, it probes configured localhost ports
- On success, it annotates the node:
sympozium.ai/inference-ollama=11434— portsympozium.ai/inference-models-ollama=llama3,mistral— discovered modelssympozium.ai/inference-healthy=true— at least one provider is up- The API server reads these annotations via
GET /api/v1/providers/nodes - The web wizard shows discovered nodes and lets you select one
Node pinning¶
When you select a node in the wizard, Sympozium sets two things on the instance:
baseURL—http://<nodeIP>:<port>/v1(auto-populated from the node's InternalIP)nodeSelector—{"kubernetes.io/hostname": "<nodeName>"}(pins agent pods to that node)
You can also set this manually in a CRD:
apiVersion: sympozium.ai/v1alpha1
kind: SympoziumInstance
metadata:
name: ollama-agent
spec:
agents:
default:
model: llama3
baseURL: "http://10.0.1.5:11434/v1"
nodeSelector:
kubernetes.io/hostname: gpu-node-1
authRefs:
- provider: ollama
secret: ollama-key
Manual node annotation¶
If you prefer not to deploy the DaemonSet, you can annotate nodes manually:
kubectl annotate node gpu-node-1 \
sympozium.ai/inference-ollama=11434 \
sympozium.ai/inference-models-ollama=llama3,mistral \
sympozium.ai/inference-healthy=true
The API server and web wizard will discover these annotations identically.
Cleanup¶
The DaemonSet removes its annotations on graceful shutdown (SIGTERM). If you disable the node-probe, annotations are cleaned up automatically.
In-cluster Ollama¶
For production setups or GPU-equipped clusters, deploy Ollama as a Kubernetes Deployment:
apiVersion: apps/v1
kind: Deployment
metadata:
name: ollama
namespace: ollama
spec:
replicas: 1
selector:
matchLabels:
app: ollama
template:
metadata:
labels:
app: ollama
spec:
containers:
- name: ollama
image: ollama/ollama:latest
ports:
- containerPort: 11434
resources:
limits:
nvidia.com/gpu: "1" # remove if CPU-only
---
apiVersion: v1
kind: Service
metadata:
name: ollama
namespace: ollama
spec:
selector:
app: ollama
ports:
- port: 11434
targetPort: 11434
The base URL becomes the in-cluster DNS name:
Update your SympoziumInstance accordingly:
If you deploy Ollama in the default namespace with a Service named ollama,
you can omit the baseURL entirely -- the agent-runner defaults to
http://ollama.default.svc:11434/v1 when the provider is ollama.
Supported models¶
Any model available through Ollama works with Sympozium. Pull a model with
ollama pull <model> and reference it by name in the model field.
Popular choices:
| Model | Parameters | Tool calling | Notes |
|---|---|---|---|
llama3 |
8B | Yes | Good general-purpose model |
llama3:70b |
70B | Yes | Higher quality, needs more RAM/VRAM |
mistral |
7B | Yes | Fast, good for simple tasks |
qwen2.5 |
7B | Yes | Strong tool-calling support |
deepseek-r1 |
7B | No | Reasoning model, no tool use |
codellama |
7B | No | Code-focused, no tool calling |
Tool calling: Sympozium agents rely on tool calling to execute commands, read files, and interact with the cluster. Models without tool-calling support can still answer questions but cannot use skills or execute actions.
Troubleshooting¶
Agent pod fails to connect to Ollama¶
Symptom: AgentRun fails with a connection refused or timeout error.
Check the bind address:
# Verify Ollama is listening on all interfaces
ss -tlnp | grep 11434
# Should show 0.0.0.0:11434, not 127.0.0.1:11434
Check the host gateway IP:
# From inside a cluster pod
kubectl run -it --rm debug --image=busybox -- wget -qO- http://172.18.0.1:11434/v1/models
Model not found¶
Symptom: model 'xyz' not found error in agent logs.
Make sure the model is pulled on the Ollama host:
The model name in your SympoziumInstance must match exactly (e.g. llama3,
not meta-llama/llama3).
Network policy blocking traffic¶
Symptom: Agent pods timeout but Ollama is reachable from the host.
Verify the network policies are applied:
Ensure the sympozium-agent-allow-egress policy includes port 11434. If you
customised the policies, re-apply the defaults:
Slow responses with large models¶
Ollama runs inference on your hardware. If responses are slow or time out:
- Use a smaller model (e.g.
llama38B instead of 70B) - Increase the AgentRun timeout:
timeout: "15m" - Ensure Ollama has GPU access (
nvidia-smito verify) - For in-cluster deployments, request GPU resources in the Deployment spec