LLMFit — Hardware Fitness & Model Placement¶
Sympozium integrates llmfit (v0.9.24) in two ways:
- DaemonSet — runs on every node, continuously reports hardware specs and model density scores. Powers instant model placement, the Model Density UI, and Prometheus metrics. Deployed by default.
- SkillPack sidecar — gives agents interactive access to llmfit's MCP tools and cluster probe scripts for ad-hoc queries.
DaemonSet (always-on density telemetry)¶
What it does¶
The sympozium-llmfit-daemon DaemonSet runs llmfit serve on every node, exposing a REST API on port 8787. The controller and API server poll each pod every 60 seconds to build a cluster-wide FitnessCache containing:
- Per-node hardware specs (RAM, CPU, GPU, VRAM, backend)
- Model density scores (which models fit on which nodes, at what quality)
- Installed runtimes (Ollama, vLLM, llama.cpp, etc.)
Instant model placement¶
When a Model CR has placement.mode: auto, the controller checks the FitnessCache first. If fresh data exists, placement is instant (milliseconds). If the cache is empty — DaemonSet not deployed or still warming up — it falls back to the original probe-pod approach (~3 minutes).
Helm configuration¶
llmfit:
daemonset:
enabled: true # Deployed by default with Sympozium
eventInterval: 60 # Seconds between fitness publications
resources:
requests:
cpu: 100m
memory: 128Mi
limits:
cpu: 200m
memory: 256Mi
tolerations:
- operator: Exists # Run on all nodes including GPU-tainted
nodeSelector: {}
liveEviction:
enabled: false # Re-place models when fitness degrades (env: LLMFIT_LIVE_EVICTION=true)
checkInterval: 30s
degradeThreshold: 0.3 # 30% score drop triggers re-placement
webhook:
preflightValidation: false # Reject Model CRs that won't fit (env: LLMFIT_PREFLIGHT_VALIDATION=true)
Security¶
- Read-only root filesystem
- Host path mounts (read-only):
/proc,/sys,/dev,/run/udevat/host/*paths SYS_PTRACEcapability for/procaccess- Minimal RBAC:
nodes: [get]
Prometheus metrics¶
The controller exposes fitness metrics on its /metrics endpoint:
| Metric | Type | Labels | Description |
|---|---|---|---|
sympozium_density_node_score |
Gauge | node |
Highest model fitness score for a node |
sympozium_density_node_stale |
Gauge | node |
1 if node stopped reporting |
sympozium_density_node_ram_total_gb |
Gauge | node |
Total RAM |
sympozium_density_node_ram_available_gb |
Gauge | node |
Available RAM |
sympozium_density_node_gpu_vram_gb |
Gauge | node |
GPU VRAM |
sympozium_density_node_gpu_count |
Gauge | node |
Number of GPUs |
sympozium_density_node_model_count |
Gauge | node |
Models that fit |
sympozium_density_cluster_nodes_total |
Gauge | — | Nodes reporting fitness |
sympozium_density_cluster_nodes_stale |
Gauge | — | Nodes with stale data |
Density API endpoints¶
The API server exposes density data for the web UI and agent queries:
| Method | Path | Description |
|---|---|---|
GET |
/api/v1/density/nodes |
All nodes with hardware specs and model fit counts |
GET |
/api/v1/density/nodes/{name} |
Single node detail with full model fit list |
GET |
/api/v1/density/runtimes |
Installed inference runtimes per node |
GET |
/api/v1/density/installed-models |
Models downloaded in local runtimes per node |
GET |
/api/v1/density/query?model={q} |
Ranked nodes for a model search query |
GET |
/api/v1/catalog |
Alphabetized catalog of all models the cluster can run |
POST |
/api/v1/density/simulate |
Simulate deploying a model — shows per-node capacity impact |
GET |
/api/v1/density/cost |
Per-model and per-namespace resource attribution |
Web UI¶
Model Density page¶
Navigate to Infrastructure > Model Density in the sidebar. Three tabs:
- Nodes — card per node showing CPU, RAM, GPU, backend, model fit count, stale indicator
- Model Catalog — alphabetized table of all models that fit on the cluster with scores and fit levels
- Query — live search for specific models with per-node scores, TPS estimates, and memory requirements
Model deploy dialog¶
When deploying a model with auto placement, the dialog shows a density preview with the top 3 nodes ranked by score, color-coded fit levels, and a "recommended" badge.
Topology page¶
K8s node cards on the topology canvas show RAM, CPU cores, GPU info, backend, and model fit count from the density cache.
SkillPack sidecar (agent-facing)¶
The llmfit SkillPack (v0.2.0) gives agents four skills:
llmfit-cluster-placement¶
Probe-based cluster placement using llmfit-cluster-fit.sh:
llmfit-cluster-fit.sh --model "Qwen/Qwen2.5-Coder-14B-Instruct" --use-case coding --min-fit good --limit 10
llmfit-rest-api-usage¶
Query node-local llmfit REST endpoints when daemons are available.
llmfit-mcp-tools¶
Structured MCP tools (v0.9.24+) available via llmfit serve --mcp:
| Tool | Purpose |
|---|---|
get_system_specs |
Node hardware (RAM, GPU, CPU) |
recommend_models |
Ranked models with filters |
search_models |
Free-text model search |
plan_hardware |
Memory/quant/TPS estimates |
get_runtimes |
Installed inference runtimes |
get_installed_models |
Downloaded models |
llmfit-fitness-cache¶
Query the density cache API from agent workflows:
curl -s http://sympozium-apiserver:8080/api/v1/density/nodes | jq .
curl -s "http://sympozium-apiserver:8080/api/v1/density/query?model=Qwen2.5" | jq .
curl -s http://sympozium-apiserver:8080/api/v1/catalog | jq .
Architecture¶
llmfit DaemonSet (per node) SkillPack sidecar (per agent)
┌──────────────────────┐ ┌──────────────────────┐
│ llmfit serve │ │ llmfit serve --mcp │
│ REST API :8787 │ │ 6 MCP tools (stdio) │
│ /api/v1/system │ │ + probe scripts │
│ /api/v1/models │ └──────────────────────┘
└──────────┬───────────┘ │
│ polled every 60s │ agent tool calls
▼ ▼
┌──────────────────────┐ ┌──────────────────────┐
│ FitnessCache │ │ Agent pod │
│ (controller + │ │ (ad-hoc queries) │
│ apiserver) │ └──────────────────────┘
└──────────┬───────────┘
│
┌─────┼──────────┐
▼ ▼ ▼
Instant Fitness Prometheus
placement API metrics
Persona integration¶
The platform-team ensemble enables llmfit for the sre-watchdog agent. Its heartbeat task queries the density API and includes a ## Density section reporting per-node scores, stale nodes, and degradation alerts.