LLMFit — Hardware Fitness & Model Placement¶

Sympozium integrates llmfit (v0.9.24) in two ways:

DaemonSet — runs on every node, continuously reports hardware specs and model density scores. Powers instant model placement, the Model Density UI, and Prometheus metrics. Deployed by default.
SkillPack sidecar — gives agents interactive access to llmfit's MCP tools and cluster probe scripts for ad-hoc queries.

DaemonSet (always-on density telemetry)¶

What it does¶

The sympozium-llmfit-daemon DaemonSet runs llmfit serve on every node, exposing a REST API on port 8787. The controller and API server poll each pod every 60 seconds to build a cluster-wide FitnessCache containing:

Per-node hardware specs (RAM, CPU, GPU, VRAM, backend)
Model density scores (which models fit on which nodes, at what quality)
Installed runtimes (Ollama, vLLM, llama.cpp, etc.)

Instant model placement¶

When a Model CR has placement.mode: auto, the controller checks the FitnessCache first. If fresh data exists, placement is instant (milliseconds). If the cache is empty — DaemonSet not deployed or still warming up — it falls back to the original probe-pod approach (~3 minutes).

Helm configuration¶

llmfit:
  daemonset:
    enabled: true           # Deployed by default with Sympozium
    eventInterval: 60       # Seconds between fitness publications
    resources:
      requests:
        cpu: 100m
        memory: 128Mi
      limits:
        cpu: 200m
        memory: 256Mi
    tolerations:
      - operator: Exists    # Run on all nodes including GPU-tainted
    nodeSelector: {}
  liveEviction:
    enabled: false          # Re-place models when fitness degrades (env: LLMFIT_LIVE_EVICTION=true)
    checkInterval: 30s
    degradeThreshold: 0.3   # 30% score drop triggers re-placement
  webhook:
    preflightValidation: false  # Reject Model CRs that won't fit (env: LLMFIT_PREFLIGHT_VALIDATION=true)

Security¶

Read-only root filesystem
Host path mounts (read-only): /proc, /sys, /dev, /run/udev at /host/* paths
SYS_PTRACE capability for /proc access
Minimal RBAC: nodes: [get]

Prometheus metrics¶

The controller exposes fitness metrics on its /metrics endpoint:

Metric	Type	Labels	Description
`sympozium_density_node_score`	Gauge	`node`	Highest model fitness score for a node
`sympozium_density_node_stale`	Gauge	`node`	1 if node stopped reporting
`sympozium_density_node_ram_total_gb`	Gauge	`node`	Total RAM
`sympozium_density_node_ram_available_gb`	Gauge	`node`	Available RAM
`sympozium_density_node_gpu_vram_gb`	Gauge	`node`	GPU VRAM
`sympozium_density_node_gpu_count`	Gauge	`node`	Number of GPUs
`sympozium_density_node_model_count`	Gauge	`node`	Models that fit
`sympozium_density_cluster_nodes_total`	Gauge	—	Nodes reporting fitness
`sympozium_density_cluster_nodes_stale`	Gauge	—	Nodes with stale data

Density API endpoints¶

The API server exposes density data for the web UI and agent queries:

Method	Path	Description
`GET`	`/api/v1/density/nodes`	All nodes with hardware specs and model fit counts
`GET`	`/api/v1/density/nodes/{name}`	Single node detail with full model fit list
`GET`	`/api/v1/density/runtimes`	Installed inference runtimes per node
`GET`	`/api/v1/density/installed-models`	Models downloaded in local runtimes per node
`GET`	`/api/v1/density/query?model={q}`	Ranked nodes for a model search query
`GET`	`/api/v1/catalog`	Alphabetized catalog of all models the cluster can run
`POST`	`/api/v1/density/simulate`	Simulate deploying a model — shows per-node capacity impact
`GET`	`/api/v1/density/cost`	Per-model and per-namespace resource attribution

Web UI¶

Model Density page¶

Navigate to Infrastructure > Model Density in the sidebar. Three tabs:

Nodes — card per node showing CPU, RAM, GPU, backend, model fit count, stale indicator
Model Catalog — alphabetized table of all models that fit on the cluster with scores and fit levels
Query — live search for specific models with per-node scores, TPS estimates, and memory requirements

Model deploy dialog¶

When deploying a model with auto placement, the dialog shows a density preview with the top 3 nodes ranked by score, color-coded fit levels, and a "recommended" badge.

Topology page¶

K8s node cards on the topology canvas show RAM, CPU cores, GPU info, backend, and model fit count from the density cache.

SkillPack sidecar (agent-facing)¶

The llmfit SkillPack (v0.2.0) gives agents four skills:

`llmfit-cluster-placement`¶

Probe-based cluster placement using llmfit-cluster-fit.sh:

llmfit-cluster-fit.sh --model "Qwen/Qwen2.5-Coder-14B-Instruct" --use-case coding --min-fit good --limit 10

`llmfit-rest-api-usage`¶

Query node-local llmfit REST endpoints when daemons are available.

`llmfit-mcp-tools`¶

Structured MCP tools (v0.9.24+) available via llmfit serve --mcp:

Tool	Purpose
`get_system_specs`	Node hardware (RAM, GPU, CPU)
`recommend_models`	Ranked models with filters
`search_models`	Free-text model search
`plan_hardware`	Memory/quant/TPS estimates
`get_runtimes`	Installed inference runtimes
`get_installed_models`	Downloaded models

`llmfit-fitness-cache`¶

Query the density cache API from agent workflows:

curl -s http://sympozium-apiserver:8080/api/v1/density/nodes | jq .
curl -s "http://sympozium-apiserver:8080/api/v1/density/query?model=Qwen2.5" | jq .
curl -s http://sympozium-apiserver:8080/api/v1/catalog | jq .

Architecture¶

llmfit DaemonSet (per node)          SkillPack sidecar (per agent)
┌──────────────────────┐             ┌──────────────────────┐
│ llmfit serve         │             │ llmfit serve --mcp   │
│ REST API :8787       │             │ 6 MCP tools (stdio)  │
│ /api/v1/system       │             │ + probe scripts      │
│ /api/v1/models       │             └──────────────────────┘
└──────────┬───────────┘                     │
           │ polled every 60s                │ agent tool calls
           ▼                                 ▼
┌──────────────────────┐             ┌──────────────────────┐
│ FitnessCache         │             │ Agent pod            │
│ (controller +        │             │ (ad-hoc queries)     │
│  apiserver)          │             └──────────────────────┘
└──────────┬───────────┘
           │
     ┌─────┼──────────┐
     ▼     ▼          ▼
  Instant   Fitness   Prometheus
  placement API       metrics

Persona integration¶

The platform-team ensemble enables llmfit for the sre-watchdog agent. Its heartbeat task queries the density API and includes a ## Density section reporting per-node scores, stale nodes, and degradation alerts.