Sympozium Design Document¶

Status: Draft Date: 2026-02-23 Authors: Architecture Review

1. Executive Summary¶

Sympozium is a Kubernetes-native reimagining of OpenClaw that decomposes the monolithic gateway into a multi-tenant, horizontally scalable system where every sub-agent runs as an ephemeral Kubernetes pod and feature access is gated by Kubernetes-native policy (admission controllers, RBAC, and Custom Resource Definitions).

This draws on two prior-art systems:

OpenClaw: the full-featured production system — rich plugin/channel/tool ecosystem, deep agent orchestration (sub-agent registry, sandbox containers, lane-based command queues, hook lifecycle), but tightly coupled as an in-process monolith with file-based state.
NanoClaw: a minimal alternative that already runs agents as ephemeral containers, communicates via filesystem IPC, enforces isolation through mount boundaries, and uses an external mount allowlist for security policy. Its architecture validates the "one container per agent invocation" model.

Sympozium takes the best of both:

Concern	OpenClaw today	NanoClaw today	Sympozium target
Agent execution	In-process (shared memory)	Ephemeral container per invocation	Ephemeral K8s Pod per invocation
Sub-agent orchestration	In-process registry + lane queue	N/A (flat)	CRD-based registry with controller reconciliation
Sandbox isolation	Docker container (long-lived sidecar)	Container = sandbox (read-only rootfs, cap-drop)	Pod SecurityContext + PodSecurity admission
IPC	In-process EventEmitter	Filesystem polling (JSON files)	gRPC sidecar + shared ephemeral volume
Tool/feature gating	In-process tool-policy pipeline (7 layers)	Mount allowlist (external file)	Admission webhooks + CRD-based policy
State	Files on disk (~/.openclaw/)	SQLite + files	etcd (CRDs) + object storage + PostgreSQL
Multi-instance	Single-instance (file lock)	Single-instance	Horizontally scalable (stateless control plane)
Channel connections	In-process per channel	WhatsApp only, in-process	Channel pods (one Deployment per channel type)

2. Architecture Overview¶

                    ┌─────────────────────────────────────────────┐
                    │              Kubernetes Cluster              │
                    └─────────────────────────────────────────────┘

  ┌──────────────┐   ┌──────────────┐   ┌──────────────┐   ┌──────────────┐
  │   Ingress    │   │ Admission    │   │   Policy     │   │  Cert-Mgr /  │
  │  Controller  │   │  Webhooks    │   │   Engine     │   │   Secrets    │
  │  (TLS, WS)   │   │ (Gatekeep)  │   │  (OPA/Kyverno)│  │              │
  └──────┬───────┘   └──────┬───────┘   └──────┬───────┘   └──────────────┘
         │                  │                  │
         ▼                  ▼                  ▼
  ┌─────────────────────────────────────────────────────────────────────────┐
  │                     Sympozium Control Plane (Deployment, HPA)            │
  │                                                                        │
  │   ┌──────────────┐  ┌────────────────┐  ┌──────────────────────────┐  │
  │   │  API Server   │  │  Agent         │  │  Session Manager         │  │
  │   │  (HTTP + WS)  │  │  Orchestrator  │  │  (session CRUD, history) │  │
  │   │              │  │  (spawn, wait)  │  │                          │  │
  │   └──────────────┘  └───────┬────────┘  └──────────────────────────┘  │
  │                             │                                          │
  │   ┌──────────────────────┐  │  ┌───────────────────────────────────┐  │
  │   │  PersonaPack         │  │  │  Reconcilers: Instance, Policy,  │  │
  │   │  Controller          │  │  │  Schedule, SkillPack, AgentRun   │  │
  │   │  (stamp out agents)  │  │  │                                   │  │
  │   └──────────────────────┘  │  └───────────────────────────────────┘  │
  │                             │                                          │
  │   ┌─────────────────────────┼──────────────────────────────────────┐  │
  │   │  Event Bus (NATS / Redis Streams)                              │  │
  │   └─────────────────────────┼──────────────────────────────────────┘  │
  └─────────────────────────────┼──────────────────────────────────────────┘
                                │
              ┌─────────────────┼─────────────────┐
              ▼                 ▼                  ▼
  ┌──────────────────┐ ┌──────────────────┐ ┌──────────────────┐
  │  Agent Pod       │ │  Agent Pod       │ │  Agent Pod       │
  │  (ephemeral Job) │ │  (ephemeral Job) │ │  (ephemeral Job) │
  │                  │ │                  │ │                  │
  │  ┌────────────┐  │ │  ┌────────────┐  │ │  ┌────────────┐  │
  │  │ Agent      │  │ │  │ Agent      │  │ │  │ Agent      │  │
  │  │ Container  │  │ │  │ Container  │  │ │  │ Container  │  │
  │  └─────┬──────┘  │ │  └────────────┘  │ │  └────────────┘  │
  │        │         │ │                  │ │                  │
  │  ┌─────▼──────┐  │ │                  │ │  ┌────────────┐  │
  │  │ Sandbox    │  │ │                  │ │  │ Browser    │  │
  │  │ Sidecar    │  │ │                  │ │  │ Sidecar    │  │
  │  └────────────┘  │ │                  │ │  └────────────┘  │
  └──────────────────┘ └──────────────────┘ └──────────────────┘

  ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐
  │ Channel Pod  │ │ Channel Pod  │ │ Channel Pod  │ │ Channel Pod  │
  │ (Telegram)   │ │ (WhatsApp)   │ │ (Discord)    │ │ (Slack)      │
  │ Deployment   │ │ StatefulSet  │ │ Deployment   │ │ Deployment   │
  └──────────────┘ └──────────────┘ └──────────────┘ └──────────────┘

  ┌──────────────────────────────────────────────────────────────────┐
  │                        Data Layer                                │
  │  ┌──────────┐  ┌──────────────┐  ┌──────────┐  ┌─────────────┐ │
  │  │ PostgreSQL│  │ Redis/Valkey │  │  MinIO   │  │  etcd       │ │
  │  │ (sessions,│  │ (pub/sub,    │  │  (S3)    │  │  (CRDs,     │ │
  │  │  memory,  │  │  queues,     │  │ transcr. │  │   state)    │ │
  │  │  config)  │  │  locks)      │  │ skills   │  │             │ │
  │  └──────────┘  └──────────────┘  └──────────┘  └─────────────┘ │
  └──────────────────────────────────────────────────────────────────┘

3. Custom Resource Definitions¶

3.1 `SympoziumInstance` — per-user/per-tenant gateway¶

Replaces the monolithic gateway. Each user or tenant gets a SympoziumInstance that declares their desired channels, agents, and policy bindings.

apiVersion: sympozium.ai/v1alpha1
kind: SympoziumInstance
metadata:
  name: alice
  namespace: sympozium
spec:
  # Which channels this instance connects to
  channels:
    - type: telegram
      configRef:
        secret: alice-telegram-creds
    - type: whatsapp
      configRef:
        secret: alice-whatsapp-creds
    - type: discord
      configRef:
        secret: alice-discord-creds

  # Agent configuration
  agents:
    default:
      model: claude-opus-4-0-20250514
      thinking: high
      sandbox:
        enabled: true
        image: ghcr.io/openclaw/sandbox:latest
        resources:
          requests: { cpu: 250m, memory: 512Mi }
          limits: { cpu: "1", memory: 1Gi }
      subagents:
        maxDepth: 2
        maxConcurrent: 5
        maxChildrenPerAgent: 3

  # Skills to mount (from SkillPack CRDs or ConfigMaps)
  skills:
    - skillPackRef: coding-skills
    - skillPackRef: research-skills
    - configMapRef: alice-custom-skills

  # Policy binding (which SympoziumPolicy applies)
  policyRef: standard-user-policy

  # Auth for AI providers
  authRefs:
    - secret: alice-anthropic-key
    - secret: alice-openai-key

status:
  phase: Running
  channels:
    - type: telegram
      status: Connected
      lastHealthCheck: "2026-02-23T10:00:00Z"
    - type: whatsapp
      status: Connected
  activeAgentPods: 2
  totalAgentRuns: 1547

3.2 `AgentRun` — ephemeral agent execution¶

Each agent invocation (including sub-agents) produces an AgentRun CR. The Agent Orchestrator controller watches these and reconciles them into K8s Jobs.

This replaces OpenClaw's in-memory SubagentRunRecord and maps directly to NanoClaw's ephemeral container model — but with K8s lifecycle management instead of docker run --rm.

apiVersion: sympozium.ai/v1alpha1
kind: AgentRun
metadata:
  name: run-abc123
  namespace: sympozium
  labels:
    sympozium.ai/instance: alice
    sympozium.ai/agent-id: default
    sympozium.ai/session-key: "agent:default:subagent:xyz"
    sympozium.ai/parent-run: run-parent-456   # populated for sub-agents
  ownerReferences:
    - apiVersion: sympozium.ai/v1alpha1
      kind: SympoziumInstance
      name: alice
spec:
  instanceRef: alice
  agentId: default
  sessionKey: "agent:default:subagent:xyz"

  # Parent linkage (for sub-agents)
  parent:
    runName: run-parent-456
    sessionKey: "agent:default:main"
    spawnDepth: 1

  task: "Research the latest Kubernetes security best practices"
  systemPrompt: |
    You are a research sub-agent...

  model:
    provider: anthropic
    model: claude-opus-4-0-20250514
    thinking: high
    authSecretRef: alice-anthropic-key

  sandbox:
    enabled: true
    image: ghcr.io/openclaw/sandbox:latest
    securityContext:
      readOnlyRootFilesystem: true
      runAsNonRoot: true
      capabilities: { drop: [ALL] }
      seccompProfile: { type: RuntimeDefault }
    resources:
      requests: { cpu: 250m, memory: 512Mi }
      limits: { cpu: "1", memory: 1Gi }

  # Skills mounted into the agent pod
  skills:
    - skillPackRef: research-skills
    - configMapRef: alice-custom-skills

  # Tools this agent is allowed to use (from its resolved policy)
  toolPolicy:
    allow: [exec, read, write, edit, apply_patch, image, subagents]
    deny: [browser, canvas, cron, gateway]

  timeout: 300s
  cleanup: delete   # or "keep" for debugging

status:
  phase: Running    # Pending → Running → Succeeded / Failed
  podName: run-abc123-pod
  startedAt: "2026-02-23T10:05:00Z"
  completedAt: null
  result: null      # populated on completion with the agent's final reply
  exitCode: null

3.3 `SympoziumPolicy` — feature and tool gating¶

Replaces OpenClaw's 7-layer in-process tool-policy pipeline with a declarative, auditable K8s resource. Enforced by admission webhooks at pod creation time.

Draws from NanoClaw's external mount-allowlist concept (policy stored outside the agent's reach) but extends it to cover all capabilities.

apiVersion: sympozium.ai/v1alpha1
kind: SympoziumPolicy
metadata:
  name: standard-user-policy
  namespace: sympozium
spec:
  # Tool-level gating
  tools:
    defaultAction: deny
    rules:
      - tools: [exec, read, write, edit, apply_patch]
        action: allow
        conditions:
          sandboxRequired: true    # these tools only allowed inside sandbox
      - tools: [image, sessions_status, subagents]
        action: allow
      - tools: [browser]
        action: allow
        conditions:
          featureGate: browser-automation    # requires FeatureGate to be enabled
          sandboxRequired: true
          sidecar: browser                   # requires browser sidecar in pod
      - tools: [cron]
        action: deny
      - tools: ["group:plugins"]
        action: allow
        conditions:
          featureGate: plugins

  # Exec-level gating (NanoClaw-style mount security + OpenClaw approval flows)
  exec:
    securityLevel: allowlist     # deny | allowlist | full
    approvalMode: on-miss        # off | on-miss | always
    approvalChannel: main        # where approval requests go
    safeBins:
      - git
      - ls
      - cat
      - grep
      - find
      - python3
      - node
      - npm
      - pnpm
    blockedBins:
      - curl      # no network exfil from sandbox
      - wget
      - nc
      - ssh

  # Sub-agent gating
  subagents:
    allowed: true
    maxDepth: 2
    maxConcurrent: 5
    maxChildrenPerAgent: 3
    allowCrossAgent: false    # sub-agents can only spawn same agentId
    requireSandbox: true

  # Sandbox enforcement
  sandbox:
    required: true                        # all agent runs must be sandboxed
    network: none                         # none | restricted | unrestricted
    readOnlyRootFilesystem: true
    capDrop: [ALL]
    seccompProfile: RuntimeDefault
    maxMemory: 1Gi
    maxCPU: "1"
    pidsLimit: 256

  # Mount policy (inspired by NanoClaw's mount-allowlist.json)
  mounts:
    workspaceAccess: rw
    blockedPatterns:
      - .ssh
      - .gnupg
      - .aws
      - .azure
      - .kube
      - .docker
      - credentials
      - .env
      - .netrc
      - id_rsa
      - id_ed25519
      - private_key
    additionalMounts:
      allowlistRef:
        configMap: alice-mount-allowlist
      nonMainReadOnly: true

  # Feature gates — features are off unless explicitly enabled
  featureGates:
    browser-automation: false
    voice-call: false
    canvas: false
    plugins: true
    agent-swarms: true
    memory-search: true
    cron-scheduler: false
    network-access: false      # sandbox network policy

3.4 `SkillPack` — portable skill bundles¶

Skills are Markdown instruction bundles that become a CRD. The SkillPack controller reconciles each SkillPack into a ConfigMap that is projected into agent pods at /skills.

Sidecar architecture: When a SkillPack requires runtime tools (e.g. kubectl, helm), it declares a sidecar spec. The AgentRun controller dynamically injects the sidecar container into the agent pod and creates scoped RBAC resources (Role/RoleBinding for namespace-scoped access, ClusterRole/ClusterRoleBinding for cluster-wide access). The controller itself is bound to cluster-admin so it can create arbitrary RBAC rules declared by SkillPacks without hitting Kubernetes RBAC escalation prevention. RBAC resources are garbage-collected when the AgentRun completes or is deleted.

┌─────────────────────────────────────────────────┐
│  Agent Pod (Job)                                │
│                                                 │
│  ┌──────────┐  ┌──────────┐  ┌──────────────┐  │
│  │  agent   │  │ipc-bridge│  │skill-k8s-ops │  │
│  │ (runner) │  │ (sidecar)│  │  (sidecar)   │  │
│  └────┬─────┘  └────┬─────┘  └──────┬───────┘  │
│       │              │               │          │
│   /workspace     /ipc            kubectl +      │
│   /skills        NATS            full RBAC      │
│   /ipc                                          │
│                                                 │
│  ServiceAccount: sympozium-agent                 │
│  + Role: sympozium-skill-k8s-ops-<run>           │
│  + ClusterRole: sympozium-skill-k8s-ops-<run>    │
└─────────────────────────────────────────────────┘

apiVersion: sympozium.ai/v1alpha1
kind: SkillPack
metadata:
  name: coding-skills
  namespace: sympozium
spec:
  skills:
    - name: code-review
      description: "Review code for security and quality"
      requires:
        bins: [git, rg]
      content: |
        # Code Review Skill
        When asked to review code...
    - name: refactoring
      description: "Refactor code to improve quality"
      requires:
        bins: [git]
      content: |
        # Refactoring Skill
        ...
  # Container image requirements (bins this skill pack needs)
  runtimeRequirements:
    image: ghcr.io/openclaw/sandbox-common:latest
  # Optional sidecar container for runtime tools + auto-RBAC
  sidecar:
    image: ghcr.io/sympozium-ai/sympozium/skill-k8s-ops:latest
    mountWorkspace: true
    resources:
      cpu: "100m"
      memory: "128Mi"
    rbac:
      - apiGroups: [""]
        resources: ["pods", "services"]
        verbs: ["get", "list", "watch"]
    clusterRBAC:
      - apiGroups: [""]
        resources: ["nodes", "namespaces"]
        verbs: ["get", "list", "watch"]

3.5 `PersonaPack` — pre-configured agent bundles¶

PersonaPacks are the highest-level abstraction in Sympozium. A single PersonaPack CRD bundles multiple agent personas — each with a system prompt, skills, tool policy, schedule, and memory seeds — into a one-click installable package. Think of them as Helm Charts for AI agents.

When a PersonaPack is activated (via the TUI wizard or kubectl), the controller stamps out all the underlying resources automatically:

PersonaPack CR (spec.personas[])
  │
  ├─ For each persona:
  │   ├─ Create SympoziumInstance (inherits model, authRefs, policyRef)
  │   ├─ Create SympoziumSchedule (from persona.schedule)
  │   └─ Create ConfigMap (<name>-memory, from persona.memory.seeds)
  │
  ├─ Set ownerReferences on all generated resources
  │   └─ Deleting the PersonaPack cascades to all children
  │
  └─ Update status:
      ├─ status.personaCount = len(spec.personas)
      ├─ status.installedCount = successfully created
      ├─ status.installedPersonas[] = {name, instanceName, scheduleName}
      └─ status.phase = Ready | Pending | Error

Lifecycle phases:

Phase	Meaning
`Pending`	PersonaPack exists but `authRefs` are empty — waiting for activation
`Ready`	All personas successfully stamped out
`Error`	One or more personas failed to reconcile

CRD spec:

apiVersion: sympozium.ai/v1alpha1
kind: PersonaPack
metadata:
  name: platform-team
spec:
  description: "Core platform engineering agents"
  category: platform
  version: "1.0.0"

  # Personas — each becomes a SympoziumInstance + Schedule
  personas:
    - name: security-guardian
      displayName: "Security Guardian"
      systemPrompt: |
        You are a Kubernetes security specialist...
      skills:
        - k8s-ops
      toolPolicy:
        allow: [read_file, list_directory, execute_command, fetch_url]
        deny: [write_file]
      schedule:
        type: sweep
        interval: "30m"
        task: "Scan all namespaces for security policy violations..."
      memory:
        enabled: true
        seeds:
          - "Follow CIS Kubernetes Benchmark v1.8 guidelines"
    - name: sre-watchdog
      # ... additional personas

  # Shared auth — patched by the TUI wizard during activation
  authRefs:
    - secret: platform-team-openai-key

  # Shared policy reference
  policyRef: default-policy

Ownership model: All generated resources (Instances, Schedules, ConfigMaps) carry an ownerReference pointing back to the PersonaPack. This gives Kubernetes-native cascading deletion — removing the PersonaPack removes everything it created. The controller uses controllerutil.SetControllerReference to establish the owner chain.

TUI activation flow: The TUI Personas tab lists all PersonaPacks in the cluster. Pressing Enter on a pack launches a wizard that collects provider, API key, and model selection, then creates a Secret and patches the PersonaPack's spec.authRefs. The controller detects the authRef and reconciles all personas into running instances.

Built-in packs: Sympozium ships with two PersonaPacks in config/personas/:

Pack	Personas	Focus
`platform-team`	security-guardian, sre-watchdog, platform-engineer	Security audit, cluster health, scheduled ops
`devops-essentials`	incident-responder, cost-analyzer	Incident triage, resource optimisation

4. Component Deep-Dive¶

4.1 Control Plane — Agent Orchestrator¶

The orchestrator is a Kubernetes controller (Deployment, HPA-scalable) that watches AgentRun CRDs and reconciles them into Jobs/Pods.

Reconciliation loop:

AgentRun created (status.phase = Pending)
  │
  ├─ Validate against SympoziumPolicy (via admission webhook, already passed)
  │
  ├─ Resolve pod spec:
  │   ├─ Base image (sandbox image from SympoziumInstance)
  │   ├─ Sidecar containers (sandbox exec, browser if featureGate enabled)
  │   ├─ Skill sidecars (from SkillPack.spec.sidecar, e.g. kubectl)
  │   ├─ Skill RBAC (Role/ClusterRole + bindings, scoped per-run)
  │   ├─ Volumes (workspace PVC, skills ConfigMaps, session ephemeral vol)
  │   ├─ SecurityContext (from SympoziumPolicy.sandbox)
  │   ├─ NetworkPolicy (from SympoziumPolicy.sandbox.network)
  │   ├─ Resource limits (from SympoziumPolicy.sandbox)
  │   └─ Environment (model auth from Secrets, agent config)
  │
  ├─ Create Job with pod spec
  │   └─ Set ownerReference → AgentRun
  │
  ├─ Update AgentRun status.phase = Running, status.podName = ...
  │
  ├─ Watch pod completion:
  │   ├─ On success: read result from shared volume or gRPC call
  │   │   └─ Update AgentRun status.phase = Succeeded, status.result = ...
  │   ├─ On failure: record error
  │   │   └─ Update AgentRun status.phase = Failed, status.error = ...
  │   └─ On timeout: kill pod
  │       └─ Update AgentRun status.phase = Failed, status.error = "timeout"
  │
  └─ If parent exists: notify parent agent via event bus
      └─ Parent's orchestrator delivers result back to parent session

Sub-agent spawning inside a pod:

When an agent's tool execution calls subagents.spawn(...), the agent container doesn't directly create a child pod. Instead:

The agent writes a SubagentSpawnRequest to its gRPC sidecar (or shared volume sentinel).
The sidecar relays this to the control plane via the event bus.
The orchestrator creates a new AgentRun CR with spec.parent populated.
The child pod runs, completes, writes its result.
The orchestrator reads the result from the child AgentRun status and delivers it to the parent agent (via the parent pod's gRPC sidecar or IPC volume).
The parent agent's tool call resolves with the sub-agent's output.

This is the Kubernetes-native equivalent of OpenClaw's spawnSubagentDirect() → registerSubagentRun() → waitForSubagentCompletion() flow, and NanoClaw's runContainerAgent() → IPC file polling → response parsing flow.

4.2 Agent Pod Structure¶

Each agent invocation runs as a K8s Job with this pod template:

apiVersion: batch/v1
kind: Job
metadata:
  name: run-abc123
  labels:
    sympozium.ai/agent-run: run-abc123
    sympozium.ai/instance: alice
spec:
  ttlSecondsAfterFinished: 300
  activeDeadlineSeconds: 600
  template:
    metadata:
      labels:
        sympozium.ai/agent-run: run-abc123
    spec:
      restartPolicy: Never
      serviceAccountName: sympozium-agent   # minimal RBAC, no cluster access

      securityContext:
        runAsNonRoot: true
        runAsUser: 1000
        fsGroup: 1000
        seccompProfile:
          type: RuntimeDefault

      containers:
        # Main agent container — runs the LLM inference loop
        - name: agent
          image: ghcr.io/openclaw/agent-runner:latest
          securityContext:
            readOnlyRootFilesystem: true
            allowPrivilegeEscalation: false
            capabilities:
              drop: [ALL]
          env:
            - name: AGENT_RUN_ID
              value: run-abc123
            - name: TASK
              valueFrom:
                configMapKeyRef: { name: run-abc123-input, key: task }
            - name: MODEL_PROVIDER
              value: anthropic
          envFrom:
            - secretRef:
                name: alice-anthropic-key
          volumeMounts:
            - name: workspace
              mountPath: /workspace
            - name: skills
              mountPath: /skills
              readOnly: true
            - name: ipc
              mountPath: /ipc
            - name: tmp
              mountPath: /tmp
          resources:
            requests: { cpu: 250m, memory: 512Mi }
            limits: { cpu: "1", memory: 1Gi }

        # Sidecar: IPC bridge to control plane
        - name: ipc-bridge
          image: ghcr.io/openclaw/ipc-bridge:latest
          env:
            - name: AGENT_RUN_ID
              value: run-abc123
            - name: EVENT_BUS_URL
              value: nats://nats.sympozium:4222
          volumeMounts:
            - name: ipc
              mountPath: /ipc
          resources:
            requests: { cpu: 50m, memory: 64Mi }
            limits: { cpu: 100m, memory: 128Mi }

        # Optional sidecar: sandbox exec (if exec tools are enabled)
        - name: sandbox
          image: ghcr.io/openclaw/sandbox:latest
          securityContext:
            readOnlyRootFilesystem: true
            capabilities:
              drop: [ALL]
          command: ["sleep", "infinity"]
          volumeMounts:
            - name: workspace
              mountPath: /workspace
            - name: tmp
              mountPath: /tmp
          resources:
            requests: { cpu: 100m, memory: 256Mi }
            limits: { cpu: 500m, memory: 512Mi }

      volumes:
        - name: workspace
          emptyDir: { sizeLimit: 1Gi }
        - name: skills
          projected:
            sources:
              - configMap: { name: coding-skills }
              - configMap: { name: alice-custom-skills }
        - name: ipc
          emptyDir: { medium: Memory, sizeLimit: 64Mi }
        - name: tmp
          emptyDir: { sizeLimit: 256Mi }

Key design choices:

emptyDir for workspace — ephemeral, scoped to the pod lifetime. For persistent workspaces, use a PVC (ReadWriteOnce per agent, or ReadWriteMany for shared access). This mirrors NanoClaw's per-group directory isolation.
IPC via shared emptyDir — the agent writes spawn requests, tool results, and messages to /ipc; the IPC bridge sidecar watches and relays to the event bus. Same pattern as NanoClaw's filesystem-based IPC, but the bridge replaces the polling loop with filesystem watches + gRPC forwarding.
Sandbox as sidecar — kubectl exec into the sandbox container replaces OpenClaw's docker exec. The agent container calls tools via the IPC bridge, which kubectl execs into the sandbox sidecar. The sandbox has its own SecurityContext, separate from the agent.
No Docker socket — unlike OpenClaw's current sandbox model, there's no need for a Docker socket. The sandbox is a sidecar, and sub-agents are new pods created by the control plane (not by the agent itself).

4.3 IPC Bridge¶

The IPC bridge sits between the ephemeral agent pod and the durable control plane. It replaces three current mechanisms:

Current (OpenClaw)	Current (NanoClaw)	Sympozium IPC Bridge
In-process EventEmitter	Filesystem polling (`setInterval`)	Sidecar with fswatch + event bus
Gateway RPC (`callGateway`)	JSON file drop in `/ipc/messages/`	gRPC to control plane via NATS
`agent.wait` long-poll	stdout marker parsing	Event bus subscription with CR status watch

IPC protocol (files in shared /ipc volume):

/ipc/
├── input/
│   ├── task.json           # Initial task (written by orchestrator before pod start)
│   └── followup-*.json     # Follow-up messages from parent or user
├── output/
│   ├── result.json         # Final agent result (written on completion)
│   ├── stream-*.json       # Streaming output chunks
│   └── status.json         # Agent status updates (thinking, tool use, etc.)
├── spawn/
│   └── request-*.json      # Sub-agent spawn requests (agent → bridge → orchestrator)
├── tools/
│   ├── exec-request-*.json # Bash exec requests (agent → bridge → sandbox sidecar)
│   └── exec-result-*.json  # Exec results (sandbox → bridge → agent)
├── messages/
│   └── send-*.json         # Outbound messages to channels (agent → bridge → channel pod)
└── schedules/
    └── request-*.json      # Schedule upsert/suspend/resume/delete requests (agent → bridge → schedule router)

The bridge watches these directories with inotify/fswatch and translates file operations into event bus messages. This is the same pattern NanoClaw uses (JSON file drop → poll → process → delete) but with push-based notification instead of polling.

4.4 Channel Pods¶

Each channel type runs as its own Deployment (or StatefulSet for channels that need persistent local state, like WhatsApp's session auth).

apiVersion: apps/v1
kind: Deployment
metadata:
  name: channel-telegram-alice
  namespace: sympozium
  labels:
    sympozium.ai/instance: alice
    sympozium.ai/channel: telegram
spec:
  replicas: 1
  selector:
    matchLabels:
      sympozium.ai/instance: alice
      sympozium.ai/channel: telegram
  template:
    spec:
      containers:
        - name: telegram
          image: ghcr.io/openclaw/channel-telegram:latest
          env:
            - name: INSTANCE_NAME
              value: alice
            - name: EVENT_BUS_URL
              value: nats://nats.sympozium:4222
          envFrom:
            - secretRef:
                name: alice-telegram-creds

Channel pods: 1. Maintain the connection to the external service (Telegram Bot API, WhatsApp Web, Discord Gateway, etc.) 2. Receive inbound messages and publish them to the event bus (channel.message.received) 3. Subscribe to outbound message events (channel.message.send) and deliver them 4. Report health status via the event bus (replacing OpenClaw's in-process channel health monitor)

This decomposition means channels scale and fail independently. A WhatsApp reconnection doesn't affect Telegram. A Telegram rate limit doesn't block Discord.

Telegram Setup¶

Open Telegram and message @BotFather.
Send /newbot, choose a name and username — BotFather replies with an API token.

Create a Kubernetes secret with the token:

kubectl create secret generic my-telegram-creds \
  --from-literal=TELEGRAM_BOT_TOKEN=<token-from-botfather>

Reference the secret in your SympoziumInstance:

channels:
  - type: telegram
    configRef:
      secret: my-telegram-creds

The controller creates a channel-telegram Deployment that long-polls the Telegram Bot API. Messages sent to your bot are routed to AgentRuns automatically.

Tip: To find your chat_id, send a message to the bot, then visit https://api.telegram.org/bot<TOKEN>/getUpdates — the chat.id field in the response is what agents use with send_channel_message.

4.5 Event Bus¶

NATS JetStream (or Redis Streams) serves as the nervous system connecting all components:

Event topics:

Topic	Publisher	Subscriber	Payload
`agent.run.requested`	API Server	Orchestrator	AgentRun spec
`agent.run.started`	Orchestrator	API Server, parent agent	Run ID, pod name
`agent.run.completed`	IPC Bridge	Orchestrator, parent agent	Run ID, result
`agent.run.failed`	Orchestrator	API Server, parent agent	Run ID, error
`agent.stream.chunk`	IPC Bridge	API Server (WS fan-out)	Session key, text chunk
`agent.spawn.request`	IPC Bridge (child)	Orchestrator	Spawn params, parent run
`channel.message.received`	Channel Pod	API Server → Orchestrator	Channel, sender, text
`channel.message.send`	IPC Bridge	Channel Pod	Channel, target, text
`channel.health.update`	Channel Pod	API Server	Channel, status
`tool.exec.request`	Agent container	IPC Bridge → Sandbox	Command, workdir
`tool.exec.result`	Sandbox sidecar	IPC Bridge → Agent	stdout, stderr, exit code
`tool.approval.request`	IPC Bridge	API Server → Channel	Command, context
`tool.approval.response`	Channel Pod	IPC Bridge → Agent	approved/denied

5. Admission Control & Policy Enforcement¶

5.1 Admission Webhook: `sympozium-policy-enforcer`¶

A validating + mutating admission webhook intercepts all pod creation requests with the sympozium.ai/agent-run label and enforces SympoziumPolicy:

Validation (reject if violated):

Sandbox required — if SympoziumPolicy.sandbox.required = true, reject pods without the sandbox sidecar container.
SecurityContext — ensure readOnlyRootFilesystem, runAsNonRoot, capDrop: ALL, seccompProfile match policy. Reject if the pod spec tries to escalate.
Resource limits — reject pods exceeding SympoziumPolicy.sandbox.maxMemory / maxCPU.
Feature gates — if the pod spec includes a browser sidecar but featureGates.browser-automation = false, reject.
Network — if SympoziumPolicy.sandbox.network = none, ensure a matching NetworkPolicy exists (or inject one).
Mount validation — validate all volumeMounts against SympoziumPolicy.mounts.blockedPatterns. Reject mounts to .ssh, .kube, etc. This is the K8s-native equivalent of NanoClaw's validateAdditionalMounts().
Sub-agent depth — check the sympozium.ai/spawn-depth annotation against SympoziumPolicy.subagents.maxDepth. Reject if exceeded.
Concurrency — count existing AgentRun CRs with status Running for this instance. Reject if maxConcurrent exceeded.

Mutation (inject defaults):

Inject NetworkPolicy sidecar label selectors for network isolation.
Add default resource limits from SympoziumPolicy if not specified.
Inject the skills volume from SympoziumInstance.spec.skills.
Add the ipc-bridge sidecar if not present.
Set ttlSecondsAfterFinished for auto-cleanup.

5.2 OPA/Gatekeeper Constraints (declarative)¶

For cluster-wide policy enforcement beyond Sympozium's own webhook:

apiVersion: constraints.gatekeeper.sh/v1beta1
kind: SympoziumSandboxRequired
metadata:
  name: require-sandbox-all-agents
spec:
  match:
    kinds:
      - apiGroups: ["batch"]
        kinds: ["Job"]
    namespaces: ["sympozium"]
    labelSelector:
      matchLabels:
        sympozium.ai/component: agent-run
  parameters:
    requiredContainers: ["sandbox"]
    requiredSecurityContext:
      readOnlyRootFilesystem: true
      runAsNonRoot: true
      capabilities:
        drop: ["ALL"]

5.3 NetworkPolicy for Agent Pods¶

Agent pods get a NetworkPolicy that implements SympoziumPolicy.sandbox.network:

# network = "none": full isolation (like NanoClaw's containers + OpenClaw's --network none)
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: agent-network-deny-all
  namespace: sympozium
spec:
  podSelector:
    matchLabels:
      sympozium.ai/component: agent-run
      sympozium.ai/network-policy: none
  policyTypes: [Ingress, Egress]
  # Empty ingress/egress = deny all
  egress:
    # Allow only DNS (needed for the IPC bridge sidecar)
    - to:
        - namespaceSelector: {}
      ports:
        - port: 53
          protocol: UDP
    # Allow event bus (IPC bridge needs this)
    - to:
        - podSelector:
            matchLabels:
              app: nats
      ports:
        - port: 4222

# network = "restricted": allow specific egress only
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: agent-network-restricted
spec:
  podSelector:
    matchLabels:
      sympozium.ai/component: agent-run
      sympozium.ai/network-policy: restricted
  policyTypes: [Ingress, Egress]
  egress:
    - to:
        - namespaceSelector: {}
      ports:
        - { port: 53, protocol: UDP }    # DNS
        - { port: 443, protocol: TCP }   # HTTPS only
    - to:
        - podSelector:
            matchLabels:
              app: nats
      ports:
        - port: 4222

5.4 Feature Gates¶

Feature gates are the primary mechanism for progressive enablement. They are declared in SympoziumPolicy.featureGates and enforced at multiple levels:

Feature Gate	What it unlocks	Enforcement point
`browser-automation`	Browser sidecar in agent pods	Admission webhook (rejects browser sidecar if false)
`voice-call`	Voice call channel pod	SympoziumInstance controller (skips voice pod creation)
`canvas`	Canvas host sidecar	Admission webhook
`plugins`	Plugin loading in agent pods	Admission webhook (env var injection)
`agent-swarms`	Sub-agent spawning	Admission webhook (rejects spawn depth > 0 if false)
`memory-search`	Memory/vector search sidecar	Admission webhook
`cron-scheduler`	CronJob creation for scheduled agent runs	Orchestrator (refuses to create CronJobs)
`network-access`	NetworkPolicy relaxation	Admission webhook (inject deny-all if false)

How a user enables a feature:

# Via kubectl
kubectl patch sympoziumpolicy standard-user-policy --type merge \
  -p '{"spec":{"featureGates":{"browser-automation": true}}}'

# Via Sympozium CLI (wrapper)
sympozium features enable browser-automation --instance alice

# Via the chat interface (admin channel)
# User: "@claw enable browser automation"
# → Control plane patches SympoziumPolicy
# → Next agent run gets browser sidecar

6. Data Layer Migration¶

6.1 Session Store: File → PostgreSQL¶

OpenClaw's file-based session store (sessions.json per agent + .jsonl transcripts) becomes a PostgreSQL table:

CREATE TABLE sessions (
    id              UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    instance_name   TEXT NOT NULL,   -- SympoziumInstance name
    agent_id        TEXT NOT NULL,
    session_key     TEXT NOT NULL UNIQUE,
    channel         TEXT,
    thread_id       TEXT,
    spawn_depth     INTEGER DEFAULT 0,
    spawned_by      TEXT REFERENCES sessions(session_key),
    metadata        JSONB DEFAULT '{}',
    created_at      TIMESTAMPTZ DEFAULT now(),
    updated_at      TIMESTAMPTZ DEFAULT now()
);

CREATE TABLE transcript_events (
    id              BIGSERIAL PRIMARY KEY,
    session_key     TEXT NOT NULL REFERENCES sessions(session_key),
    role            TEXT NOT NULL,  -- user, assistant, tool_use, tool_result
    content         JSONB NOT NULL,
    created_at      TIMESTAMPTZ DEFAULT now()
);

-- Memory / vector search (pgvector replaces SQLite + vec0)
CREATE TABLE memory_embeddings (
    id              BIGSERIAL PRIMARY KEY,
    instance_name   TEXT NOT NULL,
    agent_id        TEXT NOT NULL,
    source          TEXT NOT NULL,   -- file path, session key, etc.
    chunk           TEXT NOT NULL,
    embedding       vector(1536),    -- dimension depends on model
    metadata        JSONB DEFAULT '{}',
    created_at      TIMESTAMPTZ DEFAULT now()
);

CREATE INDEX ON memory_embeddings
    USING ivfflat (embedding vector_cosine_ops)
    WITH (lists = 100);

6.2 Config: File → ConfigMap + CRD¶

openclaw.json is decomposed:

Config section	Sympozium equivalent
`gateway.auth`	K8s Secret + Ingress auth
`gateway.mode`, `gateway.port`	Service + Ingress spec
`agents.defaults`	`SympoziumInstance.spec.agents.default`
`agents.defaults.sandbox`	`SympoziumPolicy.spec.sandbox`
`agents.defaults.subagents`	`SympoziumPolicy.spec.subagents`
`agents.defaults.tools`	`SympoziumPolicy.spec.tools`
`channels.*`	`SympoziumInstance.spec.channels` + channel pod Secrets
`hooks.*`	Plugin hooks in agent pod ConfigMap
`skills.*`	`SkillPack` CRDs
`cron.*`	K8s CronJob resources

6.3 Transcripts: Files → Object Storage¶

Session transcripts (append-only JSONL) move to MinIO/S3:

s3://sympozium-transcripts/{instance}/{agentId}/{sessionKey}/{timestamp}.jsonl

Agent pods write transcript events to the IPC volume; the IPC bridge flushes them to object storage in batches.

7. Migration Path¶

Phase 1: Operator + CRDs (foundations)¶

Implement CRDs: SympoziumInstance, AgentRun, SympoziumPolicy, SkillPack
Build the Sympozium operator (controller-runtime based)
SympoziumInstance controller: reconcile channel pods + store config
AgentRun controller: create Jobs, watch completion, deliver results
Agent pod image: minimal Node.js runner that reads task from /ipc/input/, calls LLM, writes result to /ipc/output/
IPC bridge sidecar: file-watch + NATS publish/subscribe

Phase 2: Policy enforcement (security)¶

Implement SympoziumPolicy admission webhook (validating + mutating)
Feature gates enforcement
NetworkPolicy generation from SympoziumPolicy.sandbox.network
Mount validation (blocked patterns, allowlist)
OPA/Gatekeeper constraint templates for cluster-wide policy

Phase 3: Channel decomposition¶

Extract each channel into its own container image
Channel controller: watches SympoziumInstance.spec.channels, reconciles channel pods
Event bus integration for inbound/outbound message routing
Channel health monitoring via event bus heartbeats

Phase 4: Sub-agent orchestration¶

Implement spawn request → AgentRun CR creation flow
Parent-child linkage and depth tracking
Result delivery from child to parent via event bus
Concurrency enforcement (max concurrent per instance)
Cleanup controller (TTL, cleanup: delete policy)

Phase 5: Advanced features¶

Memory/vector search as a sidecar or shared service (pgvector)
Browser automation sidecar (Chromium + noVNC, gated by feature flag)
Canvas host as a separate Deployment
CronJob integration for scheduled agent runs
Web UI / Control UI as a separate Deployment

8. Comparison: OpenClaw Concepts → Sympozium Primitives¶

OpenClaw concept	Code location	Sympozium equivalent
`startGatewayServer()`	`src/gateway/server.impl.ts`	Control plane Deployment (API + orchestrator)
`SubagentRunRecord`	`src/agents/subagent-registry.types.ts`	`AgentRun` CRD
`spawnSubagentDirect()`	`src/agents/subagent-spawn.ts`	IPC bridge writes spawn request → orchestrator creates `AgentRun`
`waitForSubagentCompletion()`	`src/agents/subagent-spawn.ts`	Watch `AgentRun.status.phase` change to Succeeded
Announce flow	`src/agents/subagent-announce.ts`	Orchestrator reads `AgentRun.status.result`, delivers to parent IPC
Command queue (lanes)	`src/process/command-queue.ts`	`AgentRun` concurrency limits in SympoziumPolicy + admission webhook
Tool policy pipeline	`src/agents/tool-policy-pipeline.ts`	`SympoziumPolicy` CRD + admission webhook
Sandbox (`docker exec`)	`src/agents/sandbox/docker.ts`	Sandbox sidecar container in agent pod
FS Bridge	`src/agents/sandbox/fs-bridge.ts`	`kubectl exec` into sandbox sidecar (via IPC bridge)
Config hot-reload	`src/gateway/config-reload.ts`	ConfigMap watch + rolling pod restart
Channel manager	`src/gateway/server-channels.ts`	Channel pod Deployments per channel type
Plugin registry	`src/plugins/registry.ts`	Plugin containers, `SkillPack` CRDs
Gateway lock	`src/infra/gateway-lock.ts`	Eliminated (stateless control plane, no file locks)
Session file write lock	`src/agents/session-write-lock.ts`	Eliminated (PostgreSQL row-level locking)
Memory/SQLite	`src/memory/manager.ts`	PostgreSQL + pgvector
Cron service	`src/cron/service.ts`	K8s CronJob resources
mDNS discovery	`src/gateway/server-discovery-runtime.ts`	K8s Service discovery
Tailscale exposure	`src/gateway/server-tailscale.ts`	K8s Ingress
Gateway health	`src/gateway/server/health-state.ts`	K8s liveness/readiness probes + Prometheus metrics

NanoClaw concept	Code location	Sympozium equivalent
`runContainerAgent()`	`src/container-runner.ts`	Orchestrator creates K8s Job from `AgentRun` spec
Container args builder	`buildContainerArgs()`	Pod spec builder in orchestrator
Volume mount builder	`buildVolumeMounts()`	Pod volume/volumeMount spec in orchestrator
IPC file polling	`src/ipc.ts`	IPC bridge sidecar with `inotify` + event bus
Group queue	`src/group-queue.ts`	`AgentRun` concurrency limits per instance
Mount allowlist	`src/mount-security.ts`	`SympoziumPolicy.mounts` + admission webhook validation
Per-group isolation	Group folder + session dir	Per-`AgentRun` pod with isolated volumes
Credential filtering	`readSecrets()`	K8s Secrets mounted only into authorized pods
`OUTPUT_START/END_MARKER`	`container-runner.ts`	IPC bridge structured JSON protocol
N/A	N/A	`PersonaPack` CRD — bundles multiple agent personas into one installable unit; stamps out Instances, Schedules, and memory automatically

9. Security Model¶

┌───────────────────────────────────────────────────────────────────────────┐
│                          CLUSTER-LEVEL POLICY                            │
│  • PodSecurity (restricted profile)                                      │
│  • NetworkPolicy (default deny for sympozium namespace)                    │
│  • OPA/Gatekeeper constraints (global guardrails)                        │
│  • RBAC (agent ServiceAccount has zero cluster permissions)              │
└────────────────────────────────────────┬──────────────────────────────────┘
                                         │
┌────────────────────────────────────────▼──────────────────────────────────┐
│                        SYMPOZIUM ADMISSION WEBHOOK                         │
│  • Validates every agent pod against SympoziumPolicy                          │
│  • Enforces feature gates, mount blocklists, resource limits             │
│  • Injects NetworkPolicy labels, security contexts, sidecars            │
│  • Checks sub-agent depth and concurrency limits                         │
└────────────────────────────────────────┬──────────────────────────────────┘
                                         │
┌────────────────────────────────────────▼──────────────────────────────────┐
│                           AGENT POD ISOLATION                            │
│  • readOnlyRootFilesystem: true                                          │
│  • runAsNonRoot: true (uid 1000)                                         │
│  • capabilities: drop ALL                                                │
│  • seccompProfile: RuntimeDefault                                        │
│  • No host network/PID/IPC namespace sharing                             │
│  • No service account token auto-mount                                   │
│  • NetworkPolicy: deny all (or restricted egress)                        │
│  • Resource limits enforced (CPU, memory, pids, ephemeral storage)       │
│  • Secrets mounted only for authorized providers                         │
└──────────────────────────────────────────────────────────────────────────┘

Credential isolation (improving on both OpenClaw and NanoClaw):

AI provider API keys are K8s Secrets, mounted as environment variables into agent pods via secretRef. The agent can read them (necessary for auth), but they never touch disk inside the pod (env-only, no file mount).
Gateway tokens are only in the control plane pods, never in agent pods.
Channel credentials are only in channel pods, never in agent pods.
Cross-instance isolation: each SympoziumInstance has its own Secrets; the admission webhook rejects pods that reference Secrets from other instances.

10. Observability¶

# ServiceMonitor for Prometheus
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  name: sympozium-control-plane
spec:
  selector:
    matchLabels:
      app: sympozium-control-plane
  endpoints:
    - port: metrics
      path: /metrics

Key metrics:

Metric	Type	Description
`sympozium_agent_runs_total`	Counter	Total agent runs (by instance, status)
`sympozium_agent_run_duration_seconds`	Histogram	Agent run duration
`sympozium_agent_runs_active`	Gauge	Currently running agent pods
`sympozium_subagent_spawns_total`	Counter	Sub-agent spawn requests
`sympozium_subagent_depth`	Histogram	Sub-agent nesting depth distribution
`sympozium_tool_calls_total`	Counter	Tool invocations (by tool name, status)
`sympozium_tool_policy_denials_total`	Counter	Policy-denied tool calls
`sympozium_channel_messages_total`	Counter	Messages in/out per channel
`sympozium_channel_health`	Gauge	Channel connection status (0/1)
`sympozium_admission_decisions_total`	Counter	Webhook admit/reject counts

11. Open Questions¶

Workspace persistence — should agent workspaces be ephemeral (emptyDir) or persistent (PVC)? Ephemeral is simpler and more secure (no cross-run state leakage), but some skills need persistent workspace state across runs (e.g., git repos). Could use ReadWriteOnce PVCs per instance with cleanup policies.
LLM streaming latency — the event bus adds a hop for streaming tokens. For interactive chats, latency matters. May need a direct WebSocket path from agent pod → API server for streaming, bypassing the event bus for agent.stream.chunk events.
Cost of pod creation — K8s pod startup is slower than docker run. Warm pod pools (pre-created, idle agent pods) could reduce cold-start latency. Alternatively, use Kata Containers or Firecracker for faster microVM boot.
Multi-cluster — should Sympozium support agents running across clusters? The event bus (NATS) supports multi-cluster natively, but CRDs are cluster-scoped.
Provider rate limiting — when many agent pods hit the same AI provider simultaneously, rate limits become a concern. A shared rate-limiting proxy (e.g., an Envoy sidecar or centralized proxy) may be needed.
Operator framework — controller-runtime (Go) vs Kopf (Python) vs custom Node.js operator (to share code with OpenClaw). Go is the standard choice for K8s operators; the agent runner itself stays Node.js/TypeScript.