Skip to content

Writing Integration Tests

Integration tests verify that Sympozium tools work end-to-end in a real cluster. Unlike unit tests, they exercise the full pipeline: controller → Job → agent-runner → LLM → tool invocation.

Prerequisites

  • A Kind (or other) cluster with Sympozium installed (make install && kubectl apply -k config/)
  • An OpenAI API key (or other supported provider)
  • kubectl configured to talk to the cluster

Running Existing Tests

# Run all integration tests
make test-integration

# Run API smoke regressions (fast, no LLM execution)
make integration-tests

# Optional capability validation (same target, token-gated checks)
# CLAUDE_TOKEN validates Anthropic provider wiring
# GITHUB_TOKEN validates github-gitops token endpoint + secret persistence
CLAUDE_TOKEN=... GITHUB_TOKEN=... make integration-tests

# Run with a specific model
TEST_MODEL=gpt-5.2 ./test/integration/test-write-file.sh

# Override timeout (seconds)
TEST_TIMEOUT=180 ./test/integration/test-write-file.sh

# Use a pre-existing secret instead of OPENAI_API_KEY env var
kubectl create secret generic inttest-openai-key --from-literal=OPENAI_API_KEY=sk-...
./test/integration/test-write-file.sh

How Integration Tests Work

Each test follows the same pattern:

  1. Create resources — a test SympoziumInstance and AgentRun with a deterministic task
  2. Wait for completion — poll status.phase until Succeeded or Failed
  3. Validate results — check pod logs, status.result, or the pod filesystem
  4. Clean up — delete all test resources

The LLM is the real LLM, not a mock. The task prompt is carefully worded to produce a deterministic, verifiable outcome (e.g., "write exactly this string to this file").

Writing a New Test

1. Create the script

Place your test in test/integration/:

touch test/integration/test-my-tool.sh
chmod +x test/integration/test-my-tool.sh

2. Use the standard template

#!/usr/bin/env bash
set -euo pipefail

# --- Configuration ---
NAMESPACE="${TEST_NAMESPACE:-default}"
INSTANCE_NAME="inttest-my-tool"
RUN_NAME="inttest-my-tool-run"
SECRET_NAME="inttest-openai-key"
MODEL="${TEST_MODEL:-gpt-4o-mini}"
TIMEOUT="${TEST_TIMEOUT:-120}"

RED='\033[0;31m'
GREEN='\033[0;32m'
YELLOW='\033[1;33m'
NC='\033[0m'

pass() { echo -e "${GREEN}$*${NC}"; }
fail() { echo -e "${RED}$*${NC}"; }
info() { echo -e "${YELLOW}$*${NC}"; }
failures=0

cleanup() {
    info "Cleaning up..."
    kubectl delete agentrun "$RUN_NAME" -n "$NAMESPACE" --ignore-not-found >/dev/null 2>&1 || true
    kubectl delete sympoziuminstance "$INSTANCE_NAME" -n "$NAMESPACE" --ignore-not-found >/dev/null 2>&1 || true
    kubectl delete jobs -n "$NAMESPACE" -l "sympozium.ai/agentrun=$RUN_NAME" --ignore-not-found >/dev/null 2>&1 || true
    kubectl delete pods -n "$NAMESPACE" -l "sympozium.ai/agentrun=$RUN_NAME" --ignore-not-found >/dev/null 2>&1 || true
}

# Ensure secret exists
if ! kubectl get secret "$SECRET_NAME" -n "$NAMESPACE" >/dev/null 2>&1; then
    if [[ -z "${OPENAI_API_KEY:-}" ]]; then
        fail "Set OPENAI_API_KEY or create secret '$SECRET_NAME'"
        exit 1
    fi
    kubectl create secret generic "$SECRET_NAME" \
        --from-literal=OPENAI_API_KEY="$OPENAI_API_KEY" -n "$NAMESPACE"
fi

cleanup 2>/dev/null || true
sleep 2

# --- Create SympoziumInstance ---
cat <<EOF | kubectl apply -f -
apiVersion: sympozium.ai/v1alpha1
kind: SympoziumInstance
metadata:
  name: ${INSTANCE_NAME}
  namespace: ${NAMESPACE}
spec:
  agents:
    default:
      model: ${MODEL}
  authRefs:
    - secret: ${SECRET_NAME}
EOF

# --- Create AgentRun ---
cat <<EOF | kubectl apply -f -
apiVersion: sympozium.ai/v1alpha1
kind: AgentRun
metadata:
  name: ${RUN_NAME}
  namespace: ${NAMESPACE}
  labels:
    sympozium.ai/instance: ${INSTANCE_NAME}
spec:
  instanceRef: ${INSTANCE_NAME}
  agentId: default
  sessionKey: "inttest-$(date +%s)"
  task: |
    YOUR DETERMINISTIC TASK PROMPT HERE.
  model:
    provider: openai
    model: ${MODEL}
    authSecretRef: ${SECRET_NAME}
  timeout: "3m"
EOF

# --- Wait for completion ---
elapsed=0
phase=""
pod=""
while [[ $elapsed -lt $TIMEOUT ]]; do
    phase=$(kubectl get agentrun "$RUN_NAME" -n "$NAMESPACE" \
        -o jsonpath='{.status.phase}' 2>/dev/null || echo "")
    # Capture pod name early, before cleanup removes it
    if [[ -z "$pod" ]]; then
        pod=$(kubectl get pods -n "$NAMESPACE" \
            -l "sympozium.ai/agentrun=$RUN_NAME" \
            -o jsonpath='{.items[0].metadata.name}' 2>/dev/null || echo "")
    fi
    [[ "$phase" == "Succeeded" || "$phase" == "Failed" ]] && break
    sleep 5
    elapsed=$((elapsed + 5))
done

[[ "$phase" == "Succeeded" ]] && pass "AgentRun succeeded" || { fail "AgentRun phase: $phase"; cleanup; exit 1; }

# --- Validate ---
result=$(kubectl get agentrun "$RUN_NAME" -n "$NAMESPACE" \
    -o jsonpath='{.status.result}' 2>/dev/null || echo "")

# YOUR VALIDATION LOGIC HERE
# Example:
# if echo "$result" | grep -qi "expected_string"; then
#     pass "Result contains expected output"
# else
#     fail "Result missing expected output"
#     failures=$((failures + 1))
# fi

# --- Cleanup & exit ---
cleanup
if [[ $failures -gt 0 ]]; then
    fail "$failures check(s) failed"
    exit 1
fi
pass "Test complete"

3. Add to the Makefile (optional)

If you want it to run as part of make test-integration, add your script to that target in the Makefile (it currently lists scripts explicitly). Or add a specific target:

test-integration-my-tool: ## Run my-tool integration test
    ./test/integration/test-my-tool.sh

Writing Good Task Prompts

The task prompt is the most important part. Since you're asking a real LLM to do something and then checking the result, the prompt must be:

  • Deterministic — ask for a specific, verifiable output (exact string, exact filename)
  • Minimal — one tool call, one action — don't test multiple things at once
  • Unambiguous — tell the model exactly what to do, not what to figure out

Good examples

Use the write_file tool to write the exact text "sympozium-integration-ok"
to /workspace/test-output.txt. Do not add any extra content.
Use the list_directory tool to list the contents of /workspace.
Report what you find.
Use the read_file tool to read /skills/k8s-ops.yaml and report
the first line of the file.

Bad examples

Do some Kubernetes operations and tell me what you find.
This is non-deterministic — you can't verify the output.

Write a Python script that calculates fibonacci numbers.
This tests the LLM's coding ability, not the tool.

Validation Strategies

What to check How Reliability
status.phase kubectl get agentrun -o jsonpath='{.status.phase}' High — always works
status.result Check for keywords in the LLM's response text High — survives pod cleanup
Pod logs kubectl logs <pod> -c agent Medium — pod may be gone
File on disk kubectl exec <pod> -c agent -- cat /path Low — pod must still be running

Tip: Capture the pod name during the polling loop (while the run is Running), not after it completes. The Job/pod is cleaned up quickly after the agent finishes.

Environment Variables

Variable Default Description
TEST_MODEL gpt-4o-mini LLM model to use
TEST_TIMEOUT 120 Max seconds to wait for completion
TEST_NAMESPACE default Kubernetes namespace
OPENAI_API_KEY (none) Used to create the secret if it doesn't exist

Existing Tests

Test File What it verifies
API smoke test/integration/test-api-smoke.sh API-only CRUD/list/get coverage for PersonaPacks, ad-hoc Instances, Skills, Policies, and Schedules
API PersonaPack provider switch test/integration/test-api-personapack-provider-switch.sh Verifies OpenAI→Anthropic PersonaPack updates propagate to stamped instances and subsequent AgentRuns (provider/auth/model/skills)
API PersonaPack + ad-hoc correctness test/integration/test-api-personapack-adhoc-correctness.sh Verifies PersonaPack propagation (authRef/provider/model/skills), ad-hoc parity, and that disabling a PersonaPack removes stamped instances
API AgentRun container shape test/integration/test-api-agentrun-container-shape.sh Validates container count/names for AgentRun pods from plain instances (agent+ipc-bridge) and skill-backed instances (agent+ipc-bridge+skill sidecar)
API PersonaPack provisioning test/integration/test-api-personapack-provisioning.sh Enabling a PersonaPack via API stamps out PersonaPack-labeled Instances and Schedules
API schedule dispatch test/integration/test-api-schedule-dispatch.sh Creating a schedule via API results in dispatched AgentRuns (status.totalRuns / lastRunName)
API observability test/integration/test-api-observability.sh OTEL collector deployment health + /api/v1/observability/metrics correctness (collectorReachable, payload sanity)
API web-endpoint test/integration/test-api-web-endpoint.sh Web-endpoint skill enable/disable via PATCH, status endpoint, parameter propagation, idempotency
API serving-mode test/integration/test-api-serving-mode.sh Server-mode AgentRun creation (Deployment + Service), web-endpoint status with deploymentName/serviceName, container env validation
API capabilities (optional) test/integration/test-api-capabilities.sh Token-gated checks for Anthropic provider wiring (CLAUDE_TOKEN) and GitHub token endpoint/secret persistence (GITHUB_TOKEN)
write_file test/integration/test-write-file.sh Agent uses write_file tool to create a file with specific content
k8s-ops nodes test/integration/test-k8s-ops-nodes.sh Agent uses execute_command with k8s-ops skill to run kubectl get nodes
telegram test/integration/test-telegram-channel.sh Channel deployment pipeline + optional full E2E with real bot
slack test/integration/test-slack-channel.sh Channel deployment pipeline + optional full E2E via Socket Mode
developer-team test/integration/developer_team_test.sh End-to-end PersonaPack simulation — 7 developer agents collaborate on a real GitHub repo (issues, PRs, code review, QA). Requires GITHUB_TOKEN with repo scope