Troubleshooting

Quick Diagnostics

Before diving into specific issues, run these quick checks:

# Check all services are running
docker compose ps

# View recent logs from all services
docker compose logs --tail=50

# Check specific service health
curl http://localhost:8080/health
curl http://localhost:8000/health  # LLM Service

Installation & Setup Issues

Docker Compose Fails to Start

Symptoms:

Services won’t start
Exit code errors
Container crashes immediately

Common Causes:

1. Docker daemon not running

Check:

docker info

Solution:

# macOS
open -a Docker

# Linux
sudo systemctl start docker

# Verify
docker info

2. Port conflicts

Check which ports are in use:

# Check all Shannon ports
lsof -i :8080  # Gateway
lsof -i :50051 # Agent Core
lsof -i :50052 # Orchestrator
lsof -i :8000  # LLM Service
lsof -i :5432  # PostgreSQL
lsof -i :6379  # Redis
lsof -i :6333  # Qdrant
lsof -i :7233  # Temporal

Solution - Kill conflicting processes:

# Find process using port
lsof -ti :8080

# Kill the process (macOS/Linux)
kill -9 $(lsof -ti :8080)

Solution - Change Shannon ports: Edit docker-compose.yml to use different ports:

gateway:
  ports:
    - "8081:8080"  # Use 8081 instead of 8080

3. Insufficient system resources

Check Docker resources:

docker system df
docker stats

Solution - Increase Docker resources:

macOS: Docker Desktop → Preferences → Resources
- RAM: Minimum 8GB (16GB recommended)
- CPUs: Minimum 4 cores
- Disk: Minimum 20GB free

Linux: Edit Docker daemon config

sudo nano /etc/docker/daemon.json

{
  "default-ulimits": {
    "nofile": {
      "Name": "nofile",
      "Hard": 64000,
      "Soft": 64000
    }
  }
}

4. Missing .env file

Error: WARNING: The OPENAI_API_KEY variable is not setSolution:

# Create .env from template
make setup

# Or manually
cp .env.example .env

# Add your API keys
echo "OPENAI_API_KEY=sk-..." >> .env
echo "ANTHROPIC_API_KEY=sk-ant-..." >> .env

5. Python WASI interpreter missing

Error: python_wasi/bin/python3.11: No such file or directorySolution:

# Download and setup Python WASI (20MB)
./scripts/setup_python_wasi.sh

# Verify installation
ls -lh python_wasi/bin/python3.11

API & Connection Issues

401 Unauthorized

Symptoms:

HTTP 401 responses
“Unauthorized” error messages

Diagnosis:

# Check if auth is enabled
docker compose exec orchestrator env | grep GATEWAY_SKIP_AUTH

Solution 1: Disable authentication (development)

Edit .env:

GATEWAY_SKIP_AUTH=1  # 1 = disabled, 0 = enabled

Restart:

docker compose restart gateway

Test:

curl http://localhost:8080/api/v1/tasks
# Should work without X-API-Key header

Solution 2: Provide valid API key (production)

Request with API key:

curl -H "X-API-Key: sk_test_123456" \
  http://localhost:8080/api/v1/tasks

Python SDK:

from shannon import ShannonClient

client = ShannonClient(
    base_url="http://localhost:8080",
    api_key="sk_test_123456"
)

Connection Refused / Service Unavailable

Symptoms:

connection refused
dial tcp: connect: connection refused
Services not responding

Diagnosis:

# Check service status
docker compose ps

# Check specific service logs
docker compose logs orchestrator --tail=50
docker compose logs agent-core --tail=50
docker compose logs llm-service --tail=50

# Test endpoints
curl http://localhost:8080/health
curl http://localhost:50052  # Should fail - gRPC doesn't support HTTP GET

Solution 1: Services not ready

Wait for all services to initialize:

# Watch logs until services are ready
docker compose logs -f

# Look for these messages:
# orchestrator: "gRPC server listening on :50052"
# agent-core: "Server started on :50051"
# llm-service: "Uvicorn running on http://0.0.0.0:8000"
# gateway: "Gateway listening on :8080"

Typical startup time: 30-60 seconds

Solution 2: Service crashed

Check for crash errors:

docker compose logs orchestrator | grep -i error
docker compose logs orchestrator | grep -i fatal

Restart crashed service:

docker compose restart orchestrator
docker compose restart agent-core
docker compose restart llm-service

Full reset if needed:

docker compose down
docker compose up -d

Solution 3: Database connection failed

Check PostgreSQL:

docker compose logs postgres --tail=20

# Test connection
docker compose exec postgres psql -U shannon -d shannon -c "SELECT 1;"

Solution:

# Restart database
docker compose restart postgres

# Wait for it to be ready
docker compose exec postgres pg_isready -U shannon

Task Stuck in RUNNING or QUEUED State

Symptoms:

Task never completes
Status remains RUNNING for hours
No progress updates

Diagnosis:

# Check Temporal workflows
docker compose logs temporal --tail=100

# Check orchestrator worker
docker compose logs orchestrator | grep -i workflow

# View task in Temporal UI
open http://localhost:8088

Solution 1: LLM API key invalid or quota exceeded

Check LLM service logs:

docker compose logs llm-service | grep -i "api key\|unauthorized\|quota"

Solution:

# Verify API keys in .env
grep -E "OPENAI_API_KEY|ANTHROPIC_API_KEY" .env

# Test API key
curl https://api.openai.com/v1/models \
  -H "Authorization: Bearer $OPENAI_API_KEY"

# Update .env with valid key
nano .env

# Restart LLM service
docker compose restart llm-service

Solution 2: Temporal worker deadlock

Restart Temporal workers:

docker compose restart orchestrator

# Check workflow in Temporal UI
open http://localhost:8088
# Navigate to Workflows → Find your workflow → View execution history

Force workflow termination (last resort):

# In Temporal UI: Workflows → Select workflow → Terminate

Solution 3: Circuit breaker open

Check circuit breaker status:

docker compose logs orchestrator | grep -i "circuit"

Circuit breakers protect against cascading failures:

LLM Service circuit breaker
Database circuit breaker
Redis circuit breaker

Solution - Wait for automatic recovery (30-60 seconds) Or restart services:

docker compose restart orchestrator agent-core llm-service

Budget & Cost Issues

Budget Exceeded Errors

Symptoms:

budget exceeded error
Tasks fail with cost limit errors
HTTP 429 (Rate Limited) Payment Required

Diagnosis:

# Check budget configuration
docker compose exec orchestrator env | grep BUDGET
docker compose exec orchestrator env | grep MAX_COST

Solution 1: Increase budget limits

Edit .env:

MAX_COST_PER_REQUEST=1.00    # Increase from 0.50
MAX_TOKENS_PER_REQUEST=20000  # Increase from 10000

Restart:

docker compose restart orchestrator llm-service

Budgets are configured server-side via environment variables. The SDK does not accept per-request budget parameters.

Solution 2: Use simpler execution mode

# Instead of advanced mode
client.submit_task(query="...", # Mode auto-selected)

# Advanced → Standard → Simple (cheapest)

Cost comparison:

Simple: 1 LLM call, $0.01-0.05
Standard: 3-5 LLM calls, $0.05-0.20
Advanced: 10+ LLM calls, $0.20-1.00+

Solution 3: Disable budget enforcement (development only)

⚠️ Warning: Only for development/testingEdit .env:

LLM_DISABLE_BUDGETS=1  # Disable budget checks

Restart:

docker compose restart orchestrator llm-service

Performance Issues

Slow Response Times

Symptoms:

Tasks take 2-3x longer than expected
High latency
Timeouts

Diagnosis:

# Check resource usage
docker stats

# Check for slow queries
docker compose logs postgres | grep "duration:"

# Check Redis latency
docker compose exec redis redis-cli --latency

# Check Qdrant performance
curl http://localhost:6333/metrics

Solution 1: Insufficient CPU/Memory

Check resources:

docker stats
# Look for CPU > 80% or Memory near limit

Increase Docker resources:

macOS: Docker Desktop → Resources → increase RAM to 16GB, CPUs to 6
Linux: More powerful machine or reduce concurrent workflows

Tune worker concurrency in .env:

WORKER_ACT_CRITICAL=5   # Reduce from 10
WORKER_WF_CRITICAL=3     # Reduce from 5
TOOL_PARALLELISM=2       # Reduce from 5

Solution 2: Cold start / cache misses

First request is always slower (10-30s)Subsequent requests use caching:

LLM response cache (Redis)
Session context cache
Tool result cache

Solution: Warm up with a test request

curl -X POST http://localhost:8080/api/v1/tasks \
  -H "Content-Type: application/json" \
  -d '{"query": "Hello"}'

Solution 3: Database connection pool exhausted

Increase pool size in .env:

DB_MAX_OPEN_CONNS=50    # Increase from 25
DB_MAX_IDLE_CONNS=10    # Increase from 5

Restart:

docker compose restart orchestrator

Tokens > 0 but empty result

Symptoms:

Database or logs show non‑zero completion tokens, but the final result text is empty.
Complex prompts return nothing while simple prompts work.

Cause:

Some GPT‑5 chat responses return content as structured parts instead of a plain string. Older parsing could miss the text. This is fixed by routing GPT‑5 models via the Responses API and defensively normalizing content for chat responses.

Fix (Shannon ≥ 2025‑11‑05):

LLM Service routes GPT‑5 models to the Responses API and prefers output_text when available.
Chat providers normalize content by joining text parts when a list is returned.
If you upgraded from an older build, restart the LLM Service to clear cached empty responses.

Verify:

Re‑run a long, multi‑paragraph prompt. result length should be > 0 and session history should include the assistant message.

High Memory Usage

Symptoms:

OOM (Out of Memory) errors
Container restarts
Swap usage high

Diagnosis:

docker stats

# Check session cache size
docker compose logs orchestrator | grep "session.*cache"

Solution: Reduce cache sizes

Edit config/shannon.yaml or set env vars:

# Reduce session cache
SESSION_CACHE_SIZE=5000  # From 10000

# Reduce history
SESSION_MAX_HISTORY=250  # From 500

# Reduce LRU caches
TOOL_CACHE_SIZE=1000     # From 5000

Restart:

docker compose restart orchestrator agent-core

Data & State Issues

Sessions Not Persisting

Symptoms:

Session context lost between requests
Agent doesn’t remember previous tasks

Diagnosis:

# Check Redis connectivity
docker compose exec orchestrator nc -zv redis 6379

# Check session data
docker compose exec redis redis-cli KEYS "session:*"

Solution 1: Redis connection failed

Check Redis status:

docker compose ps redis
docker compose logs redis --tail=20

Restart Redis:

docker compose restart redis

Test connection:

docker compose exec redis redis-cli ping
# Should return "PONG"

Solution 2: Session expired (TTL)

Sessions expire after 30 days by defaultIncrease TTL in .env:

REDIS_TTL_SECONDS=7776000  # 90 days

Check session expiry:

docker compose exec redis redis-cli TTL "session:YOUR_SESSION_ID"
# Returns seconds until expiry, or -1 for no expiry

Solution 3: Using consistent session IDs

Provide a stable session_id explicitly:

session_id = "user-123-conversation"

handle1 = client.submit_task("Load data", session_id=session_id)
handle2 = client.submit_task("Analyze data", session_id=session_id)

Database Migration Errors

Symptoms:

Table doesn’t exist errors
Column not found errors
Schema version mismatch

Solution:

# Run migrations
docker compose exec orchestrator make migrate

# Or reset database (⚠️ DESTRUCTIVE)
docker compose down -v  # Remove volumes
docker compose up -d

Debugging Tools

Viewing Logs

# All services
docker compose logs -f

# Specific service
docker compose logs -f orchestrator
docker compose logs -f agent-core
docker compose logs -f llm-service

# Last N lines
docker compose logs --tail=100 orchestrator

# Search logs
docker compose logs orchestrator | grep -i error
docker compose logs orchestrator | grep "task_id=YOUR_TASK_ID"

Temporal UI

Access: http://localhost:8088 Features:

View all workflows
See execution history
Replay failed workflows
Terminate stuck workflows
Time-travel debugging

Usage:

Navigate to Workflows
Search by workflow ID (task ID)
View execution history to see where it failed
Check Activity logs for detailed errors

Prometheus Metrics

# Orchestrator metrics
curl http://localhost:2112/metrics

# Agent Core metrics
curl http://localhost:2113/metrics

# LLM Service metrics
curl http://localhost:8000/metrics

Key metrics:

tasks_submitted_total
tasks_completed_total
tasks_failed_total
llm_requests_total
circuit_breaker_state

Real-time Dashboard

Access: http://localhost:2111 Features:

Live task execution
Event streams
Token usage graphs
System health

Getting Help

Installation Guide

Detailed setup instructions

API Documentation

Complete API reference

Discord Community

Get help from the community

GitHub Issues

Report bugs or request features

Quick Reference Commands

# Health checks
curl http://localhost:8080/health
curl http://localhost:8000/health

# Service status
docker compose ps
docker stats

# Restart services
docker compose restart orchestrator
docker compose restart agent-core
docker compose restart llm-service

# View logs
docker compose logs -f orchestrator

# Full reset
docker compose down -v
docker compose up -d

# Database access
docker compose exec postgres psql -U shannon -d shannon

# Redis CLI
docker compose exec redis redis-cli

# Check environment
docker compose exec orchestrator env | grep -E "OPENAI|ANTHROPIC"

Getting Started

Core Concepts

Guides

Troubleshooting

Quick Diagnostics

Installation & Setup Issues

Docker Compose Fails to Start

API & Connection Issues

401 Unauthorized

Connection Refused / Service Unavailable

Task Stuck in RUNNING or QUEUED State

Budget & Cost Issues

Budget Exceeded Errors

Performance Issues

Slow Response Times

Tokens > 0 but empty result

High Memory Usage

Data & State Issues

Sessions Not Persisting

Database Migration Errors

Debugging Tools

Viewing Logs

Temporal UI

Prometheus Metrics

Real-time Dashboard

Getting Help

Installation Guide

API Documentation

Discord Community

GitHub Issues

Quick Reference Commands

Getting Started

Core Concepts

Guides

​Quick Diagnostics

​Installation & Setup Issues

​Docker Compose Fails to Start

​API & Connection Issues

​401 Unauthorized

​Connection Refused / Service Unavailable

​Task Stuck in RUNNING or QUEUED State

​Budget & Cost Issues

​Budget Exceeded Errors

​Performance Issues

​Slow Response Times

​Tokens > 0 but empty result

​High Memory Usage

​Data & State Issues

​Sessions Not Persisting

​Database Migration Errors

​Debugging Tools

​Viewing Logs

​Temporal UI

​Prometheus Metrics

​Real-time Dashboard

​Getting Help

Installation Guide

API Documentation

Discord Community

GitHub Issues

​Quick Reference Commands

Quick Diagnostics

Installation & Setup Issues

Docker Compose Fails to Start

API & Connection Issues

401 Unauthorized

Connection Refused / Service Unavailable

Task Stuck in RUNNING or QUEUED State

Budget & Cost Issues

Budget Exceeded Errors

Performance Issues

Slow Response Times

Tokens > 0 but empty result

High Memory Usage

Data & State Issues

Sessions Not Persisting

Database Migration Errors

Debugging Tools

Viewing Logs

Temporal UI

Prometheus Metrics

Real-time Dashboard

Getting Help

Quick Reference Commands