System Architecture
Shannon is built as a distributed microservices system designed for production AI agent orchestration:Core Components
Gateway (Port 8080)
Technology: Go Purpose: REST API layer for external clients The Gateway provides:- HTTP/JSON API interface
- Authentication and authorization (API keys)
- Rate limiting per user
- Idempotency support
- SSE and WebSocket streaming
- OpenAPI specification
Key Feature
Authentication is disabled by default for easy adoption. Enable it in production with
GATEWAY_SKIP_AUTH=0.Orchestrator (Port 50052)
Technology: Go + Temporal Purpose: Central workflow coordination The Orchestrator handles:- Task routing and decomposition
- Cognitive pattern selection (CoT, ToT, ReAct)
- Budget and token usage enforcement
- Session management
- OPA policy evaluation
- Multi-agent coordination
Agent Core (Port 50051)
Technology: Rust Purpose: Secure execution layer The Agent Core provides:- WASI (WebAssembly System Interface) sandboxing
- Secure Python code execution (CPython 3.11 in WASI)
- Tool registry and execution
- Result caching (LRU with TTL)
- Circuit breakers and rate limiting
Security First
All code execution happens in a WASI sandbox with no network access and read-only filesystem.
LLM Service (Port 8000)
Technology: Python + FastAPI Purpose: Multi-provider LLM gateway The LLM Service handles:- Multi-provider abstraction (OpenAI, Anthropic, Google, etc.)
- Intelligent caching with SHA256-based deduplication
- MCP (Model Context Protocol) tool integration
- Web search integration (Exa, Perplexity, etc.)
- Embeddings and document chunking
Data Flow
Here’s how a task flows through Shannon:1
Task Submission
Client submits task via REST API to Gateway
2
Workflow Creation
Gateway forwards to Orchestrator, which creates a Temporal workflow
3
Pattern Selection
Orchestrator analyzes task complexity and selects cognitive pattern
4
Task Decomposition
For complex tasks, breaks into subtasks and creates DAG (Directed Acyclic Graph)
5
Agent Execution
Orchestrator invokes Agent Core for each subtask
6
LLM Calls
Agent Core calls LLM Service, which routes to appropriate provider
7
Tool Execution
If needed, executes tools in WASI sandbox or calls external APIs
8
Result Synthesis
Orchestrator combines results from all agents
9
Response
Final result returned to client via Gateway
Persistence Layer
PostgreSQL
Stores:- Task metadata and execution history
- Session state and context
- User and API key data
- Workflow history
Redis
Provides:- Session caching (TTL: 3600s)
- LLM response caching
- Rate limiter state
- Pub/sub for events
Qdrant
Vector database for:- Semantic memory retrieval
- Session-scoped vector collections
- MMR (Maximal Marginal Relevance) for diversity
Observability
Shannon includes comprehensive observability:Metrics (Prometheus)
Each service exposes metrics:- Orchestrator:
:2112/metrics - Agent Core:
:2113/metrics - LLM Service:
:8000/metrics
- Request rates and latency
- Token usage and costs
- Cache hit/miss rates
- Error rates by type
- Circuit breaker status
Tracing (OpenTelemetry)
Distributed tracing across all services with context propagation viatraceparent headers.
Dashboard (Port 2111)
Real-time React/Next.js dashboard showing:- Active tasks and workflows
- Event streams
- System metrics
- Token usage graphs
Temporal UI (Port 8088)
Native Temporal interface for:- Workflow visualization
- Execution history
- Replay debugging
- Worker status
Design Principles
1. Reliability
- Temporal workflows ensure durability - workflows survive service restarts
- Circuit breakers prevent cascading failures
- Graceful degradation when services are unavailable
2. Security
- WASI sandboxing isolates untrusted code execution
- OPA policies enforce fine-grained access control
- Multi-tenancy with tenant isolation
3. Cost Control
- Token budgets prevent runaway costs
- Intelligent routing to cheaper models when appropriate
- Learning router improves cost efficiency over time (85-95% savings)
4. Observability
- Prometheus metrics for monitoring
- OpenTelemetry tracing for debugging
- Deterministic replay via Temporal
Scalability
Shannon scales horizontally:- Stateless services: Gateway, Orchestrator, Agent Core can scale independently
- Temporal workers: Add more workers to increase throughput
- Database: PostgreSQL with read replicas, Redis cluster, Qdrant distributed mode