Skip to main content

System Architecture

Shannon is built as a distributed microservices system designed for production AI agent orchestration:

Core Components

Gateway (Port 8080)

Technology: Go Purpose: REST API layer for external clients The Gateway provides:
  • HTTP/JSON API interface
  • Authentication and authorization (API keys)
  • Rate limiting per user
  • Idempotency support
  • SSE and WebSocket streaming
  • OpenAPI specification

Key Feature

Authentication is disabled by default for easy adoption. Enable it in production with GATEWAY_SKIP_AUTH=0.

Orchestrator (Port 50052)

Technology: Go + Temporal Purpose: Central workflow coordination The Orchestrator handles:
  • Task routing and decomposition
  • Cognitive pattern selection (CoT, ToT, ReAct)
  • Budget and token usage enforcement
  • Session management
  • OPA policy evaluation
  • Multi-agent coordination
Key Technology: Temporal - Provides durable, deterministic workflows that can be replayed for debugging.

Agent Core (Port 50051)

Technology: Rust Purpose: Secure execution layer The Agent Core provides:
  • WASI (WebAssembly System Interface) sandboxing
  • Secure Python code execution (CPython 3.11 in WASI)
  • Tool registry and execution
  • Result caching (LRU with TTL)
  • Circuit breakers and rate limiting

Security First

All code execution happens in a WASI sandbox with no network access and read-only filesystem.

LLM Service (Port 8000)

Technology: Python + FastAPI Purpose: Multi-provider LLM gateway The LLM Service handles:
  • Multi-provider abstraction (OpenAI, Anthropic, Google, etc.)
  • Intelligent caching with SHA256-based deduplication
  • MCP (Model Context Protocol) tool integration
  • Web search integration (Exa, Perplexity, etc.)
  • Embeddings and document chunking

Data Flow

Here’s how a task flows through Shannon:
1

Task Submission

Client submits task via REST API to Gateway
2

Workflow Creation

Gateway forwards to Orchestrator, which creates a Temporal workflow
3

Pattern Selection

Orchestrator analyzes task complexity and selects cognitive pattern
4

Task Decomposition

For complex tasks, breaks into subtasks and creates DAG (Directed Acyclic Graph)
5

Agent Execution

Orchestrator invokes Agent Core for each subtask
6

LLM Calls

Agent Core calls LLM Service, which routes to appropriate provider
7

Tool Execution

If needed, executes tools in WASI sandbox or calls external APIs
8

Result Synthesis

Orchestrator combines results from all agents
9

Response

Final result returned to client via Gateway

Persistence Layer

PostgreSQL

Stores:
  • Task metadata and execution history
  • Session state and context
  • User and API key data
  • Workflow history
Schema: Includes pgvector extension for vector operations

Redis

Provides:
  • Session caching (TTL: 3600s)
  • LLM response caching
  • Rate limiter state
  • Pub/sub for events

Qdrant

Vector database for:
  • Semantic memory retrieval
  • Session-scoped vector collections
  • MMR (Maximal Marginal Relevance) for diversity

Observability

Shannon includes comprehensive observability:

Metrics (Prometheus)

Each service exposes metrics:
  • Orchestrator: :2112/metrics
  • Agent Core: :2113/metrics
  • LLM Service: :8000/metrics
Metrics include:
  • Request rates and latency
  • Token usage and costs
  • Cache hit/miss rates
  • Error rates by type
  • Circuit breaker status

Tracing (OpenTelemetry)

Distributed tracing across all services with context propagation via traceparent headers.

Dashboard (Port 2111)

Real-time React/Next.js dashboard showing:
  • Active tasks and workflows
  • Event streams
  • System metrics
  • Token usage graphs

Temporal UI (Port 8088)

Native Temporal interface for:
  • Workflow visualization
  • Execution history
  • Replay debugging
  • Worker status

Design Principles

1. Reliability

  • Temporal workflows ensure durability - workflows survive service restarts
  • Circuit breakers prevent cascading failures
  • Graceful degradation when services are unavailable

2. Security

  • WASI sandboxing isolates untrusted code execution
  • OPA policies enforce fine-grained access control
  • Multi-tenancy with tenant isolation

3. Cost Control

  • Token budgets prevent runaway costs
  • Intelligent routing to cheaper models when appropriate
  • Learning router improves cost efficiency over time (85-95% savings)

4. Observability

  • Prometheus metrics for monitoring
  • OpenTelemetry tracing for debugging
  • Deterministic replay via Temporal

Scalability

Shannon scales horizontally:
  • Stateless services: Gateway, Orchestrator, Agent Core can scale independently
  • Temporal workers: Add more workers to increase throughput
  • Database: PostgreSQL with read replicas, Redis cluster, Qdrant distributed mode

Next Steps