Overview
Shannon provides comprehensive cost control features to prevent unexpected LLM charges and optimize spending. With built-in budget enforcement and intelligent routing, teams often see significant cost reductions (60–90%) versus naive implementations (workload‑dependent).Setting Budgets
Budgets are configured at the platform level (not per request via REST). Use environment variables in.env:
Shannon enforces budgets during execution. When limits are reached, the system halts further spending and returns the best available result or an error depending on context.
Model Tiers
Shannon categorizes models into tiers based on capability and cost:| Tier | Models | Cost per 1M tokens | Use Case |
|---|---|---|---|
| SMALL | gpt-5-mini claude-haiku | 0.25 | Simple queries, high volume |
| MEDIUM | gpt-5 claude-sonnet | 15.00 | General purpose tasks |
| LARGE | gpt-5-thinking claude-opus | 75.00 | Complex reasoning, critical tasks |
Explicit Tier Preference
Set a preferred default tier via environment variable:Intelligent Router
Shannon’s learning router automatically selects the cheapest model capable of handling each task.How It Works
- Task Analysis: Analyzes complexity, required capabilities
- Model Selection: Starts with smallest viable model
- Quality Check: Validates output quality
- Learning: Remembers successful model-task pairings
Cost Savings
Monitoring Router Decisions
Use the dashboard or SDK status to review costs:Response Caching
Shannon caches LLM responses to eliminate redundant API calls:Cache Strategy
- Key: SHA256 hash of
(messages + model + parameters) - TTL: Configurable (often ~1 hour via Redis TTL)
- Storage: In-memory LRU + optional Redis for distributed caching
- Hit Rate: Typical 30-50% for production workloads
Cache Benefits
Monitoring Cache Performance
Provider Rate Limits
Shannon respects provider rate limits automatically:Configured Limits
Fromconfig/models.yaml:
Automatic Throttling
When approaching limits:- Queues requests
- Spreads load over time
- Falls back to alternative providers if available
Cost Monitoring
Track Spending Per Task
Aggregate Metrics
Shannon tracks cumulative costs in the dashboard:- Total spend by day/week/month
- Cost per user/team
- Cost per cognitive pattern
- Token usage trends
Best Practices
1. Always Set Budgets
Never run production tasks without budget limits:2. Use Simple Mode When Possible
Complex patterns cost more:3. Leverage Caching
For repeated queries, use consistent phrasing to maximize cache hits:4. Monitor and Optimize
Review cost metrics regularly:5. Use Smaller Models First
Let the intelligent router prove when larger models are needed:Cost Optimization Checklist
Optimization Checklist
Optimization Checklist
- Set budget env vars (
MAX_COST_PER_REQUEST,MAX_TOKENS_PER_REQUEST) - Use
# Mode auto-selectedfor straightforward queries - Enable response caching (default: enabled)
- Use
model_tier="SMALL"when appropriate - Standardize query phrasing for cache hits
- Monitor cost metrics in dashboard
- Set up budget alerts (via Prometheus)
- Review and optimize prompt templates
- Use session context to reduce token usage
- Enable learning router (default: enabled)