Overview
Shannon automatically selects the optimal LLM model for each task based on:- Task complexity (analyzed during decomposition)
- Explicit tier requests (
model_tierparameter) - Model/provider overrides (
model_override,provider_override) - Priority rankings (defined in
config/models.yaml) - Budget constraints and token limits
Model Tiers
Shannon organizes models into three tiers:| Tier | Target Usage | Characteristics | Cost Range |
|---|---|---|---|
| Small | 50% | Fast, cost-optimized, basic reasoning | $0.0001-0.0002/1K input |
| Medium | 40% | Balanced capability/cost | $0.002-0.006/1K input |
| Large | 10% | Heavy reasoning, complex tasks | $0.02-0.025/1K input |
Selection Flow
Priority Ranking
Within each tier, models are ranked by priority (lower number = higher priority). Shannon attempts models in priority order until one succeeds. Example fromconfig/models.yaml:
- If priority 1 fails (rate limit, API error), Shannon tries priority 2
- Continues until a model succeeds or all options exhausted
- Failures are logged to orchestrator logs
Parameter Precedence
When multiple parameters specify model selection, the precedence is:model_override(highest priority) → Forces specific modelprovider_override→ Limits to one provider’s modelsmodel_tier→ Uses requested tier- Auto-detected complexity (lowest priority) → Default behavior
Top-Level vs Context Parameters
Top-level parameters always override context parameters:Usage Examples
Auto-Selection (Default)
gpt-5-nano-2025-08-07 (priority 1)
Force Specific Tier
gpt-4.1-2025-04-14 (priority 1 in large tier)
Override to Specific Model
Force Provider
claude-sonnet-4-5-20250929
Python SDK Examples
Cost Optimization Strategies
1. Start Small, Escalate if Needed
2. Provider-Specific Optimization
3. Session-Based Escalation
Complexity Analysis
Shannon analyzes task complexity using several factors:- Query length and specificity
- Number of sub-tasks identified
- Tool usage requirements
- Context depth needed
- Reasoning intensity (keywords like “analyze”, “compare”, “synthesize”)
< 0.3→ Small tier (simple Q&A, basic tasks)0.3 - 0.7→ Medium tier (multi-step, moderate reasoning)> 0.7→ Large tier (complex research, heavy reasoning)
Monitoring & Debugging
Check Which Model Was Used
Prometheus Metrics
Orchestrator Logs
"Model selected: gpt-5-nano-2025-08-07 (small tier, priority 1)""Falling back to priority 2: claude-haiku-4-5-20251001""Tier override: user requested large → using gpt-4.1-2025-04-14"
Configuration
Model tiers and priorities are defined inconfig/models.yaml:
priority(default): Try models in priority orderround-robin: Distribute load evenly across same-priority modelsleast-cost: Always select cheapest model in tier
Troubleshooting
Issue: Wrong tier selected
Symptoms: Task uses medium tier when you expected small Solutions:- Explicitly set
model_tier: "small"in request - Check complexity score in orchestrator logs
- Verify query isn’t triggering complexity heuristics (avoid words like “analyze deeply”)
Issue: Specific model not used
Symptoms: Requestmodel_override: "gpt-5-pro" but gets different model
Solutions:
- Verify model is in
config/models.yamlundermodel_catalog - Check API key for provider is set in
.env - Verify model ID uses canonical name (not alias)
- Check orchestrator logs for fallback messages
Issue: High costs
Symptoms: Costs higher than expected Solutions:- Check actual tier distribution via Prometheus
- Add explicit
model_tier: "small"to requests - Review
shannon_tier_drift_totalfor unwanted escalations - Set
MAX_COST_PER_REQUESTin.envto enforce budget
Issue: Rate limiting
Symptoms: Frequent 429 errors, slow fallback cascade Solutions:- Add more providers to tier priority list
- Enable
round-robinmode to distribute load - Increase
RATE_LIMIT_WINDOWfor affected providers - Consider cheaper providers (DeepSeek, Groq) as fallbacks
Best Practices
- Default to Auto-Selection: Let Shannon’s complexity analysis work
- Override Sparingly: Use
model_overrideonly when required - Start Small: Set
model_tier: "small"for cost-sensitive workloads - Monitor Distribution: Track tier usage via metrics
- Configure Fallbacks: Ensure each tier has 3+ providers
- Test Priority Order: Verify your preferred models are priority 1
- Budget Enforcement: Set
MAX_COST_PER_REQUESTfor safety