GET /providers/models

Endpoint

GET http://localhost:8000/providers/models

Description

Returns all models currently configured in Shannon, organized by provider. This endpoint queries the Python LLM service directly and reflects the models defined in config/models.yaml.

Authentication

Required: No (internal service endpoint) For production deployments, access should be restricted to internal networks only.

Request

Query Parameters

Parameter	Type	Required	Description
`tier`	string	No	Filter by tier: `small`, `medium`, or `large`

Headers

None required for internal access.

Response

Success Response

Status: 200 OK Body:

{
  "openai": [
    {
      "id": "gpt-5-nano-2025-08-07",
      "name": "gpt-5-nano-2025-08-07",
      "tier": "small",
      "context_window": 128000,
      "cost_per_1k_prompt_tokens": 0.0001,
      "cost_per_1k_completion_tokens": 0.0004,
      "supports_tools": true,
      "supports_streaming": true,
      "available": true
    }
  ],
  "anthropic": [
    {
      "id": "claude-sonnet-4-5-20250929",
      "name": "claude-sonnet-4-5-20250929",
      "tier": "medium",
      "context_window": 200000,
      "cost_per_1k_prompt_tokens": 0.003,
      "cost_per_1k_completion_tokens": 0.015,
      "supports_tools": true,
      "supports_streaming": true,
      "available": true
    }
  ]
}

Response Structure

Response is organized by provider, with each provider returning an array of model objects:

Field	Type	Description
`id`	string	Model identifier (canonical name)
`name`	string	Display name (same as id)
`tier`	string	Size tier: `small`, `medium`, or `large`
`context_window`	integer	Maximum context length in tokens
`cost_per_1k_prompt_tokens`	float	Cost per 1K input tokens (USD)
`cost_per_1k_completion_tokens`	float	Cost per 1K output tokens (USD)
`supports_tools`	boolean	Function calling support
`supports_streaming`	boolean	Real-time streaming support
`available`	boolean	Currently available for use

Examples

List All Models

curl http://localhost:8000/providers/models | jq

Filter by Tier

# Small models only
curl "http://localhost:8000/providers/models?tier=small" | jq

# Large models only
curl "http://localhost:8000/providers/models?tier=large" | jq

Python Example

import httpx

# Get all models
response = httpx.get("http://localhost:8000/providers/models")
models = response.json()

# Print OpenAI models
for model in models.get("openai", []):
    print(f"{model['id']} - {model['tier']} - ${model['cost_per_1k_prompt_tokens']:.4f}/1K")

# Filter small tier
response = httpx.get("http://localhost:8000/providers/models?tier=small")
small_models = response.json()

Model Tiers

Models are organized into three tiers based on capability and cost:

Small Tier (Priority for 50% of workload)

Fast, cost-optimized models for basic tasks:

OpenAI: gpt-5-nano-2025-08-07
Anthropic: claude-haiku-4-5-20251001
xAI: grok-4-fast-non-reasoning
Google: gemini-2.5-flash-lite
DeepSeek: deepseek-chat

Medium Tier (Priority for 40% of workload)

Balanced capability/cost models:

OpenAI: gpt-5-2025-08-07
Anthropic: claude-sonnet-4-5-20250929
xAI: grok-4
Google: gemini-2.5-flash
Meta: llama-4-scout

Large Tier (Priority for 10% of workload)

Heavy reasoning models for complex tasks:

OpenAI: gpt-4.1-2025-04-14, gpt-5-pro-2025-10-06
Anthropic: claude-opus-4-1-20250805
Google: gemini-2.5-pro
DeepSeek: deepseek-r1
xAI: grok-4-fast-reasoning

Configuration Source

Models are defined in config/models.yaml under model_catalog:

model_catalog:
  openai:
    gpt-5-nano-2025-08-07:
      model_id: gpt-5-nano-2025-08-07
      tier: small
      context_window: 128000
      max_tokens: 4096
      supports_functions: true
      supports_streaming: true

Pricing is centralized under pricing.models:

pricing:
  models:
    openai:
      gpt-5-nano-2025-08-07:
        input_per_1k: 0.0001
        output_per_1k: 0.0004

Use Cases

1. Discover Available Models

curl http://localhost:8000/providers/models | jq 'keys'
# ["anthropic", "openai", "google", "xai", ...]

2. Check Pricing

curl http://localhost:8000/providers/models | \
  jq '.openai[] | {id, input: .cost_per_1k_prompt_tokens, output: .cost_per_1k_completion_tokens}'

3. Verify API Key Configuration

# If a provider returns empty array, API key may be missing
curl http://localhost:8000/providers/models | jq '.anthropic | length'

4. Build Model Selector UI

const response = await fetch('http://localhost:8000/providers/models?tier=small');
const models = await response.json();

// Populate dropdown
Object.entries(models).forEach(([provider, modelList]) => {
  modelList.forEach(model => {
    dropdown.add(new Option(`${provider}: ${model.id}`, model.id));
  });
});

Notes

Static Configuration: Models are loaded from config/models.yaml, not dynamically discovered from provider APIs
Hot Reload: Changes to models.yaml require service restart to take effect
Empty Providers: If a provider returns [], check that the API key is set in .env
Pricing Centralization: All costs come from pricing section in YAML, ensuring consistency across Go/Rust/Python services
Internal Endpoint: This endpoint is on the LLM service (port 8000), not the Gateway API (port 8080)

Environment Variables

Override model selections with environment variables:

# Stage-specific overrides
COMPLEXITY_MODEL_ID=gpt-5-mini-2025-08-07
DECOMPOSITION_MODEL_ID=gpt-5-2025-08-07
DEFAULT_MODEL_TIER=small

See Configuration Guide for full list.

Troubleshooting

Empty provider arrays

Verify API key is set: OPENAI_API_KEY, ANTHROPIC_API_KEY, etc.
Check config/models.yaml has entries under model_catalog.<provider>

Missing models

Ensure MODELS_CONFIG_PATH points to correct file
Verify YAML syntax is valid
Check for typos in model IDs

Incorrect pricing

Pricing comes from pricing.models.<provider> section
Update config/models.yaml and restart services
Verify Go/Rust services also read same config file

Model Selection Guide

How tier routing and fallback works

Configuration

Environment variables and config files

Submit Task

Use model_tier or model_override

Centralized Pricing

Pricing architecture details

Overview

Authentication & Headers

Tasks

Sessions

Streaming

Models

GET /providers/models

Endpoint

Description

Authentication

Request

Query Parameters

Headers

Response

Success Response

Response Structure

Examples

List All Models

Filter by Tier

Python Example

Model Tiers

Small Tier (Priority for 50% of workload)

Medium Tier (Priority for 40% of workload)

Large Tier (Priority for 10% of workload)

Configuration Source

Use Cases

Notes

Environment Variables

Troubleshooting

Model Selection Guide

Configuration

Submit Task

Centralized Pricing

Overview

Authentication & Headers

Tasks

Sessions

Streaming

Models

​Endpoint

​Description

​Authentication

​Request

​Query Parameters

​Headers

​Response

​Success Response

​Response Structure

​Examples

​List All Models

​Filter by Tier

​Python Example

​Model Tiers

​Small Tier (Priority for 50% of workload)

​Medium Tier (Priority for 40% of workload)

​Large Tier (Priority for 10% of workload)

​Configuration Source

​Use Cases

​Notes

​Environment Variables

​Troubleshooting

​Related Documentation

Model Selection Guide

Configuration

Submit Task

Centralized Pricing

Endpoint

Description

Authentication

Request

Query Parameters

Headers

Response

Success Response

Response Structure

Examples

List All Models

Filter by Tier

Python Example

Model Tiers

Small Tier (Priority for 50% of workload)

Medium Tier (Priority for 40% of workload)

Large Tier (Priority for 10% of workload)

Configuration Source

Use Cases

Notes

Environment Variables

Troubleshooting

Related Documentation