Model Routing

KodaCode treats model selection as runtime configuration.

Core routing keys

config.yaml uses three main knobs:

model.primary: the default provider/model selection for new turns
utility_model: a cheaper model for runtime utility text work such as title generation and utility compaction summaries
workflow.review_model.primary: an optional model for reviewer turns
workflow YAML model: an optional workflow or phase primary model override

Example:

version: 1

providers:
  - id: openai
  - id: anthropic

model:
  primary: openai/gpt-5

utility_model: openai/gpt-5-mini

workflow:
  review_model:
    primary: openai/gpt-5-mini

Workflow files can also declare model routing:

id: team-delivery
model: openai/gpt-5-mini
phases:
  - id: implement
    agent: engineer
    model: openai/gpt-5

Phase model wins over workflow model. When neither is declared, KodaCode uses the selected agent, session, and global config routing. For reviewer phases, a declared workflow or phase model wins over workflow.review_model; otherwise workflow.review_model remains the reviewer fallback.

Provider shape

Built-in providers use stable IDs such as:

openai
openai-codex
anthropic
google
nvidia
github-copilot
deepseek

openai and openai-codex are intentionally separate:

Provider ID	Account type	Model ref example
`openai`	OpenAI Platform API key	`openai/gpt-5`
`openai-codex`	ChatGPT/Codex OAuth account from `/connect`	`openai-codex/gpt-5`

If both account types are configured, both providers can appear in the model picker. If only a ChatGPT/Codex OAuth account is configured, stored openai/... model refs are normalized to openai-codex/... at runtime so existing selections continue to route.

Model catalog availability for openai-codex comes from the ChatGPT/Codex account endpoint. models.dev metadata under openai is used only to enrich matching Codex model IDs with context, output, capability, and pricing metadata; it does not copy OpenAI Platform model availability into openai-codex.

The /connect dialog includes native presets for OpenAI, Anthropic, Google, NVIDIA, GitHub Copilot, and DeepSeek. It also includes OpenAI-compatible presets for QwenCloud, OpenRouter, Together AI, Groq, Fireworks AI, Mistral, Cerebras, Deep Infra, Moonshot AI, Venice AI, Z.AI, Ollama Cloud, and a custom compatible provider. Compatible providers are stored as providers entries with an id and base_url.

Per-model metadata

model_overrides lets you pin local metadata such as:

display name
context size
input and output token caps
default requested output tokens for ordinary agent turns
reasoning, vision, or tool-call support flags
input and output pricing

That keeps the runtime UI and cost estimation usable even when provider metadata is incomplete.

QwenCloud model metadata

QwenCloud is OpenAI-compatible, but its /models endpoint only returns basic model records such as id, object, created, and owned_by. It does not return context windows, max output limits, thinking budgets, tool-call support, or structured-output support. models.dev also does not currently publish a QwenCloud provider catalog.

Because neither source can supply full QwenCloud capabilities, the best current path is to configure QwenCloud model metadata with model_overrides.

Example overrides for common QwenCloud coding and agent models:

providers:
  - id: qwencloud
    base_url: https://dashscope-intl.aliyuncs.com/compatible-mode/v1

model:
  primary: qwencloud/qwen3.6-plus

model_overrides:
  - ref: qwencloud/qwen3.7-max
    name: Qwen3.7 Max
    context_size: 1000000
    max_input_tokens: 1000000
    max_output_tokens: 64000
    default_output_tokens: 16000
    reasoning: true
    tool_calls: true
    vision: true

  - ref: qwencloud/qwen3.6-plus
    name: Qwen3.6 Plus
    context_size: 1000000
    max_input_tokens: 1000000
    max_output_tokens: 64000
    default_output_tokens: 16000
    reasoning: true
    tool_calls: true
    vision: true

  - ref: qwencloud/qwen3.6-flash
    name: Qwen3.6 Flash
    context_size: 1000000
    max_input_tokens: 1000000
    max_output_tokens: 64000
    default_output_tokens: 16000
    reasoning: true
    tool_calls: true
    vision: true

QwenCloud documents thinking budget separately from visible max output. For example, Qwen3.7 Max lists 64k max output with a larger thinking budget, while Qwen3.6 Plus and Flash list 64k max output with their own thinking budgets. KodaCode’s current model_overrides surface models visible output with max_output_tokens; do not add thinking budget to that value.

Live switching

Inside the TUI:

/model opens the model picker
/utility-model opens the utility model picker
/reviewer-model opens the reviewer model picker
/thinking toggles model thinking when supported; new sessions start with thinking off
/variant changes the thinking level when supported

Remote model catalogs are cached with model_cache.expiry_days (default: 7); local providers still refresh on startup.