Model Routing
KodaCode treats model selection as runtime configuration.
Core routing keys
Section titled “Core routing keys”config.yaml uses three main knobs:
model.primary: the defaultprovider/modelselection for new turnsutility_model: a cheaper model for runtime utility text work such as title generation and utility compaction summariesworkflow.review_model.primary: an optional model for reviewer turns
Example:
version: 1
providers: - id: openai - id: anthropic
model: primary: openai/gpt-5
utility_model: openai/gpt-5-mini
workflow: review_model: primary: openai/gpt-5-miniProvider shape
Section titled “Provider shape”Built-in providers use stable IDs such as:
openaiopenai-codexanthropicgooglenvidiagithub-copilotdeepseek
openai and openai-codex are intentionally separate:
| Provider ID | Account type | Model ref example |
|---|---|---|
openai | OpenAI Platform API key | openai/gpt-5 |
openai-codex | ChatGPT/Codex OAuth account from /connect | openai-codex/gpt-5 |
If both account types are configured, both providers can appear in the model
picker. If only a ChatGPT/Codex OAuth account is configured, stored openai/...
model refs are normalized to openai-codex/... at runtime so existing
selections continue to route.
Model catalog availability for openai-codex comes from the ChatGPT/Codex
account endpoint. models.dev metadata under openai is used only to enrich
matching Codex model IDs with context, output, capability, and pricing metadata;
it does not copy OpenAI Platform model availability into openai-codex.
The /connect dialog includes native presets for OpenAI, Anthropic, Google,
NVIDIA, GitHub Copilot, and DeepSeek. It also includes OpenAI-compatible
presets for QwenCloud, OpenRouter, Together AI, Groq, Fireworks AI, Mistral,
Cerebras, Deep Infra, Moonshot AI, Venice AI, Z.AI, Ollama Cloud, and a custom
compatible provider. Compatible providers are stored as providers entries
with an id and base_url.
Per-model metadata
Section titled “Per-model metadata”model_overrides lets you pin local metadata such as:
- display name
- context size
- input and output token caps
- default requested output tokens for ordinary agent turns
- reasoning, vision, or tool-call support flags
- input and output pricing
That keeps the runtime UI and cost estimation usable even when provider metadata is incomplete.
QwenCloud model metadata
Section titled “QwenCloud model metadata”QwenCloud is OpenAI-compatible, but its /models endpoint only returns basic
model records such as id, object, created, and owned_by. It does not
return context windows, max output limits, thinking budgets, tool-call support,
or structured-output support. models.dev also does not currently publish a
QwenCloud provider catalog.
Because neither source can supply full QwenCloud capabilities, the best current
path is to configure QwenCloud model metadata with model_overrides.
Example overrides for common QwenCloud coding and agent models:
providers: - id: qwencloud base_url: https://dashscope-intl.aliyuncs.com/compatible-mode/v1
model: primary: qwencloud/qwen3.6-plus
model_overrides: - ref: qwencloud/qwen3.7-max name: Qwen3.7 Max context_size: 1000000 max_input_tokens: 1000000 max_output_tokens: 64000 default_output_tokens: 16000 reasoning: true tool_calls: true vision: true
- ref: qwencloud/qwen3.6-plus name: Qwen3.6 Plus context_size: 1000000 max_input_tokens: 1000000 max_output_tokens: 64000 default_output_tokens: 16000 reasoning: true tool_calls: true vision: true
- ref: qwencloud/qwen3.6-flash name: Qwen3.6 Flash context_size: 1000000 max_input_tokens: 1000000 max_output_tokens: 64000 default_output_tokens: 16000 reasoning: true tool_calls: true vision: trueQwenCloud documents thinking budget separately from visible max output. For
example, Qwen3.7 Max lists 64k max output with a larger thinking budget,
while Qwen3.6 Plus and Flash list 64k max output with their own thinking
budgets. KodaCode’s current model_overrides surface models visible output
with max_output_tokens; do not add thinking budget to that value.
Live switching
Section titled “Live switching”Inside the TUI:
/modelopens the model picker/utility-modelopens the utility model picker/reviewer-modelopens the reviewer model picker/thinkingtoggles model thinking when supported; new sessions start with thinking off/variantchanges the thinking level when supported
Remote model catalogs are cached with model_cache.expiry_days (default: 7); local providers still refresh on startup.