Skip to content

Model Routing

KodaCode treats model selection as runtime configuration.

config.yaml uses three main knobs:

  • model.primary: the default provider/model selection for new turns
  • utility_model: a cheaper model for runtime utility text work such as title generation and utility compaction summaries
  • workflow.review_model.primary: an optional model for reviewer turns

Example:

version: 1
providers:
- id: openai
- id: anthropic
model:
primary: openai/gpt-5
utility_model: openai/gpt-5-mini
workflow:
review_model:
primary: openai/gpt-5-mini

Built-in providers use stable IDs such as:

  • openai
  • openai-codex
  • anthropic
  • google
  • nvidia
  • github-copilot
  • deepseek

openai and openai-codex are intentionally separate:

Provider IDAccount typeModel ref example
openaiOpenAI Platform API keyopenai/gpt-5
openai-codexChatGPT/Codex OAuth account from /connectopenai-codex/gpt-5

If both account types are configured, both providers can appear in the model picker. If only a ChatGPT/Codex OAuth account is configured, stored openai/... model refs are normalized to openai-codex/... at runtime so existing selections continue to route.

Model catalog availability for openai-codex comes from the ChatGPT/Codex account endpoint. models.dev metadata under openai is used only to enrich matching Codex model IDs with context, output, capability, and pricing metadata; it does not copy OpenAI Platform model availability into openai-codex.

The /connect dialog includes native presets for OpenAI, Anthropic, Google, NVIDIA, GitHub Copilot, and DeepSeek. It also includes OpenAI-compatible presets for QwenCloud, OpenRouter, Together AI, Groq, Fireworks AI, Mistral, Cerebras, Deep Infra, Moonshot AI, Venice AI, Z.AI, Ollama Cloud, and a custom compatible provider. Compatible providers are stored as providers entries with an id and base_url.

model_overrides lets you pin local metadata such as:

  • display name
  • context size
  • input and output token caps
  • default requested output tokens for ordinary agent turns
  • reasoning, vision, or tool-call support flags
  • input and output pricing

That keeps the runtime UI and cost estimation usable even when provider metadata is incomplete.

QwenCloud is OpenAI-compatible, but its /models endpoint only returns basic model records such as id, object, created, and owned_by. It does not return context windows, max output limits, thinking budgets, tool-call support, or structured-output support. models.dev also does not currently publish a QwenCloud provider catalog.

Because neither source can supply full QwenCloud capabilities, the best current path is to configure QwenCloud model metadata with model_overrides.

Example overrides for common QwenCloud coding and agent models:

providers:
- id: qwencloud
base_url: https://dashscope-intl.aliyuncs.com/compatible-mode/v1
model:
primary: qwencloud/qwen3.6-plus
model_overrides:
- ref: qwencloud/qwen3.7-max
name: Qwen3.7 Max
context_size: 1000000
max_input_tokens: 1000000
max_output_tokens: 64000
default_output_tokens: 16000
reasoning: true
tool_calls: true
vision: true
- ref: qwencloud/qwen3.6-plus
name: Qwen3.6 Plus
context_size: 1000000
max_input_tokens: 1000000
max_output_tokens: 64000
default_output_tokens: 16000
reasoning: true
tool_calls: true
vision: true
- ref: qwencloud/qwen3.6-flash
name: Qwen3.6 Flash
context_size: 1000000
max_input_tokens: 1000000
max_output_tokens: 64000
default_output_tokens: 16000
reasoning: true
tool_calls: true
vision: true

QwenCloud documents thinking budget separately from visible max output. For example, Qwen3.7 Max lists 64k max output with a larger thinking budget, while Qwen3.6 Plus and Flash list 64k max output with their own thinking budgets. KodaCode’s current model_overrides surface models visible output with max_output_tokens; do not add thinking budget to that value.

Inside the TUI:

  • /model opens the model picker
  • /utility-model opens the utility model picker
  • /reviewer-model opens the reviewer model picker
  • /thinking toggles model thinking when supported; new sessions start with thinking off
  • /variant changes the thinking level when supported

Remote model catalogs are cached with model_cache.expiry_days (default: 7); local providers still refresh on startup.