Multi-Provider AI
KodaCode supports multiple AI providers simultaneously. Switch models mid-session, configure fallback chains, and route cheap tasks to utility models.
Supported Providers
Section titled “Supported Providers”| Provider | Auth Methods | Notable Models |
|---|---|---|
| Anthropic | API key, OAuth | Claude Opus 4.6, Sonnet 4.6, Haiku 4.5 |
| OpenAI | API key, OAuth | GPT-4o, GPT-4.1, o1 |
| Google Gemini | API key, OAuth | Gemini 2.5 Flash, Gemini 2.5 Pro |
| GitHub Copilot | Token import | GPT-4.1, Claude via Copilot |
| OpenAI-compatible | API key | Groq, LM Studio, Ollama, any custom endpoint |
Configuration
Section titled “Configuration”providers: - id: anthropic api_key: ${ANTHROPIC_API_KEY}
- id: openai api_key: ${OPENAI_API_KEY}
- id: google api_key: ${GOOGLE_API_KEY}
- id: groq api_key: ${GROQ_API_KEY} base_url: https://api.groq.com/openai/v1
- id: ollama base_url: http://localhost:11434/v1
- id: lmstudio base_url: http://localhost:8000/v1Select your model via the /models command or the home screen model picker.
Example: Local + Cloud Setup
Section titled “Example: Local + Cloud Setup”providers: - id: anthropic api_key: ${ANTHROPIC_API_KEY} - id: ollama base_url: http://localhost:11434/v1
utility_model: ollama/llama3 # use local model for background tasksfallback_models: - ollama/llama3 # fall back to local if cloud failsOAuth Authentication
Section titled “OAuth Authentication”For OpenAI (ChatGPT Plus/Pro), use OAuth instead of API keys:
kodacode login openaiLeave api_key empty in your config for OAuth-authenticated providers. To log out:
kodacode logoutGitHub Copilot
Section titled “GitHub Copilot”KodaCode can use your existing GitHub Copilot subscription to access models like GPT-4.1 and Claude through the Copilot API.
Use /connect and select GitHub Copilot. You’ll be prompted to import your token from one of these sources:
- Neovim/Vim — reads from
~/.config/github-copilot/hosts.json - opencode — reads from
~/.local/share/opencode/auth.json - Manual — paste a
gho_token directly
KodaCode stores the token in its auth store and automatically exchanges it for short-lived Copilot session tokens (~30 min), refreshing as needed. No manual token rotation required.
Limitations
Section titled “Limitations”KodaCode cannot perform its own GitHub OAuth device code flow for Copilot. GitHub’s Copilot API requires a registered OAuth App client_id that determines which models the user can access. The client_id is obtained through GitHub’s Copilot partner program, which is designed for enterprise and business integrations — KodaCode is not enrolled in this program.
Without a partner-enrolled client_id, a custom OAuth flow would only provide access to basic models, not the full catalog available to Copilot subscribers.
This is why KodaCode imports tokens from tools that have already completed the Copilot OAuth flow (Neovim plugin, Copilot CLI, opencode) rather than implementing its own.
Getting a token
Section titled “Getting a token”If you don’t have any of the supported tools installed, the easiest path is to install the Neovim Copilot plugin — it stores its token as a plain JSON file that kodacode can import directly.
If you already have the Copilot CLI authenticated, the token is in your system keychain. You can extract it for manual paste:
# macOSsecurity find-generic-password -s "copilot-cli" -w
# Linux (libsecret — install: apt install libsecret-tools / dnf install libsecret / pacman -S libsecret)secret-tool lookup service copilot-cliUtility Model
Section titled “Utility Model”Configure a cheap, fast model for background tasks like title generation and context compaction:
utility_model: anthropic/claude-haiku-4-5-20251001Agents can opt into the utility model by setting model: utility in their frontmatter. The built-in explorer and insight agents use this by default, significantly reducing cost for read-only research tasks.
Fallback Chain
Section titled “Fallback Chain”When the primary model fails, KodaCode automatically tries fallback models in order:
fallback_models: - anthropic/claude-sonnet-4-6 - openai/gpt-4oFallbacks only trigger after the primary model has fully exhausted its retry budget. The retry sequence for each model is:
- Silent retries — up to 10 attempts with exponential backoff (2s, 4s, 8s… capped at 30s). These are invisible to the user.
- Visible retries — up to
max_retries(default 5) additional attempts. The TUI shows a notification with the error and countdown. - Non-retryable errors (401 auth, context overflow) skip retries entirely and fail immediately.
Only after all retries are exhausted does KodaCode move to the next fallback model. Each fallback gets its own full retry budget. The first fallback that succeeds takes over and the response flows normally. If all fallbacks also fail, the original error is returned.
The TUI shows “Primary model unavailable, trying X…” when switching to a fallback.
Adaptive Reasoning Budget
Section titled “Adaptive Reasoning Budget”Thinking is configured per-model in the provider config:
providers: - id: google models: - id: gemini-3-pro-preview thinking_budget: 10000 # only this model gets thinkingKodaCode then adjusts the budget dynamically across a tool loop:
- Per-model config — sets the base budget for models that support reasoning
- Agent frontmatter
reasoning_budget— overrides the per-model config - Variant (user-set ceiling) —
/variantcycles throughlow(3K),high(10K), andmax(32K) thinking tokens - Auto-reduce on tool turns — After the initial response, the budget drops to 3K since the model is just routing tool results
- Context-aware scaling — Above 70% context usage, the budget scales down to prevent output token exhaustion
Models without thinking_budget in their config do not use extended thinking, even if the model supports it. This prevents unnecessary latency and cost.
Example: Extended Thinking Setup
Section titled “Example: Extended Thinking Setup”providers: - id: anthropic api_key: ${ANTHROPIC_API_KEY} thinking_type: adaptive # "adaptive" (default) or "enabled" thinking_budget: 10000 # used when thinking_type is "enabled" models: - id: claude-opus-4-6 thinking_budget: 32000 # more thinking for Opus (when enabled)
- id: google api_key: ${GOOGLE_API_KEY} models: - id: gemini-2.5-pro thinking_budget: 10000Anthropic OAuth thinking types:
adaptive(default) — the model decides how much to think based on query complexity. Simple questions get minimal thinking; complex tasks get deep reasoning. More token-efficient and less likely to hit subscription rate limits.enabled— reserves a fixed token budget for thinking on every request. Uses thethinking_budgetvalue (defaults to 10,000). Predictable but consumes more tokens.
Then in a session, use /variant to cycle thinking effort:
/variant # cycles: low (3K) → high (10K) → max (32K) → offModel Discovery
Section titled “Model Discovery”Models are discovered automatically:
- Cloud providers: metadata fetched from models.dev and provider APIs
- Local providers (Ollama, LM Studio): discovered via their
/v1/modelsendpoint
Use /models in the TUI to browse and switch models mid-session.
Refresh behavior
Section titled “Refresh behavior”- Local models (Ollama, LM Studio) are refreshed on every startup regardless of
model_refresh_interval. Capability probing (/api/showfor Ollama,/api/v1/modelsfor LM Studio) runs in parallel with bounded concurrency so startup stays fast even with many local models. - Cloud model metadata (from models.dev) is cached locally and refreshed based on
model_refresh_interval(default: 7 days). Setmodel_refresh_interval: 0to force a refresh on every startup.
Best Practices
Section titled “Best Practices”Model Selection
Section titled “Model Selection”- Use a capable model (Claude Sonnet/Opus, GPT-4o) for complex coding tasks
- Set a fast, cheap model as
utility_modelfor titles, summaries, and compaction - Configure
fallback_modelsas a safety net for provider outages
Local Models
Section titled “Local Models”- Ollama and LM Studio models are auto-discovered from their
/v1/modelsendpoint - Set
context_sizein config for local models — the default may not match the actual model capacity - If a slow local model takes a long time to finish emitting tool-call JSON, raise
session.tool_call_argument_timeoutor set a per-model override - Local model title generation can be slow — kodacode retries up to 3 times with a 30s timeout
Cost Optimization
Section titled “Cost Optimization”- The utility model handles titles, compaction, and summaries — use a cheap model here
- Use
/costto monitor spending during a session - Set
budgetandbudget_warnto prevent runaway costs - Subagent agents like
explorerandinsightuse the utility model by default