Skip to content

Context Management

KodaCode uses a multi-stage system to manage context window usage, keeping long coding sessions productive without manual intervention.

Old tool outputs (bash results, file contents, search results) are replaced with compact summaries:

[pruned: 584 lines of file content]
  • Targets outputs older than the most recent tool calls
  • Protects the last 40K tokens from pruning (configurable via prune_protect_tokens)
  • Only prunes if total savings exceed 20K tokens (configurable via prune_min_savings)
  • Edit/patch outputs are never pruned (needed for correctness)

Before compaction runs, KodaCode cleans the message history to ensure it’s valid for all providers:

  • Orphaned tool calls — if a tool call has no matching result (e.g. from a cancelled stream or crash recovery), the call is stripped. This prevents providers that require strict tool_call/tool_result pairing (Anthropic, OpenAI) from rejecting the request.
  • File attachments — images and PDFs are replaced with [file attachment omitted] before being sent to the utility model for summarization, since they inflate token counts and the summary model doesn’t need them.
  • Turn boundary alignment — truncation always cuts at user-message boundaries, keeping tool_call/tool_result pairs together. No broken pairs leak through.

This happens automatically and invisibly. It’s relevant when you resume sessions after crashes or when switching between providers that have different message format requirements.

When context usage exceeds the threshold (default 80%), KodaCode generates a structured summary using the utility model:

The summary follows a strict format:

  • Goal — what the user is trying to accomplish
  • Key instructions — constraints and preferences
  • Discoveries — what was learned during the session
  • Accomplished — what has been done
  • Relevant files — files that matter for the current task

The most recent turns (default 10) are preserved verbatim. Everything older is replaced by the summary.

At 90% of the model’s actual input capacity, KodaCode stops the tool loop and forces a final response. This prevents context overflow errors that would lose the entire turn.

Additional safety measures:

  • At 60% context usage: tool parameter descriptions are stripped (the model has seen them before)
  • At 90% context usage: tools are removed entirely, forcing a text response
session:
compaction_threshold: 0.8 # Trigger compaction at 80%
compaction_keep_turns: 10 # Preserve last 10 turns
prune_protect_tokens: 40000 # Protect last 40K tokens from pruning
prune_min_savings: 20000 # Only prune if saves 20K+ tokens
context_limit: 0.9 # Stop tool loop at 90%
# Per-model overrides
models:
openai/gpt-4o:
compaction_threshold: 0.9
prune_protect_tokens: 80000
FieldDefaultDescription
compaction_threshold0.8Fraction of context window that triggers compaction
compaction_keep_turns10Recent turns preserved verbatim after compaction
prune_protect_tokens40000Tokens protected from pruning
prune_min_savings20000Minimum savings required to trigger pruning
context_limit0.9Fraction of context that stops the tool loop

Use /pin to add instructions that survive compaction:

/pin Always use TypeScript strict mode
/pin Never modify files in the vendor/ directory
/pin Use table-driven tests for all new Go tests

Pinned instructions are injected into every system prompt, even after compaction clears the conversation history.

CommandDescription
/pin <instruction>Pin a new instruction
/pinsList all pinned instructions
/unpin <number>Remove a pin by number

Example: Protecting Conventions During Long Sessions

Section titled “Example: Protecting Conventions During Long Sessions”
> /pin All database queries must go through the repository layer
> /pin Error messages start lowercase, no trailing punctuation
> /pin Run tests after every code change

These instructions persist through compaction, so even if the model’s conversation history is summarized, it still follows your rules.

Use /cost to see current token usage. Watch for:

  • 60%: Tool parameter descriptions are stripped to save space
  • 80%: Compaction triggers — older turns are summarized
  • 90%: Tool loop stops, model forced to respond with available information
  • Long refactoring sessions: Lower compaction_threshold (0.7) and increase compaction_keep_turns (15) to preserve more recent context
  • Quick Q&A sessions: Defaults work well — compaction rarely triggers
  • Large codebases: Increase prune_protect_tokens (60000+) to keep more tool output visible

If the model forgets something important after compaction:

  • Use /pin to make critical instructions survive compaction
  • Use KODACODE.md for project conventions that should always be present
  • Re-state key context in your next message — the model will pick it up