Skip to content

Context Management

Long conversations get expensive fast. KodaCode addresses this through visible request context, automatic history summaries, configurable response length, and full visibility into what the model actually sees each turn.

If you are looking for the main user-facing cost levers overall, combine this page with Budgets and Cost Tracking & Optimization.

KodaCode uses these words for two different things:

  • ctx is the current provider request size. It is the prompt, tools, selected prior material, current turn, and summaries that are actually being sent to the model for this step.
  • history is the stored session record. It includes prior messages, tool calls, tool results, runtime notes, completed turns, and any durable summary already written.

The header ctx meter is the source of truth for the current model request. It can be lower than the amount of raw stored history because the runtime may already have shaped, pruned, or summarized older material before sending the request.

When the TUI says Summarizing History, KodaCode is maintaining stored session history for future continuity and cost control. That does not necessarily mean the current ctx meter is near the model limit.

KodaCode uses one durable history-summary mechanism when stored session history gets large:

  • History Summary: the runtime rewrites completed prior turns into one durable summary artifact. Long-lived sessions carry this forward, and /compact rebuilds it.
TUI messageInternal eventWhat it meansDurable?
Summarizing Historycontext_compaction_startedThe runtime started rebuilding the saved summary of older stored turns. The current ctx meter may already be lower because request history has been shaped for this step.Not yet
History Summarysession_history_continuation_updatedThe runtime finished rebuilding the saved summary, and the transcript shows the new card.Yes
History summary update failedcontext_compaction_failedThe runtime could not finish rebuilding the summary, or the request still could not fit after history compaction.No new summary

There is only one durable compaction card: History Summary.

When replayed stored session history grows past compaction_threshold (measured as a fraction of the model’s context limit), the runtime replaces older turns with a compact summary that keeps what matters for continuity: the goal, decisions made, files touched, work in progress, blockers, and critical technical context.

Compaction is always on. By default, older stored history starts summarizing at 80% of the model context window and aims to get under 60% again:

sessions:
compaction_threshold: 0.8 # default: 0.8; start rebuilding older stored history at 80%
compaction_target_threshold: 0.60 # default: 0.60; accept any candidate at or below 60%

You can also trigger this manually from inside the TUI:

/compact

/compact rebuilds the durable history summary through the same history-compaction path used automatically at runtime.

Why summarization can appear below 80% ctx

Section titled “Why summarization can appear below 80% ctx”

The ctx meter and the history-summary trigger measure different stages of the same pipeline:

  • ctx shows the current shaped request that will be sent to the model.
  • history summarization is triggered by replay pressure from older stored session history before that history is reduced into a durable summary.

This is why Summarizing History can appear while the header still shows a moderate ctx percentage. The runtime is reducing older stored turns so future requests can continue from an inspectable summary instead of repeatedly carrying or pruning the full raw transcript.

compaction_target_threshold is an acceptance ceiling, not an exact destination.

After summarization, KodaCode keeps a compact summary instead of replaying the older raw turns it replaced. That means stored-history pressure can move from roughly 80% down to well below 60% in one step. A large drop is normal when the rebuilt summary is much smaller than the raw history it replaced.

KodaCode also performs smaller automatic request-shaping steps before sending a provider request. These are cost controls, but they are not user-toggled features:

  • structured prompt fragments may get a compact provider-facing form
  • provider-facing tool schemas may drop nested descriptions while keeping the callable shape intact
  • older large tool results can be replaced with placeholders in later requests from the same active turn
  • a deterministic context packet can be omitted when the request is already under input pressure

The full local event log remains the source of truth. These steps affect what is sent to the model, not whether the turn remains inspectable.

For example, after a broad search result has already guided later reads and edits, a later provider request in the same turn may carry this kind of placeholder instead of the full old search output:

[older retained output from search turn-3 call-abc pruned for prompt budget]

Use /cost to see the token savings totals and /trace [turn-number] to see the request mix for one turn.

KodaCode may use the configured utility model to improve the durable history summary, but the utility model is optional.

  • if the utility summary path is unavailable, runtime keeps the built-in runtime summary
  • if the utility request fails normally, compaction still completes with the runtime summary
  • only a narrower runtime failure path produces History summary update failed

This is why a missing or flaky utility model does not usually break compaction entirely.

response_style controls how verbose ordinary model replies are. The two values are default (no constraint) and terse, and terse is the default session posture.

sessions:
response_style: terse # default: terse

Terse mode is implemented entirely at the prompt level. When response_style: terse is active, the runtime injects a short instruction fragment into the system prompt for every turn:

Response style: terse.
- Keep ordinary model replies brief and direct.
- Do not shorten safety, permission, destructive-action, or ambiguity clarifications.

This fragment is inserted after agent and skill prompts but before workspace and execution-environment context, so it applies consistently across all turns in the session.

Nothing else changes. Terse mode does not alter:

  • the TUI layout or display
  • tool call behaviour or results
  • session logging or history
  • cost tracking (though shorter replies do produce fewer output tokens, which reduces cost indirectly)

The instruction targets ordinary prose replies: narration, summaries, and step-by-step explanations. The model is explicitly told not to shorten:

  • safety warnings
  • permission and approval explanations
  • destructive-action confirmations
  • clarification requests where ambiguity needs to be surfaced

Structured output (tables, code blocks, lists of keys) is also unaffected.

response_style is set once at the session level and applied to every turn. There is no per-turn override and no runtime toggle. Changes require editing the config file and starting a new session.

Valid values are default and terse. If you omit the field, the session defaults to terse. Any other value is rejected at startup with a validation error.

The durable history summary captures:

  • the stated goal and active constraints
  • completed steps and decisions made
  • work in progress and what is blocking it
  • file paths touched and their relevance
  • critical warnings that must not be forgotten

That structure lets the model pick up where it left off instead of replaying every old turn verbatim.

The runtime emits explicit events for context operations:

  • context_compaction_started: history-summary rebuild started
  • session_history_continuation_updated: the new durable history summary was written
  • context_compaction_failed: history-summary rebuild failed
  • prompt_compiled: what was actually assembled for each turn
  • context_pruned: which turns were dropped before the prompt was built after summary insertion

These are stored in the session log, so you can inspect what the model saw at any point rather than guessing why a session drifted.

context_packet is disabled by default. When enabled, it adds small, deterministic runtime facts to a request so the model can avoid some discovery calls for basic workspace state. It is not free: the packet uses a small number of input tokens. The value comes from avoiding repeated tool calls for facts the runtime already knows.

Example:

context_packet:
enabled_sections:
- repo
- git
- git_dirty_summary

What those sections mean:

SectionWhat it includesUser value
repoWorkspace name and whether Git was detectedHelps the model orient itself without first asking for basic workspace metadata
gitCurrent branch, when available, and changed-file countHelps the model know whether it is looking at a dirty branch and roughly how large the change is
git_dirty_summaryA bounded list of changed files and their Git status, up to the runtime limitHelps review, debugging, and planning prompts start with the files already in play

Example packet content is closer to this than to a file dump:

name: kodacode
git: detected
branch: docs-refresh
changed_files: 3
total_changed_files: 3
files:
- M README.md
- M site/src/content/docs/features/context.mdx
- ?? site/src/content/docs/features/skills.mdx

The current packet sections are intentionally bounded. They can include repository metadata, branch and changed-file counts, a limited changed-file list, and a small diagnostics summary for changed files when the request is diagnostics-related. They do not include file contents or patch hunks.

If the request is already close to the model input limit, runtime omits the packet and records the omitted token estimate in /cost.

This setting limits wasteful turn churn:

  • max_provider_requests_per_turn: stops one turn from making provider requests forever
  • max_output_continuations: controls automatic continuation after a provider stops solely because the output limit was reached

Use these TUI commands to check what happened:

  • /cost: session token totals, compaction savings, and cache activity
  • /trace [turn-number]: per-turn detail including history-summary compaction, token breakdown, and route attempts