Context Management
Long conversations get expensive fast. KodaCode addresses this through visible request context, automatic history summaries, configurable response length, and full visibility into what the model actually sees each turn.
If you are looking for the main user-facing cost levers overall, combine this page with Budgets and Cost Tracking & Optimization.
Context vs history
Section titled “Context vs history”KodaCode uses these words for two different things:
ctxis the current provider request size. It is the prompt, tools, selected prior material, current turn, and summaries that are actually being sent to the model for this step.historyis the stored session record. It includes prior messages, tool calls, tool results, runtime notes, completed turns, and any durable summary already written.
The header ctx meter is the source of truth for the current model request. It can be lower than the amount of raw stored history because the runtime may already have shaped, pruned, or summarized older material before sending the request.
When the TUI says Summarizing History, KodaCode is maintaining stored session history for future continuity and cost control. That does not necessarily mean the current ctx meter is near the model limit.
One compaction control
Section titled “One compaction control”KodaCode uses one durable history-summary mechanism when stored session history gets large:
History Summary: the runtime rewrites completed prior turns into one durable summary artifact. Long-lived sessions carry this forward, and/compactrebuilds it.
Messages you will see
Section titled “Messages you will see”| TUI message | Internal event | What it means | Durable? |
|---|---|---|---|
Summarizing History | context_compaction_started | The runtime started rebuilding the saved summary of older stored turns. The current ctx meter may already be lower because request history has been shaped for this step. | Not yet |
History Summary | session_history_continuation_updated | The runtime finished rebuilding the saved summary, and the transcript shows the new card. | Yes |
History summary update failed | context_compaction_failed | The runtime could not finish rebuilding the summary, or the request still could not fit after history compaction. | No new summary |
There is only one durable compaction card: History Summary.
History summary compaction
Section titled “History summary compaction”When replayed stored session history grows past compaction_threshold (measured as a fraction of the model’s context limit), the runtime replaces older turns with a compact summary that keeps what matters for continuity: the goal, decisions made, files touched, work in progress, blockers, and critical technical context.
Compaction is always on. By default, older stored history starts summarizing at 80% of the model context window and aims to get under 60% again:
sessions: compaction_threshold: 0.8 # default: 0.8; start rebuilding older stored history at 80% compaction_target_threshold: 0.60 # default: 0.60; accept any candidate at or below 60%You can also trigger this manually from inside the TUI:
/compact/compact rebuilds the durable history summary through the same history-compaction path used automatically at runtime.
Why summarization can appear below 80% ctx
Section titled “Why summarization can appear below 80% ctx”The ctx meter and the history-summary trigger measure different stages of the same pipeline:
ctxshows the current shaped request that will be sent to the model.- history summarization is triggered by replay pressure from older stored session history before that history is reduced into a durable summary.
This is why Summarizing History can appear while the header still shows a moderate ctx percentage. The runtime is reducing older stored turns so future requests can continue from an inspectable summary instead of repeatedly carrying or pruning the full raw transcript.
Why the percentage can drop a lot
Section titled “Why the percentage can drop a lot”compaction_target_threshold is an acceptance ceiling, not an exact destination.
After summarization, KodaCode keeps a compact summary instead of replaying the older raw turns it replaced. That means stored-history pressure can move from roughly 80% down to well below 60% in one step. A large drop is normal when the rebuilt summary is much smaller than the raw history it replaced.
Automatic request shaping
Section titled “Automatic request shaping”KodaCode also performs smaller automatic request-shaping steps before sending a provider request. These are cost controls, but they are not user-toggled features:
- structured prompt fragments may get a compact provider-facing form
- provider-facing tool schemas may drop nested descriptions while keeping the callable shape intact
- older large tool results can be replaced with placeholders in later requests from the same active turn
- a deterministic context packet can be omitted when the request is already under input pressure
The full local event log remains the source of truth. These steps affect what is sent to the model, not whether the turn remains inspectable.
For example, after a broad search result has already guided later reads and edits, a later provider request in the same turn may carry this kind of placeholder instead of the full old search output:
[older retained output from search turn-3 call-abc pruned for prompt budget]Use /cost to see the token savings totals and /trace [turn-number] to see
the request mix for one turn.
Utility-model involvement
Section titled “Utility-model involvement”KodaCode may use the configured utility model to improve the durable history summary, but the utility model is optional.
- if the utility summary path is unavailable, runtime keeps the built-in runtime summary
- if the utility request fails normally, compaction still completes with the runtime summary
- only a narrower runtime failure path produces
History summary update failed
This is why a missing or flaky utility model does not usually break compaction entirely.
Response style
Section titled “Response style”response_style controls how verbose ordinary model replies are. The two values are default (no constraint) and terse, and terse is the default session posture.
sessions: response_style: terse # default: terseHow terse mode works
Section titled “How terse mode works”Terse mode is implemented entirely at the prompt level. When response_style: terse is active, the runtime injects a short instruction fragment into the system prompt for every turn:
Response style: terse.- Keep ordinary model replies brief and direct.- Do not shorten safety, permission, destructive-action, or ambiguity clarifications.This fragment is inserted after agent and skill prompts but before workspace and execution-environment context, so it applies consistently across all turns in the session.
Nothing else changes. Terse mode does not alter:
- the TUI layout or display
- tool call behaviour or results
- session logging or history
- cost tracking (though shorter replies do produce fewer output tokens, which reduces cost indirectly)
What terse does and does not shorten
Section titled “What terse does and does not shorten”The instruction targets ordinary prose replies: narration, summaries, and step-by-step explanations. The model is explicitly told not to shorten:
- safety warnings
- permission and approval explanations
- destructive-action confirmations
- clarification requests where ambiguity needs to be surfaced
Structured output (tables, code blocks, lists of keys) is also unaffected.
Configuration
Section titled “Configuration”response_style is set once at the session level and applied to every turn. There is no per-turn override and no runtime toggle. Changes require editing the config file and starting a new session.
Valid values are default and terse. If you omit the field, the session defaults to terse. Any other value is rejected at startup with a validation error.
What the history summary keeps
Section titled “What the history summary keeps”The durable history summary captures:
- the stated goal and active constraints
- completed steps and decisions made
- work in progress and what is blocking it
- file paths touched and their relevance
- critical warnings that must not be forgotten
That structure lets the model pick up where it left off instead of replaying every old turn verbatim.
Durable context artifacts
Section titled “Durable context artifacts”The runtime emits explicit events for context operations:
context_compaction_started: history-summary rebuild startedsession_history_continuation_updated: the new durable history summary was writtencontext_compaction_failed: history-summary rebuild failedprompt_compiled: what was actually assembled for each turncontext_pruned: which turns were dropped before the prompt was built after summary insertion
These are stored in the session log, so you can inspect what the model saw at any point rather than guessing why a session drifted.
Deterministic context packet
Section titled “Deterministic context packet”context_packet is disabled by default. When enabled, it adds small,
deterministic runtime facts to a request so the model can avoid some discovery
calls for basic workspace state. It is not free: the packet uses a small number
of input tokens. The value comes from avoiding repeated tool calls for facts the
runtime already knows.
Example:
context_packet: enabled_sections: - repo - git - git_dirty_summaryWhat those sections mean:
| Section | What it includes | User value |
|---|---|---|
repo | Workspace name and whether Git was detected | Helps the model orient itself without first asking for basic workspace metadata |
git | Current branch, when available, and changed-file count | Helps the model know whether it is looking at a dirty branch and roughly how large the change is |
git_dirty_summary | A bounded list of changed files and their Git status, up to the runtime limit | Helps review, debugging, and planning prompts start with the files already in play |
Example packet content is closer to this than to a file dump:
name: kodacodegit: detected
branch: docs-refreshchanged_files: 3
total_changed_files: 3files:- M README.md- M site/src/content/docs/features/context.mdx- ?? site/src/content/docs/features/skills.mdxThe current packet sections are intentionally bounded. They can include repository metadata, branch and changed-file counts, a limited changed-file list, and a small diagnostics summary for changed files when the request is diagnostics-related. They do not include file contents or patch hunks.
If the request is already close to the model input limit, runtime omits the
packet and records the omitted token estimate in /cost.
Request Limits
Section titled “Request Limits”This setting limits wasteful turn churn:
max_provider_requests_per_turn: stops one turn from making provider requests forevermax_output_continuations: controls automatic continuation after a provider stops solely because the output limit was reached
Inspection
Section titled “Inspection”Use these TUI commands to check what happened:
/cost: session token totals, compaction savings, and cache activity/trace [turn-number]: per-turn detail including history-summary compaction, token breakdown, and route attempts