Hybrid Search
The search tool uses a hybrid mode by default: it ranks chunk-level lexical and semantic evidence together, then merges the results by relevance. If embeddings are not configured, it falls back to lexical search automatically.
How it works
Section titled “How it works”When the model calls search, KodaCode:
- Splits files into chunks and keeps those chunk boundaries stable for hybrid ranking.
- Runs a chunk-aware lexical pass over the target path or glob.
- If an embedding model is configured, embeds the query and each chunk, then scores them by cosine similarity.
- Merges the two ranked chunk lists using reciprocal rank fusion, which combines rank position from each pass into a single relevance score.
- Applies fixed internal path-aware adjustments so source files tend to outrank docs, tests, mocks, and generated code.
Visible output stays text-first:
- lexical mode returns
path:line:snippet - hybrid mode prefixes each result with
[lexical],[semantic], or[merged]
The runtime also stores structured search metadata for replay and the TUI inspector, including fallback notices, source mix, and match counts.
File chunking
Section titled “File chunking”Files are split into chunks before embedding. The chunker detects declaration boundaries (functions, classes, types, and their preceding comments) and uses those as split points. Where no boundaries are found, it uses 40-line sliding windows.
Chunks are cached on disk and revalidated against file modification time every 10 seconds. Only changed files are re-embedded.
Search modes
Section titled “Search modes”| Mode | Behaviour |
|---|---|
hybrid | Lexical and semantic combined (default when embeddings are configured) |
lexical | Text matching only |
Regex search always uses lexical mode regardless of configuration.
Path and glob scope
Section titled “Path and glob scope”Use "." for workspace-wide search. If you want to narrow the scope, prefer a
more specific path first, then add a simple glob when needed.
Current glob behavior supports basename patterns and relative path patterns such as:
*.gointernal/*.gopkg/*_test.go
It does not use doublestar semantics. Patterns like **/tests/** are not part
of the current search contract.
Configuration
Section titled “Configuration”search: skip_dirs: [coverage, .next] # optional extra directory names to ignore embeddings_model: openai/text-embedding-3-small # required for hybrid mode embeddings_dimensions: 1536 # optional; omit to use the model default prewarm_embeddings: false # embed workspace files on session open index_dir: ~/.local/state/kodacode/search # cache locationembeddings_model uses the format provider_id/model_id. The provider must be configured with a valid API key and base URL. Any OpenAI-compatible embedding endpoint works.
Setting prewarm_embeddings: true requires embeddings_model to be set; the config validator rejects the combination otherwise. Hybrid ranking also applies fixed internal path-aware adjustments so source files tend to rank above docs, tests, and generated paths without exposing more user-facing tuning.
Complete example
Section titled “Complete example”This is a copy-pasteable example with every public search setting:
version: 1
providers: - id: openai
search: index_dir: /Users/you/.local/state/kodacode/search skip_dirs: - coverage - dist - .next embeddings_model: openai/text-embedding-3-small embeddings_dimensions: 1536 prewarm_embeddings: trueReplace the provider and model with your own route if you use a local OpenAI-compatible server such as Ollama or LM Studio.
Scope limit
Section titled “Scope limit”Hybrid search operates on at most 800 chunks. If the search path or glob resolves to more than that, KodaCode falls back to lexical search and includes a notice in the result:
notice: semantic search scope is too large; narrow path or globFor tracked workspaces, a cold broad fallback also schedules background index warming so later searches can use the cached chunk index without asking the user to trigger warmup manually.
To stay within the limit immediately, pass a more specific path such as
internal/auth or a simple glob like *.go or internal/*.go instead of the
entire workspace root.
What gets skipped
Section titled “What gets skipped”- Binary files (detected by null byte probe)
.git,node_modules, andvendordirectories by default- Any extra directory names you add under
search.skip_dirs
search.skip_dirs entries are exact directory names, not globs or relative
paths. For example, coverage skips any directory named coverage anywhere in
the searched tree.
Graceful fallback
Section titled “Graceful fallback”Hybrid search never hard-fails. If embeddings are not configured, the embedding API returns an error, or the scope is too large, the tool returns lexical results with a notice explaining the downgrade. The model always gets something useful back.