Query Engine Deep Dive
The query engine is the beating heart of Claude Code: the pipeline that takes a user message and produces a complete agentic response. It spans two files: query.ts (the shared agentic loop) and QueryEngine.ts (the SDK-facing class wrapper).
Two Files, One Loop
It is important to understand the relationship between these two files before diving into either:
query.ts: An async generator function that takes a user message plus context and yields streaming events. It implements the full agentic loop: API request → stream parse → tool dispatch → loop. This function is used by both the REPL and the SDK.QueryEngine.ts: A class that wrapsquery.tsfor the headless/SDK use case. It owns session state (message history, file cache, usage totals) and exposes asubmitMessage()async generator. The REPL doesn't use this class, it manages its own state through the Ink component tree and callsquery.tsdirectly viaask().
The QueryEngine Class
The QueryEngine class (1,295 lines) is the primary interface for programmatic use of Claude Code. You create one instance per conversation and call submitMessage() for each user turn:
// QueryEngine configuration
export type QueryEngineConfig = {
cwd: string
tools: Tools
commands: Command[]
mcpClients: MCPServerConnection[]
agents: AgentDefinition[]
canUseTool: CanUseToolFn
getAppState: () => AppState
setAppState: (f: (prev: AppState) => AppState) => void
// Optional session controls
initialMessages?: Message[]
readFileCache: FileStateCache
customSystemPrompt?: string
appendSystemPrompt?: string
userSpecifiedModel?: string
fallbackModel?: string
thinkingConfig?: ThinkingConfig
maxTurns?: number
maxBudgetUsd?: number
verbose?: boolean
}
// One instance per conversation
const engine = new QueryEngine(config)
// Each message is an async generator of SDK events
for await (const event of engine.submitMessage("Fix the bug in main.ts")) {
switch (event.type) {
case 'text': console.log(event.text); break
case 'tool_use': console.log('Tool:', event.name); break
case 'result': console.log('Done:', event.stop_reason); break
}
}
Session State
The QueryEngine maintains several pieces of mutable state across turns:
mutableMessages: The full conversation history, grown by each turn.abortController: Shared abort signal; callingengine.interrupt()cancels in-flight requests.permissionDenials: Accumulated record of denied tool calls, yielded asSDKPermissionDenialevents.totalUsage: Aggregated token usage across all turns.readFileState: LRU cache of file contents read this session (prevents re-reading unchanged files).discoveredSkillNames: Turn-scoped set of skills surfaced via skill discovery (cleared at start of eachsubmitMessage()).loadedNestedMemoryPaths: Deduplication set for CLAUDE.md attachment injection (persists across turns to prevent repeated injections).
Context Assembly Pipeline
Before an API request is made, query.ts assembles the full context. This is not a simple message list copy, it involves several transformation stages:
- Message normalization:
normalizeMessagesForAPI()strips all UI-only message types (progress messages, system local command messages, tombstones) and ensures the message list starts with auserrole message. - Attachment injection:
getAttachmentMessages()scans the working directory tree for CLAUDE.md files and injects them asAttachmentMessageentries. ThenestedMemoryAttachmentTriggersset andloadedNestedMemoryPathsprevent redundant re-injection on every turn. - Memory prompt prepend:
loadMemoryPrompt()reads the global memory directory (~/.claude/memory/) and prepends it to the user context. - User context prepend:
prependUserContext()adds workspace-specific context (cwd, git status, etc.) to the first user message. - System prompt fetch:
fetchSystemPromptParts()assembles the system prompt from multiple sources: base prompt + tool prompts (each tool'sprompt()method) + any custom/append prompts. - Tool schema generation: Each enabled tool's
inputSchema(a Zod schema) is converted to JSON Schema and included in the API request'stoolsarray.
// Simplified context assembly (from query.ts)
async function assembleContext(messages, toolUseContext) {
// 1. Normalize messages - strip UI-only types
const apiMessages = normalizeMessagesForAPI(messages)
// 2. Inject CLAUDE.md memory files
const attachments = await getAttachmentMessages(
cwd,
toolUseContext.nestedMemoryAttachmentTriggers,
toolUseContext.loadedNestedMemoryPaths
)
// 3. Load global memory
const memoryPrompt = await loadMemoryPrompt()
// 4. Fetch system prompt parts (lazy - cached after first call)
const systemPromptParts = await fetchSystemPromptParts({
tools, commands, mcpClients, agents, ...
})
return {
system: asSystemPrompt(systemPromptParts),
messages: [...attachments, ...apiMessages],
tools: tools.map(t => toAPISchema(t))
}
}
API Streaming Flow
Claude Code uses the Anthropic SDK's streaming API. Each API call streams StreamEvent objects that are processed and forwarded to the UI in real time:
The streaming parser buffers incoming deltas and fires rendering callbacks as each block completes. This gives the terminal UI its characteristic "streaming text" effect where Claude's response appears character by character.
Error Handling and Retry
The API layer in services/api/ wraps the raw SDK with retry logic via withRetry.js. A FallbackTriggeredError is thrown when the primary model fails and a fallback model is specified (e.g., if the user specified a premium model but it's unavailable). The categorizeRetryableAPIError() function classifies errors into retryable vs. terminal categories.
// Error categories in query.ts
type RetryableCategory =
| 'overloaded' // 529 - retry with backoff
| 'rate_limited' // 429 - wait for reset
| 'server_error' // 500/502/503 - retry
| 'fallback' // Model unavailable - switch model
| 'non_retryable' // 400/401/403 - fail immediately
// Prompt too long is a special case
if (isPromptTooLongMessage(error)) {
// Try compaction first, then fail if already compacted
await attemptCompaction()
}
Tool Dispatch Mechanism
When the stream produces a tool_use content block, the dispatch sequence begins. This is one of the most complex parts of the codebase because tools can run concurrently, update UI state via callbacks, and modify the conversation context:
- Find tool:
findToolByName(tools, name)looks up by primary name or any alias. - Validate input:
tool.validateInput(input, context)runs Zod schema validation plus tool-specific checks. - Check permissions:
canUseTool(tool, input, context, ...)runs the full permission pipeline (see Security page). - Execute:
tool.call(input, context, canUseTool, parentMessage, onProgress)runs the actual tool logic, which may run subagents, shell commands, file operations, etc. - Collect result: The returned
ToolResult<T>containsdataplus optionalnewMessages(for tools that inject additional context) andcontextModifier(for tools that need to update theToolUseContext). - Format for API:
tool.mapToolResultToToolResultBlockParam(data, toolUseId)serializes the result into the format the API expects.
Concurrency note: Tools with isConcurrencySafe() === true can run in parallel when the model requests multiple tool calls in one response. Tools with isConcurrencySafe() === false (the default) are serialized. The contextModifier returned by a tool is only honored for non-concurrent tools because ordering matters for context mutation.
Message Compaction Strategy
Every AI conversation has a finite context window. Claude Code has a sophisticated multi-strategy compaction system in services/compact/ that activates when the token count approaches the limit:
Compaction Strategies
| Strategy | File | Description |
|---|---|---|
| Standard Compact | compact.ts |
Full summarization: sends the entire conversation to Claude and asks for a detailed summary. The summary replaces all history before the compact boundary. |
| Auto Compact | autoCompact.ts |
Triggered automatically when token usage crosses a threshold. Runs without user intervention in non-interactive mode. |
| Microcompact | microCompact.ts |
Lighter-weight compaction: summarizes only older tool-use exchanges while keeping recent context intact. Faster than full compaction. |
| Reactive Compact | reactiveCompact.ts |
Feature-flagged (REACTIVE_COMPACT). Dynamically compacts in response to context pressure signals. |
| Session Memory | sessionMemoryCompact.ts |
Extracts and persists important facts from a session to the memory directory before compacting. |
| Snip | snipCompact.ts |
Feature-flagged (HISTORY_SNIP). Removes specific tool-use exchanges by "snipping" them out rather than summarizing the whole history. |
| API Microcompact | apiMicrocompact.ts |
Compacts individual tool results that are excessively large before they enter the message history. |
Compact Boundaries
When compaction runs, it inserts a compactBoundaryMessage into the history. The getMessagesAfterCompactBoundary() utility ensures that API requests only include messages after the most recent boundary, the compacted summary lives as the first message in the trimmed window.
// After compaction, message history looks like:
[
// CompactBoundaryMessage (marker)
{ type: 'system', subtype: 'compact_boundary', summary: 'Prior work...' },
// Only post-compact messages sent to API
{ role: 'user', content: 'Now fix the remaining issue...' },
{ role: 'assistant', content: [{ type: 'text', text: '...' }] },
// ...
]
// getMessagesAfterCompactBoundary() returns only post-boundary messages
// The summary is prepended to the system prompt automatically
Tool Use Summary Generation
For long tool-use sequences (e.g., a complex file edit that involves 10 rounds of reading, editing, and verifying), the system can replace the entire exchange with a ToolUseSummaryMessage: a compact text representation of what was done. This is generated by generateToolUseSummary() in services/toolUseSummary/.
The summary is used both for display (showing a condensed history in the REPL) and for API requests (replacing a large tool-exchange block with a small text description, saving tokens).
Slash Command Routing
Before a message reaches the API, processUserInput() checks if it is a slash command. The routing logic handles several special cases:
- Local commands (e.g.,
/compact,/clear,/config) are handled entirely client-side, never sent to the API. - Skill commands (e.g.,
/commit,/review, custom user scripts): the skill file is loaded, its template is expanded, and the result is submitted as a user message. - Tool skills (e.g.,
/bash): creates a synthetic tool_use message to immediately invoke a tool. - Priority queue:
getCommandsByMaxPriority()allows system-level commands to preempt user commands when the queue has multiple pending messages.
The notifyCommandLifecycle() call in query.ts fires analytics events for command start/end, enabling usage tracking for each slash command type. This is how Anthropic measures which features are actually used.
Inside prompts.ts: How the System Prompt is Built
The system prompt isn't a static string — it's assembled fresh on every session from roughly a dozen composable sections, each gated by runtime conditions, feature flags, and user type. The source of truth is src/prompts/prompts.ts, a ~700-line file that is one of the most revealing files in the entire codebase.
The Static/Dynamic Cache Boundary
Near the top of prompts.ts sits one of the most architecturally important constants in Claude Code:
/**
* Boundary marker separating static (cross-org cacheable) content
* from dynamic content.
* Everything BEFORE this marker can use scope: 'global'.
* Everything AFTER contains user/session-specific content and
* should not be cached globally.
*
* WARNING: Do not remove or reorder this marker without updating
* cache logic in api.ts and services/api/claude.ts
*/
export const SYSTEM_PROMPT_DYNAMIC_BOUNDARY =
'__SYSTEM_PROMPT_DYNAMIC_BOUNDARY__'
The system prompt array is split at this marker. Everything before it (the core instructions, tool guidance, behavior rules) is structured as globally cacheable with Anthropic's prompt caching infrastructure — it will be the same hash across all external user sessions, so the API can return a cache hit and skip most of the prompt processing cost. Everything after the marker includes session-specific data like the current working directory, git status, OS version, date, and loaded skills — all of which change per session and cannot be shared in cache.
This is why arbitrary modifications to CLAUDE.md at the system level (before the boundary) are discouraged — changing even one word before the boundary breaks the global cache key for every user simultaneously.
The Two-Tier User System
The single most consequential branch in the entire prompts file is this check, which appears dozens of times:
process.env.USER_TYPE === 'ant'
When this environment variable is set to 'ant' (short for Anthropic), Claude receives a substantially different system prompt than external users. The ant-tier receives additional instructions that Anthropic is still validating before rolling out broadly:
| Behavior | External Users | Anthropic Employees (ant) |
|---|---|---|
| Comment writing | Standard guidance | Strict "no comments unless WHY is non-obvious" rule; never explain WHAT |
| Output format | "Be extra concise. Lead with answer." | Flowing prose, inverted pyramid, full explanation, no fragments |
| Verification | Basic checking | Explicit: "Before reporting complete, verify it works; run the test." |
| Proactiveness | Standard | Extra: "If you notice a misconception, say so. You're a collaborator." |
| Accuracy reporting | Standard | Anti-false-claims mitigation: "Never claim 'all tests pass' when output shows failures." |
| Slack integration | None | Told to post /share links to #claude-code-feedback (C07VBSHV7EV) |
| REPL tool | Not available | Exposed via process.env.USER_TYPE === 'ant' guard |
The comments in the source code explicitly mark these as A/B experiments in progress, with annotations like // @[MODEL LAUNCH]: capy v8 assertiveness counterweight (PR #24302) — un-gate once validated on external via A/B. This is a live development pipeline: instructions get validated internally first, then gradually rolled out.
Internal Model Codenames
The source file contains developer-facing model codename annotations that reveal Anthropic's internal naming conventions and what's coming next:
// @[MODEL LAUNCH]: Update the latest frontier model.
const FRONTIER_MODEL_NAME = 'Claude Opus 4.6'
// Comments referencing internal codenames:
// @[MODEL LAUNCH]: Remove this section when we launch numbat.
// @[MODEL LAUNCH]: Update comment writing for Capybara — remove or soften
// once the model stops over-commenting by default
// @[MODEL LAUNCH]: capy v8 thoroughness counterweight (PR #24302)
// @[MODEL LAUNCH]: False-claims mitigation for Capybara v8 (29-30% FC rate
// vs v4's 16.7%)
Capybara is the internal codename for the currently deployed model family (Claude 4.x / Sonnet 4). Numbat is the next upcoming model, with a dedicated comment marking code that will need updating at its launch. The annotations also reveal active model quality issues being addressed — Capybara v8 had a 29-30% false-claims rate vs. v4's 16.7%, which is why ant users get the strict accuracy-reporting instructions that external users don't yet have.
The Escape Hatch: CLAUDE_CODE_SIMPLE
At the very top of getSystemPrompt(), before any of the elaborate section assembly, there's a one-line escape hatch:
export async function getSystemPrompt(tools, model, ...): Promise<string[]> {
if (isEnvTruthy(process.env.CLAUDE_CODE_SIMPLE)) {
return [
`You are Claude Code, Anthropic's official CLI for Claude.\n\nCWD: ${getCwd()}\nDate: ${getSessionStartDate()}`,
]
}
// ... hundreds of lines of section assembly ...
}
Setting CLAUDE_CODE_SIMPLE=1 collapses the entire elaborate system prompt into exactly two lines: identity + CWD + date. This is used for debugging and for scenarios where you explicitly want a blank-slate Claude with no behavioral guardrails loaded. It's also the fastest possible startup because none of the async section assembly runs.
The Verification Agent A/B Test
One of the most fascinating sections in the file is the verification agent instruction, which is double-gated behind both a compile-time feature flag and a live GrowthBook experiment:
hasAgentTool &&
feature('VERIFICATION_AGENT') &&
// 3P default: false — verification agent is ant-only A/B
getFeatureValue_CACHED_MAY_BE_STALE('tengu_hive_evidence', false)
? `The contract: when non-trivial implementation happens on your turn,
independent adversarial verification must happen before you report
completion — regardless of who did the implementing (you directly,
a fork you spawned, or a subagent).
...
Spawn the ${AGENT_TOOL_NAME} with subagent_type="${VERIFICATION_AGENT_TYPE}".
Your own checks do NOT substitute — only the verifier assigns a verdict.
On FAIL: fix, resume the verifier, repeat until PASS.
On PASS: spot-check it — re-run 2-3 commands from its report.`
: null
When both the VERIFICATION_AGENT feature flag is compiled in AND the tengu_hive_evidence GrowthBook experiment assigns the user to the treatment group, Claude Code mandates that after any non-trivial implementation (3+ file edits, backend/API changes, infrastructure changes), a separate adversarial verification agent must run and pass before Claude can report completion. This is Anthropic's most aggressive attempt to reduce false "I'm done" claims in agentic workflows.
The CYBER_RISK_INSTRUCTION Injection Point
Inside getSimpleIntroSection(), a single import from cyberRiskInstruction.ts (see the Security page for a deep dive) is injected directly into the opening paragraph that every external user receives:
function getSimpleIntroSection(outputStyleConfig): string {
return `
You are an interactive agent that helps users with software engineering tasks.
Use the instructions below and the tools available to you to assist the user.
${CYBER_RISK_INSTRUCTION}
IMPORTANT: You must NEVER generate or guess URLs for the user unless you are
confident that the URLs are for helping the user with programming.`
}
This means the cybersecurity policy lands in the very first paragraph of the system prompt, before all tool instructions and before the cache boundary. It cannot be overridden by CLAUDE.md, project settings, or custom system prompts because it's hardcoded into the section that runs before any user customization is applied.
Async Prefetching
Claude Code aggressively prefetches data at startup and turn boundaries to minimize latency. Several prefetch operations run in parallel:
| Prefetch | Purpose | When |
|---|---|---|
startKeychainPrefetch() |
Read OAuth token + API key from macOS keychain | Before any imports |
startMdmRawRead() |
MDM policy reads via plutil/reg query | Before any imports |
prefetchOfficialMcpUrls() |
Fetch official MCP registry | After auth |
prefetchPassesEligibility() |
Check referral pass eligibility | After auth |
prefetchAwsCredentials() |
AWS/Bedrock credential prefetch | If Bedrock configured |
prefetchFastModeStatus() |
Resolve fast-mode feature gate | At init |
startRelevantMemoryPrefetch() |
Pre-scan for relevant CLAUDE.md files | Per turn |
skillPrefetch |
Pre-index skills directory for search | When EXPERIMENTAL_SKILL_SEARCH enabled |