Core System

Query Engine Deep Dive

The query engine is the beating heart of Claude Code: the pipeline that takes a user message and produces a complete agentic response. It spans two files: query.ts (the shared agentic loop) and QueryEngine.ts (the SDK-facing class wrapper).

Two Files, One Loop

It is important to understand the relationship between these two files before diving into either:

query.ts: An async generator function that takes a user message plus context and yields streaming events. It implements the full agentic loop: API request → stream parse → tool dispatch → loop. This function is used by both the REPL and the SDK.
QueryEngine.ts: A class that wraps query.ts for the headless/SDK use case. It owns session state (message history, file cache, usage totals) and exposes a submitMessage() async generator. The REPL doesn't use this class, it manages its own state through the Ink component tree and calls query.ts directly via ask().

The QueryEngine Class

The QueryEngine class (1,295 lines) is the primary interface for programmatic use of Claude Code. You create one instance per conversation and call submitMessage() for each user turn:

// QueryEngine configuration
export type QueryEngineConfig = {
  cwd: string
  tools: Tools
  commands: Command[]
  mcpClients: MCPServerConnection[]
  agents: AgentDefinition[]
  canUseTool: CanUseToolFn
  getAppState: () => AppState
  setAppState: (f: (prev: AppState) => AppState) => void

  // Optional session controls
  initialMessages?: Message[]
  readFileCache: FileStateCache
  customSystemPrompt?: string
  appendSystemPrompt?: string
  userSpecifiedModel?: string
  fallbackModel?: string
  thinkingConfig?: ThinkingConfig
  maxTurns?: number
  maxBudgetUsd?: number
  verbose?: boolean
}

// One instance per conversation
const engine = new QueryEngine(config)

// Each message is an async generator of SDK events
for await (const event of engine.submitMessage("Fix the bug in main.ts")) {
  switch (event.type) {
    case 'text':      console.log(event.text); break
    case 'tool_use':  console.log('Tool:', event.name); break
    case 'result':    console.log('Done:', event.stop_reason); break
  }
}

Session State

The QueryEngine maintains several pieces of mutable state across turns:

mutableMessages: The full conversation history, grown by each turn.
abortController: Shared abort signal; calling engine.interrupt() cancels in-flight requests.
permissionDenials: Accumulated record of denied tool calls, yielded as SDKPermissionDenial events.
totalUsage: Aggregated token usage across all turns.
readFileState: LRU cache of file contents read this session (prevents re-reading unchanged files).
discoveredSkillNames: Turn-scoped set of skills surfaced via skill discovery (cleared at start of each submitMessage()).
loadedNestedMemoryPaths: Deduplication set for CLAUDE.md attachment injection (persists across turns to prevent repeated injections).

Context Assembly Pipeline

Before an API request is made, query.ts assembles the full context. This is not a simple message list copy, it involves several transformation stages:

Message normalization: normalizeMessagesForAPI() strips all UI-only message types (progress messages, system local command messages, tombstones) and ensures the message list starts with a user role message.
Attachment injection: getAttachmentMessages() scans the working directory tree for CLAUDE.md files and injects them as AttachmentMessage entries. The nestedMemoryAttachmentTriggers set and loadedNestedMemoryPaths prevent redundant re-injection on every turn.
Memory prompt prepend: loadMemoryPrompt() reads the global memory directory (~/.claude/memory/) and prepends it to the user context.
User context prepend: prependUserContext() adds workspace-specific context (cwd, git status, etc.) to the first user message.
System prompt fetch: fetchSystemPromptParts() assembles the system prompt from multiple sources: base prompt + tool prompts (each tool's prompt() method) + any custom/append prompts.
Tool schema generation: Each enabled tool's inputSchema (a Zod schema) is converted to JSON Schema and included in the API request's tools array.

// Simplified context assembly (from query.ts)
async function assembleContext(messages, toolUseContext) {
  // 1. Normalize messages - strip UI-only types
  const apiMessages = normalizeMessagesForAPI(messages)

  // 2. Inject CLAUDE.md memory files
  const attachments = await getAttachmentMessages(
    cwd,
    toolUseContext.nestedMemoryAttachmentTriggers,
    toolUseContext.loadedNestedMemoryPaths
  )

  // 3. Load global memory
  const memoryPrompt = await loadMemoryPrompt()

  // 4. Fetch system prompt parts (lazy - cached after first call)
  const systemPromptParts = await fetchSystemPromptParts({
    tools, commands, mcpClients, agents, ...
  })

  return {
    system: asSystemPrompt(systemPromptParts),
    messages: [...attachments, ...apiMessages],
    tools: tools.map(t => toAPISchema(t))
  }
}

API Streaming Flow

Claude Code uses the Anthropic SDK's streaming API. Each API call streams StreamEvent objects that are processed and forwarded to the UI in real time:

The streaming parser buffers incoming deltas and fires rendering callbacks as each block completes. This gives the terminal UI its characteristic "streaming text" effect where Claude's response appears character by character.

Error Handling and Retry

The API layer in services/api/ wraps the raw SDK with retry logic via withRetry.js. A FallbackTriggeredError is thrown when the primary model fails and a fallback model is specified (e.g., if the user specified a premium model but it's unavailable). The categorizeRetryableAPIError() function classifies errors into retryable vs. terminal categories.

// Error categories in query.ts
type RetryableCategory =
  | 'overloaded'       // 529 - retry with backoff
  | 'rate_limited'     // 429 - wait for reset
  | 'server_error'     // 500/502/503 - retry
  | 'fallback'         // Model unavailable - switch model
  | 'non_retryable'    // 400/401/403 - fail immediately

// Prompt too long is a special case
if (isPromptTooLongMessage(error)) {
  // Try compaction first, then fail if already compacted
  await attemptCompaction()
}

Tool Dispatch Mechanism

When the stream produces a tool_use content block, the dispatch sequence begins. This is one of the most complex parts of the codebase because tools can run concurrently, update UI state via callbacks, and modify the conversation context:

Find tool: findToolByName(tools, name) looks up by primary name or any alias.
Validate input: tool.validateInput(input, context) runs Zod schema validation plus tool-specific checks.
Check permissions: canUseTool(tool, input, context, ...) runs the full permission pipeline (see Security page).
Execute: tool.call(input, context, canUseTool, parentMessage, onProgress) runs the actual tool logic, which may run subagents, shell commands, file operations, etc.
Collect result: The returned ToolResult<T> contains data plus optional newMessages (for tools that inject additional context) and contextModifier (for tools that need to update the ToolUseContext).
Format for API: tool.mapToolResultToToolResultBlockParam(data, toolUseId) serializes the result into the format the API expects.

Concurrency note: Tools with isConcurrencySafe() === true can run in parallel when the model requests multiple tool calls in one response. Tools with isConcurrencySafe() === false (the default) are serialized. The contextModifier returned by a tool is only honored for non-concurrent tools because ordering matters for context mutation.

Message Compaction Strategy

Every AI conversation has a finite context window. Claude Code has a sophisticated multi-strategy compaction system in services/compact/ that activates when the token count approaches the limit:

Compaction Strategies

Strategy	File	Description
Standard Compact	`compact.ts`	Full summarization: sends the entire conversation to Claude and asks for a detailed summary. The summary replaces all history before the compact boundary.
Auto Compact	`autoCompact.ts`	Triggered automatically when token usage crosses a threshold. Runs without user intervention in non-interactive mode.
Microcompact	`microCompact.ts`	Lighter-weight compaction: summarizes only older tool-use exchanges while keeping recent context intact. Faster than full compaction.
Reactive Compact	`reactiveCompact.ts`	Feature-flagged (`REACTIVE_COMPACT`). Dynamically compacts in response to context pressure signals.
Session Memory	`sessionMemoryCompact.ts`	Extracts and persists important facts from a session to the memory directory before compacting.
Snip	`snipCompact.ts`	Feature-flagged (`HISTORY_SNIP`). Removes specific tool-use exchanges by "snipping" them out rather than summarizing the whole history.
API Microcompact	`apiMicrocompact.ts`	Compacts individual tool results that are excessively large before they enter the message history.

Compact Boundaries

When compaction runs, it inserts a compactBoundaryMessage into the history. The getMessagesAfterCompactBoundary() utility ensures that API requests only include messages after the most recent boundary, the compacted summary lives as the first message in the trimmed window.

// After compaction, message history looks like:
[
  // CompactBoundaryMessage (marker)
  { type: 'system', subtype: 'compact_boundary', summary: 'Prior work...' },

  // Only post-compact messages sent to API
  { role: 'user', content: 'Now fix the remaining issue...' },
  { role: 'assistant', content: [{ type: 'text', text: '...' }] },
  // ...
]

// getMessagesAfterCompactBoundary() returns only post-boundary messages
// The summary is prepended to the system prompt automatically

Tool Use Summary Generation

For long tool-use sequences (e.g., a complex file edit that involves 10 rounds of reading, editing, and verifying), the system can replace the entire exchange with a ToolUseSummaryMessage: a compact text representation of what was done. This is generated by generateToolUseSummary() in services/toolUseSummary/.

The summary is used both for display (showing a condensed history in the REPL) and for API requests (replacing a large tool-exchange block with a small text description, saving tokens).

Slash Command Routing

Before a message reaches the API, processUserInput() checks if it is a slash command. The routing logic handles several special cases:

Local commands (e.g., /compact, /clear, /config) are handled entirely client-side, never sent to the API.
Skill commands (e.g., /commit, /review, custom user scripts): the skill file is loaded, its template is expanded, and the result is submitted as a user message.
Tool skills (e.g., /bash): creates a synthetic tool_use message to immediately invoke a tool.
Priority queue: getCommandsByMaxPriority() allows system-level commands to preempt user commands when the queue has multiple pending messages.

The notifyCommandLifecycle() call in query.ts fires analytics events for command start/end, enabling usage tracking for each slash command type. This is how Anthropic measures which features are actually used.

Inside `prompts.ts`: How the System Prompt is Built

The system prompt isn't a static string — it's assembled fresh on every session from roughly a dozen composable sections, each gated by runtime conditions, feature flags, and user type. The source of truth is src/prompts/prompts.ts, a ~700-line file that is one of the most revealing files in the entire codebase.

The Static/Dynamic Cache Boundary

Near the top of prompts.ts sits one of the most architecturally important constants in Claude Code:

/**
 * Boundary marker separating static (cross-org cacheable) content
 * from dynamic content.
 * Everything BEFORE this marker can use scope: 'global'.
 * Everything AFTER contains user/session-specific content and
 * should not be cached globally.
 *
 * WARNING: Do not remove or reorder this marker without updating
 * cache logic in api.ts and services/api/claude.ts
 */
export const SYSTEM_PROMPT_DYNAMIC_BOUNDARY =
  '__SYSTEM_PROMPT_DYNAMIC_BOUNDARY__'

The system prompt array is split at this marker. Everything before it (the core instructions, tool guidance, behavior rules) is structured as globally cacheable with Anthropic's prompt caching infrastructure — it will be the same hash across all external user sessions, so the API can return a cache hit and skip most of the prompt processing cost. Everything after the marker includes session-specific data like the current working directory, git status, OS version, date, and loaded skills — all of which change per session and cannot be shared in cache.

This is why arbitrary modifications to CLAUDE.md at the system level (before the boundary) are discouraged — changing even one word before the boundary breaks the global cache key for every user simultaneously.

The Two-Tier User System

The single most consequential branch in the entire prompts file is this check, which appears dozens of times:

process.env.USER_TYPE === 'ant'

When this environment variable is set to 'ant' (short for Anthropic), Claude receives a substantially different system prompt than external users. The ant-tier receives additional instructions that Anthropic is still validating before rolling out broadly:

Behavior	External Users	Anthropic Employees (ant)
Comment writing	Standard guidance	Strict "no comments unless WHY is non-obvious" rule; never explain WHAT
Output format	"Be extra concise. Lead with answer."	Flowing prose, inverted pyramid, full explanation, no fragments
Verification	Basic checking	Explicit: "Before reporting complete, verify it works; run the test."
Proactiveness	Standard	Extra: "If you notice a misconception, say so. You're a collaborator."
Accuracy reporting	Standard	Anti-false-claims mitigation: "Never claim 'all tests pass' when output shows failures."
Slack integration	None	Told to post `/share` links to #claude-code-feedback (C07VBSHV7EV)
REPL tool	Not available	Exposed via `process.env.USER_TYPE === 'ant'` guard

The comments in the source code explicitly mark these as A/B experiments in progress, with annotations like // @[MODEL LAUNCH]: capy v8 assertiveness counterweight (PR #24302) — un-gate once validated on external via A/B. This is a live development pipeline: instructions get validated internally first, then gradually rolled out.

Internal Model Codenames

The source file contains developer-facing model codename annotations that reveal Anthropic's internal naming conventions and what's coming next:

// @[MODEL LAUNCH]: Update the latest frontier model.
const FRONTIER_MODEL_NAME = 'Claude Opus 4.6'

// Comments referencing internal codenames:
// @[MODEL LAUNCH]: Remove this section when we launch numbat.
// @[MODEL LAUNCH]: Update comment writing for Capybara — remove or soften
//   once the model stops over-commenting by default
// @[MODEL LAUNCH]: capy v8 thoroughness counterweight (PR #24302)
// @[MODEL LAUNCH]: False-claims mitigation for Capybara v8 (29-30% FC rate
//   vs v4's 16.7%)

Capybara is the internal codename for the currently deployed model family (Claude 4.x / Sonnet 4). Numbat is the next upcoming model, with a dedicated comment marking code that will need updating at its launch. The annotations also reveal active model quality issues being addressed — Capybara v8 had a 29-30% false-claims rate vs. v4's 16.7%, which is why ant users get the strict accuracy-reporting instructions that external users don't yet have.

The Escape Hatch: `CLAUDE_CODE_SIMPLE`

At the very top of getSystemPrompt(), before any of the elaborate section assembly, there's a one-line escape hatch:

export async function getSystemPrompt(tools, model, ...): Promise<string[]> {
  if (isEnvTruthy(process.env.CLAUDE_CODE_SIMPLE)) {
    return [
      `You are Claude Code, Anthropic's official CLI for Claude.\n\nCWD: ${getCwd()}\nDate: ${getSessionStartDate()}`,
    ]
  }
  // ... hundreds of lines of section assembly ...
}

Setting CLAUDE_CODE_SIMPLE=1 collapses the entire elaborate system prompt into exactly two lines: identity + CWD + date. This is used for debugging and for scenarios where you explicitly want a blank-slate Claude with no behavioral guardrails loaded. It's also the fastest possible startup because none of the async section assembly runs.

The Verification Agent A/B Test

One of the most fascinating sections in the file is the verification agent instruction, which is double-gated behind both a compile-time feature flag and a live GrowthBook experiment:

hasAgentTool &&
feature('VERIFICATION_AGENT') &&
// 3P default: false — verification agent is ant-only A/B
getFeatureValue_CACHED_MAY_BE_STALE('tengu_hive_evidence', false)
  ? `The contract: when non-trivial implementation happens on your turn,
     independent adversarial verification must happen before you report
     completion — regardless of who did the implementing (you directly,
     a fork you spawned, or a subagent).
     ...
     Spawn the ${AGENT_TOOL_NAME} with subagent_type="${VERIFICATION_AGENT_TYPE}".
     Your own checks do NOT substitute — only the verifier assigns a verdict.
     On FAIL: fix, resume the verifier, repeat until PASS.
     On PASS: spot-check it — re-run 2-3 commands from its report.`
  : null

When both the VERIFICATION_AGENT feature flag is compiled in AND the tengu_hive_evidence GrowthBook experiment assigns the user to the treatment group, Claude Code mandates that after any non-trivial implementation (3+ file edits, backend/API changes, infrastructure changes), a separate adversarial verification agent must run and pass before Claude can report completion. This is Anthropic's most aggressive attempt to reduce false "I'm done" claims in agentic workflows.

The `CYBER_RISK_INSTRUCTION` Injection Point

Inside getSimpleIntroSection(), a single import from cyberRiskInstruction.ts (see the Security page for a deep dive) is injected directly into the opening paragraph that every external user receives:

function getSimpleIntroSection(outputStyleConfig): string {
  return `
You are an interactive agent that helps users with software engineering tasks.
Use the instructions below and the tools available to you to assist the user.

${CYBER_RISK_INSTRUCTION}

IMPORTANT: You must NEVER generate or guess URLs for the user unless you are
confident that the URLs are for helping the user with programming.`
}

This means the cybersecurity policy lands in the very first paragraph of the system prompt, before all tool instructions and before the cache boundary. It cannot be overridden by CLAUDE.md, project settings, or custom system prompts because it's hardcoded into the section that runs before any user customization is applied.

Async Prefetching

Claude Code aggressively prefetches data at startup and turn boundaries to minimize latency. Several prefetch operations run in parallel:

Prefetch	Purpose	When
`startKeychainPrefetch()`	Read OAuth token + API key from macOS keychain	Before any imports
`startMdmRawRead()`	MDM policy reads via plutil/reg query	Before any imports
`prefetchOfficialMcpUrls()`	Fetch official MCP registry	After auth
`prefetchPassesEligibility()`	Check referral pass eligibility	After auth
`prefetchAwsCredentials()`	AWS/Bedrock credential prefetch	If Bedrock configured
`prefetchFastModeStatus()`	Resolve fast-mode feature gate	At init
`startRelevantMemoryPrefetch()`	Pre-scan for relevant CLAUDE.md files	Per turn
`skillPrefetch`	Pre-index skills directory for search	When `EXPERIMENTAL_SKILL_SEARCH` enabled

Frequently Asked Questions

How does Claude Code compress context?

It uses aggressive compaction techniques, prioritizing errors and recent interactions to keep the context window under the token limit without losing critical working history.

Does it use the standard Anthropic API?

Yes, it interacts primarily via Anthropic's streaming SSE endpoints, allowing tool usages and incremental outputs to stream natively into the terminal.

What happens if the context limit is reached?

The query engine will trigger a context collapse, stripping out older tool logs and retaining only the summaries of past actions to preserve the flow.

Can I customize the system prompt?

The core system prompt is deeply synthesized from configuration, current environment context, and the tool schemas available at runtime, rather than being a static string.

How is token usage measured efficiently?

Token lengths are rigorously tracked with a dedicated cost-tracker to measure limits and compute estimated dollar costs during interactions.

Query Engine Deep Dive

Two Files, One Loop

The QueryEngine Class

Session State

Context Assembly Pipeline

API Streaming Flow

Error Handling and Retry

Tool Dispatch Mechanism

Message Compaction Strategy

Compaction Strategies

Compact Boundaries

Tool Use Summary Generation

Slash Command Routing

Inside prompts.ts: How the System Prompt is Built

The Static/Dynamic Cache Boundary

The Two-Tier User System

Internal Model Codenames

The Escape Hatch: CLAUDE_CODE_SIMPLE

The Verification Agent A/B Test

The CYBER_RISK_INSTRUCTION Injection Point

Async Prefetching

Frequently Asked Questions

Inside `prompts.ts`: How the System Prompt is Built

The Escape Hatch: `CLAUDE_CODE_SIMPLE`

The `CYBER_RISK_INSTRUCTION` Injection Point