Query Engine - Claude Code CLI

The query engine is the core loop that sends messages to the API, streams responses, executes tools, and decides when to stop. It is split across several modules: query.ts (the async generator), QueryEngine.ts (session-level orchestration), query/config.ts (immutable config snapshot), and query/stopHooks.ts (end-of-turn processing).

query() Async Generator

The query() function in src/query.ts is an async generator that yields StreamEvent, Message, and control events. It is the inner loop of a single model turn.

async function* query(
  messagesForQuery: Message[],
  systemPrompt: SystemPrompt,
  toolUseContext: ToolUseContext,
  querySource: QuerySource,
  config: QueryConfig,
  // ...additional parameters
): AsyncGenerator<StreamEvent | Message | ...>

Each iteration of query() performs these steps:

Message normalization: filters and prepares messages for the API via normalizeMessagesForAPI
System prompt construction: prepends user context and appends system context via prependUserContext and appendSystemContext
API call: streams the response, yielding StreamEvent objects as content arrives
Tool execution: when the model emits tool_use blocks, tools are executed (streaming or sequential) and results are fed back
Continuation decision: checks token budgets, stop reasons, and stop hooks to decide whether to loop

Streaming Tool Execution

When the streamingToolExecution gate is enabled, tools begin executing as soon as their input JSON is complete, overlapping with continued model output:

const StreamingToolExecutor = // ...
// Tools execute as their JSON completes during streaming
// Results are collected and fed back after the full response

Auto-Compaction

The query loop integrates with the compaction system. When token usage approaches the context window limit, it triggers automatic compaction:

calculateTokenWarningState checks if compaction thresholds are exceeded
isAutoCompactEnabled gates whether auto-compaction can fire
buildPostCompactMessages constructs the compacted message set
Reactive compaction (REACTIVE_COMPACT feature) and context collapse (CONTEXT_COLLAPSE) provide alternative strategies behind feature gates

Missing Tool Results

If the API response contains tool_use blocks but execution is interrupted, yieldMissingToolResultBlocks generates synthetic error results to maintain the alternating assistant/user message invariant:

function* yieldMissingToolResultBlocks(
  assistantMessages: AssistantMessage[],
  errorMessage: string,
)

QueryEngine

QueryEngine in src/QueryEngine.ts owns the full conversation lifecycle. One QueryEngine instance per conversation. Each submitMessage() call starts a new turn.

Configuration

cwdstring

Working directory for the conversation.

toolsTools

Available tools for this session.

commandsCommand[]

Slash commands available to the model.

mcpClientsMCPServerConnection[]

Connected MCP servers.

agentsAgentDefinition[]

Available agent definitions.

canUseToolCanUseToolFn

Permission check function for tool use.

getAppState / setAppStatefunctions

AppState accessors for reading and updating UI state.

maxTurnsnumber | undefined

Maximum number of agentic turns before stopping.

maxBudgetUsdnumber | undefined

Maximum cost budget for the conversation.

taskBudgetobject | undefined

Token budget for background tasks.

thinkingConfigThinkingConfig | undefined

Extended thinking configuration.

customSystemPromptstring | undefined

Override for the system prompt.

appendSystemPromptstring | undefined

Additional text appended to the system prompt.

snipReplayfunction | undefined

Handler for snip-boundary messages (HISTORY_SNIP feature).

Session State

The QueryEngine maintains mutable state across turns:

class QueryEngine {
  private mutableMessages: Message[]           // Full conversation history
  private abortController: AbortController     // Cancellation handle
  private permissionDenials: SDKPermissionDenial[]  // Tracked denials
  private totalUsage: NonNullableUsage         // Accumulated token usage
  private readFileState: FileStateCache        // File content cache
  private discoveredSkillNames: Set<string>    // Skills found this turn
  private loadedNestedMemoryPaths: Set<string> // Memory files loaded
}

System Prompt Building

The QueryEngine delegates system prompt construction to fetchSystemPromptParts() (from src/utils/queryContext.ts), which assembles:

Base system prompt from getSystemPrompt()
User context (CLAUDE.md files, current date)
System context (git status, recent commits, cache breaker)
Coordinator context (when in coordinator mode)
Scratchpad context (when scratchpad is enabled)
Memory prompt (from memdir)
Skill-specific hooks and content

Message Filtering

The MessageSelector component (lazy-loaded to avoid pulling React/Ink into headless paths) filters messages for display and API submission, handling synthetic messages, compaction boundaries, and agent-scoped visibility.

Query Config

src/query/config.ts snapshots immutable values once at query() entry, separating them from the mutable ToolUseContext and per-iteration state.

type QueryConfig = {
  sessionId: SessionId
  gates: {
    streamingToolExecution: boolean  // Statsig gate
    emitToolUseSummaries: boolean    // Env var control
    isAnt: boolean                   // Internal user flag
    fastModeEnabled: boolean         // Fast mode availability
  }
}

QueryConfig intentionally excludes feature() gates. Those are tree-shaking boundaries resolved at build time and must stay inline at their guarded blocks for dead-code elimination to work.

Stop Hooks

src/query/stopHooks.ts runs after the model's response completes (stop reason end_turn or stop_sequence). It is itself an async generator, yielding additional messages and events.

Hook Pipeline

The handleStopHooks function orchestrates several post-turn operations:

async function* handleStopHooks(
  messagesForQuery, assistantMessages, systemPrompt,
  userContext, systemContext, toolUseContext, querySource,
): AsyncGenerator<StreamEvent | Message, StopHookResult>

The pipeline includes:

Stop hooks (executeStopHooks): user-configured hooks that run at turn end
Task completed hooks (executeTaskCompletedHooks): fire when a task finishes
Teammate idle hooks (executeTeammateIdleHooks): fire when a teammate goes idle
Memory extraction (EXTRACT_MEMORIES feature): extracts memories from the conversation
Auto-dream (executeAutoDream): background knowledge synthesis
Prompt suggestions (executePromptSuggestion): generates follow-up suggestions
Job classification (TEMPLATES feature): classifies the conversation for templates

Stop Hook Result

The result determines whether the query loop continues:

type StopHookResult = {
  blockingErrors: Message[]       // Errors that must be shown
  preventContinuation: boolean    // If true, stop the query loop
}

Token Budgets

Token budget management spans bootstrap state and query utilities:

getCurrentTurnTokenBudget(): returns the output token limit for the current turn
getTurnOutputTokens(): returns tokens generated so far in the current turn
incrementBudgetContinuationCount(): tracks how many times the budget has been extended
createBudgetTracker / checkTokenBudget: utilities for enforcing budget limits during streaming

The budget system supports escalation: when the model hits the default limit but has more work to do, the budget can be raised to ESCALATED_MAX_TOKENS for continued generation.

On this page
query() Async Generator
Streaming Tool Execution
Auto-Compaction
Missing Tool Results
QueryEngine
Configuration
Session State
System Prompt Building
Message Filtering
Query Config
Stop Hooks
Hook Pipeline
Stop Hook Result
Token Budgets

State Management Previous Context SystemNext

AI Assistant