The query engine is the core loop that sends messages to the API, streams responses, executes tools, and decides when to stop. It is split across several modules: query.ts (the async generator), QueryEngine.ts (session-level orchestration), query/config.ts (immutable config snapshot), and query/stopHooks.ts (end-of-turn processing).
query() Async Generator
The query() function in src/query.ts is an async generator that yields StreamEvent, Message, and control events. It is the inner loop of a single model turn.
async function* query(
messagesForQuery: Message[],
systemPrompt: SystemPrompt,
toolUseContext: ToolUseContext,
querySource: QuerySource,
config: QueryConfig,
// ...additional parameters
): AsyncGenerator<StreamEvent | Message | ...>Each iteration of query() performs these steps:
- Message normalization: filters and prepares messages for the API via
normalizeMessagesForAPI - System prompt construction: prepends user context and appends system context via
prependUserContextandappendSystemContext - API call: streams the response, yielding
StreamEventobjects as content arrives - Tool execution: when the model emits
tool_useblocks, tools are executed (streaming or sequential) and results are fed back - Continuation decision: checks token budgets, stop reasons, and stop hooks to decide whether to loop
Streaming Tool Execution
When the streamingToolExecution gate is enabled, tools begin executing as soon as their input JSON is complete, overlapping with continued model output:
const StreamingToolExecutor = // ...
// Tools execute as their JSON completes during streaming
// Results are collected and fed back after the full responseAuto-Compaction
The query loop integrates with the compaction system. When token usage approaches the context window limit, it triggers automatic compaction:
calculateTokenWarningStatechecks if compaction thresholds are exceededisAutoCompactEnabledgates whether auto-compaction can firebuildPostCompactMessagesconstructs the compacted message set- Reactive compaction (
REACTIVE_COMPACTfeature) and context collapse (CONTEXT_COLLAPSE) provide alternative strategies behind feature gates
Missing Tool Results
If the API response contains tool_use blocks but execution is interrupted, yieldMissingToolResultBlocks generates synthetic error results to maintain the alternating assistant/user message invariant:
function* yieldMissingToolResultBlocks(
assistantMessages: AssistantMessage[],
errorMessage: string,
)QueryEngine
QueryEngine in src/QueryEngine.ts owns the full conversation lifecycle. One QueryEngine instance per conversation. Each submitMessage() call starts a new turn.
Configuration
Session State
The QueryEngine maintains mutable state across turns:
class QueryEngine {
private mutableMessages: Message[] // Full conversation history
private abortController: AbortController // Cancellation handle
private permissionDenials: SDKPermissionDenial[] // Tracked denials
private totalUsage: NonNullableUsage // Accumulated token usage
private readFileState: FileStateCache // File content cache
private discoveredSkillNames: Set<string> // Skills found this turn
private loadedNestedMemoryPaths: Set<string> // Memory files loaded
}System Prompt Building
The QueryEngine delegates system prompt construction to fetchSystemPromptParts() (from src/utils/queryContext.ts), which assembles:
- Base system prompt from
getSystemPrompt() - User context (CLAUDE.md files, current date)
- System context (git status, recent commits, cache breaker)
- Coordinator context (when in coordinator mode)
- Scratchpad context (when scratchpad is enabled)
- Memory prompt (from memdir)
- Skill-specific hooks and content
Message Filtering
The MessageSelector component (lazy-loaded to avoid pulling React/Ink into headless paths) filters messages for display and API submission, handling synthetic messages, compaction boundaries, and agent-scoped visibility.
Query Config
src/query/config.ts snapshots immutable values once at query() entry, separating them from the mutable ToolUseContext and per-iteration state.
type QueryConfig = {
sessionId: SessionId
gates: {
streamingToolExecution: boolean // Statsig gate
emitToolUseSummaries: boolean // Env var control
isAnt: boolean // Internal user flag
fastModeEnabled: boolean // Fast mode availability
}
}QueryConfig intentionally excludes feature() gates. Those are tree-shaking boundaries resolved at build time and must stay inline at their guarded blocks for dead-code elimination to work.
Stop Hooks
src/query/stopHooks.ts runs after the model's response completes (stop reason end_turn or stop_sequence). It is itself an async generator, yielding additional messages and events.
Hook Pipeline
The handleStopHooks function orchestrates several post-turn operations:
async function* handleStopHooks(
messagesForQuery, assistantMessages, systemPrompt,
userContext, systemContext, toolUseContext, querySource,
): AsyncGenerator<StreamEvent | Message, StopHookResult>The pipeline includes:
- Stop hooks (
executeStopHooks): user-configured hooks that run at turn end - Task completed hooks (
executeTaskCompletedHooks): fire when a task finishes - Teammate idle hooks (
executeTeammateIdleHooks): fire when a teammate goes idle - Memory extraction (
EXTRACT_MEMORIESfeature): extracts memories from the conversation - Auto-dream (
executeAutoDream): background knowledge synthesis - Prompt suggestions (
executePromptSuggestion): generates follow-up suggestions - Job classification (
TEMPLATESfeature): classifies the conversation for templates
Stop Hook Result
The result determines whether the query loop continues:
type StopHookResult = {
blockingErrors: Message[] // Errors that must be shown
preventContinuation: boolean // If true, stop the query loop
}Token Budgets
Token budget management spans bootstrap state and query utilities:
getCurrentTurnTokenBudget(): returns the output token limit for the current turngetTurnOutputTokens(): returns tokens generated so far in the current turnincrementBudgetContinuationCount(): tracks how many times the budget has been extendedcreateBudgetTracker/checkTokenBudget: utilities for enforcing budget limits during streaming
The budget system supports escalation: when the model hits the default limit but has more work to do, the budget can be raised to ESCALATED_MAX_TOKENS for continued generation.