AI Assistant

The query engine is the core loop that sends messages to the API, streams responses, executes tools, and decides when to stop. It is split across several modules: query.ts (the async generator), QueryEngine.ts (session-level orchestration), query/config.ts (immutable config snapshot), and query/stopHooks.ts (end-of-turn processing).

query() Async Generator

The query() function in src/query.ts is an async generator that yields StreamEvent, Message, and control events. It is the inner loop of a single model turn.

async function* query(
  messagesForQuery: Message[],
  systemPrompt: SystemPrompt,
  toolUseContext: ToolUseContext,
  querySource: QuerySource,
  config: QueryConfig,
  // ...additional parameters
): AsyncGenerator<StreamEvent | Message | ...>

Each iteration of query() performs these steps:

  1. Message normalization: filters and prepares messages for the API via normalizeMessagesForAPI
  2. System prompt construction: prepends user context and appends system context via prependUserContext and appendSystemContext
  3. API call: streams the response, yielding StreamEvent objects as content arrives
  4. Tool execution: when the model emits tool_use blocks, tools are executed (streaming or sequential) and results are fed back
  5. Continuation decision: checks token budgets, stop reasons, and stop hooks to decide whether to loop

Streaming Tool Execution

When the streamingToolExecution gate is enabled, tools begin executing as soon as their input JSON is complete, overlapping with continued model output:

const StreamingToolExecutor = // ...
// Tools execute as their JSON completes during streaming
// Results are collected and fed back after the full response

Auto-Compaction

The query loop integrates with the compaction system. When token usage approaches the context window limit, it triggers automatic compaction:

  • calculateTokenWarningState checks if compaction thresholds are exceeded
  • isAutoCompactEnabled gates whether auto-compaction can fire
  • buildPostCompactMessages constructs the compacted message set
  • Reactive compaction (REACTIVE_COMPACT feature) and context collapse (CONTEXT_COLLAPSE) provide alternative strategies behind feature gates

Missing Tool Results

If the API response contains tool_use blocks but execution is interrupted, yieldMissingToolResultBlocks generates synthetic error results to maintain the alternating assistant/user message invariant:

function* yieldMissingToolResultBlocks(
  assistantMessages: AssistantMessage[],
  errorMessage: string,
)

QueryEngine

QueryEngine in src/QueryEngine.ts owns the full conversation lifecycle. One QueryEngine instance per conversation. Each submitMessage() call starts a new turn.

Configuration

cwdstring
Working directory for the conversation.
toolsTools
Available tools for this session.
commandsCommand[]
Slash commands available to the model.
mcpClientsMCPServerConnection[]
Connected MCP servers.
agentsAgentDefinition[]
Available agent definitions.
canUseToolCanUseToolFn
Permission check function for tool use.
getAppState / setAppStatefunctions
AppState accessors for reading and updating UI state.
maxTurnsnumber | undefined
Maximum number of agentic turns before stopping.
maxBudgetUsdnumber | undefined
Maximum cost budget for the conversation.
taskBudgetobject | undefined
Token budget for background tasks.
thinkingConfigThinkingConfig | undefined
Extended thinking configuration.
customSystemPromptstring | undefined
Override for the system prompt.
appendSystemPromptstring | undefined
Additional text appended to the system prompt.
snipReplayfunction | undefined
Handler for snip-boundary messages (HISTORY_SNIP feature).

Session State

The QueryEngine maintains mutable state across turns:

class QueryEngine {
  private mutableMessages: Message[]           // Full conversation history
  private abortController: AbortController     // Cancellation handle
  private permissionDenials: SDKPermissionDenial[]  // Tracked denials
  private totalUsage: NonNullableUsage         // Accumulated token usage
  private readFileState: FileStateCache        // File content cache
  private discoveredSkillNames: Set<string>    // Skills found this turn
  private loadedNestedMemoryPaths: Set<string> // Memory files loaded
}

System Prompt Building

The QueryEngine delegates system prompt construction to fetchSystemPromptParts() (from src/utils/queryContext.ts), which assembles:

  • Base system prompt from getSystemPrompt()
  • User context (CLAUDE.md files, current date)
  • System context (git status, recent commits, cache breaker)
  • Coordinator context (when in coordinator mode)
  • Scratchpad context (when scratchpad is enabled)
  • Memory prompt (from memdir)
  • Skill-specific hooks and content

Message Filtering

The MessageSelector component (lazy-loaded to avoid pulling React/Ink into headless paths) filters messages for display and API submission, handling synthetic messages, compaction boundaries, and agent-scoped visibility.

Query Config

src/query/config.ts snapshots immutable values once at query() entry, separating them from the mutable ToolUseContext and per-iteration state.

type QueryConfig = {
  sessionId: SessionId
  gates: {
    streamingToolExecution: boolean  // Statsig gate
    emitToolUseSummaries: boolean    // Env var control
    isAnt: boolean                   // Internal user flag
    fastModeEnabled: boolean         // Fast mode availability
  }
}

QueryConfig intentionally excludes feature() gates. Those are tree-shaking boundaries resolved at build time and must stay inline at their guarded blocks for dead-code elimination to work.

Stop Hooks

src/query/stopHooks.ts runs after the model's response completes (stop reason end_turn or stop_sequence). It is itself an async generator, yielding additional messages and events.

Hook Pipeline

The handleStopHooks function orchestrates several post-turn operations:

async function* handleStopHooks(
  messagesForQuery, assistantMessages, systemPrompt,
  userContext, systemContext, toolUseContext, querySource,
): AsyncGenerator<StreamEvent | Message, StopHookResult>

The pipeline includes:

  1. Stop hooks (executeStopHooks): user-configured hooks that run at turn end
  2. Task completed hooks (executeTaskCompletedHooks): fire when a task finishes
  3. Teammate idle hooks (executeTeammateIdleHooks): fire when a teammate goes idle
  4. Memory extraction (EXTRACT_MEMORIES feature): extracts memories from the conversation
  5. Auto-dream (executeAutoDream): background knowledge synthesis
  6. Prompt suggestions (executePromptSuggestion): generates follow-up suggestions
  7. Job classification (TEMPLATES feature): classifies the conversation for templates

Stop Hook Result

The result determines whether the query loop continues:

type StopHookResult = {
  blockingErrors: Message[]       // Errors that must be shown
  preventContinuation: boolean    // If true, stop the query loop
}

Token Budgets

Token budget management spans bootstrap state and query utilities:

  • getCurrentTurnTokenBudget(): returns the output token limit for the current turn
  • getTurnOutputTokens(): returns tokens generated so far in the current turn
  • incrementBudgetContinuationCount(): tracks how many times the budget has been extended
  • createBudgetTracker / checkTokenBudget: utilities for enforcing budget limits during streaming

The budget system supports escalation: when the model hits the default limit but has more work to do, the budget can be raised to ESCALATED_MAX_TOKENS for continued generation.