AI Assistant

Compact Service

The compact service (src/services/compact/) manages context window pressure by compressing conversation history when it grows too large. It provides multiple compaction strategies, from full conversation summarization to targeted tool result compression.

Why Compaction Is Needed

Claude models have finite context windows. As conversations grow with tool calls, file reads, and multi-step reasoning, the token count approaches the limit. Without compaction:

  • The API rejects requests with "prompt is too long" errors
  • Prompt cache efficiency degrades as the context shifts
  • Response quality can suffer from irrelevant historical context

The compact service keeps conversations within bounds while preserving the most important context.

Compaction Strategies

Proactive compaction triggered when token usage crosses a threshold. Runs automatically between query turns.

Auto-Compact

Location: src/services/compact/autoCompact.ts

Auto-compact monitors token usage and triggers compaction before the context window is exhausted.

Thresholds

const AUTOCOMPACT_BUFFER_TOKENS = 13_000
const WARNING_THRESHOLD_BUFFER_TOKENS = 20_000
const MANUAL_COMPACT_BUFFER_TOKENS = 3_000

The auto-compact threshold is calculated as:

effectiveContextWindow = contextWindow - reservedForSummaryOutput
autoCompactThreshold = effectiveContextWindow - 13,000
isAboveWarningThresholdboolean

Token usage is within 20K of the limit. UI shows a yellow warning.

isAboveErrorThresholdboolean

Token usage is within 20K of the limit (same buffer). UI shows a red warning.

isAboveAutoCompactThresholdboolean

Token usage is within 13K of the effective window. Triggers auto-compaction.

isAtBlockingLimitboolean

Token usage is within 3K of the effective window. Blocks further queries until compacted.

Circuit Breaker

Auto-compact includes a circuit breaker to prevent wasted API calls when compaction repeatedly fails:

const MAX_CONSECUTIVE_AUTOCOMPACT_FAILURES = 3

After 3 consecutive failures, auto-compact stops retrying for the remainder of the session. This prevents sessions with irrecoverably large contexts from burning API calls on doomed compaction attempts.

Compaction Order

When auto-compact triggers, it tries strategies in order:

  1. Session memory compaction: Replace old messages with session memory summary (fast, no API call)
  2. Full compaction: Run a forked agent to summarize the conversation (slower, requires API call)

Full Compaction

Location: src/services/compact/compact.ts

The full compaction flow in compactConversation():

  1. Group messages by API round using groupMessagesByApiRound()
  2. Strip images from messages (not needed for summarization, avoids hitting size limits)
  3. Insert a compact boundary message marking the compaction point
  4. Run a forked agent with the conversation history and a compact prompt
  5. Stream the summary back, replacing pre-boundary messages with the compressed result
  6. Run post-compact cleanup (re-inject CLAUDE.md, restore recent file state, re-attach skills)

Compact Boundary Messages

A SystemCompactBoundaryMessage marks where compaction occurred in the message array:

type SystemCompactBoundaryMessage = {
  type: 'system'
  subtype: 'compact_boundary'
  summary: string      // The compacted summary
  direction?: 'older' | 'newer'  // Which side was compacted
}

Messages before the boundary are replaced with the summary. Messages after it are preserved verbatim.

Post-Compact Cleanup

After compaction, runPostCompactCleanup() restores essential context:

  • Re-injects CLAUDE.md and memory files as attachments
  • Restores file state for recently-edited files (up to 5 files, 5K tokens each)
  • Re-attaches invoked skills (up to 25K tokens total budget, 5K per skill)
  • Reprocesses session start hooks
  • Notifies prompt cache break detection

Micro-Compact

Location: src/services/compact/microCompact.ts

Micro-compact targets individual tool results for compression, keeping the overall conversation structure intact while reducing token bloat from large outputs.

Compactable Tools

Only specific tool results are eligible for micro-compaction:

const COMPACTABLE_TOOLS = new Set([
  'Read',          // File contents
  'Bash', 'Shell', // Command output
  'Grep',          // Search results
  'Glob',          // File listings
  'WebSearch',     // Search results
  'WebFetch',      // Fetched content
  'Edit',          // Edit results
  'Write',         // Write results
])

Time-Based Micro-Compact

A time-based variant clears old tool result content after a configurable period, replacing it with [Old tool result content cleared]. This is configured via getTimeBasedMCConfig().

Cached Micro-Compact

An advanced variant (feature-gated) maintains a cache of compressed tool results. It tracks:

  • Pending cache edits: New compressions to include in the next API request
  • Pinned cache edits: Previously-compressed results that must be re-sent at their original positions for cache hits
export function consumePendingCacheEdits(): CacheEditsBlock | null
export function getPinnedCacheEdits(): PinnedCacheEdits[]

Message Grouping

The groupMessagesByApiRound() function in grouping.ts organizes messages into logical groups for compaction:

  • Each group represents one API round-trip (user message + assistant response + tool results)
  • Groups are compacted as units to maintain coherence
  • Partial compaction can target specific groups (older or newer) based on the PartialCompactDirection

Environment Controls

VariableEffect
DISABLE_COMPACTDisables all compaction (auto, manual, reactive)
DISABLE_AUTO_COMPACTDisables auto-compact only (manual /compact still works)
CLAUDE_CODE_AUTO_COMPACT_WINDOWOverride the effective context window size
CLAUDE_AUTOCOMPACT_PCT_OVERRIDETrigger auto-compact at a percentage of the context window
CLAUDE_CODE_BLOCKING_LIMIT_OVERRIDEOverride the blocking limit for testing

Disabling compaction entirely (DISABLE_COMPACT) means the CLI will eventually hit "prompt too long" errors in long conversations. Use this only for debugging or short sessions.