Compact Service
The compact service (src/services/compact/) manages context window pressure by compressing conversation history when it grows too large. It provides multiple compaction strategies, from full conversation summarization to targeted tool result compression.
Why Compaction Is Needed
Claude models have finite context windows. As conversations grow with tool calls, file reads, and multi-step reasoning, the token count approaches the limit. Without compaction:
- The API rejects requests with "prompt is too long" errors
- Prompt cache efficiency degrades as the context shifts
- Response quality can suffer from irrelevant historical context
The compact service keeps conversations within bounds while preserving the most important context.
Compaction Strategies
Proactive compaction triggered when token usage crosses a threshold. Runs automatically between query turns.
Auto-Compact
Location: src/services/compact/autoCompact.ts
Auto-compact monitors token usage and triggers compaction before the context window is exhausted.
Thresholds
const AUTOCOMPACT_BUFFER_TOKENS = 13_000
const WARNING_THRESHOLD_BUFFER_TOKENS = 20_000
const MANUAL_COMPACT_BUFFER_TOKENS = 3_000The auto-compact threshold is calculated as:
effectiveContextWindow = contextWindow - reservedForSummaryOutput
autoCompactThreshold = effectiveContextWindow - 13,000Token usage is within 20K of the limit. UI shows a yellow warning.
Token usage is within 20K of the limit (same buffer). UI shows a red warning.
Token usage is within 13K of the effective window. Triggers auto-compaction.
Token usage is within 3K of the effective window. Blocks further queries until compacted.
Circuit Breaker
Auto-compact includes a circuit breaker to prevent wasted API calls when compaction repeatedly fails:
const MAX_CONSECUTIVE_AUTOCOMPACT_FAILURES = 3After 3 consecutive failures, auto-compact stops retrying for the remainder of the session. This prevents sessions with irrecoverably large contexts from burning API calls on doomed compaction attempts.
Compaction Order
When auto-compact triggers, it tries strategies in order:
- Session memory compaction: Replace old messages with session memory summary (fast, no API call)
- Full compaction: Run a forked agent to summarize the conversation (slower, requires API call)
Full Compaction
Location: src/services/compact/compact.ts
The full compaction flow in compactConversation():
- Group messages by API round using
groupMessagesByApiRound() - Strip images from messages (not needed for summarization, avoids hitting size limits)
- Insert a compact boundary message marking the compaction point
- Run a forked agent with the conversation history and a compact prompt
- Stream the summary back, replacing pre-boundary messages with the compressed result
- Run post-compact cleanup (re-inject CLAUDE.md, restore recent file state, re-attach skills)
Compact Boundary Messages
A SystemCompactBoundaryMessage marks where compaction occurred in the message array:
type SystemCompactBoundaryMessage = {
type: 'system'
subtype: 'compact_boundary'
summary: string // The compacted summary
direction?: 'older' | 'newer' // Which side was compacted
}Messages before the boundary are replaced with the summary. Messages after it are preserved verbatim.
Post-Compact Cleanup
After compaction, runPostCompactCleanup() restores essential context:
- Re-injects CLAUDE.md and memory files as attachments
- Restores file state for recently-edited files (up to 5 files, 5K tokens each)
- Re-attaches invoked skills (up to 25K tokens total budget, 5K per skill)
- Reprocesses session start hooks
- Notifies prompt cache break detection
Micro-Compact
Location: src/services/compact/microCompact.ts
Micro-compact targets individual tool results for compression, keeping the overall conversation structure intact while reducing token bloat from large outputs.
Compactable Tools
Only specific tool results are eligible for micro-compaction:
const COMPACTABLE_TOOLS = new Set([
'Read', // File contents
'Bash', 'Shell', // Command output
'Grep', // Search results
'Glob', // File listings
'WebSearch', // Search results
'WebFetch', // Fetched content
'Edit', // Edit results
'Write', // Write results
])Time-Based Micro-Compact
A time-based variant clears old tool result content after a configurable period, replacing it with [Old tool result content cleared]. This is configured via getTimeBasedMCConfig().
Cached Micro-Compact
An advanced variant (feature-gated) maintains a cache of compressed tool results. It tracks:
- Pending cache edits: New compressions to include in the next API request
- Pinned cache edits: Previously-compressed results that must be re-sent at their original positions for cache hits
export function consumePendingCacheEdits(): CacheEditsBlock | null
export function getPinnedCacheEdits(): PinnedCacheEdits[]Message Grouping
The groupMessagesByApiRound() function in grouping.ts organizes messages into logical groups for compaction:
- Each group represents one API round-trip (user message + assistant response + tool results)
- Groups are compacted as units to maintain coherence
- Partial compaction can target specific groups (older or newer) based on the
PartialCompactDirection
Environment Controls
| Variable | Effect |
|---|---|
DISABLE_COMPACT | Disables all compaction (auto, manual, reactive) |
DISABLE_AUTO_COMPACT | Disables auto-compact only (manual /compact still works) |
CLAUDE_CODE_AUTO_COMPACT_WINDOW | Override the effective context window size |
CLAUDE_AUTOCOMPACT_PCT_OVERRIDE | Trigger auto-compact at a percentage of the context window |
CLAUDE_CODE_BLOCKING_LIMIT_OVERRIDE | Override the blocking limit for testing |
Disabling compaction entirely (DISABLE_COMPACT) means the CLI will eventually hit "prompt too long" errors in long conversations. Use this only for debugging or short sessions.