The number one failure mode for long-running agents is hitting the context limit mid-task. Your agent is 90% through a 3-hour refactoring session, and then it dies — context full, no graceful handoff, no memory of what it already did. Claude 4.6 introduces context compaction: server-side auto-summarisation that extends conversations effectively without limit. What used to require complex context management code now works with one beta header.
What is context compaction?
Context compaction is a server-side feature: when your conversation approaches the context limit, the Claude API automatically summarises older parts of the conversation. The summary replaces the raw old messages, preserving the "gist" of what happened while freeing capacity for new turns.
The result is effectively infinite conversations for agentic workflows — the model maintains task continuity without you needing to build manual summarisation, rolling window, or session handoff logic.
Currently in beta for Opus 4.6 and Sonnet 4.6. You opt in via a beta header.
How to enable it
import anthropic
client = anthropic.Anthropic()
response = client.beta.messages.create(
model="claude-opus-4-6",
max_tokens=8000,
betas=["context-compaction-2026-02-01"],
thinking={"type": "adaptive"},
effort="high",
messages=conversation_history
)
That's the entire change. One extra string in betas. The compaction happens automatically on the server side — you don't need to detect when it triggers or handle it differently in your response parsing.
You can also configure a compaction threshold to trigger compaction proactively rather than waiting for the limit:
response = client.beta.messages.create(
model="claude-opus-4-6",
max_tokens=8000,
betas=["context-compaction-2026-02-01"],
thinking={"type": "adaptive"},
system=[
{
"type": "text",
"text": "You are a senior software engineer working on a long refactoring task.",
},
{
"type": "text",
"text": "COMPACTION_THRESHOLD: 0.8" # compact when 80% of context is used
}
],
messages=conversation_history
)
What gets compacted vs preserved
Understanding this matters for designing reliable agents.
Gets compacted (summarised):
- Old conversation turns from earlier in the session
- Intermediate tool results that are no longer directly relevant
- Previous thinking blocks (these are already stripped from context by default)
- Superseded decisions and intermediate reasoning steps
Always preserved:
- System prompt — never compacted
- The current task goal and active instruction
- Recent tool results (the last few turns stay intact)
- The current reasoning chain Claude is actively building on
- Explicit facts you stored in the system prompt
The implication: your system prompt is your persistent memory. If there's something the agent must remember across the entire session — a coding style guide, architectural constraints, the user's name — put it in the system prompt, not the conversation. The conversation gets summarised; the system prompt doesn't.
Practical patterns that benefit from compaction
Long code refactoring sessions
An agent reads files, refactors, runs tests, encounters errors, fixes them, iterates. This naturally generates a long conversation as tool results accumulate.
Without compaction: the agent dies around 200K tokens into a session. With compaction: it runs until the task is complete or you cancel it.
def run_refactoring_agent(codebase_path: str):
conversation = []
while True:
response = client.beta.messages.create(
model="claude-opus-4-6",
max_tokens=8000,
betas=["context-compaction-2026-02-01"],
thinking={"type": "adaptive"},
system=REFACTORING_SYSTEM_PROMPT, # constraints never compacted
tools=CODE_TOOLS,
messages=conversation
)
# Handle tool calls, add results to conversation
conversation = update_conversation(conversation, response)
if response.stop_reason == "end_turn":
break
Research synthesis agents
An agent searches across dozens of documents, reads each, extracts relevant passages, then synthesises findings. Old search results get summarised as newer documents are processed. The agent retains its accumulated understanding without holding every source document verbatim in context.
Multi-day project agents
An agent works in a session, you stop it, it resumes the next day. Without compaction, you'd need to manually reconstruct what the agent knew. With compaction and a summary of the previous session in the system prompt, continuity is maintained.
What to watch out for
Compaction is lossy. The summary is accurate at the semantic level but loses specific details. Don't rely on compaction to remember exact numbers, specific line numbers in files, or precise decisions made early in a session. If a number matters, store it externally.
Test near the compaction threshold. Before shipping an agent with compaction enabled, deliberately run it into the threshold in a test environment. Check whether the post-compaction behaviour is correct. Some agents change behaviour subtly after compaction because context they were implicitly relying on got summarised away.
Don't compact your context on every call. The 0.8 threshold is a reasonable default. Going lower (0.5) means more frequent compaction, more overhead, more information loss. Going to 1.0 (wait until full) means potentially hitting the limit before compaction triggers.
Manual alternative (for finer control)
If you need precise control over what's preserved, rolling window management is still available:
def clear_old_tool_results(messages: list, keep_last_n: int = 5) -> list:
"""
Keep system context and recent messages; drop old tool results
to prevent context overflow.
"""
if not messages:
return messages
# Separate user/assistant turns from tool results
tool_result_indices = [
i for i, m in enumerate(messages)
if isinstance(m.get("content"), list)
and any(c.get("type") == "tool_result" for c in m["content"])
]
# Keep only the most recent tool results
if len(tool_result_indices) > keep_last_n:
indices_to_remove = set(tool_result_indices[:-keep_last_n])
messages = [m for i, m in enumerate(messages) if i not in indices_to_remove]
return messages
Use manual management when: you have highly structured tool results where you know exactly what to preserve, you're building a production agent where budget predictability matters more than convenience, or you need to store specific data points in an external database as part of your memory architecture.
Use automatic compaction for most agent workflows — it's good enough and saves substantial engineering time.
Cost implications
Compaction adds a small overhead: the act of summarising itself uses tokens. The server generates the summary, which appears in your token usage. For a typical long-running agent, this overhead is 1–5% of total token usage.
The alternative cost without compaction is much higher: if you hit the context limit and restart the task from scratch, you pay for all the exploration work again plus the first attempt's failed tokens.
For Indian developers billing in ₹ via AICredits.in: a typical 2-hour code refactoring session might burn 800K tokens without compaction (all that exploration is in context). With compaction keeping the window under 200K, you're paying ~₹132 instead of ~₹528 for the same work done. The 4x cost reduction comes from not carrying old tool results indefinitely.
💡 Access Claude 4.6 API in India with UPI billing at AICredits.in — ₹100 minimum top-up, no international card required.
Next steps
- Understand how multi-agent systems use context differently — multi-agent systems
- Full breakdown of Claude 4.6 API changes — Claude Opus 4.6 prompting guide
- Context engineering fundamentals — context engineering
- Access Claude 4.6 in India — AICredits.in review



