Released February 5, 2026, Claude Opus 4.6 is the largest capability jump in a single Claude generation. Three things changed how you should prompt it: adaptive thinking, the effort parameter, and 1M context going GA. If you're still using the old budget_tokens approach, you're overpaying — and the parameter is deprecated.
What changed from Claude 4.5 to 4.6 (for prompters)
The surface API looks similar. The behaviour underneath is substantially different.
| Feature | Claude 4.5 | Claude 4.6 |
|---|---|---|
| Thinking mode | budget_tokens required | Adaptive (auto) |
| Effort control | Not available | low / medium / high / max |
| Context window | 200K | 1M (Opus & Sonnet) |
| Max output | 64K | 128K (Opus), 64K (Sonnet) |
| Prefill | Supported | Returns 400 error |
| Models available | 3.5 Sonnet, Haiku 3.5 | Retired; 4.6 Sonnet/Opus replace them |
The retired models matter: claude-3-5-sonnet-20241022 and claude-haiku-3-5-20241022 are gone as of ~April 2026. If your code still references those strings, you'll get errors.
Adaptive thinking — how it actually works
The old extended thinking API required you to specify how many tokens Claude was "allowed" to think with. This was a blunt instrument — you either over-allocated (wasting money) or under-allocated (degrading quality on hard problems).
Old way (deprecated, avoid)
# Don't do this — budget_tokens is deprecated in Claude 4.6
response = client.messages.create(
model="claude-opus-4-6",
max_tokens=16000,
thinking={
"type": "enabled",
"budget_tokens": 10000 # ❌ deprecated
},
messages=[{"role": "user", "content": "Solve this..."}]
)
New way (adaptive)
response = client.messages.create(
model="claude-opus-4-6",
max_tokens=16000,
thinking={"type": "adaptive"}, # ✅ Claude decides internally
effort="high", # controls depth of reasoning
messages=[{"role": "user", "content": "Solve this..."}]
)
With adaptive thinking, Claude decides internally when and how much to think. At effort="high" (the default for Opus 4.6), it almost always engages the thinking module. At effort="low", it skips thinking entirely for simple problems and responds directly. The model is better at calibrating this than you are with a manual token budget.
The effort parameter — your main cost-quality lever
The effort parameter is new in 4.6 and is the primary way to control the cost-quality tradeoff.
response = client.messages.create(
model="claude-sonnet-4-6",
max_tokens=4000,
thinking={"type": "adaptive"},
effort="medium", # low | medium | high | max
messages=[{"role": "user", "content": prompt}]
)
| Effort | Use case | Relative cost |
|---|---|---|
low | Classification, routing, extraction, yes/no decisions | Cheapest |
medium | General coding, text generation, summarisation, standard Q&A | ~40–60% cheaper than high |
high | Default for Opus 4.6. Complex reasoning, multi-step planning. | Standard |
max | Hardest problems only. Architecture decisions, novel research. Opus 4.6 only. | Most expensive |
Anthropic's own documentation recommends medium as the default for most Sonnet 4.6 use cases. That's meaningful — it means the engineering team is confident that most tasks don't need full-depth reasoning.
Rule of thumb
Start all Sonnet 4.6 tasks at effort="medium". Drop to low for classification and routing. Reserve max exclusively for Opus 4.6 on problems you'd genuinely spend an hour thinking through yourself.
The 1M token context window — what you can actually do now
The 1M context window is now GA for both Opus 4.6 and Sonnet 4.6 — no beta header required. In practice, what fits?
- Entire codebase: ~50,000 lines of code
- Legal document set: ~200 average contracts
- Research corpus: ~1,500 academic papers (abstract + intro sections)
- Meeting transcripts: ~5 years of weekly 1-hour meetings
The "lost in the middle" problem is real but improved. Opus 4.6 MRCR v2 score is 76% on an 8-needle, 1M-token test — 4x better than Sonnet 4.5's 18.5% on the same task. Still, put the most important content at the beginning or end of the context. Structure with clear section headers so Claude can navigate when it can't hold every detail equally.
Cost at 1M tokens via AICredits.in for Indian developers: ~₹252 per 1M input tokens on Sonnet 4.6 ($3/MTok at ₹84/USD + ~10% platform markup), ~₹462 per 1M input tokens on Opus 4.6 ($5/MTok). No long-context surcharge — standard per-token pricing applies to every token in the window.
Interleaved thinking — reasoning between tool calls
When Claude uses tools in an agent loop, 4.5 would plan upfront, then execute. 4.6 with adaptive thinking reasons at each step — think before calling a tool, receive the result, think again before the next call.
This is automatically enabled when you use thinking={"type": "adaptive"} with tool use. You don't need any code changes if you're already using adaptive thinking. The practical effect: much more reliable multi-step agent workflows. Claude doesn't commit to a plan it can't revise; it updates its reasoning as evidence arrives.
Interleaved thinking is one of the main reasons Opus 4.6 hits 80.8% on SWE-bench Verified — code debugging requires updating your mental model as you read output, not following a predetermined script.
What to stop doing on 4.6
Stop using budget_tokens. The parameter is deprecated. Claude ignores it. Switch to thinking={"type": "adaptive"} and control depth via effort.
Stop prefilling assistant messages. This was a common trick to guide Claude's output format — pre-populating the start of Claude's response in the messages array. On 4.6, this returns a 400 error. The fix: move the formatting instruction into your user message or system prompt instead.
Stop using old model strings as the "cheap" option. The previous go-to for cost-sensitive tasks was claude-haiku-3-5-20241022. That model is retired. Use claude-sonnet-4-6 with effort="low" or effort="medium" instead — it's both cheaper and more capable for most tasks.
Quick migration from 4.5 to 4.6 API calls
Pattern 1: Old extended thinking → new adaptive
# Before (4.5)
response = client.messages.create(
model="claude-opus-4-5",
max_tokens=16000,
thinking={"type": "enabled", "budget_tokens": 8000},
messages=messages
)
# After (4.6)
response = client.messages.create(
model="claude-opus-4-6",
max_tokens=16000,
thinking={"type": "adaptive"},
effort="high",
messages=messages
)
Pattern 2: Old model string → new model string
# Before
model = "claude-3-5-sonnet-20241022" # retired
model = "claude-haiku-3-5-20241022" # retired
# After
model = "claude-sonnet-4-6" # general purpose, cost-effective
model = "claude-opus-4-6" # highest capability
Pattern 3: Old prefill trick → instruction in user message
# Before — returned structured output by prefilling "{"
messages = [
{"role": "user", "content": "Extract entities from: " + text},
{"role": "assistant", "content": "{"} # ❌ returns 400 on 4.6
]
# After — instruction in user message
messages = [
{
"role": "user",
"content": f"Extract entities from the text below. Respond with valid JSON only, starting with {{.\n\n{text}"
}
]
The migration is mostly mechanical. The biggest functional change is the prefill breakage — if you have any code that builds the assistant turn before Claude responds, that needs fixing before you switch model versions.
SWE-bench and ARC-AGI-2 — what the benchmarks mean for you
Numbers without context are marketing. Here's what they actually mean:
SWE-bench Verified: real GitHub issues from real codebases. Sonnet 4.6 at 79.6% and Opus 4.6 at 80.8% means roughly 4 in 5 real-world coding tasks are solved correctly from a cold start. That's the bar where autonomous coding agents become useful rather than aspirational.
ARC-AGI-2: tests novel reasoning — tasks the model genuinely cannot have memorised. Sonnet 4.6 jumped from 13.6% to 58.3%, a 4.3x improvement in a single generation. This is the benchmark where throwing more training data doesn't work; you have to actually reason differently.
For your prompts: these numbers mean Claude 4.6 can handle significantly more open-ended problem specifications. You don't need to hand-hold the reasoning as much. Less scaffolding in the prompt, more trust in the model to figure out the path.
Next steps
- Understand why long context isn't just "bigger" — context engineering
- Build something on top of these capabilities — AI agents track
- Cut your API bill with the effort parameter — effort parameter cost guide
- Migration from older Claude versions — Claude 4 prompting guide
💡 Want to use Claude 4.6 in India? AICredits.in gives you access to Sonnet 4.6 and Opus 4.6 with UPI payment in ₹ — no international card needed.



