Claude 3.5 Sonnet and Claude Haiku 3.5 are retired as of April 2026. If you're still running claude-3-5-sonnet-20241022 or claude-haiku-3-5-20241022, you'll get model-not-found errors — or quietly get routed to a deprecated fallback depending on your client version. Time to migrate. Here's everything that breaks, changes, and improves when moving to Claude 4.6.
Model ID changes
The first thing to audit is every place you've hardcoded a model ID. Here's the direct mapping:
| Old model ID | New model ID | Cost change | Notes |
|---|---|---|---|
claude-3-5-sonnet-20241022 | claude-sonnet-4-6 | $3→$3/MTok input | Same price, significantly higher capability |
claude-haiku-3-5-20241022 | claude-sonnet-4-6 + effort=low | $0.80→$0.30/MTok cache | effort=low gives Haiku-equivalent speed and cost |
claude-opus-4-5 | claude-opus-4-6 | $5→$5/MTok input | Similar price, 1M context, 128K output |
claude-3-5-haiku-* | claude-sonnet-4-6 + effort=low | Varies | No direct Haiku 4.6 equivalent |
The effort=low substitution for Haiku is the non-obvious one. Haiku was positioned as the fast/cheap tier. Claude 4.6 consolidates that into a single model with an effort dial. At effort=low, Sonnet 4.6 runs faster and cheaper than Haiku 3.5 for most tasks while returning meaningfully better outputs.
Breaking change #1 — Prefill returns 400
Prefill was the pattern of setting the assistant's opening message to force a specific format or skip Claude's preamble. A lot of people used it to guarantee JSON-only responses:
# This pattern now returns HTTP 400
messages = [
{"role": "user", "content": "Extract the company name and revenue."},
{"role": "assistant", "content": '{"company":'} # prefill — BROKEN in 4.6
]
Claude 4.6 returns a 400 error when you include a partial assistant message. This is intentional — Anthropic wants you using structured outputs instead of prompt hacks.
Fix option 1: Move the instruction to the system prompt
response = client.messages.create(
model="claude-sonnet-4-6",
system="Respond with valid JSON only. No preamble, no explanation, no markdown code fences.",
messages=[{"role": "user", "content": "Extract company name and revenue from: ..."}]
)
Fix option 2: Use structured outputs (the proper 4.6 way)
response = client.messages.create(
model="claude-sonnet-4-6",
output_config={
"format": {
"type": "json_schema",
"json_schema": {
"name": "company_info",
"schema": {
"type": "object",
"properties": {
"company": {"type": "string"},
"revenue": {"type": "number"}
},
"required": ["company", "revenue"]
}
}
}
},
messages=[{"role": "user", "content": "Extract company name and revenue from: ..."}]
)
Option 2 guarantees schema-valid JSON. Option 1 works for simpler cases but doesn't give you the guarantee.
Breaking change #2 — budget_tokens is deprecated
If you were using Claude's extended thinking with an explicit token budget:
# Old pattern — still works but deprecated
response = client.messages.create(
model="claude-sonnet-4-6",
thinking={
"type": "enabled",
"budget_tokens": 5000 # deprecated
},
...
)
This still works in 4.6 but is deprecated and will be removed. The replacement is adaptive thinking combined with the effort parameter:
# New pattern
response = client.messages.create(
model="claude-sonnet-4-6",
thinking={"type": "adaptive"},
effort="medium", # low / medium / high / max
...
)
What changes: instead of a fixed token budget that Claude tries to hit, adaptive thinking lets Claude decide how much reasoning a problem warrants. The effort parameter sets a ceiling on how hard it tries. In practice, adaptive is more reliable than budget_tokens for tasks with unpredictable complexity — you're not over-thinking simple requests or under-thinking hard ones.
The mapping isn't exact, but as a starting point: budget_tokens=1000 → effort=low, budget_tokens=5000 → effort=medium, budget_tokens=16000+ → effort=high.
Breaking change #3 — output_config.format (renamed)
If you built structured output workflows in Claude 4.5, the parameter was output_format. In 4.6, it moved:
# Claude 4.5 — output_format
response = client.messages.create(
model="claude-sonnet-4-6",
output_format={ # WRONG in 4.6
"type": "json_schema",
"json_schema": {...}
},
...
)
# Claude 4.6 — output_config.format
response = client.messages.create(
model="claude-sonnet-4-6",
output_config={ # CORRECT in 4.6
"format": {
"type": "json_schema",
"json_schema": {...}
}
},
...
)
This is a silent failure in some SDK versions — the parameter just gets ignored and Claude returns unstructured text. If your structured output pipeline stopped working after a model update, check this rename first.
What changed (not breaking, but worth knowing)
| Feature | Old behaviour | New behaviour |
|---|---|---|
| Context window | 200K max | 1M GA on Opus — no beta header required |
| Max output (Opus 4.6) | 64K | 128K |
| Interleaved thinking | Required beta header | Automatic with adaptive thinking |
| Effort control | Not available | low / medium / high / max |
| Context compaction | Not available | Beta: betas=["context-compaction-2025-04"] |
| Prompt caching | Explicit cache_control required | cache_control: {"type": "auto"} available |
The 1M context window going GA is the biggest operationally. If you were passing beta headers to access extended context on Opus, remove them — they're no longer needed and may cause unexpected behaviour in some SDK versions.
The 128K output on Opus 4.6 matters for anyone generating long documents, code files, or structured data dumps. Sonnet 4.6 stays at 64K.
The complete migration checklist
Work through this in order — the first four items will fix any breakage, the rest are optimisations.
- Update all hardcoded model IDs (see table above — ctrl+F for
claude-3-5,claude-haiku,claude-opus-4-5) - Remove prefill messages from assistant turn, or convert to system prompt instructions
- Replace
budget_tokenswiththinking={"type": "adaptive"}+ effort parameter - Replace
output_formatwithoutput_config.formatfor structured output calls - Remove beta headers for 1M context on Opus (it's GA — headers can cause errors)
- Add
effort="medium"as the default on all Sonnet 4.6 calls (Anthropic's recommendation for balanced performance/cost) - Optionally add
cache_control={"type": "auto"}to benefit from automatic prompt caching - Test: verify all API responses still contain expected fields (usage structure is unchanged)
- Update any hardcoded context window limits from 200K to 1M where relevant to your application
- If you rely on
max_tokens > 64Kon Sonnet 4.6 — not supported on Sonnet, use Opus 4.6
Run a search across your codebase for these strings before you do anything else: claude-3-5, claude-haiku, claude-opus-4-5, budget_tokens, output_format, assistant in messages arrays (to catch prefill patterns).
What's strictly better in 4.6 — things to start using
Adaptive thinking is more cost-efficient than budget_tokens for mixed workloads. With a fixed budget, you were either wasting tokens on simple tasks or running out on complex ones. Adaptive allocates thinking tokens based on actual problem difficulty.
1M context on Opus means no chunking for most document workflows. A 500-page PDF, a large codebase, a full conversation history — it fits in a single context window without RAG workarounds. Not every task needs this, but when it does, it eliminates a lot of complexity.
Effort parameter is the cost knob you've always wanted. Setting effort=low on classification and routing tasks typically cuts costs 40-60% with no quality loss. Reserve effort=high for tasks where quality genuinely matters more than cost. The default (no effort parameter) is equivalent to medium.
Interleaved thinking is now on by default with adaptive mode, which matters for agent workflows. Claude reasons before each tool call rather than planning upfront and executing blindly. This means agents self-correct mid-task instead of failing silently.
Quick validation after migration
Run this script after updating to confirm your migration is clean:
import anthropic
import json
client = anthropic.Anthropic()
# Test 1: Basic call with new model ID
r1 = client.messages.create(
model="claude-sonnet-4-6",
max_tokens=100,
effort="medium",
messages=[{"role": "user", "content": "Reply with: OK"}]
)
assert r1.content[0].text.strip() == "OK", f"Unexpected: {r1.content[0].text}"
assert hasattr(r1, "usage"), "No usage field"
print(f"✓ Basic call — input tokens: {r1.usage.input_tokens}")
# Test 2: Structured outputs with new output_config.format
r2 = client.messages.create(
model="claude-sonnet-4-6",
max_tokens=200,
output_config={
"format": {
"type": "json_schema",
"json_schema": {
"name": "test_schema",
"schema": {
"type": "object",
"properties": {"status": {"type": "string"}},
"required": ["status"]
}
}
}
},
messages=[{"role": "user", "content": "Return status: ok"}]
)
data = json.loads(r2.content[0].text)
assert "status" in data, f"Schema not respected: {r2.content[0].text}"
print(f"✓ Structured output — status: {data['status']}")
# Test 3: Adaptive thinking
r3 = client.messages.create(
model="claude-sonnet-4-6",
max_tokens=500,
thinking={"type": "adaptive"},
effort="high",
messages=[{"role": "user", "content": "What's 17 * 23?"}]
)
# Check thinking block present when adaptive is on
has_thinking = any(b.type == "thinking" for b in r3.content)
print(f"✓ Adaptive thinking — thinking block present: {has_thinking}")
print("\nAll checks passed. Migration is clean.")
If any assertion fails, it points to which change you missed.
💡 API access in India For Claude 4.6 API access with UPI billing — no international card needed — AICredits.in supports all Claude 4.6 models with per-key spending limits to control costs.



