Claude Sonnet 4.6 is the model I reach for by default. Not because it's the most powerful thing Anthropic has shipped — Opus 4 exists for that — but because it hits a capability-to-cost-to-speed ratio that's hard to beat for production workloads. If you're building something real and want to understand exactly what you're working with, this is the guide.
What makes Sonnet 4.6 the sweet spot
There's a useful mental model for Anthropic's model lineup: Haiku for fast cheap tasks, Sonnet for most things, Opus for genuinely hard problems. That's been true for a while, but Sonnet 4.6 has closed the gap with Opus on most day-to-day tasks.
Concretely: Sonnet 4.6 handles multi-step coding tasks, complex document analysis, structured extraction at scale, and agentic tool-use loops without the cost overhead or latency of Opus. On coding benchmarks it scores close to Opus 4. On extended reasoning tasks where you enable thinking mode, it punches well above its price point.
The API model ID is claude-sonnet-4-6. Use it exactly that way in your API calls — Anthropic's API doesn't accept aliases.
Context window: 200k tokens, used correctly
200,000 tokens. That's roughly 150,000 words or a 500-page book. In practice it means you can throw an entire codebase, a legal contract, a research corpus, or a year of Slack logs at a single prompt and get coherent answers back.
But bigger isn't automatically better. A few things to know:
Performance degrades in the middle. Claude (like all transformer-based models) attends most reliably to the beginning and end of long contexts. If you have critical instructions or key facts, put them at the top of your system prompt or just before the user message — not buried in the middle of 50k tokens of context.
Cache your large context blocks. If you're repeatedly sending the same large document or codebase, use prompt caching with cache_control breakpoints. You pay full price on the first call and roughly 10% on cache reads. For a 100k-token context repeated across 100 API calls, that's a 90% reduction in input token costs.
Not everything needs 200k. Sending 150k tokens when your actual payload is 2k costs money and adds latency. Trim your context to what's relevant.
Pricing breakdown
Sonnet 4.6 pricing at time of writing:
| Token type | Price per million tokens |
|---|---|
| Input | $3.00 |
| Output | $15.00 |
| Cache write | $3.75 |
| Cache read | $0.30 |
Compare that to the lineup:
| Model | Input | Output |
|---|---|---|
| Haiku 3.5 | $0.80 | $4.00 |
| Sonnet 4.6 | $3.00 | $15.00 |
| Opus 4 | $15.00 | $75.00 |
Opus is 5x more expensive on input and 5x on output. For a typical agentic loop that makes 20 API calls per task, that's a real cost difference. Sonnet 4.6 becomes the obvious choice unless you have a specific reason to go up or down the stack.
India developers: AICredits lets you access the Claude API with INR billing via UPI — no USD card or international transaction fees needed.
Capabilities by task type
Coding
This is where Sonnet 4.6 earns its reputation. It writes clean, idiomatic code across Python, TypeScript, Go, Rust, and SQL without needing heavy hand-holding. More importantly, it understands diffs — you can hand it a broken pull request and ask what's wrong, and it'll actually find the bug rather than hallucinating one.
It handles multi-file context well. Paste in three related files and ask it to refactor a shared utility and it'll track the dependencies correctly. It's not perfect, but it's the best I've used at this outside of full IDE integration.
Tool use and function calling are strong. If you're building an agent that needs to call APIs, query databases, or chain tool outputs together, Sonnet 4.6 follows tool schemas reliably. Check how to design tools for AI agents for patterns that hold up in production.
Reasoning and analysis
Sonnet 4.6 handles multi-step logical problems, structured argumentation, and policy analysis well. It doesn't drift off track in long chains of reasoning the way some models do. For simpler reasoning tasks, just use the base model. For genuinely hard problems — complex financial modeling, multi-variable tradeoff analysis, legal interpretation — enable extended thinking mode.
Writing
Strong. It matches tone, holds a voice across a long document, and doesn't produce the flat corporate prose that plagues lesser models. The main failure mode is over-caution: it'll sometimes soften a strong claim or hedge where you don't want it to. Be explicit about tone in your system prompt.
Vision
Sonnet 4.6 reads images, charts, screenshots, and diagrams accurately. It's good at extracting structured data from screenshots (tables, forms, pricing pages) and at describing visual layouts precisely enough to be useful. It won't match a dedicated OCR tool for pure text extraction throughput, but for understanding visual context it's solid.
Structured outputs
Tell it to respond in JSON and it does. Tell it to follow a specific schema and it'll follow it. This is especially reliable when you combine explicit format instructions with XML tags in your prompt — more on that below.
Extended thinking mode
Extended thinking lets Sonnet 4.6 reason through a problem step by step before returning an answer. You enable it by passing a thinking block with a budget_tokens value in your API call.
response = client.messages.create(
model="claude-sonnet-4-6",
max_tokens=16000,
thinking={
"type": "enabled",
"budget_tokens": 10000
},
messages=[{"role": "user", "content": "...your hard problem here..."}]
)
The thinking tokens are billed as output tokens, so it adds cost. Turn it on when:
- The task involves multi-step reasoning where intermediate steps matter
- You're seeing inconsistent answers and want the model to slow down
- You're doing complex math, logic proofs, or structured planning
Don't turn it on for simple classification, short generation tasks, or anything where you're just paying for tokens you don't need. The extended thinking guide covers budget tuning and when it moves the needle.
Prompting patterns that work with Sonnet 4.6
Use XML tags for structure
Sonnet 4.6 was trained with XML-tagged prompts. It picks up on <context>, <task>, <format>, <examples> blocks and handles each section with more precision than freeform text. A prompt like this:
<context>
You are reviewing a Python microservice that handles payment processing.
</context>
<task>
Identify all places where exceptions are swallowed silently (bare except clauses or
except Exception: pass patterns). For each one, explain the risk and suggest a fix.
</task>
<format>
Return a numbered list. For each issue: file path + line number, the problematic code,
the risk, and the recommended fix.
</format>
...will outperform an equivalent freeform prompt on precision and completeness.
Give explicit output format instructions
Don't assume the model will pick the right format. If you want JSON, say "Respond with a JSON object matching this schema: {...}". If you want a numbered list with specific fields, show the structure. Sonnet 4.6 follows explicit format instructions reliably, which means you can skip a lot of output parsing headaches downstream.
System prompt placement matters
For agentic tasks with tool use, keep your core behavioral instructions in the system prompt. Put task-specific details in the user message. Sonnet 4.6 treats the system prompt as higher-priority context — role definition, output format rules, and safety constraints all belong there.
Few-shot examples work
If you have a task where you know what good output looks like, include 2-3 examples. Sonnet 4.6 learns from them quickly. Even one good example dramatically reduces format drift on structured extraction tasks.
A complete API example
Here's a working example: basic API call with a system prompt, user message, and tool definition for a code review agent.
import anthropic
client = anthropic.Anthropic()
tools = [
{
"name": "create_issue",
"description": "Create a code review issue with a severity level and suggested fix.",
"input_schema": {
"type": "object",
"properties": {
"file": {"type": "string", "description": "File path"},
"line": {"type": "integer", "description": "Line number"},
"severity": {
"type": "string",
"enum": ["critical", "major", "minor", "suggestion"]
},
"description": {"type": "string"},
"suggested_fix": {"type": "string"}
},
"required": ["file", "line", "severity", "description"]
}
}
]
response = client.messages.create(
model="claude-sonnet-4-6",
max_tokens=4096,
system="""You are a senior engineer conducting a security-focused code review.
For each issue you find, call the create_issue tool. Focus on: SQL injection,
unvalidated inputs, hardcoded credentials, and insecure deserialization.""",
tools=tools,
messages=[
{
"role": "user",
"content": f"Review this code:\n\n```python\n{code_to_review}\n```"
}
]
)
For a deeper comparison of Claude's API against OpenAI's, including auth, rate limits, and SDK differences, see Claude API vs OpenAI API.
Benchmarks: where Sonnet leads, where Opus pulls ahead
Sonnet 4.6 matches or approaches Opus 4 on:
- HumanEval (coding)
- MMLU (knowledge breadth)
- GSM8K (math reasoning)
- Most structured extraction tasks
Opus 4 pulls ahead on:
- Long-document synthesis requiring deep cross-referencing
- Multi-hop reasoning chains with 5+ logical steps
- Research tasks requiring nuanced judgment calls
If your task involves generating code, extracting structured data, writing documents, or running agentic workflows with clear tool schemas, Sonnet 4.6 is the right call. If you're doing frontier research synthesis or genuinely hard multi-step reasoning at scale, Opus starts to justify its 5x price.
When to use Haiku instead
Haiku 3.5 is fast and cheap. Use it when:
- You need sub-200ms response times for user-facing features
- The task is simple classification, routing, or short extraction
- You're running high-volume batch jobs where Sonnet's cost adds up
- You're building a system that makes 10,000+ API calls per day on simple tasks
The pattern I use: Haiku for the cheap outer loop (routing, classification, filtering), Sonnet for the tasks that require actual reasoning. You can cut API costs by 60-70% this way on high-volume systems without degrading quality where it matters.
When to upgrade to Opus
Upgrade to Opus 4 when you're hitting the ceiling on Sonnet. Signs you need it:
- Sonnet is hallucinating on complex multi-document synthesis tasks
- Your multi-step reasoning chain is producing inconsistent results even with extended thinking
- You're doing legal, medical, or financial analysis where nuance and accuracy outweigh cost
- You're building a Claude Projects-style system with extremely complex persistent context
Don't jump to Opus as a default. Start with Sonnet, identify where quality is falling short, then upgrade selectively.
The bottom line
Claude Sonnet 4.6 is a production-grade model. It's what you run when you don't have a specific reason to go cheaper or more expensive. The 200k context window is genuinely useful, not just a spec sheet number. The XML-aware prompting, reliable tool use, and strong coding performance make it the default for anything agentic.
Use the model ID claude-sonnet-4-6, structure your prompts with XML tags, cache your large context blocks, and turn on extended thinking only when you actually need it. That's the short version.



