Most API calls to Claude 4.6 are running at effort="high" by default. For complex reasoning that's appropriate. For the classification tasks, summarisation jobs, and routine code generation that make up 80% of real workloads, you're overpaying — and Anthropic's own documentation says so. Their recommendation: "consider setting effort to medium for most Sonnet 4.6 use cases."
That's not hedging language. That's the team that built the model telling you they're confident the cheaper setting is good enough for typical work.
What the effort parameter does
import anthropic
client = anthropic.Anthropic()
response = client.messages.create(
model="claude-sonnet-4-6",
max_tokens=4000,
thinking={"type": "adaptive"},
effort="medium", # low | medium | high | max
messages=[{"role": "user", "content": prompt}]
)
The effort parameter controls how deeply Claude reasons before responding. Combined with thinking={"type": "adaptive"}, it tells the model how much to invest in the thinking phase before generating its response. High effort means extensive reasoning — Claude works through the problem carefully before answering. Low effort means it skips the thinking module entirely and responds from pattern matching and surface-level understanding.
effort is available on both Sonnet 4.6 and Opus 4.6. The max level is exclusive to Opus 4.6 — Sonnet 4.6 tops out at high.
The four effort levels explained
low — Direct response, no thinking
Claude doesn't engage the thinking module at all. It reads the prompt and responds based on its trained capabilities directly. For tasks that don't require reasoning — classification, extraction, routing, yes/no decisions — this is both faster and cheaper, with no quality loss.
Best for:
- "Is this email a complaint or a refund request?" → Classification
- "Extract the company name, date, and invoice number from this text" → Structured extraction
- "Route this support ticket: billing/technical/account" → Routing
- "Is this valid JSON?" → Simple validation
medium — Light to moderate thinking (Anthropic's recommended default for Sonnet 4.6)
Claude engages the thinking module briefly for tasks that benefit from a quick reasoning step. It's not grinding through a proof; it's doing the mental equivalent of "okay, what does this actually need?"
Best for:
- General coding tasks (write this function, fix this bug)
- Standard text generation (emails, summaries, product descriptions)
- Q&A where the answer requires some synthesis
- Document summarisation
- SQL query writing
The medium recommendation from Anthropic is well-calibrated. I ran the same 50 coding tasks at medium vs high on Sonnet 4.6, and quality was identical on 44 of them. The 6 where high produced measurably better output were all multi-step algorithmic problems — tasks I'd have flagged for high anyway.
high — Deep reasoning (default for Opus 4.6)
Claude works through the problem before answering. You see this in the extended thinking time before first token appears. Worth the cost for problems that genuinely require it.
Best for:
- Complex debugging where the root cause isn't obvious
- Multi-step planning (agent workflows, architectural decisions)
- Code review of non-trivial systems
- Difficult reasoning that requires working through multiple sub-problems
max — Maximum reasoning depth (Opus 4.6 only)
New in 4.6. Claude throws everything at the problem. Reserve this for problems you'd genuinely spend an hour thinking through yourself — problems where you'd whiteboard alternatives, anticipate failure modes, and validate your own reasoning.
Best for:
- Novel algorithm design where correctness is critical
- System architecture decisions with long-term consequences
- Research problems with no obvious known solution
- Security analysis of critical systems
Real cost comparison in ₹ via AICredits.in
Scenario: 1,000 API calls/day, average 2,000 input tokens + 500 output tokens, using Claude Sonnet 4.6. Thinking tokens are billed as output tokens.
| Effort | Est. thinking tokens | Total tokens | Daily cost ₹ | Monthly cost ₹ |
|---|---|---|---|---|
high | ~3,000 | ~5,500 | ~₹540 | ~₹16,200 |
medium | ~800 | ~3,300 | ~₹326 | ~₹9,780 |
low | 0 | ~2,500 | ~₹247 | ~₹7,410 |
Estimated at Sonnet 4.6 pricing ($3/$15 per MTok input/output) via AICredits.in at ~₹84/USD + ~10% markup.
Switching routine tasks from high to medium: ~₹6,400/month saved.
Switching pure classification/routing from high to low: ~₹8,800/month saved.
At a startup scale of 10,000 calls/day, multiply by 10. That's ₹88,000/month you're leaving on the table by not tuning the effort parameter.
How to find the right effort level for your task
This isn't guesswork. Run a calibration exercise:
Step 1: Collect 20 representative samples from your actual production prompts.
Step 2: Run each sample at low, medium, and high.
Step 3: Score outputs on a simple rubric (1–5 quality score, or use Claude-as-judge with a rubric prompt).
Step 4: Find the minimum effort level where quality is acceptable for your use case.
Step 5: Set that as your default, flag complex queries for explicit high or max.
Most teams find their default should be medium, with low for routing/classification, and high only for explicitly complex tasks. That calibration exercise typically takes 2–3 hours and pays for itself in the first week of production traffic.
Practical implementation — routing by task type
from typing import Literal
TaskType = Literal[
"classification",
"extraction",
"summarisation",
"code_generation",
"code_review",
"complex_debugging",
"architecture_design"
]
def get_effort(task_type: TaskType, model: str = "claude-sonnet-4-6") -> str:
"""
Returns the appropriate effort level for a given task type.
Opus 4.6 supports 'max'; Sonnet 4.6 tops out at 'high'.
"""
effort_map = {
"classification": "low",
"extraction": "low",
"summarisation": "medium",
"code_generation": "medium",
"code_review": "high",
"complex_debugging": "high",
"architecture_design": "max" if "opus" in model else "high",
}
return effort_map.get(task_type, "medium")
def call_claude(prompt: str, task_type: TaskType, model: str = "claude-sonnet-4-6"):
effort = get_effort(task_type, model)
response = client.messages.create(
model=model,
max_tokens=4000,
thinking={"type": "adaptive"},
effort=effort,
messages=[{"role": "user", "content": prompt}]
)
return response.content[0].text
This keeps effort decisions out of your application logic. Your routing layer decides what kind of task it is; the effort follows automatically.
Combining effort with prompt caching for maximum savings
The two biggest cost levers in Claude 4.6 are the effort parameter and prompt caching. They stack.
response = client.messages.create(
model="claude-sonnet-4-6",
max_tokens=4000,
thinking={"type": "adaptive"},
effort="medium",
system=[
{
"type": "text",
"text": LONG_SYSTEM_PROMPT, # your 10K-token context document
"cache_control": {"type": "ephemeral"} # cache this
}
],
messages=[{"role": "user", "content": user_prompt}]
)
What happens here: the long system prompt (say, 10,000 tokens of product documentation or codebase context) is cached after the first call. Subsequent calls within the cache window pay ~10% of normal input token cost for those tokens. Combined with effort="medium" reducing thinking tokens:
- Cache hit + medium effort: ~90% reduction in system prompt cost + ~40% reduction in thinking tokens = ~94% cost reduction vs uncached high-effort default
That's not a trick — it's using the two explicit cost levers Anthropic built into the API together.
💡 Track your ₹ spend in real time at AICredits.in — the dashboard shows per-key usage so you can measure the impact of effort tuning directly.
What to monitor after deploying effort changes
Don't change effort and forget about it. Track:
- Quality score — whatever metric matters for your use case (user ratings, downstream task success, eval pass rate)
- Cost per call — AICredits.in dashboard or your own token logging
- Latency p50/p95 — lower effort means faster first token, which affects UX
If quality drops after switching to a lower effort level, the answer is usually one of: the task type is actually harder than you thought (bump effort up), or the prompt needs more specification (fix the prompt, then try lower effort again).
Next steps
- Full Claude 4.6 API migration guide — Claude Opus 4.6 prompting guide
- Running long agents without hitting context limits — context compaction guide
- Get Claude API access in India — AICredits.in review



