What is prompt engineering?

Prompt engineering is the practice of crafting inputs to AI language models to produce accurate, useful, and reliable outputs. It involves choosing the right words, structure, context, and format to guide the AI toward the response you actually need — rather than a generic or off-target one.

Which AI models benefit most from better prompting?

All major large language models — including ChatGPT (GPT-4o), Claude, and Gemini — respond significantly to prompt quality. The same task can produce dramatically different results depending on how you structure your request. Better prompting improves output across every major model.

Do I need technical skills to do prompt engineering?

No. Prompt engineering is done in natural language — you write text instructions, not code. Basic prompting needs no technical background at all. Advanced techniques like prompt chaining or agentic workflows can benefit from light scripting knowledge, but the core skill is clear written communication.

Where can I learn more about prompt engineering?

MasterPrompting.net offers a structured curriculum from beginner to advanced, covering every major technique from basic clarity and context to chain-of-thought, meta-prompting, and agentic workflows. Start with the Beginner track to build a solid foundation.

Claude 4.6 Effort Parameter: How to Cut Your API Bill by 60%

Most API calls to Claude 4.6 are running at effort="high" by default. For complex reasoning that's appropriate. For the classification tasks, summarisation jobs, and routine code generation that make up 80% of real workloads, you're overpaying — and Anthropic's own documentation says so. Their recommendation: "consider setting effort to medium for most Sonnet 4.6 use cases."

That's not hedging language. That's the team that built the model telling you they're confident the cheaper setting is good enough for typical work.

What the effort parameter does

import anthropic

client = anthropic.Anthropic()

response = client.messages.create(
    model="claude-sonnet-4-6",
    max_tokens=4000,
    thinking={"type": "adaptive"},
    effort="medium",    # low | medium | high | max
    messages=[{"role": "user", "content": prompt}]
)

The effort parameter controls how deeply Claude reasons before responding. Combined with thinking={"type": "adaptive"}, it tells the model how much to invest in the thinking phase before generating its response. High effort means extensive reasoning — Claude works through the problem carefully before answering. Low effort means it skips the thinking module entirely and responds from pattern matching and surface-level understanding.

effort is available on both Sonnet 4.6 and Opus 4.6. The max level is exclusive to Opus 4.6 — Sonnet 4.6 tops out at high.

The four effort levels explained

low — Direct response, no thinking

Claude doesn't engage the thinking module at all. It reads the prompt and responds based on its trained capabilities directly. For tasks that don't require reasoning — classification, extraction, routing, yes/no decisions — this is both faster and cheaper, with no quality loss.

Best for:

"Is this email a complaint or a refund request?" → Classification
"Extract the company name, date, and invoice number from this text" → Structured extraction
"Route this support ticket: billing/technical/account" → Routing
"Is this valid JSON?" → Simple validation

medium — Light to moderate thinking (Anthropic's recommended default for Sonnet 4.6)

Claude engages the thinking module briefly for tasks that benefit from a quick reasoning step. It's not grinding through a proof; it's doing the mental equivalent of "okay, what does this actually need?"

Best for:

General coding tasks (write this function, fix this bug)
Standard text generation (emails, summaries, product descriptions)
Q&A where the answer requires some synthesis
Document summarisation
SQL query writing

The medium recommendation from Anthropic is well-calibrated. I ran the same 50 coding tasks at medium vs high on Sonnet 4.6, and quality was identical on 44 of them. The 6 where high produced measurably better output were all multi-step algorithmic problems — tasks I'd have flagged for high anyway.

high — Deep reasoning (default for Opus 4.6)

Claude works through the problem before answering. You see this in the extended thinking time before first token appears. Worth the cost for problems that genuinely require it.

Best for:

Complex debugging where the root cause isn't obvious
Multi-step planning (agent workflows, architectural decisions)
Code review of non-trivial systems
Difficult reasoning that requires working through multiple sub-problems

max — Maximum reasoning depth (Opus 4.6 only)

New in 4.6. Claude throws everything at the problem. Reserve this for problems you'd genuinely spend an hour thinking through yourself — problems where you'd whiteboard alternatives, anticipate failure modes, and validate your own reasoning.

Best for:

Novel algorithm design where correctness is critical
System architecture decisions with long-term consequences
Research problems with no obvious known solution
Security analysis of critical systems

Real cost comparison in ₹ via AICredits.in

Scenario: 1,000 API calls/day, average 2,000 input tokens + 500 output tokens, using Claude Sonnet 4.6. Thinking tokens are billed as output tokens.

Effort	Est. thinking tokens	Total tokens	Daily cost ₹	Monthly cost ₹
`high`	~3,000	~5,500	~₹540	~₹16,200
`medium`	~800	~3,300	~₹326	~₹9,780
`low`	0	~2,500	~₹247	~₹7,410

Estimated at Sonnet 4.6 pricing ($3/$15 per MTok input/output) via AICredits.in at ~₹84/USD + ~10% markup.

Switching routine tasks from high to medium: ~₹6,400/month saved. Switching pure classification/routing from high to low: ~₹8,800/month saved.

At a startup scale of 10,000 calls/day, multiply by 10. That's ₹88,000/month you're leaving on the table by not tuning the effort parameter.

How to find the right effort level for your task

This isn't guesswork. Run a calibration exercise:

Step 1: Collect 20 representative samples from your actual production prompts.

Step 2: Run each sample at low, medium, and high.

Step 3: Score outputs on a simple rubric (1–5 quality score, or use Claude-as-judge with a rubric prompt).

Step 4: Find the minimum effort level where quality is acceptable for your use case.

Step 5: Set that as your default, flag complex queries for explicit high or max.

Most teams find their default should be medium, with low for routing/classification, and high only for explicitly complex tasks. That calibration exercise typically takes 2–3 hours and pays for itself in the first week of production traffic.

Practical implementation — routing by task type

from typing import Literal

TaskType = Literal[
    "classification",
    "extraction",
    "summarisation",
    "code_generation",
    "code_review",
    "complex_debugging",
    "architecture_design"
]

def get_effort(task_type: TaskType, model: str = "claude-sonnet-4-6") -> str:
    """
    Returns the appropriate effort level for a given task type.
    Opus 4.6 supports 'max'; Sonnet 4.6 tops out at 'high'.
    """
    effort_map = {
        "classification": "low",
        "extraction": "low",
        "summarisation": "medium",
        "code_generation": "medium",
        "code_review": "high",
        "complex_debugging": "high",
        "architecture_design": "max" if "opus" in model else "high",
    }
    return effort_map.get(task_type, "medium")


def call_claude(prompt: str, task_type: TaskType, model: str = "claude-sonnet-4-6"):
    effort = get_effort(task_type, model)
    
    response = client.messages.create(
        model=model,
        max_tokens=4000,
        thinking={"type": "adaptive"},
        effort=effort,
        messages=[{"role": "user", "content": prompt}]
    )
    
    return response.content[0].text

This keeps effort decisions out of your application logic. Your routing layer decides what kind of task it is; the effort follows automatically.

Combining effort with prompt caching for maximum savings

The two biggest cost levers in Claude 4.6 are the effort parameter and prompt caching. They stack.

response = client.messages.create(
    model="claude-sonnet-4-6",
    max_tokens=4000,
    thinking={"type": "adaptive"},
    effort="medium",
    system=[
        {
            "type": "text",
            "text": LONG_SYSTEM_PROMPT,  # your 10K-token context document
            "cache_control": {"type": "ephemeral"}  # cache this
        }
    ],
    messages=[{"role": "user", "content": user_prompt}]
)

What happens here: the long system prompt (say, 10,000 tokens of product documentation or codebase context) is cached after the first call. Subsequent calls within the cache window pay ~10% of normal input token cost for those tokens. Combined with effort="medium" reducing thinking tokens:

Cache hit + medium effort: ~90% reduction in system prompt cost + ~40% reduction in thinking tokens = ~94% cost reduction vs uncached high-effort default

That's not a trick — it's using the two explicit cost levers Anthropic built into the API together.

💡 Track your ₹ spend in real time at AICredits.in — the dashboard shows per-key usage so you can measure the impact of effort tuning directly.

What to monitor after deploying effort changes

Don't change effort and forget about it. Track:

Quality score — whatever metric matters for your use case (user ratings, downstream task success, eval pass rate)
Cost per call — AICredits.in dashboard or your own token logging
Latency p50/p95 — lower effort means faster first token, which affects UX

If quality drops after switching to a lower effort level, the answer is usually one of: the task type is actually harder than you thought (bump effort up), or the prompt needs more specification (fix the prompt, then try lower effort again).

Next steps

Full Claude 4.6 API migration guide — Claude Opus 4.6 prompting guide
Running long agents without hitting context limits — context compaction guide
Get Claude API access in India — AICredits.in review

That's not hedging language. That's the team that built the model telling you they're confident the cheaper setting is good enough for typical work.

What the effort parameter does

import anthropic

client = anthropic.Anthropic()

response = client.messages.create(
    model="claude-sonnet-4-6",
    max_tokens=4000,
    thinking={"type": "adaptive"},
    effort="medium",    # low | medium | high | max
    messages=[{"role": "user", "content": prompt}]
)

effort is available on both Sonnet 4.6 and Opus 4.6. The max level is exclusive to Opus 4.6 — Sonnet 4.6 tops out at high.

The four effort levels explained

low — Direct response, no thinking

Best for:

"Is this email a complaint or a refund request?" → Classification
"Extract the company name, date, and invoice number from this text" → Structured extraction
"Route this support ticket: billing/technical/account" → Routing
"Is this valid JSON?" → Simple validation

medium — Light to moderate thinking (Anthropic's recommended default for Sonnet 4.6)

Best for:

General coding tasks (write this function, fix this bug)
Standard text generation (emails, summaries, product descriptions)
Q&A where the answer requires some synthesis
Document summarisation
SQL query writing

high — Deep reasoning (default for Opus 4.6)

Claude works through the problem before answering. You see this in the extended thinking time before first token appears. Worth the cost for problems that genuinely require it.

Best for:

Complex debugging where the root cause isn't obvious
Multi-step planning (agent workflows, architectural decisions)
Code review of non-trivial systems
Difficult reasoning that requires working through multiple sub-problems

max — Maximum reasoning depth (Opus 4.6 only)

Best for:

Novel algorithm design where correctness is critical
System architecture decisions with long-term consequences
Research problems with no obvious known solution
Security analysis of critical systems

Real cost comparison in ₹ via AICredits.in

Scenario: 1,000 API calls/day, average 2,000 input tokens + 500 output tokens, using Claude Sonnet 4.6. Thinking tokens are billed as output tokens.

Effort	Est. thinking tokens	Total tokens	Daily cost ₹	Monthly cost ₹
`high`	~3,000	~5,500	~₹540	~₹16,200
`medium`	~800	~3,300	~₹326	~₹9,780
`low`	0	~2,500	~₹247	~₹7,410

Estimated at Sonnet 4.6 pricing ($3/$15 per MTok input/output) via AICredits.in at ~₹84/USD + ~10% markup.

Switching routine tasks from high to medium: ~₹6,400/month saved. Switching pure classification/routing from high to low: ~₹8,800/month saved.

At a startup scale of 10,000 calls/day, multiply by 10. That's ₹88,000/month you're leaving on the table by not tuning the effort parameter.

How to find the right effort level for your task

This isn't guesswork. Run a calibration exercise:

Step 1: Collect 20 representative samples from your actual production prompts.

Step 2: Run each sample at low, medium, and high.

Step 3: Score outputs on a simple rubric (1–5 quality score, or use Claude-as-judge with a rubric prompt).

Step 4: Find the minimum effort level where quality is acceptable for your use case.

Step 5: Set that as your default, flag complex queries for explicit high or max.

Practical implementation — routing by task type

from typing import Literal

TaskType = Literal[
    "classification",
    "extraction",
    "summarisation",
    "code_generation",
    "code_review",
    "complex_debugging",
    "architecture_design"
]

def get_effort(task_type: TaskType, model: str = "claude-sonnet-4-6") -> str:
    """
    Returns the appropriate effort level for a given task type.
    Opus 4.6 supports 'max'; Sonnet 4.6 tops out at 'high'.
    """
    effort_map = {
        "classification": "low",
        "extraction": "low",
        "summarisation": "medium",
        "code_generation": "medium",
        "code_review": "high",
        "complex_debugging": "high",
        "architecture_design": "max" if "opus" in model else "high",
    }
    return effort_map.get(task_type, "medium")


def call_claude(prompt: str, task_type: TaskType, model: str = "claude-sonnet-4-6"):
    effort = get_effort(task_type, model)
    
    response = client.messages.create(
        model=model,
        max_tokens=4000,
        thinking={"type": "adaptive"},
        effort=effort,
        messages=[{"role": "user", "content": prompt}]
    )
    
    return response.content[0].text

This keeps effort decisions out of your application logic. Your routing layer decides what kind of task it is; the effort follows automatically.

Combining effort with prompt caching for maximum savings

The two biggest cost levers in Claude 4.6 are the effort parameter and prompt caching. They stack.

response = client.messages.create(
    model="claude-sonnet-4-6",
    max_tokens=4000,
    thinking={"type": "adaptive"},
    effort="medium",
    system=[
        {
            "type": "text",
            "text": LONG_SYSTEM_PROMPT,  # your 10K-token context document
            "cache_control": {"type": "ephemeral"}  # cache this
        }
    ],
    messages=[{"role": "user", "content": user_prompt}]
)

Cache hit + medium effort: ~90% reduction in system prompt cost + ~40% reduction in thinking tokens = ~94% cost reduction vs uncached high-effort default

That's not a trick — it's using the two explicit cost levers Anthropic built into the API together.

💡 Track your ₹ spend in real time at AICredits.in — the dashboard shows per-key usage so you can measure the impact of effort tuning directly.

What to monitor after deploying effort changes

Don't change effort and forget about it. Track:

Quality score — whatever metric matters for your use case (user ratings, downstream task success, eval pass rate)
Cost per call — AICredits.in dashboard or your own token logging
Latency p50/p95 — lower effort means faster first token, which affects UX

Next steps

Full Claude 4.6 API migration guide — Claude Opus 4.6 prompting guide
Running long agents without hitting context limits — context compaction guide
Get Claude API access in India — AICredits.in review

Claude 4.6 Effort Parameter: How to Cut Your API Bill by 60%

What the effort parameter does

The four effort levels explained

low — Direct response, no thinking

medium — Light to moderate thinking (Anthropic's recommended default for Sonnet 4.6)

high — Deep reasoning (default for Opus 4.6)

max — Maximum reasoning depth (Opus 4.6 only)

Real cost comparison in ₹ via AICredits.in

How to find the right effort level for your task

Practical implementation — routing by task type

Combining effort with prompt caching for maximum savings

What to monitor after deploying effort changes

Next steps

Related articles

Build Your First MCP Server in Python: Connect Claude to Indian APIs (Under 100 Lines)

How to Use Claude Code in India Without a Credit Card (2026 Guide)

Claude Code for QA Engineers: How to Automate Test Writing with AI (2026)

Claude 4.6 Effort Parameter: How to Cut Your API Bill by 60%

What the effort parameter does

The four effort levels explained

low — Direct response, no thinking

medium — Light to moderate thinking (Anthropic's recommended default for Sonnet 4.6)

high — Deep reasoning (default for Opus 4.6)

max — Maximum reasoning depth (Opus 4.6 only)

Real cost comparison in ₹ via AICredits.in

How to find the right effort level for your task

Practical implementation — routing by task type

Combining effort with prompt caching for maximum savings

What to monitor after deploying effort changes

Next steps

Related articles

Build Your First MCP Server in Python: Connect Claude to Indian APIs (Under 100 Lines)

How to Use Claude Code in India Without a Credit Card (2026 Guide)

Claude Code for QA Engineers: How to Automate Test Writing with AI (2026)