What is prompt engineering?

Prompt engineering is the practice of crafting inputs to AI language models to produce accurate, useful, and reliable outputs. It involves choosing the right words, structure, context, and format to guide the AI toward the response you actually need — rather than a generic or off-target one.

Which AI models benefit most from better prompting?

All major large language models — including ChatGPT (GPT-4o), Claude, and Gemini — respond significantly to prompt quality. The same task can produce dramatically different results depending on how you structure your request. Better prompting improves output across every major model.

Do I need technical skills to do prompt engineering?

No. Prompt engineering is done in natural language — you write text instructions, not code. Basic prompting needs no technical background at all. Advanced techniques like prompt chaining or agentic workflows can benefit from light scripting knowledge, but the core skill is clear written communication.

Where can I learn more about prompt engineering?

MasterPrompting.net offers a structured curriculum from beginner to advanced, covering every major technique from basic clarity and context to chain-of-thought, meta-prompting, and agentic workflows. Start with the Beginner track to build a solid foundation.

Claude Extended Thinking — How to Prompt for Deep Reasoning

Most Claude outputs come back in seconds. Extended thinking can take 30 seconds, sometimes longer. That latency isn't a bug — it's Claude actually working through a problem before committing to an answer, the same way you'd want a surgeon to review scans before operating.

Extended thinking is Claude's internal reasoning mode, introduced with Claude 3 and now deeply integrated into claude-sonnet-4-6. You set a token budget, Claude thinks privately in a scratchpad, then delivers a final answer. The thinking is hidden by default. The quality difference on hard problems is significant enough that once you've used it on a genuinely complex task, you'll find it hard to go back.

This guide covers how it works under the hood, how to enable it via the API, which problems benefit (and which don't), and how to write prompts that get the most out of it.

What extended thinking actually does

Standard prompting works like this: you write a prompt, Claude predicts the next token, repeats until done. Even when you use chain-of-thought prompting — asking Claude to "think step by step" — you're steering the visible output. Claude is reasoning in the response itself.

Extended thinking is different. Claude reasons in a private scratchpad before generating the visible output. This scratchpad is used to explore dead ends, backtrack, reconsider assumptions, and verify intermediate steps — things that don't fit neatly into a forward-only text generation process. The final answer comes only after that internal process completes.

Think of it as the difference between watching someone solve a problem out loud versus waiting for them to finish thinking and then explain their answer. The second version is often more coherent and more accurate, because the solver isn't constrained to make every intermediate statement sound polished.

The thinking block uses the same transformer under the hood. There's no separate model. What changes is that Claude is given budget to generate tokens that aren't shown to you, which frees it to reason more exploratorily without worrying about whether the intermediate steps read well.

Enabling extended thinking via the API

You enable it with a thinking parameter on the messages endpoint. The key setting is budget_tokens — how many tokens Claude can spend on internal reasoning before writing the final answer.

India developers: AICredits provides Claude API access with INR / UPI billing — no USD card needed to start experimenting with extended thinking.

import anthropic

client = anthropic.Anthropic()

response = client.messages.create(
    model="claude-sonnet-4-6",
    max_tokens=16000,
    thinking={
        "type": "enabled",
        "budget_tokens": 10000
    },
    messages=[{
        "role": "user",
        "content": "A snail is at the bottom of a 30-foot well. Each day it climbs 3 feet. Each night it slides back 2 feet. How many days does it take to reach the top?"
    }]
)

# response.content is a list of blocks
# Block 0: type "thinking" — the internal scratchpad (redacted in most contexts)
# Block 1: type "text" — the final answer

thinking_block = response.content[0]
answer_block = response.content[1]

print(f"Thinking tokens used: {thinking_block.thinking[:100]}...")
print(f"Answer: {answer_block.text}")

The minimum budget_tokens is 1,024. The maximum depends on the model — claude-sonnet-4-6 supports up to 32,000+ thinking tokens on complex tasks. max_tokens must be set high enough to cover both the thinking budget and the final response.

One thing that trips people up: budget_tokens is a budget, not a fixed allocation. Claude won't always use all of it. On a simpler problem it might spend 800 tokens thinking even if you gave it 10,000. This is fine — you're charged for what's actually used.

Cost math

Thinking tokens are billed the same as regular output tokens. With Sonnet 4.6 at ~$0.015 per 1K output tokens:

1,024 thinking tokens → ~$0.015 per call
5,000 thinking tokens → ~$0.075 per call
10,000 thinking tokens → ~$0.15 per call
30,000 thinking tokens → ~$0.45 per call

For one-off tasks, this is negligible. For high-volume pipelines processing thousands of documents, it adds up fast. Model your expected usage before defaulting to a high budget.

Also worth knowing: thinking tokens don't count toward your context window for the next turn in a conversation. They're ephemeral. If you want Claude to "remember" its reasoning for follow-up questions, you'd need to capture the thinking block and pass it back manually.

Problems that benefit most from extended thinking

Not every task gets better with a thinking budget. Here's where the difference is actually meaningful:

Math and logic puzzles. Anything requiring multiple inference steps where an early mistake compounds. The snail-and-well problem above is a classic example — most models get it wrong on the first try because they don't account for the final day correctly. Extended thinking handles this reliably.

Complex code debugging. When you paste in a stack trace and surrounding code, extended thinking lets Claude map the execution path, form hypotheses about root causes, and test them before responding. Particularly useful for asynchronous bugs or race conditions that require reasoning about state over time.

Multi-step planning. "Design a database schema for a SaaS app with these requirements" involves considering normalization, query patterns, scaling tradeoffs, and tenant isolation simultaneously. Extended thinking lets Claude hold all of those in tension before recommending anything.

Decision analysis with real tradeoffs. "Should we use Postgres or DynamoDB for this use case" — the right answer depends on access patterns, team familiarity, scaling needs, and budget. Extended thinking tends to surface the actual decision drivers rather than hedging with "it depends."

Structured argument generation. Legal memos, investment theses, technical design documents — anywhere the final output needs internal logical consistency. Claude can check its own reasoning before committing.

Problems where extended thinking doesn't help (or hurts)

Simple Q&A and factual lookup. If you're asking "What's the capital of France," extended thinking adds latency and cost with zero benefit. The answer doesn't require deliberation.

Classification tasks. Sentiment analysis, category tagging, intent detection — these are pattern matching, not reasoning chains. Standard prompting wins on speed and cost.

Summarization. Compressing a document into bullet points doesn't benefit from internal deliberation. Claude already has all the information it needs in the context window.

Creative writing. Counterintuitively, extended thinking can make creative output worse by over-deliberating on choices that should feel spontaneous. A character's voice doesn't need a formal reasoning chain to sound authentic.

Real-time chat. If a user is waiting for a response in a live conversation, 30-second latency kills the experience. Extended thinking is better suited for async tasks, batch processing, and background jobs where quality matters more than speed.

Prompting patterns that work best

Extended thinking changes what Claude does internally, but your prompt still determines what problem it's working on. A few patterns that consistently work well:

State the problem completely upfront. Don't hint at the answer or embed assumptions. If you write "Given that X is true, how do we handle Y," you've already constrained the reasoning. Write "Here's the situation: [facts]. What's the best approach to Y?" and let Claude form its own premises.

Reinforce the thinking mode explicitly. Add "Think through this carefully before giving your final answer" or "Consider all relevant factors before recommending." This is redundant with the API setting, but it seems to help Claude allocate thinking budget more deliberately on hard problems.

Ask for confidence. "How confident are you in this answer, and what would change your recommendation?" This forces Claude to flag uncertainty rather than paper over it. Particularly useful for factual claims where hallucination is a risk.

For math: ask for verification. "After solving, verify your answer is correct by checking it against the original problem." This uses part of the thinking budget for a second-pass check — basically getting Claude to double-check its own work.

For planning: ask about failure modes. "Before recommending, consider what could go wrong with each option." This pushes the thinking toward adversarial analysis rather than just selecting the best-case path.

Tuning the budget

There's no universal right answer for budget_tokens. A rough heuristic:

1,024–2,000: Simple logic problems, short proofs, quick tradeoff analysis
5,000–10,000: Complex code debugging, multi-step math, design decisions with several competing factors
15,000–32,000: Very hard reasoning, multi-document synthesis, problems that require backtracking significantly

Start at 5,000 for most production use cases and monitor token usage in your responses. If Claude is consistently using less than half the budget, reduce it. If the answers still feel shallow, increase it — though you'll hit diminishing returns on most problems somewhere around 15,000 tokens.

One pattern that works well: set a moderate budget (8,000) and add "If you need more reasoning space, say so in your answer and I'll rerun with a higher budget." Claude can tell you when it's hitting the ceiling.

Streaming and displaying thinking

You can stream extended thinking with stream=True. The thinking block streams first, then the text block. This is useful for showing progress in a UI — even if you don't display the raw thinking, you can show a "Claude is reasoning..." indicator while the thinking stream runs.

with client.messages.stream(
    model="claude-sonnet-4-6",
    max_tokens=16000,
    thinking={"type": "enabled", "budget_tokens": 8000},
    messages=[{"role": "user", "content": your_prompt}]
) as stream:
    for event in stream:
        if hasattr(event, 'type'):
            if event.type == 'content_block_start':
                block_type = event.content_block.type
                # "thinking" or "text"
            # handle streaming events
    
    final_message = stream.get_final_message()

Whether to show the thinking block to users is a product decision. The raw thinking is often messy — Claude explores dead ends, changes its mind, writes things that look wrong in isolation. Showing it can build trust ("look how carefully Claude reasoned through this") or undermine it ("Claude considered the wrong approach for a while"). Most production apps hide the thinking and just show the final answer.

If you do surface it, consider showing a condensed summary rather than the raw text. The thinking block can be thousands of tokens of exploratory reasoning that doesn't read well as prose.

Combining with context engineering

Extended thinking and context engineering compound each other. When you give Claude a well-structured context — relevant documents, clear problem framing, explicit constraints — the thinking budget goes toward reasoning rather than extracting and organizing information.

The worst use of extended thinking is a vague prompt with a huge budget. Claude will spend tokens figuring out what you're even asking. The best use is a precise, complete problem statement with all relevant context attached — then a generous thinking budget to actually solve it.

For document-heavy tasks, load your source material, structure it clearly (XML tags work well here), then enable extended thinking. The combination of rich context and extended reasoning is where you see outputs that look qualitatively different from standard API calls.

When to use extended thinking vs other approaches

Use it when:

The problem has a verifiably correct answer (math, logic, code) and you need that answer to be right
You're processing tasks async or in batch where latency doesn't matter
The downstream cost of a wrong answer exceeds the cost of extra thinking tokens
You need Claude to catch its own mistakes before you see the output

Don't use it when:

Users are waiting in real-time
The task is mostly retrieval or pattern-matching, not reasoning
You're running at high volume with cost constraints
The problem is genuinely ambiguous and no amount of reasoning produces a "correct" answer

Extended thinking isn't a general upgrade — it's a targeted tool. Used on the right problems with well-structured prompts, it's the difference between Claude guessing and Claude actually working through the answer. That gap is worth understanding and exploiting deliberately.

This guide covers how it works under the hood, how to enable it via the API, which problems benefit (and which don't), and how to write prompts that get the most out of it.

What extended thinking actually does

Enabling extended thinking via the API

You enable it with a thinking parameter on the messages endpoint. The key setting is budget_tokens — how many tokens Claude can spend on internal reasoning before writing the final answer.

India developers: AICredits provides Claude API access with INR / UPI billing — no USD card needed to start experimenting with extended thinking.

import anthropic

client = anthropic.Anthropic()

response = client.messages.create(
    model="claude-sonnet-4-6",
    max_tokens=16000,
    thinking={
        "type": "enabled",
        "budget_tokens": 10000
    },
    messages=[{
        "role": "user",
        "content": "A snail is at the bottom of a 30-foot well. Each day it climbs 3 feet. Each night it slides back 2 feet. How many days does it take to reach the top?"
    }]
)

# response.content is a list of blocks
# Block 0: type "thinking" — the internal scratchpad (redacted in most contexts)
# Block 1: type "text" — the final answer

thinking_block = response.content[0]
answer_block = response.content[1]

print(f"Thinking tokens used: {thinking_block.thinking[:100]}...")
print(f"Answer: {answer_block.text}")

Cost math

Thinking tokens are billed the same as regular output tokens. With Sonnet 4.6 at ~$0.015 per 1K output tokens:

1,024 thinking tokens → ~$0.015 per call
5,000 thinking tokens → ~$0.075 per call
10,000 thinking tokens → ~$0.15 per call
30,000 thinking tokens → ~$0.45 per call

For one-off tasks, this is negligible. For high-volume pipelines processing thousands of documents, it adds up fast. Model your expected usage before defaulting to a high budget.

Problems that benefit most from extended thinking

Not every task gets better with a thinking budget. Here's where the difference is actually meaningful:

Problems where extended thinking doesn't help (or hurts)

Simple Q&A and factual lookup. If you're asking "What's the capital of France," extended thinking adds latency and cost with zero benefit. The answer doesn't require deliberation.

Classification tasks. Sentiment analysis, category tagging, intent detection — these are pattern matching, not reasoning chains. Standard prompting wins on speed and cost.

Summarization. Compressing a document into bullet points doesn't benefit from internal deliberation. Claude already has all the information it needs in the context window.

Prompting patterns that work best

Extended thinking changes what Claude does internally, but your prompt still determines what problem it's working on. A few patterns that consistently work well:

Tuning the budget

There's no universal right answer for budget_tokens. A rough heuristic:

1,024–2,000: Simple logic problems, short proofs, quick tradeoff analysis
5,000–10,000: Complex code debugging, multi-step math, design decisions with several competing factors
15,000–32,000: Very hard reasoning, multi-document synthesis, problems that require backtracking significantly

Streaming and displaying thinking

with client.messages.stream(
    model="claude-sonnet-4-6",
    max_tokens=16000,
    thinking={"type": "enabled", "budget_tokens": 8000},
    messages=[{"role": "user", "content": your_prompt}]
) as stream:
    for event in stream:
        if hasattr(event, 'type'):
            if event.type == 'content_block_start':
                block_type = event.content_block.type
                # "thinking" or "text"
            # handle streaming events
    
    final_message = stream.get_final_message()

If you do surface it, consider showing a condensed summary rather than the raw text. The thinking block can be thousands of tokens of exploratory reasoning that doesn't read well as prose.

Combining with context engineering

When to use extended thinking vs other approaches

Use it when:

The problem has a verifiably correct answer (math, logic, code) and you need that answer to be right
You're processing tasks async or in batch where latency doesn't matter
The downstream cost of a wrong answer exceeds the cost of extra thinking tokens
You need Claude to catch its own mistakes before you see the output

Don't use it when:

Users are waiting in real-time
The task is mostly retrieval or pattern-matching, not reasoning
You're running at high volume with cost constraints
The problem is genuinely ambiguous and no amount of reasoning produces a "correct" answer

Claude Extended Thinking — How to Prompt for Deep Reasoning

What extended thinking actually does

Enabling extended thinking via the API

Cost math

Problems that benefit most from extended thinking

Problems where extended thinking doesn't help (or hurts)

Prompting patterns that work best

Tuning the budget

Streaming and displaying thinking

Combining with context engineering

When to use extended thinking vs other approaches

Related articles

Claude Max Plan — What You Get and Whether It's Worth It

50 Best AI Prompts for Claude That Actually Work (2026)

Claude API vs OpenAI API — Developer Comparison Guide (2026)

Claude Extended Thinking — How to Prompt for Deep Reasoning

What extended thinking actually does

Enabling extended thinking via the API

Cost math

Problems that benefit most from extended thinking

Problems where extended thinking doesn't help (or hurts)

Prompting patterns that work best

Tuning the budget

Streaming and displaying thinking

Combining with context engineering

When to use extended thinking vs other approaches

Related articles

Claude Max Plan — What You Get and Whether It's Worth It

50 Best AI Prompts for Claude That Actually Work (2026)

Claude API vs OpenAI API — Developer Comparison Guide (2026)