What is the simplest way to add chain of thought to a prompt?

Add 'Think step by step.' at the end of your prompt. This single phrase triggers the model to generate intermediate reasoning steps before committing to an answer, which improves accuracy on any task requiring logic, analysis, or sequential thinking. Variations that work: 'Let's work through this carefully', 'Reason through this before giving your answer', 'Walk me through your thinking.'

Does chain of thought prompting work on all tasks?

CoT is most valuable for tasks requiring reasoning: math, multi-step logic, analysis, planning, code debugging, and decision-making. It adds little value for simple tasks like translation, direct lookups, or creative writing — and can even make creative tasks feel more mechanical. It also increases token count and latency, so skip it for fast, high-volume API calls where reasoning isn't required.

How much does chain of thought actually improve accuracy?

Research from Google Brain showed CoT prompting improved accuracy from roughly 18% to 57% on certain reasoning benchmarks — more than tripling performance with no model changes, just the prompt. The improvement is largest on tasks that require multi-step logic where the 'obvious' first answer is wrong. For straightforward tasks, the gain is minimal, but for complex reasoning, CoT is one of the highest-impact techniques available.

Chain of Thought Prompting: Make AI Reason Step by Step

Chain of Thought (CoT) prompting is one of the highest-impact techniques in prompt engineering. Adding just a few words to your prompt can dramatically improve accuracy on any task that requires reasoning, analysis, or sequential thinking.

The Core Idea

By default, an LLM jumps straight to an answer. Chain of Thought makes it work through the problem first — showing its reasoning before committing to a conclusion.

Without CoT:

Prompt: If a train travels 60mph for 2.5 hours then 80mph for 1.5 hours,
what's the total distance?

Response: 260 miles   ← often wrong

With CoT:

Prompt: If a train travels 60mph for 2.5 hours then 80mph for 1.5 hours,
what's the total distance? Think step by step.

Response:
Step 1: First leg = 60 × 2.5 = 150 miles
Step 2: Second leg = 80 × 1.5 = 120 miles
Step 3: Total = 150 + 120 = 270 miles
Answer: 270 miles   ← correct

The reasoning becomes visible, verifiable, and correct.

Why It Works

When a model generates intermediate reasoning steps, each step becomes context for the next token prediction. This means the model builds on correct intermediate conclusions rather than jumping to a pattern-matched (potentially wrong) answer.

Research from Google Brain showed that CoT prompting on certain tasks improved accuracy from ~18% to ~57% — more than tripling performance with no changes to the model, just the prompt.

Three Levels of Implementation

Level 1: The Simple Trigger

Add any of these phrases to your prompt:

Think step by step.
Let's work through this carefully.
Reason through this before giving your answer.
Walk me through your thinking.

What's the best way to structure a Series A fundraising pitch deck?
Think step by step.

This is the easiest CoT to implement and works surprisingly well.

Level 2: Guided CoT

Tell the model what steps to reason through. This is better when you know the structure of the problem.

You are a business analyst. Evaluate this startup idea using the following framework:

1. First, identify the target market and estimate its size
2. Then, analyze the main competitors and how this idea differentiates
3. Assess the key risks (market, technical, execution)
4. Finally, give an overall score from 1-10 with a one-sentence justification

Startup idea: An app that connects homeowners with vetted freelance contractors
for small home repair jobs, with upfront pricing.

Work through each step before giving your final score.

Level 3: Structured CoT with XML Tags

The most powerful form — separate reasoning from output using XML tags (especially effective with Claude):

You are a senior product manager reviewing a feature proposal.

<proposal>
Add a "dark mode" toggle to the mobile app. Estimated dev time: 3 weeks.
Expected to increase user satisfaction scores by 15%.
</proposal>

Think through this proposal in a <thinking> block — consider effort vs. impact,
user demand, technical risk, and opportunity cost. Then give your final
recommendation in a <recommendation> block with a clear yes/no and rationale.

The <thinking> block gives the model space to reason freely without that reasoning polluting the final output. The clean recommendation stands alone.

When to Use CoT

High value:

Math or calculations
Multi-step logic or reasoning
Analysis tasks (evaluate this, compare these, diagnose this)
Planning (what should we do and in what order?)
Code debugging (why is this wrong?)
Decision-making (should we do X?)

Low value / skip it:

Simple lookups or translations
Creative writing (CoT can make it too mechanical)
Tasks where the output format matters more than reasoning
Fast, high-volume API calls where latency and cost matter

CoT + Few-Shot = Maximum Power

Combining few-shot examples with chain of thought reasoning is even more powerful. Show the model examples that include the reasoning steps:

Classify the sentiment and explain your reasoning.

Review: "The battery life is incredible but the camera is mediocre."
Reasoning: The review mentions one strong positive (battery) and one negative (camera).
Mixed feedback with no strong overall lean.
Sentiment: Mixed

Review: "Absolute garbage. Broke after a week and customer service ignored me."
Reasoning: Strong negative language ("absolute garbage"), reports product failure,
and negative service experience. Clear negative sentiment.
Sentiment: Negative

Review: "Does what it says on the tin, nothing more."
Reasoning:

Key Takeaway

Chain of Thought is one of the most reliable, high-impact prompting techniques. It costs you a few extra words ("think step by step") and gains you dramatically better reasoning on complex tasks. Use Level 1 for quick tasks, Level 2 when you know the problem structure, and Level 3 (XML + thinking block) for the highest-stakes analysis.

Next: Learn Avoiding Hallucinations — how to keep AI models grounded in facts and prevent confident fabrication.