Chain of Thought (CoT) prompting is one of the highest-impact techniques in prompt engineering. Adding just a few words to your prompt can dramatically improve accuracy on any task that requires reasoning, analysis, or sequential thinking.
The Core Idea
By default, an LLM jumps straight to an answer. Chain of Thought makes it work through the problem first — showing its reasoning before committing to a conclusion.
Without CoT:
Prompt: If a train travels 60mph for 2.5 hours then 80mph for 1.5 hours,
what's the total distance?
Response: 260 miles ← often wrong
With CoT:
Prompt: If a train travels 60mph for 2.5 hours then 80mph for 1.5 hours,
what's the total distance? Think step by step.
Response:
Step 1: First leg = 60 × 2.5 = 150 miles
Step 2: Second leg = 80 × 1.5 = 120 miles
Step 3: Total = 150 + 120 = 270 miles
Answer: 270 miles ← correct
The reasoning becomes visible, verifiable, and correct.
Why It Works
When a model generates intermediate reasoning steps, each step becomes context for the next token prediction. This means the model builds on correct intermediate conclusions rather than jumping to a pattern-matched (potentially wrong) answer.
Research from Google Brain showed that CoT prompting on certain tasks improved accuracy from ~18% to ~57% — more than tripling performance with no changes to the model, just the prompt.
Three Levels of Implementation
Level 1: The Simple Trigger
Add any of these phrases to your prompt:
Think step by step.Let's work through this carefully.Reason through this before giving your answer.Walk me through your thinking.
What's the best way to structure a Series A fundraising pitch deck?
Think step by step.
This is the easiest CoT to implement and works surprisingly well.
Level 2: Guided CoT
Tell the model what steps to reason through. This is better when you know the structure of the problem.
You are a business analyst. Evaluate this startup idea using the following framework:
1. First, identify the target market and estimate its size
2. Then, analyze the main competitors and how this idea differentiates
3. Assess the key risks (market, technical, execution)
4. Finally, give an overall score from 1-10 with a one-sentence justification
Startup idea: An app that connects homeowners with vetted freelance contractors
for small home repair jobs, with upfront pricing.
Work through each step before giving your final score.
Level 3: Structured CoT with XML Tags
The most powerful form — separate reasoning from output using XML tags (especially effective with Claude):
You are a senior product manager reviewing a feature proposal.
<proposal>
Add a "dark mode" toggle to the mobile app. Estimated dev time: 3 weeks.
Expected to increase user satisfaction scores by 15%.
</proposal>
Think through this proposal in a <thinking> block — consider effort vs. impact,
user demand, technical risk, and opportunity cost. Then give your final
recommendation in a <recommendation> block with a clear yes/no and rationale.
The <thinking> block gives the model space to reason freely without that reasoning polluting the final output. The clean recommendation stands alone.
When to Use CoT
High value:
- Math or calculations
- Multi-step logic or reasoning
- Analysis tasks (evaluate this, compare these, diagnose this)
- Planning (what should we do and in what order?)
- Code debugging (why is this wrong?)
- Decision-making (should we do X?)
Low value / skip it:
- Simple lookups or translations
- Creative writing (CoT can make it too mechanical)
- Tasks where the output format matters more than reasoning
- Fast, high-volume API calls where latency and cost matter
CoT + Few-Shot = Maximum Power
Combining few-shot examples with chain of thought reasoning is even more powerful. Show the model examples that include the reasoning steps:
Classify the sentiment and explain your reasoning.
Review: "The battery life is incredible but the camera is mediocre."
Reasoning: The review mentions one strong positive (battery) and one negative (camera).
Mixed feedback with no strong overall lean.
Sentiment: Mixed
Review: "Absolute garbage. Broke after a week and customer service ignored me."
Reasoning: Strong negative language ("absolute garbage"), reports product failure,
and negative service experience. Clear negative sentiment.
Sentiment: Negative
Review: "Does what it says on the tin, nothing more."
Reasoning:
Key Takeaway
Chain of Thought is one of the most reliable, high-impact prompting techniques. It costs you a few extra words ("think step by step") and gains you dramatically better reasoning on complex tasks. Use Level 1 for quick tasks, Level 2 when you know the problem structure, and Level 3 (XML + thinking block) for the highest-stakes analysis.
Next: Learn Avoiding Hallucinations — how to keep AI models grounded in facts and prevent confident fabrication.