Few-shot prompting is one of the most reliable techniques in prompt engineering. Instead of describing what you want, you show it — through examples embedded directly in your prompt.
The model pattern-matches against your examples and produces output in the same style, format, and tone.
Zero-Shot vs One-Shot vs Few-Shot
Zero-shot: No examples. Just a description of the task.
Classify the sentiment of this review as Positive, Negative, or Neutral.
Review: "The product broke after two days."
One-shot: One example before the actual task.
Classify sentiment as Positive, Negative, or Neutral.
Review: "Fast shipping and great quality!" → Positive
Review: "The product broke after two days." →
Few-shot: Multiple examples (typically 2-5).
Classify sentiment as Positive, Negative, or Neutral.
Review: "Fast shipping and great quality!" → Positive
Review: "It's okay, nothing special." → Neutral
Review: "Completely stopped working after a week." → Negative
Review: "The product broke after two days." →
Few-shot almost always outperforms zero-shot for structured tasks.
Why Few-Shot Works
When you provide examples, you accomplish several things at once:
- Define the output format — The model sees exactly how answers should be structured
- Calibrate tone and style — Your writing style carries through
- Reduce ambiguity — A description of a task can be misinterpreted; an example cannot
- Anchor edge cases — Examples show how to handle unusual inputs
The Critical Insight: Format > Accuracy
This is the key research finding on few-shot prompting that most people get wrong:
The format consistency of your examples matters more than whether the examples are correct.
In studies, models prompted with randomly labeled examples (wrong answers) but consistent formatting performed almost as well as models with correct labels. The model was learning how to respond, not what the right answer is.
This means:
- Focus on making your examples look exactly like what you want the output to look like
- Don't stress if your example answers aren't perfect
- Consistency across examples is more important than individual accuracy
Practical Few-Shot Templates
Text Classification
Classify each customer message into one of: [Billing], [Technical], [Shipping], [Other]
Message: "I was charged twice for my subscription" → [Billing]
Message: "The app crashes every time I open it" → [Technical]
Message: "My order hasn't arrived after 2 weeks" → [Shipping]
Message: "Do you offer student discounts?" → [Other]
Message: "I can't log into my account" →
Tone Transformation
Rewrite each sentence to be more concise and direct.
Original: "We wanted to reach out to let you know that your order has been successfully processed and will be shipped soon."
Rewritten: "Your order is confirmed and shipping soon."
Original: "I was wondering if it might be possible to schedule a meeting at some point next week if you happen to be available."
Rewritten: "Can we meet next week? I'm flexible on timing."
Original: "Due to the fact that there were a number of unforeseen complications that arose during the development process, we have been unable to meet the originally agreed-upon deadline."
Rewritten:
Structured Data Extraction
Extract the key information from each job posting into structured format.
Job posting: "We're looking for a Senior React Developer with 5+ years experience. Remote OK. Salary: $120k-150k. Must know TypeScript."
Output:
- Role: Senior React Developer
- Experience: 5+ years
- Remote: Yes
- Salary: $120k-150k
- Key skills: React, TypeScript
Job posting: "Junior Python engineer needed for our London office. 1-2 years experience. Competitive salary. Flask and Django preferred."
Output:
How Many Examples?
The research-backed answer: 2-5 examples is the sweet spot for most tasks.
- 1 example: Better than zero-shot, but limited
- 2-3 examples: Usually sufficient for simple tasks
- 4-5 examples: Better for complex formatting or nuanced tone
- 6+ examples: Diminishing returns, increases token cost, can confuse the model
For tasks with multiple categories or edge cases, aim for at least one example per category.
Common Mistakes
Inconsistent formatting: If your first example uses bullet points and the second uses a paragraph, the model gets confused about what format to use.
Too few examples for complex tasks: One example of a 5-category classification task isn't enough — include at least one per category.
Examples that don't match your actual input: If your examples are polished and your real inputs are messy, the model may not generalize well.
Skipping the final prompt: After your examples, always clearly signal what input the model should now process.
Key Takeaway
Few-shot prompting is your most reliable tool for controlling output format and style. When you need consistent, structured outputs — classification, extraction, transformation — provide 2-5 examples before your actual request. Focus on formatting consistency over example accuracy.
Next: Learn about XML Tags & Delimiters — a structural technique that makes complex prompts dramatically clearer.