How many examples should I include in a few-shot prompt?

Research points to 2–5 examples as the sweet spot for most tasks. One example beats zero-shot but has limited reach. Two to three examples are usually sufficient for simple tasks. Four to five examples help for complex formatting or nuanced tone. Beyond six examples you get diminishing returns while increasing token cost. For classification tasks with multiple categories, include at least one example per category.

Does the accuracy of my examples matter more than their format?

Surprisingly, format consistency matters more than accuracy. Research shows that models prompted with randomly labeled examples (wrong answers) but consistent formatting performed nearly as well as models with correct labels. The model is learning how to respond, not what the right answer is. Focus on making your examples look exactly like the output you want — consistent structure, format, and style — rather than stressing over perfect example accuracy.

When is few-shot prompting most useful?

Few-shot shines on tasks requiring consistent structure, format, or style — classification, data extraction, text transformation, tone matching. Anytime you need the output to follow a precise pattern (a specific JSON schema, a particular writing voice, a set response structure), examples communicate the target better than descriptions can. It's less valuable for open-ended creative tasks where variability is the goal, or for simple factual lookups.

Few-Shot Prompting: Teaching AI by Example

Few-shot prompting is one of the most reliable techniques in prompt engineering. Instead of describing what you want, you show it — through examples embedded directly in your prompt.

The model pattern-matches against your examples and produces output in the same style, format, and tone.

Zero-Shot vs One-Shot vs Few-Shot

Zero-shot: No examples. Just a description of the task.

Classify the sentiment of this review as Positive, Negative, or Neutral.

Review: "The product broke after two days."

One-shot: One example before the actual task.

Classify sentiment as Positive, Negative, or Neutral.

Review: "Fast shipping and great quality!" → Positive

Review: "The product broke after two days." →

Few-shot: Multiple examples (typically 2-5).

Classify sentiment as Positive, Negative, or Neutral.

Review: "Fast shipping and great quality!" → Positive
Review: "It's okay, nothing special." → Neutral
Review: "Completely stopped working after a week." → Negative

Review: "The product broke after two days." →

Few-shot almost always outperforms zero-shot for structured tasks.

Why Few-Shot Works

When you provide examples, you accomplish several things at once:

Define the output format — The model sees exactly how answers should be structured
Calibrate tone and style — Your writing style carries through
Reduce ambiguity — A description of a task can be misinterpreted; an example cannot
Anchor edge cases — Examples show how to handle unusual inputs

The Critical Insight: Format > Accuracy

This is the key research finding on few-shot prompting that most people get wrong:

The format consistency of your examples matters more than whether the examples are correct.

In studies, models prompted with randomly labeled examples (wrong answers) but consistent formatting performed almost as well as models with correct labels. The model was learning how to respond, not what the right answer is.

This means:

Focus on making your examples look exactly like what you want the output to look like
Don't stress if your example answers aren't perfect
Consistency across examples is more important than individual accuracy

Practical Few-Shot Templates

Text Classification

Classify each customer message into one of: [Billing], [Technical], [Shipping], [Other]

Message: "I was charged twice for my subscription" → [Billing]
Message: "The app crashes every time I open it" → [Technical]
Message: "My order hasn't arrived after 2 weeks" → [Shipping]
Message: "Do you offer student discounts?" → [Other]

Message: "I can't log into my account" →

Tone Transformation

Rewrite each sentence to be more concise and direct.

Original: "We wanted to reach out to let you know that your order has been successfully processed and will be shipped soon."
Rewritten: "Your order is confirmed and shipping soon."

Original: "I was wondering if it might be possible to schedule a meeting at some point next week if you happen to be available."
Rewritten: "Can we meet next week? I'm flexible on timing."

Original: "Due to the fact that there were a number of unforeseen complications that arose during the development process, we have been unable to meet the originally agreed-upon deadline."
Rewritten:

Structured Data Extraction

Extract the key information from each job posting into structured format.

Job posting: "We're looking for a Senior React Developer with 5+ years experience. Remote OK. Salary: $120k-150k. Must know TypeScript."
Output:
- Role: Senior React Developer
- Experience: 5+ years
- Remote: Yes
- Salary: $120k-150k
- Key skills: React, TypeScript

Job posting: "Junior Python engineer needed for our London office. 1-2 years experience. Competitive salary. Flask and Django preferred."
Output:

How Many Examples?

The research-backed answer: 2-5 examples is the sweet spot for most tasks.

1 example: Better than zero-shot, but limited
2-3 examples: Usually sufficient for simple tasks
4-5 examples: Better for complex formatting or nuanced tone
6+ examples: Diminishing returns, increases token cost, can confuse the model

For tasks with multiple categories or edge cases, aim for at least one example per category.

Common Mistakes

Inconsistent formatting: If your first example uses bullet points and the second uses a paragraph, the model gets confused about what format to use.

Too few examples for complex tasks: One example of a 5-category classification task isn't enough — include at least one per category.

Examples that don't match your actual input: If your examples are polished and your real inputs are messy, the model may not generalize well.

Skipping the final prompt: After your examples, always clearly signal what input the model should now process.

Key Takeaway

Few-shot prompting is your most reliable tool for controlling output format and style. When you need consistent, structured outputs — classification, extraction, transformation — provide 2-5 examples before your actual request. Focus on formatting consistency over example accuracy.

Next: Learn about XML Tags & Delimiters — a structural technique that makes complex prompts dramatically clearer.