When I started building AI agents, I kept running into the same frustration: the agent would take a reasonable-sounding approach, hit an unexpected obstacle, and get stuck — or worse, confidently proceed down a wrong path.
What I eventually learned is that different agent patterns exist for a reason. Each one handles a different type of uncertainty differently. Knowing which pattern to reach for, and why, is most of what separates agents that work reliably from ones that work sometimes.
Pattern 1: ReAct (Reasoning + Acting)
When to use: Dynamic tasks where you don't know the right sequence of actions upfront. The agent needs to figure out what to do as it discovers information.
How it works
The agent alternates between Thought, Action, and Observation in a loop:
Thought: What do I need to find out to answer this?
Action: search("population of the 5 largest EU countries")
Observation: Germany 84M, France 68M, Italy 60M, Spain 47M, Poland 38M
Thought: Now I need to add these up.
Action: calculate(84 + 68 + 60 + 47 + 38)
Observation: 297
Thought: I have the total. I can answer now.
Final Answer: The 5 largest EU countries have a combined population of approximately 297 million.
The key feature: each observation informs the next thought. If the search had returned incomplete results, the next thought would have adjusted — "I only got 3 countries, I need to search again for the other two."
When ReAct works well
- Research tasks where you don't know what information exists
- Tasks where early results change what you search for next
- Multi-hop reasoning ("Find X, then use X to find Y")
- Debugging and investigation tasks
When ReAct struggles
- Tasks where the model keeps looping without making progress
- Tasks where the plan is known upfront and flexible execution isn't needed (wastes steps)
- Very long reasoning chains (the context fills up with Thought/Action/Observation traces)
Pattern 2: Plan-and-Execute
When to use: Tasks with a predictable structure where you can decompose the work upfront and potentially execute steps in parallel.
How it works
Phase 1 — The planner LLM creates a detailed execution plan:
Task: Write a competitive analysis report for our product.
Plan:
1. Research competitor A: features, pricing, reviews
2. Research competitor B: features, pricing, reviews
3. Research competitor C: features, pricing, reviews
4. Analyze our product's strengths vs. each competitor
5. Identify gaps and opportunities
6. Write the executive summary
7. Compile the full report
Phase 2 — An executor runs each step (often in parallel where possible):
[Parallel execution]
Step 1: search("competitor A features pricing")
Step 2: search("competitor B features pricing")
Step 3: search("competitor C features pricing")
[Sequential execution — waits for steps 1-3]
Step 4: analyze(data from steps 1-3)
Step 5: identify_gaps(analysis from step 4)
...
When Plan-and-Execute works well
- Report generation requiring multiple independent research threads
- Data collection tasks that can be parallelized
- Tasks where the steps are known but the data isn't
- When you want explicit human review of the plan before execution
When Plan-and-Execute struggles
- Tasks where the plan needs to change based on what's discovered (ReAct handles this better)
- Tasks where the plan itself is the hard part (you don't know the steps upfront)
- When the planner makes wrong assumptions that cascade through execution
A common fix: Add a "replan" step. After executing a few steps, check whether the remaining plan still makes sense given what's been discovered. If not, replan before continuing.
Pattern 3: Reflexion (Self-Evaluation + Revision)
When to use: Tasks where quality matters more than speed, and where you can define what "good" looks like.
How it works
The agent generates an output, evaluates it, writes a reflection on what went wrong, and revises:
Step 1 — Generate:
[Agent writes initial answer/code/document]
Step 2 — Evaluate:
Does this meet the requirements?
- Requirement A: ✓ Pass
- Requirement B: ✗ Fail — I didn't handle the edge case where the list is empty
- Requirement C: ✗ Fail — The output format is wrong
Step 3 — Reflect:
"My solution fails for empty inputs because I access index 0 without checking length.
My output format uses bullet points when the requirement asked for a JSON object."
Step 4 — Revise:
[Agent writes improved answer incorporating the reflection]
Step 5 — Evaluate again:
- Requirement A: ✓ Pass
- Requirement B: ✓ Pass (fixed empty list check)
- Requirement C: ✓ Pass (fixed format)
Done.
When Reflexion works well
- Code generation (unit tests provide clear pass/fail signals)
- Structured document creation (evaluate against a rubric)
- Reasoning tasks where you can check the logic
- Any output where "first draft" quality is consistently insufficient
The key ingredient: clear evaluation criteria
Reflexion requires evaluating whether the output meets requirements. The easier the evaluation, the more effective Reflexion is.
| Easy to evaluate | Hard to evaluate |
|---|---|
| Code that passes unit tests | "Is this writing good?" |
| JSON that matches a schema | "Is this advice sound?" |
| Checklist items | "Is this creative enough?" |
When evaluation is fuzzy, Reflexion can loop without improving — the agent doesn't know what "better" means.
Combining Patterns: The Practical Reality
Most real-world agents use combinations:
For a complex research task:
Plan-and-Execute: Break the task into research threads
↓ (for each thread)
ReAct: Adaptively research each thread, adjusting based on what's found
↓ (for outputs)
Reflexion: Check outputs against quality criteria, revise if needed
For code generation:
ReAct: Understand requirements, explore the codebase, identify approach
↓
Plan-and-Execute: Plan the implementation (functions, modules, tests)
↓
Reflexion: Generate code, run tests, evaluate failures, revise until passing
For customer support:
ReAct: Look up customer data, check account status, find relevant policies
↓
(no Reflexion — just return the grounded answer)
Quick Decision Guide
| Situation | Pattern to try |
|---|---|
| Don't know the right steps upfront | ReAct |
| Steps are known; want parallelism | Plan-and-Execute |
| Need to improve output quality | Reflexion |
| Complex task requiring multiple capabilities | Combine all three |
| Simple task needing one or two tools | Basic ReAct (no need for complex orchestration) |
The Underlying Lesson
These patterns exist because different problems have different failure modes:
- ReAct fails when the model loops without progress — add a step limit and a forced-conclusion trigger
- Plan-and-Execute fails when assumptions in the plan don't hold — add replan checkpoints
- Reflexion fails when evaluation criteria are vague — write explicit rubrics before you start
Knowing the failure modes is as useful as knowing the patterns. When your agent gets stuck, the pattern's typical failure mode is usually where to look first.
For a deeper dive into how each of these patterns is implemented at the prompt level, the AI Agents track on MasterPrompting.net covers ReAct, context engineering, and evaluating agents with code examples throughout.



