What are the main AI agent design patterns?

The three most important patterns are: (1) ReAct (Reasoning + Acting) — the model alternates between thinking and taking actions in a loop, adapting based on what each action returns; (2) Plan-and-Execute — the model creates a complete plan first, then executes each step; (3) Reflexion — the agent evaluates its outputs, identifies what went wrong, and revises. Most production agents combine elements of all three.

Which agent pattern should I use?

ReAct for tasks where the right sequence of actions isn't clear upfront and the agent needs to adapt based on what it finds. Plan-and-Execute for tasks with known structure where parallel execution would help. Reflexion whenever quality matters more than speed — it's best combined with ReAct or Plan-and-Execute as a quality improvement layer, not used alone.

Can I combine multiple agent patterns?

Yes, and that's often the right approach. A common combination: Plan-and-Execute for high-level task decomposition (breaking into steps), ReAct for executing each step adaptively (searching, using tools), and Reflexion for verifying and improving outputs. These patterns are complementary, not mutually exclusive.

AI Agent Design Patterns: ReAct, Plan-and-Execute, and Reflexion Explained

When I started building AI agents, I kept running into the same frustration: the agent would take a reasonable-sounding approach, hit an unexpected obstacle, and get stuck — or worse, confidently proceed down a wrong path.

What I eventually learned is that different agent patterns exist for a reason. Each one handles a different type of uncertainty differently. Knowing which pattern to reach for, and why, is most of what separates agents that work reliably from ones that work sometimes.

Pattern 1: ReAct (Reasoning + Acting)

When to use: Dynamic tasks where you don't know the right sequence of actions upfront. The agent needs to figure out what to do as it discovers information.

How it works

The agent alternates between Thought, Action, and Observation in a loop:

Thought: What do I need to find out to answer this?
Action: search("population of the 5 largest EU countries")
Observation: Germany 84M, France 68M, Italy 60M, Spain 47M, Poland 38M
Thought: Now I need to add these up.
Action: calculate(84 + 68 + 60 + 47 + 38)
Observation: 297
Thought: I have the total. I can answer now.
Final Answer: The 5 largest EU countries have a combined population of approximately 297 million.

The key feature: each observation informs the next thought. If the search had returned incomplete results, the next thought would have adjusted — "I only got 3 countries, I need to search again for the other two."

When ReAct works well

Research tasks where you don't know what information exists
Tasks where early results change what you search for next
Multi-hop reasoning ("Find X, then use X to find Y")
Debugging and investigation tasks

When ReAct struggles

Tasks where the model keeps looping without making progress
Tasks where the plan is known upfront and flexible execution isn't needed (wastes steps)
Very long reasoning chains (the context fills up with Thought/Action/Observation traces)

Pattern 2: Plan-and-Execute

When to use: Tasks with a predictable structure where you can decompose the work upfront and potentially execute steps in parallel.

How it works

Phase 1 — The planner LLM creates a detailed execution plan:

Task: Write a competitive analysis report for our product.

Plan:
1. Research competitor A: features, pricing, reviews
2. Research competitor B: features, pricing, reviews
3. Research competitor C: features, pricing, reviews
4. Analyze our product's strengths vs. each competitor
5. Identify gaps and opportunities
6. Write the executive summary
7. Compile the full report

Phase 2 — An executor runs each step (often in parallel where possible):

[Parallel execution]
Step 1: search("competitor A features pricing")
Step 2: search("competitor B features pricing")
Step 3: search("competitor C features pricing")

[Sequential execution — waits for steps 1-3]
Step 4: analyze(data from steps 1-3)
Step 5: identify_gaps(analysis from step 4)
...

When Plan-and-Execute works well

Report generation requiring multiple independent research threads
Data collection tasks that can be parallelized
Tasks where the steps are known but the data isn't
When you want explicit human review of the plan before execution

When Plan-and-Execute struggles

Tasks where the plan needs to change based on what's discovered (ReAct handles this better)
Tasks where the plan itself is the hard part (you don't know the steps upfront)
When the planner makes wrong assumptions that cascade through execution

A common fix: Add a "replan" step. After executing a few steps, check whether the remaining plan still makes sense given what's been discovered. If not, replan before continuing.

Pattern 3: Reflexion (Self-Evaluation + Revision)

When to use: Tasks where quality matters more than speed, and where you can define what "good" looks like.

How it works

The agent generates an output, evaluates it, writes a reflection on what went wrong, and revises:

Step 1 — Generate:
[Agent writes initial answer/code/document]

Step 2 — Evaluate:
Does this meet the requirements?
- Requirement A: ✓ Pass
- Requirement B: ✗ Fail — I didn't handle the edge case where the list is empty
- Requirement C: ✗ Fail — The output format is wrong

Step 3 — Reflect:
"My solution fails for empty inputs because I access index 0 without checking length.
My output format uses bullet points when the requirement asked for a JSON object."

Step 4 — Revise:
[Agent writes improved answer incorporating the reflection]

Step 5 — Evaluate again:
- Requirement A: ✓ Pass
- Requirement B: ✓ Pass (fixed empty list check)
- Requirement C: ✓ Pass (fixed format)
Done.

When Reflexion works well

Code generation (unit tests provide clear pass/fail signals)
Structured document creation (evaluate against a rubric)
Reasoning tasks where you can check the logic
Any output where "first draft" quality is consistently insufficient

The key ingredient: clear evaluation criteria

Reflexion requires evaluating whether the output meets requirements. The easier the evaluation, the more effective Reflexion is.

Easy to evaluate	Hard to evaluate
Code that passes unit tests	"Is this writing good?"
JSON that matches a schema	"Is this advice sound?"
Checklist items	"Is this creative enough?"

When evaluation is fuzzy, Reflexion can loop without improving — the agent doesn't know what "better" means.

Combining Patterns: The Practical Reality

Most real-world agents use combinations:

For a complex research task:

Plan-and-Execute: Break the task into research threads
    ↓ (for each thread)
ReAct: Adaptively research each thread, adjusting based on what's found
    ↓ (for outputs)
Reflexion: Check outputs against quality criteria, revise if needed

For code generation:

ReAct: Understand requirements, explore the codebase, identify approach
    ↓
Plan-and-Execute: Plan the implementation (functions, modules, tests)
    ↓
Reflexion: Generate code, run tests, evaluate failures, revise until passing

For customer support:

ReAct: Look up customer data, check account status, find relevant policies
    ↓
(no Reflexion — just return the grounded answer)

Quick Decision Guide

Situation	Pattern to try
Don't know the right steps upfront	ReAct
Steps are known; want parallelism	Plan-and-Execute
Need to improve output quality	Reflexion
Complex task requiring multiple capabilities	Combine all three
Simple task needing one or two tools	Basic ReAct (no need for complex orchestration)

The Underlying Lesson

These patterns exist because different problems have different failure modes:

ReAct fails when the model loops without progress — add a step limit and a forced-conclusion trigger
Plan-and-Execute fails when assumptions in the plan don't hold — add replan checkpoints
Reflexion fails when evaluation criteria are vague — write explicit rubrics before you start

Knowing the failure modes is as useful as knowing the patterns. When your agent gets stuck, the pattern's typical failure mode is usually where to look first.

For a deeper dive into how each of these patterns is implemented at the prompt level, the AI Agents track on MasterPrompting.net covers ReAct, context engineering, and evaluating agents with code examples throughout.

Pattern 1: ReAct (Reasoning + Acting)

When to use: Dynamic tasks where you don't know the right sequence of actions upfront. The agent needs to figure out what to do as it discovers information.

How it works

The agent alternates between Thought, Action, and Observation in a loop:

Thought: What do I need to find out to answer this?
Action: search("population of the 5 largest EU countries")
Observation: Germany 84M, France 68M, Italy 60M, Spain 47M, Poland 38M
Thought: Now I need to add these up.
Action: calculate(84 + 68 + 60 + 47 + 38)
Observation: 297
Thought: I have the total. I can answer now.
Final Answer: The 5 largest EU countries have a combined population of approximately 297 million.

When ReAct works well

Research tasks where you don't know what information exists
Tasks where early results change what you search for next
Multi-hop reasoning ("Find X, then use X to find Y")
Debugging and investigation tasks

When ReAct struggles

Tasks where the model keeps looping without making progress
Tasks where the plan is known upfront and flexible execution isn't needed (wastes steps)
Very long reasoning chains (the context fills up with Thought/Action/Observation traces)

Pattern 2: Plan-and-Execute

When to use: Tasks with a predictable structure where you can decompose the work upfront and potentially execute steps in parallel.

How it works

Phase 1 — The planner LLM creates a detailed execution plan:

Task: Write a competitive analysis report for our product.

Plan:
1. Research competitor A: features, pricing, reviews
2. Research competitor B: features, pricing, reviews
3. Research competitor C: features, pricing, reviews
4. Analyze our product's strengths vs. each competitor
5. Identify gaps and opportunities
6. Write the executive summary
7. Compile the full report

Phase 2 — An executor runs each step (often in parallel where possible):

[Parallel execution]
Step 1: search("competitor A features pricing")
Step 2: search("competitor B features pricing")
Step 3: search("competitor C features pricing")

[Sequential execution — waits for steps 1-3]
Step 4: analyze(data from steps 1-3)
Step 5: identify_gaps(analysis from step 4)
...

When Plan-and-Execute works well

Report generation requiring multiple independent research threads
Data collection tasks that can be parallelized
Tasks where the steps are known but the data isn't
When you want explicit human review of the plan before execution

When Plan-and-Execute struggles

Tasks where the plan needs to change based on what's discovered (ReAct handles this better)
Tasks where the plan itself is the hard part (you don't know the steps upfront)
When the planner makes wrong assumptions that cascade through execution

A common fix: Add a "replan" step. After executing a few steps, check whether the remaining plan still makes sense given what's been discovered. If not, replan before continuing.

Pattern 3: Reflexion (Self-Evaluation + Revision)

When to use: Tasks where quality matters more than speed, and where you can define what "good" looks like.

How it works

The agent generates an output, evaluates it, writes a reflection on what went wrong, and revises:

Step 1 — Generate:
[Agent writes initial answer/code/document]

Step 2 — Evaluate:
Does this meet the requirements?
- Requirement A: ✓ Pass
- Requirement B: ✗ Fail — I didn't handle the edge case where the list is empty
- Requirement C: ✗ Fail — The output format is wrong

Step 3 — Reflect:
"My solution fails for empty inputs because I access index 0 without checking length.
My output format uses bullet points when the requirement asked for a JSON object."

Step 4 — Revise:
[Agent writes improved answer incorporating the reflection]

Step 5 — Evaluate again:
- Requirement A: ✓ Pass
- Requirement B: ✓ Pass (fixed empty list check)
- Requirement C: ✓ Pass (fixed format)
Done.

When Reflexion works well

Code generation (unit tests provide clear pass/fail signals)
Structured document creation (evaluate against a rubric)
Reasoning tasks where you can check the logic
Any output where "first draft" quality is consistently insufficient

The key ingredient: clear evaluation criteria

Reflexion requires evaluating whether the output meets requirements. The easier the evaluation, the more effective Reflexion is.

Easy to evaluate	Hard to evaluate
Code that passes unit tests	"Is this writing good?"
JSON that matches a schema	"Is this advice sound?"
Checklist items	"Is this creative enough?"

When evaluation is fuzzy, Reflexion can loop without improving — the agent doesn't know what "better" means.

Combining Patterns: The Practical Reality

Most real-world agents use combinations:

For a complex research task:

Plan-and-Execute: Break the task into research threads
    ↓ (for each thread)
ReAct: Adaptively research each thread, adjusting based on what's found
    ↓ (for outputs)
Reflexion: Check outputs against quality criteria, revise if needed

For code generation:

ReAct: Understand requirements, explore the codebase, identify approach
    ↓
Plan-and-Execute: Plan the implementation (functions, modules, tests)
    ↓
Reflexion: Generate code, run tests, evaluate failures, revise until passing

For customer support:

ReAct: Look up customer data, check account status, find relevant policies
    ↓
(no Reflexion — just return the grounded answer)

Quick Decision Guide

Situation	Pattern to try
Don't know the right steps upfront	ReAct
Steps are known; want parallelism	Plan-and-Execute
Need to improve output quality	Reflexion
Complex task requiring multiple capabilities	Combine all three
Simple task needing one or two tools	Basic ReAct (no need for complex orchestration)

The Underlying Lesson

These patterns exist because different problems have different failure modes:

ReAct fails when the model loops without progress — add a step limit and a forced-conclusion trigger
Plan-and-Execute fails when assumptions in the plan don't hold — add replan checkpoints
Reflexion fails when evaluation criteria are vague — write explicit rubrics before you start

Knowing the failure modes is as useful as knowing the patterns. When your agent gets stuck, the pattern's typical failure mode is usually where to look first.

AI Agent Design Patterns: ReAct, Plan-and-Execute, and Reflexion Explained

Pattern 1: ReAct (Reasoning + Acting)

How it works

When ReAct works well

When ReAct struggles

Pattern 2: Plan-and-Execute

How it works

When Plan-and-Execute works well

When Plan-and-Execute struggles

Pattern 3: Reflexion (Self-Evaluation + Revision)

How it works

When Reflexion works well

The key ingredient: clear evaluation criteria

Combining Patterns: The Practical Reality

Quick Decision Guide

The Underlying Lesson

Related articles

Build Your First AI Agent: A Beginner's Step-by-Step Guide

What is Context Engineering? The Term Replacing 'Prompt Engineering' in 2025

OpenClaw Browser Relay: What It Is and How to Set It Up

AI Agent Design Patterns: ReAct, Plan-and-Execute, and Reflexion Explained

Pattern 1: ReAct (Reasoning + Acting)

How it works

When ReAct works well

When ReAct struggles

Pattern 2: Plan-and-Execute

How it works

When Plan-and-Execute works well

When Plan-and-Execute struggles

Pattern 3: Reflexion (Self-Evaluation + Revision)

How it works

When Reflexion works well

The key ingredient: clear evaluation criteria

Combining Patterns: The Practical Reality

Quick Decision Guide

The Underlying Lesson

Related articles

Build Your First AI Agent: A Beginner's Step-by-Step Guide

What is Context Engineering? The Term Replacing 'Prompt Engineering' in 2025

OpenClaw Browser Relay: What It Is and How to Set It Up