Anthropic's documentation for Claude Opus 4.6 includes a notable warning: the model has a "strong predilection for subagents" and may spawn them in situations where a simpler, direct approach would suffice. That observation cuts both ways. It means Opus 4.6 is genuinely good at decomposing complex goals into parallelisable workstreams — and it means you need to be deliberate about when you actually want that behaviour.
The planner/executor/critic triad is the pattern that makes this strength useful without letting it run wild. It gives you parallel execution where the task warrants it, quality gates before results get integrated, and a clean architecture you can debug when something goes wrong.
Why multi-agent for complex tasks?
A single agent planning a complex task upfront has a fundamental problem: errors at step 3 compound through steps 4, 5, and 6. By the time you see a wrong output, you're unwinding multiple decisions. There's also no parallel execution — everything is sequential.
Multi-agent solves different problems:
Parallelisation: A code review that checks security, performance, and style can run all three checks simultaneously. A research task covering three market segments doesn't need them to be sequential.
Isolation: Each executor gets only the context it needs. This prevents earlier mistakes from polluting later reasoning.
Self-correction: A critic role creates a quality gate before results get integrated. The executor doesn't decide if its output is good — a separate agent does.
Context window management: For tasks that genuinely exceed one context window, parallel subagents each working on bounded chunks is the right architecture.
Use multi-agent when tasks can be parallelised, when multiple independent perspectives add value, or when the task spans more tokens than a single context can reliably handle. For simple, short tasks — don't add this complexity.
The planner/executor/critic pattern
The flow works like this:
- Planner receives the goal, decomposes it into subtasks, identifies which can run in parallel vs must be sequential
- Executors (one or more, running in parallel where possible) carry out bounded subtasks with narrow context
- Critic reviews each executor output and issues a verdict: ACCEPT or REJECT with specific feedback
- Planner (or a Synthesiser) integrates accepted outputs into the final result
Rejected outputs loop back to the executor with the critic's feedback. Typically you allow 1-2 retry attempts before escalating.
The Planner agent
The planner's job is decomposition and orchestration. It needs the full goal, constraints, and available tools.
Key instruction that prevents Opus 4.6 from over-spawning subagents:
PLANNER_SYSTEM = """You are a planning agent. Your job is to decompose the goal into concrete subtasks and assign them to executor agents.
For each subtask, specify:
- What to do (specific and bounded)
- What the expected output looks like
- Dependencies (which other subtasks this depends on)
- Whether it can run in parallel with other subtasks
Use subagents when:
- Tasks can run in parallel without sharing context
- Tasks require isolated reasoning without contamination from other subtasks
- Tasks are large enough to benefit from dedicated context
Do NOT use subagents when:
- The task is simple and sequential
- Tasks need to share context closely (do them yourself in sequence)
- A single-file edit or minor lookup is involved
- The overhead of coordination exceeds the benefit of parallelism
Return your plan as a structured list of subtasks with their dependencies and parallel/sequential classification."""
The Executor agent(s)
Executors get narrow scope and focused context. They should not ask clarifying questions — they make reasonable assumptions, note them, and complete the task.
EXECUTOR_SYSTEM = """You are an executor agent. You have been assigned a specific subtask.
Your responsibilities:
1. Complete the assigned task precisely and completely
2. Stay within your assigned scope — do not expand to related work
3. Make reasonable assumptions when information is missing; state your assumptions clearly
4. Produce the specific output format requested
5. Do not ask clarifying questions — use your best judgment and note what you assumed
Your output should be directly usable by a reviewer. Include your assumptions as a brief note at the end."""
The Critic agent
The critic is the most important role to get right. Vague feedback ("could be better") is useless. The critic must be specific.
CRITIC_SYSTEM = """You are a quality reviewer. Your job is to evaluate executor output against the task specification.
Your verdict must be exactly one of:
- ACCEPT — the output meets the requirements and is ready for integration
- REJECT — the output has specific problems that must be fixed
If REJECT: explain precisely what is wrong. Do not say "needs improvement." Say:
- What specific requirement is not met
- What the executor should do differently
- What the corrected output should look like (if it's a small fix)
Be direct. Vague feedback wastes cycles. If the output is 90% correct but has one specific fixable problem, say exactly what that problem is.
Format your response as:
VERDICT: [ACCEPT/REJECT]
FEEDBACK: [Specific feedback if REJECT, brief confirmation of what's good if ACCEPT]"""
Python implementation
Here's the full orchestration loop. Executors run in parallel using asyncio.gather. The planner and critic use Opus 4.6 — they need orchestration intelligence. Executors use Sonnet 4.6 — capable enough for bounded tasks, significantly cheaper.
import asyncio
import json
from anthropic import Anthropic
client = Anthropic()
def planner_call(goal: str, context: str = "") -> dict:
"""Call Opus 4.6 to decompose the goal into subtasks."""
prompt = f"Goal: {goal}"
if context:
prompt += f"\n\nAdditional context: {context}"
prompt += "\n\nDecompose this into subtasks. Return JSON with structure: {subtasks: [{id, description, expected_output, dependencies: [], can_parallel: bool}]}"
response = client.messages.create(
model="claude-opus-4-6",
max_tokens=4000,
system=PLANNER_SYSTEM,
messages=[{"role": "user", "content": prompt}]
)
# Extract JSON from response
text = response.content[0].text
# Simple extraction — in production, use a more robust parser
start = text.find('{')
end = text.rfind('}') + 1
return json.loads(text[start:end])
async def executor_call(task: dict, feedback: str = "") -> dict:
"""Call Sonnet 4.6 to execute a single subtask."""
prompt = f"Your assigned task:\n{task['description']}\n\nExpected output format:\n{task['expected_output']}"
if feedback:
prompt += f"\n\n---\nPrevious attempt was rejected. Specific feedback:\n{feedback}\n\nPlease address this feedback in your revised output."
# Use asyncio to run in thread pool (anthropic SDK is sync)
loop = asyncio.get_event_loop()
response = await loop.run_in_executor(
None,
lambda: client.messages.create(
model="claude-sonnet-4-6",
max_tokens=4000,
system=EXECUTOR_SYSTEM,
messages=[{"role": "user", "content": prompt}]
)
)
return {
"task_id": task["id"],
"output": response.content[0].text
}
def critic_call(task: dict, executor_output: str) -> dict:
"""Call Opus 4.6 to evaluate executor output."""
prompt = f"""Task specification:
{task['description']}
Expected output:
{task['expected_output']}
Executor output to review:
{executor_output}
Evaluate this output and provide your verdict."""
response = client.messages.create(
model="claude-opus-4-6",
max_tokens=1000,
system=CRITIC_SYSTEM,
messages=[{"role": "user", "content": prompt}]
)
text = response.content[0].text
verdict = "ACCEPT" if "VERDICT: ACCEPT" in text else "REJECT"
feedback = text.split("FEEDBACK:", 1)[-1].strip() if "FEEDBACK:" in text else text
return {"verdict": verdict, "feedback": feedback}
async def run_with_review(task: dict, max_retries: int = 2) -> dict:
"""Execute a task, critique it, and retry if rejected."""
feedback = ""
for attempt in range(max_retries + 1):
result = await executor_call(task, feedback=feedback)
review = critic_call(task, result["output"])
if review["verdict"] == "ACCEPT":
return {"task_id": task["id"], "output": result["output"], "attempts": attempt + 1}
if attempt < max_retries:
feedback = review["feedback"]
print(f"Task {task['id']} rejected (attempt {attempt + 1}): {feedback[:100]}...")
else:
# Return best attempt with a note
return {
"task_id": task["id"],
"output": result["output"],
"attempts": attempt + 1,
"note": f"Max retries reached. Last feedback: {review['feedback']}"
}
async def run_multi_agent(goal: str, context: str = "") -> str:
"""Full multi-agent pipeline: plan → execute in parallel → critique → synthesise."""
# 1. Plan
print("Planning...")
plan = planner_call(goal, context)
tasks = plan["subtasks"]
print(f"Plan: {len(tasks)} subtasks, {sum(1 for t in tasks if t['can_parallel'])} can run in parallel")
# 2. Execute parallel tasks simultaneously, sequential ones in order
parallel_tasks = [t for t in tasks if t["can_parallel"] and not t["dependencies"]]
sequential_tasks = [t for t in tasks if not t["can_parallel"] or t["dependencies"]]
results = {}
# Run parallel tasks
if parallel_tasks:
print(f"Running {len(parallel_tasks)} tasks in parallel...")
parallel_results = await asyncio.gather(*[run_with_review(task) for task in parallel_tasks])
for r in parallel_results:
results[r["task_id"]] = r["output"]
# Run sequential tasks (simplified — proper dependency resolution would check task["dependencies"])
for task in sequential_tasks:
print(f"Running sequential task: {task['id']}...")
# Include outputs from dependencies as context
dep_context = "\n\n".join([
f"Output from {dep}: {results.get(dep, 'Not available')}"
for dep in task.get("dependencies", [])
])
if dep_context:
task["description"] += f"\n\nContext from prior tasks:\n{dep_context}"
result = await run_with_review(task)
results[result["task_id"]] = result["output"]
# 3. Synthesise
synthesis_prompt = f"""Original goal: {goal}
Completed subtask outputs:
{chr(10).join(f'--- {k} ---{chr(10)}{v}' for k, v in results.items())}
Synthesise these outputs into a single coherent result that fully addresses the original goal."""
synthesis = client.messages.create(
model="claude-opus-4-6",
max_tokens=6000,
messages=[{"role": "user", "content": synthesis_prompt}]
)
return synthesis.content[0].text
Controlling Opus 4.6's subagent eagerness
The system prompt above includes the "use subagents when / do NOT use subagents when" block. This matters. Without it, Opus 4.6 will decompose even simple tasks into subagents because it's been trained to be helpful through delegation.
Add this specific language to your planner system prompt (exact wording matters):
planner_addition = """
IMPORTANT: Subagent overhead is real. Creating a subagent for a task that takes 30 seconds
adds coordination cost of 5-10 seconds plus additional API calls. Reserve subagents for
tasks where parallelism saves meaningful time or where context isolation is genuinely
necessary. A 500-token lookup does not need a subagent. A 10,000-token analysis of an
independent workstream does.
"""
In testing, this language reduces unnecessary subagent spawning by roughly 60% on tasks that don't need it.
Real example — code review pipeline
Here's a concrete use case: reviewing a PR for security, performance, and style simultaneously.
async def review_pull_request(pr_diff: str, codebase_context: str = "") -> str:
"""Three-stream parallel code review."""
tasks = [
{
"id": "security",
"description": f"""Review this PR diff specifically for security vulnerabilities.
Check against OWASP Top 10. For each vulnerability: location, severity, explanation, fix.
PR diff:
{pr_diff}""",
"expected_output": "Numbered list of security issues with severity, location, and fix. 'No issues found' if clean.",
"dependencies": [],
"can_parallel": True
},
{
"id": "performance",
"description": f"""Review this PR diff specifically for performance issues.
Check for: N+1 queries, unnecessary loops, blocking I/O, memory leaks, missing indexes.
PR diff:
{pr_diff}""",
"expected_output": "Numbered list of performance issues with estimated impact and fix. 'No issues found' if clean.",
"dependencies": [],
"can_parallel": True
},
{
"id": "style",
"description": f"""Review this PR diff for code style and convention issues.
Check for: naming conventions, function complexity, missing error handling, inadequate comments.
PR diff:
{pr_diff}
{f"Codebase conventions: {codebase_context}" if codebase_context else ""}""",
"expected_output": "Numbered list of style issues with line references and suggestions. 'No issues found' if clean.",
"dependencies": [],
"can_parallel": True
}
]
# Run all three in parallel
results = await asyncio.gather(*[run_with_review(task) for task in tasks])
results_dict = {r["task_id"]: r["output"] for r in results}
# Synthesise into a single PR comment
synthesis = client.messages.create(
model="claude-opus-4-6",
max_tokens=3000,
messages=[{
"role": "user",
"content": f"""Combine these three code review outputs into a single, well-structured PR comment.
Group by severity. Use markdown formatting suitable for GitHub.
Security review:
{results_dict['security']}
Performance review:
{results_dict['performance']}
Style review:
{results_dict['style']}"""
}]
)
return synthesis.content[0].text
# Usage
pr_comment = asyncio.run(review_pull_request(pr_diff=your_diff))
print(pr_comment)
This runs three Sonnet 4.6 calls in parallel, each reviewed by Opus 4.6, then synthesised by Opus 4.6. Total latency is roughly equal to one sequential review pass, but you get three independent perspectives.
When NOT to use multi-agent
Multi-agent is overhead. Use it when the overhead is worth it.
Don't use it for:
- Simple tasks — a single
client.messages.create()call witheffort="high"will outperform a three-agent pipeline on most tasks under ~1,000 tokens of output. - Tasks with tight context sharing — if your executor needs to know what every other executor found, you need sequential execution, not parallel. The isolation that makes parallel useful becomes a liability.
- Cost-sensitive low-complexity applications — multi-agent multiplies token usage by 3-5x. For a startup on a tight budget doing simple chat interactions, this is the wrong architecture.
- Latency-critical paths — coordination adds round-trips. If you need sub-second responses, single-agent is almost always faster.
The test I apply: "If this task were assigned to three humans in parallel, would they produce better output than one human doing it sequentially?" If the answer is no, single-agent.
💡 Want to go deeper? The AI Agents track covers multi-agent orchestration with hands-on exercises and production patterns — including how to handle failures, shared state, and agent communication protocols.
Next steps
- Multi-agent systems — architectural patterns beyond planner/executor/critic
- Build your first AI agent — start here if you're new to agents
- Claude Opus 4.6 prompting guide — get the most out of the orchestrator model
- AI agent design patterns — broader survey of agent architectures
Try it now with AICredits.in
Access Claude Opus 4.6, Sonnet 4.6, GPT-4o, and 300+ models with UPI payment in ₹. Run multi-agent pipelines without an international card. Create free account →



