Everyone building with AI agents eventually hits the same question: should I split this into multiple agents? The answer is usually no — but the cases where yes applies are specific and important. This post gives you a decision framework, not a sales pitch for complexity.
Why people over-engineer with multiple agents
Multi-agent systems feel more powerful, more "real AI." There's something satisfying about a diagram with five boxes connected by arrows. But every agent boundary you add comes with a real cost:
- Latency: another LLM call, another round-trip
- Cost: more input and output tokens, every time
- New failure surfaces: message passing errors, context loss between agents, misrouting
- Debugging complexity: when something goes wrong in a pipeline, it's much harder to trace where
The default should always be: one agent with the right tools. Not because multi-agent systems aren't useful — they are — but because you should only add complexity when you've identified a specific problem that complexity solves.
The single-agent-with-tools baseline
Before splitting anything, ask whether a single agent with well-designed tools can do the job. Most tasks can. A single agent with the right tools can:
- Search the web, read files, write code, and send emails in one coherent session
- Handle multi-step research, synthesis, and output formatting
- Manage a customer conversation end-to-end, from triage to resolution
If one agent can do it coherently, it should. The overhead you avoid — in latency, cost, and operational complexity — is substantial. A three-step task handled by one agent with three tools is almost always faster, cheaper, and easier to debug than a three-agent pipeline doing the same work.
The failure mode here isn't that single agents can't do much. It's that people reach for multiple agents before they've even tried giving one agent the right tools.
The 3 patterns for when multi-agent actually makes sense
When you do hit a genuine limitation, there are three patterns worth knowing. Each solves a different problem.
Pattern 1: Pipeline (sequential)
Structure: Agent A → Agent B → Agent C, each feeding into the next.
Use when: Tasks have strict ordering, the output of one step is the meaningful input to the next, and each step benefits from a different context or skill set.
Concrete example: A content production pipeline.
- Researcher agent: given a topic and search tools, finds sources, pulls key data, and produces a structured brief
- Writer agent: given the brief (not the raw sources), drafts an article with appropriate tone and structure
- Editor agent: given the draft, fact-checks claims, tightens the writing, and flags any issues
Why this works: each agent gets a clean, focused prompt with only what it needs. The researcher doesn't need to know how to write an engaging intro. The writer doesn't need web search in its context. The editor doesn't need either — just the draft and a checklist. Focused prompts produce better outputs than kitchen-sink prompts that ask one agent to do all three jobs simultaneously.
The cost: 3× the token cost and latency of a single agent, minimum. If each agent call takes two seconds, this pipeline takes six seconds before any business logic. Factor that in.
When not to use it: When one agent with "research + write + edit" tools would produce equivalent quality. For most medium-complexity content tasks, it would.
Pattern 2: Parallel (concurrent)
Structure: Orchestrator splits a task → agents A, B, and C run at the same time → aggregator combines their outputs.
Use when: Subtasks are genuinely independent of each other, and each one takes meaningful time or tokens.
Concrete example: Market research across ten competitors.
An orchestrator dispatches ten agents simultaneously — one per competitor — each analyzing pricing, features, and reviews. An aggregator then synthesizes the findings. Wall-clock time drops to roughly max(A, B, C, ...) instead of A + B + C + ... That's a real speedup: what would take 20 seconds sequentially might take 2–3 seconds in parallel.
Implementation-wise: n8n handles this with parallel branches and a merge node; LangGraph supports it with parallel edges; plain Python can use asyncio.gather for async calls to the same model.
When not to use it: When tasks have dependencies. If Agent B needs Agent A's output to proceed, they're not actually independent — you've just forced a parallel structure onto a sequential problem, which causes consistency issues and usually produces worse results than just running them in order.
Pattern 3: Hierarchical (orchestrator + specialists)
Structure: Orchestrator receives a query → routes to a specialist agent based on intent → specialist responds → orchestrator presents the result.
Use when: You have genuinely different domains with different context requirements, and routing can be reliably determined at the orchestrator level.
Concrete example: An enterprise support system.
- Orchestrator classifies the incoming query
- Billing Agent handles pricing, invoices, and payment issues — its system prompt and context are loaded with billing documentation
- Tech Support Agent handles product bugs and configuration issues — its context has product docs and known issue lists
- Legal Agent handles contract and compliance queries — its context has legal agreements and policy documents
Why this works: each specialist operates with a focused, domain-appropriate context window. The billing agent doesn't need 50 pages of technical documentation cluttering its prompt. The tech agent doesn't need legal contracts. This matters a lot for both accuracy and cost.
Failure modes to watch for:
Misrouting: the orchestrator sends a query to the wrong specialist. This is the most common failure. Fix it with better orchestrator prompts, confidence thresholds, and a fallback "general" agent for ambiguous queries.
Context loss at handoff: the specialist doesn't have enough history to answer coherently. Fix it by passing structured summaries of the conversation context, not just the latest message.
The decision matrix
| Scenario | Pattern | Reason |
|---|---|---|
| Sequential steps where output feeds forward | Pipeline | Clean separation of concerns |
| Independent subtasks that can run at the same time | Parallel | Reduce wall-clock latency |
| Different domains needing different context | Hierarchical | Focused specialist prompts |
| One coherent task with multiple tools | Single agent | Simpler, cheaper, easier to debug |
| Simple RAG Q&A | Single agent | No benefit to splitting |
Real "should I split this?" examples
These are the actual architecture decisions worth thinking through.
Research + writing + fact-check: split it (pipeline)
Each step is meaningfully different, they're strictly sequential, and each benefits from a focused prompt. The researcher's job is to gather and structure; the writer's job is to craft; the editor's job is to verify. Conflating all three into one agent produces a prompt so long and multi-directional that quality degrades. This is one of the clearest cases for a pipeline.
Customer support Q&A with a knowledge base: don't split
A single agent with a search tool that queries your knowledge base handles this well. Adding a routing layer and specialist agents for a basic Q&A system adds latency, cost, and failure modes without improving answer quality. Keep it simple.
Analyzing 10 market segments simultaneously: split it (parallel)
The segments are genuinely independent. No segment's analysis depends on any other. Each analysis takes substantial tokens and time. Running them in parallel gives you real latency savings and scales naturally. This is the clearest case for parallel architecture.
Writing a long-form document: don't split
This is a common mistake. Splitting a 3,000-word article into sections assigned to different agents creates incoherence — different "voices," inconsistent terminology, arguments that don't build on each other. One agent maintains the through-line. If context length is a concern, use a single agent with section-by-section passes, preserving context throughout.
Enterprise support with billing/tech/legal: split it (hierarchical)
The domains are genuinely distinct. The context requirements don't overlap. Routing can be determined reliably from the first message in most cases. This is the textbook case for a hierarchical system — each specialist is better because it's focused, not despite it.
A 3-step form-filling workflow: don't split
This isn't an agent problem. It's a workflow problem. Filling out fields in order is a deterministic sequential process. Use n8n or a similar tool with sequential nodes. Adding LLM agents to each step introduces cost and non-determinism to a problem that doesn't need either.
The cost model
Be specific about what you're committing to before building multi-agent architecture. Take a simple three-agent pipeline:
- Agent A: 800 input tokens, 400 output tokens
- Agent B: 900 input tokens (including A's output as context), 500 output tokens
- Agent C: 1,200 input tokens (including B's output), 400 output tokens
That's 2,900 input tokens and 1,300 output tokens just for the pipeline — before any business logic, tool calls, or context enrichment. At typical API pricing, you've already 3× the cost of a single-agent approach.
Then add latency: if each call takes two seconds average, that's six seconds minimum for the pipeline. User-facing flows with three-second response-time expectations simply can't absorb this without caching or pre-computation.
Quantify before you commit. "This should be faster with parallel agents" is a hypothesis; verify it with actual numbers for your specific task and model.
Failure modes specific to multi-agent systems
Single-agent failures are usually visible — the agent does the wrong thing in front of you. Multi-agent failures are often silent or subtle. Know what to watch for.
Context loss at handoffs
Agent A produces rich, detailed output. But when it's passed to Agent B, only a high-level summary gets included — the specific numbers, edge cases, and caveats that mattered are gone. Agent B then makes decisions based on incomplete information. Fix: pass structured outputs between agents (JSON or clearly delimited sections), not free-text summaries. If Agent A should preserve specific data for Agent B, define that schema explicitly.
Infinite delegation loops
Orchestrator routes to Specialist, Specialist decides it needs more information and routes back to Orchestrator, Orchestrator routes back to Specialist. This happens when specialists aren't given enough context to resolve queries themselves. Fix: set max_iterations at the framework level, and add "resolve the query directly, do not re-delegate" to specialist system prompts.
Inconsistent decisions in parallel runs
Two parallel agents have overlapping scope and make contradictory decisions — Agent A recommends approach X, Agent B recommends approach Y, and the aggregator doesn't know how to reconcile them. Fix: define task boundaries precisely before running in parallel. If tasks genuinely overlap, they're not independent — run them sequentially or merge them.
Silent failures propagating downstream
Agent A fails partway through its task and returns a partial or malformed output. Agent B doesn't check — it processes whatever it received. Agent C inherits the corrupted state and produces garbage that looks plausible. Fix: each agent must validate both its inputs (is this what I expected?) and its outputs (does this meet the spec I was given?) before passing forward. Explicit output schemas catch this early.
Tools for each pattern
Pipeline: n8n sequential AI nodes, LangChain SequentialChain, LangGraph as a linear graph with one edge between nodes.
Parallel: n8n parallel branches with a merge node, LangGraph with parallel edges fanning out from a router node, async Python with asyncio.gather for concurrent model calls.
Hierarchical: LangGraph with routing nodes (see our post on stateful agents with LangGraph), CrewAI with role-based agents and a manager, n8n with a Switch node routing to different sub-workflows.
For a deeper look at how these patterns work at the architecture level, the multi-agent systems lesson in the Agents track covers the components and design decisions in detail.
The bottom line
The best multi-agent system is often a single agent with good tools. Don't split because it feels more sophisticated — split when you've clearly identified why a single agent can't do the job and what specific property of the multi-agent pattern solves that.
The framework is simple: if tasks are sequential with different context needs, use a pipeline. If tasks are truly independent and time matters, use parallel. If tasks span genuinely different domains, use hierarchical. Everything else — one agent, good tools, and a well-structured prompt.
Start simple. Split when you hit a real wall, not before.



