What is prompt engineering?

Prompt engineering is the practice of crafting inputs to AI language models to produce accurate, useful, and reliable outputs. It involves choosing the right words, structure, context, and format to guide the AI toward the response you actually need — rather than a generic or off-target one.

Which AI models benefit most from better prompting?

All major large language models — including ChatGPT (GPT-4o), Claude, and Gemini — respond significantly to prompt quality. The same task can produce dramatically different results depending on how you structure your request. Better prompting improves output across every major model.

Do I need technical skills to do prompt engineering?

No. Prompt engineering is done in natural language — you write text instructions, not code. Basic prompting needs no technical background at all. Advanced techniques like prompt chaining or agentic workflows can benefit from light scripting knowledge, but the core skill is clear written communication.

Where can I learn more about prompt engineering?

MasterPrompting.net offers a structured curriculum from beginner to advanced, covering every major technique from basic clarity and context to chain-of-thought, meta-prompting, and agentic workflows. Start with the Beginner track to build a solid foundation.

Multi-Agent Workflows: When to Use One Agent vs Many

Everyone building with AI agents eventually hits the same question: should I split this into multiple agents? The answer is usually no — but the cases where yes applies are specific and important. This post gives you a decision framework, not a sales pitch for complexity.

Why people over-engineer with multiple agents

Multi-agent systems feel more powerful, more "real AI." There's something satisfying about a diagram with five boxes connected by arrows. But every agent boundary you add comes with a real cost:

Latency: another LLM call, another round-trip
Cost: more input and output tokens, every time
New failure surfaces: message passing errors, context loss between agents, misrouting
Debugging complexity: when something goes wrong in a pipeline, it's much harder to trace where

The default should always be: one agent with the right tools. Not because multi-agent systems aren't useful — they are — but because you should only add complexity when you've identified a specific problem that complexity solves.

The single-agent-with-tools baseline

Before splitting anything, ask whether a single agent with well-designed tools can do the job. Most tasks can. A single agent with the right tools can:

Search the web, read files, write code, and send emails in one coherent session
Handle multi-step research, synthesis, and output formatting
Manage a customer conversation end-to-end, from triage to resolution

If one agent can do it coherently, it should. The overhead you avoid — in latency, cost, and operational complexity — is substantial. A three-step task handled by one agent with three tools is almost always faster, cheaper, and easier to debug than a three-agent pipeline doing the same work.

The failure mode here isn't that single agents can't do much. It's that people reach for multiple agents before they've even tried giving one agent the right tools.

The 3 patterns for when multi-agent actually makes sense

When you do hit a genuine limitation, there are three patterns worth knowing. Each solves a different problem.

Pattern 1: Pipeline (sequential)

Structure: Agent A → Agent B → Agent C, each feeding into the next.

Use when: Tasks have strict ordering, the output of one step is the meaningful input to the next, and each step benefits from a different context or skill set.

Concrete example: A content production pipeline.

Researcher agent: given a topic and search tools, finds sources, pulls key data, and produces a structured brief
Writer agent: given the brief (not the raw sources), drafts an article with appropriate tone and structure
Editor agent: given the draft, fact-checks claims, tightens the writing, and flags any issues

Why this works: each agent gets a clean, focused prompt with only what it needs. The researcher doesn't need to know how to write an engaging intro. The writer doesn't need web search in its context. The editor doesn't need either — just the draft and a checklist. Focused prompts produce better outputs than kitchen-sink prompts that ask one agent to do all three jobs simultaneously.

The cost: 3× the token cost and latency of a single agent, minimum. If each agent call takes two seconds, this pipeline takes six seconds before any business logic. Factor that in.

When not to use it: When one agent with "research + write + edit" tools would produce equivalent quality. For most medium-complexity content tasks, it would.

Pattern 2: Parallel (concurrent)

Structure: Orchestrator splits a task → agents A, B, and C run at the same time → aggregator combines their outputs.

Use when: Subtasks are genuinely independent of each other, and each one takes meaningful time or tokens.

Concrete example: Market research across ten competitors.

An orchestrator dispatches ten agents simultaneously — one per competitor — each analyzing pricing, features, and reviews. An aggregator then synthesizes the findings. Wall-clock time drops to roughly max(A, B, C, ...) instead of A + B + C + ... That's a real speedup: what would take 20 seconds sequentially might take 2–3 seconds in parallel.

Implementation-wise: n8n handles this with parallel branches and a merge node; LangGraph supports it with parallel edges; plain Python can use asyncio.gather for async calls to the same model.

When not to use it: When tasks have dependencies. If Agent B needs Agent A's output to proceed, they're not actually independent — you've just forced a parallel structure onto a sequential problem, which causes consistency issues and usually produces worse results than just running them in order.

Pattern 3: Hierarchical (orchestrator + specialists)

Structure: Orchestrator receives a query → routes to a specialist agent based on intent → specialist responds → orchestrator presents the result.

Use when: You have genuinely different domains with different context requirements, and routing can be reliably determined at the orchestrator level.

Concrete example: An enterprise support system.

Orchestrator classifies the incoming query
Billing Agent handles pricing, invoices, and payment issues — its system prompt and context are loaded with billing documentation
Tech Support Agent handles product bugs and configuration issues — its context has product docs and known issue lists
Legal Agent handles contract and compliance queries — its context has legal agreements and policy documents

Why this works: each specialist operates with a focused, domain-appropriate context window. The billing agent doesn't need 50 pages of technical documentation cluttering its prompt. The tech agent doesn't need legal contracts. This matters a lot for both accuracy and cost.

Failure modes to watch for:

Misrouting: the orchestrator sends a query to the wrong specialist. This is the most common failure. Fix it with better orchestrator prompts, confidence thresholds, and a fallback "general" agent for ambiguous queries.

Context loss at handoff: the specialist doesn't have enough history to answer coherently. Fix it by passing structured summaries of the conversation context, not just the latest message.

The decision matrix

Scenario	Pattern	Reason
Sequential steps where output feeds forward	Pipeline	Clean separation of concerns
Independent subtasks that can run at the same time	Parallel	Reduce wall-clock latency
Different domains needing different context	Hierarchical	Focused specialist prompts
One coherent task with multiple tools	Single agent	Simpler, cheaper, easier to debug
Simple RAG Q&A	Single agent	No benefit to splitting

Real "should I split this?" examples

These are the actual architecture decisions worth thinking through.

Research + writing + fact-check: split it (pipeline)

Each step is meaningfully different, they're strictly sequential, and each benefits from a focused prompt. The researcher's job is to gather and structure; the writer's job is to craft; the editor's job is to verify. Conflating all three into one agent produces a prompt so long and multi-directional that quality degrades. This is one of the clearest cases for a pipeline.

Customer support Q&A with a knowledge base: don't split

A single agent with a search tool that queries your knowledge base handles this well. Adding a routing layer and specialist agents for a basic Q&A system adds latency, cost, and failure modes without improving answer quality. Keep it simple.

Analyzing 10 market segments simultaneously: split it (parallel)

The segments are genuinely independent. No segment's analysis depends on any other. Each analysis takes substantial tokens and time. Running them in parallel gives you real latency savings and scales naturally. This is the clearest case for parallel architecture.

Writing a long-form document: don't split

This is a common mistake. Splitting a 3,000-word article into sections assigned to different agents creates incoherence — different "voices," inconsistent terminology, arguments that don't build on each other. One agent maintains the through-line. If context length is a concern, use a single agent with section-by-section passes, preserving context throughout.

Enterprise support with billing/tech/legal: split it (hierarchical)

The domains are genuinely distinct. The context requirements don't overlap. Routing can be determined reliably from the first message in most cases. This is the textbook case for a hierarchical system — each specialist is better because it's focused, not despite it.

A 3-step form-filling workflow: don't split

This isn't an agent problem. It's a workflow problem. Filling out fields in order is a deterministic sequential process. Use n8n or a similar tool with sequential nodes. Adding LLM agents to each step introduces cost and non-determinism to a problem that doesn't need either.

The cost model

Be specific about what you're committing to before building multi-agent architecture. Take a simple three-agent pipeline:

Agent A: 800 input tokens, 400 output tokens
Agent B: 900 input tokens (including A's output as context), 500 output tokens
Agent C: 1,200 input tokens (including B's output), 400 output tokens

That's 2,900 input tokens and 1,300 output tokens just for the pipeline — before any business logic, tool calls, or context enrichment. At typical API pricing, you've already 3× the cost of a single-agent approach.

Then add latency: if each call takes two seconds average, that's six seconds minimum for the pipeline. User-facing flows with three-second response-time expectations simply can't absorb this without caching or pre-computation.

Quantify before you commit. "This should be faster with parallel agents" is a hypothesis; verify it with actual numbers for your specific task and model.

Failure modes specific to multi-agent systems

Single-agent failures are usually visible — the agent does the wrong thing in front of you. Multi-agent failures are often silent or subtle. Know what to watch for.

Context loss at handoffs

Agent A produces rich, detailed output. But when it's passed to Agent B, only a high-level summary gets included — the specific numbers, edge cases, and caveats that mattered are gone. Agent B then makes decisions based on incomplete information. Fix: pass structured outputs between agents (JSON or clearly delimited sections), not free-text summaries. If Agent A should preserve specific data for Agent B, define that schema explicitly.

Infinite delegation loops

Orchestrator routes to Specialist, Specialist decides it needs more information and routes back to Orchestrator, Orchestrator routes back to Specialist. This happens when specialists aren't given enough context to resolve queries themselves. Fix: set max_iterations at the framework level, and add "resolve the query directly, do not re-delegate" to specialist system prompts.

Inconsistent decisions in parallel runs

Two parallel agents have overlapping scope and make contradictory decisions — Agent A recommends approach X, Agent B recommends approach Y, and the aggregator doesn't know how to reconcile them. Fix: define task boundaries precisely before running in parallel. If tasks genuinely overlap, they're not independent — run them sequentially or merge them.

Silent failures propagating downstream

Agent A fails partway through its task and returns a partial or malformed output. Agent B doesn't check — it processes whatever it received. Agent C inherits the corrupted state and produces garbage that looks plausible. Fix: each agent must validate both its inputs (is this what I expected?) and its outputs (does this meet the spec I was given?) before passing forward. Explicit output schemas catch this early.

Tools for each pattern

Pipeline: n8n sequential AI nodes, LangChain SequentialChain, LangGraph as a linear graph with one edge between nodes.

Parallel: n8n parallel branches with a merge node, LangGraph with parallel edges fanning out from a router node, async Python with asyncio.gather for concurrent model calls.

Hierarchical: LangGraph with routing nodes (see our post on stateful agents with LangGraph), CrewAI with role-based agents and a manager, n8n with a Switch node routing to different sub-workflows.

For a deeper look at how these patterns work at the architecture level, the multi-agent systems lesson in the Agents track covers the components and design decisions in detail.

The bottom line

The best multi-agent system is often a single agent with good tools. Don't split because it feels more sophisticated — split when you've clearly identified why a single agent can't do the job and what specific property of the multi-agent pattern solves that.

The framework is simple: if tasks are sequential with different context needs, use a pipeline. If tasks are truly independent and time matters, use parallel. If tasks span genuinely different domains, use hierarchical. Everything else — one agent, good tools, and a well-structured prompt.

Start simple. Split when you hit a real wall, not before.

Why people over-engineer with multiple agents

Multi-agent systems feel more powerful, more "real AI." There's something satisfying about a diagram with five boxes connected by arrows. But every agent boundary you add comes with a real cost:

Latency: another LLM call, another round-trip
Cost: more input and output tokens, every time
New failure surfaces: message passing errors, context loss between agents, misrouting
Debugging complexity: when something goes wrong in a pipeline, it's much harder to trace where

The single-agent-with-tools baseline

Before splitting anything, ask whether a single agent with well-designed tools can do the job. Most tasks can. A single agent with the right tools can:

Search the web, read files, write code, and send emails in one coherent session
Handle multi-step research, synthesis, and output formatting
Manage a customer conversation end-to-end, from triage to resolution

The failure mode here isn't that single agents can't do much. It's that people reach for multiple agents before they've even tried giving one agent the right tools.

The 3 patterns for when multi-agent actually makes sense

When you do hit a genuine limitation, there are three patterns worth knowing. Each solves a different problem.

Pattern 1: Pipeline (sequential)

Structure: Agent A → Agent B → Agent C, each feeding into the next.

Use when: Tasks have strict ordering, the output of one step is the meaningful input to the next, and each step benefits from a different context or skill set.

Concrete example: A content production pipeline.

Researcher agent: given a topic and search tools, finds sources, pulls key data, and produces a structured brief
Writer agent: given the brief (not the raw sources), drafts an article with appropriate tone and structure
Editor agent: given the draft, fact-checks claims, tightens the writing, and flags any issues

The cost: 3× the token cost and latency of a single agent, minimum. If each agent call takes two seconds, this pipeline takes six seconds before any business logic. Factor that in.

When not to use it: When one agent with "research + write + edit" tools would produce equivalent quality. For most medium-complexity content tasks, it would.

Pattern 2: Parallel (concurrent)

Structure: Orchestrator splits a task → agents A, B, and C run at the same time → aggregator combines their outputs.

Use when: Subtasks are genuinely independent of each other, and each one takes meaningful time or tokens.

Concrete example: Market research across ten competitors.

Implementation-wise: n8n handles this with parallel branches and a merge node; LangGraph supports it with parallel edges; plain Python can use asyncio.gather for async calls to the same model.

Pattern 3: Hierarchical (orchestrator + specialists)

Structure: Orchestrator receives a query → routes to a specialist agent based on intent → specialist responds → orchestrator presents the result.

Use when: You have genuinely different domains with different context requirements, and routing can be reliably determined at the orchestrator level.

Concrete example: An enterprise support system.

Orchestrator classifies the incoming query
Billing Agent handles pricing, invoices, and payment issues — its system prompt and context are loaded with billing documentation
Tech Support Agent handles product bugs and configuration issues — its context has product docs and known issue lists
Legal Agent handles contract and compliance queries — its context has legal agreements and policy documents

Failure modes to watch for:

Context loss at handoff: the specialist doesn't have enough history to answer coherently. Fix it by passing structured summaries of the conversation context, not just the latest message.

The decision matrix

Scenario	Pattern	Reason
Sequential steps where output feeds forward	Pipeline	Clean separation of concerns
Independent subtasks that can run at the same time	Parallel	Reduce wall-clock latency
Different domains needing different context	Hierarchical	Focused specialist prompts
One coherent task with multiple tools	Single agent	Simpler, cheaper, easier to debug
Simple RAG Q&A	Single agent	No benefit to splitting

Real "should I split this?" examples

These are the actual architecture decisions worth thinking through.

Research + writing + fact-check: split it (pipeline)

Customer support Q&A with a knowledge base: don't split

Analyzing 10 market segments simultaneously: split it (parallel)

Writing a long-form document: don't split

Enterprise support with billing/tech/legal: split it (hierarchical)

A 3-step form-filling workflow: don't split

The cost model

Be specific about what you're committing to before building multi-agent architecture. Take a simple three-agent pipeline:

Agent A: 800 input tokens, 400 output tokens
Agent B: 900 input tokens (including A's output as context), 500 output tokens
Agent C: 1,200 input tokens (including B's output), 400 output tokens

Quantify before you commit. "This should be faster with parallel agents" is a hypothesis; verify it with actual numbers for your specific task and model.

Failure modes specific to multi-agent systems

Single-agent failures are usually visible — the agent does the wrong thing in front of you. Multi-agent failures are often silent or subtle. Know what to watch for.

Context loss at handoffs

Infinite delegation loops

Inconsistent decisions in parallel runs

Silent failures propagating downstream

Tools for each pattern

Pipeline: n8n sequential AI nodes, LangChain SequentialChain, LangGraph as a linear graph with one edge between nodes.

Parallel: n8n parallel branches with a merge node, LangGraph with parallel edges fanning out from a router node, async Python with asyncio.gather for concurrent model calls.

For a deeper look at how these patterns work at the architecture level, the multi-agent systems lesson in the Agents track covers the components and design decisions in detail.

The bottom line

Start simple. Split when you hit a real wall, not before.

Multi-Agent Workflows: When to Use One Agent vs Many

Why people over-engineer with multiple agents

The single-agent-with-tools baseline

The 3 patterns for when multi-agent actually makes sense

Pattern 1: Pipeline (sequential)

Pattern 2: Parallel (concurrent)

Pattern 3: Hierarchical (orchestrator + specialists)

The decision matrix

Real "should I split this?" examples

The cost model

Failure modes specific to multi-agent systems

Tools for each pattern

The bottom line

Related articles

LlamaIndex vs LangChain for RAG in 2026 — A Code-First Comparison

MCP vs Function Calling vs Tool Use — What's the Difference?

Agentic search — build a research agent that synthesizes, not just lists

Multi-Agent Workflows: When to Use One Agent vs Many

Why people over-engineer with multiple agents

The single-agent-with-tools baseline

The 3 patterns for when multi-agent actually makes sense

Pattern 1: Pipeline (sequential)

Pattern 2: Parallel (concurrent)

Pattern 3: Hierarchical (orchestrator + specialists)

The decision matrix

Real "should I split this?" examples

The cost model

Failure modes specific to multi-agent systems

Tools for each pattern

The bottom line

Related articles

LlamaIndex vs LangChain for RAG in 2026 — A Code-First Comparison

MCP vs Function Calling vs Tool Use — What's the Difference?

Agentic search — build a research agent that synthesizes, not just lists