Here's the failure mode nobody warns you about: you write a detailed, well-structured prompt for a complex task. You run it. The output looks reasonable but is subtly wrong in 4 different ways at once. You edit the prompt to fix one problem and two others get worse. After 45 minutes you have something usable, but barely.
The root cause is almost always the same: you asked one prompt to do too many things. When a single prompt handles research, synthesis, structure, tone, and formatting simultaneously, errors compound. Fix the tone and the structure degrades. Get the structure right and the research gets shallow. The model is spreading attention across too many constraints at once.
Prompt chaining solves this by doing what any good production system does: decomposing complex work into discrete stages, each responsible for exactly one thing.
What prompt chaining actually is
Prompt chaining is a workflow pattern where the output of one prompt becomes the input to the next. Instead of one monolithic prompt trying to do everything, you build a pipeline:
Prompt A → [output A] → Prompt B → [output B] → Prompt C → [final output]
Each prompt in the chain has a single, well-defined job. Because the model doesn't have to juggle multiple concerns at once, each step is more reliable. Errors are also easier to find and fix — if step 3 goes wrong, you fix the step 3 prompt without touching 1 and 2.
This is the foundation of what practitioners now call context engineering — deliberately managing what information flows through each stage of a model pipeline, and in what form.
Three chain patterns worth knowing
Not all chains are sequential. Here are the three structures I use regularly:
Linear chain (A→B→C)
The simplest pattern. Each step processes the output of the previous one. Good for tasks with a natural sequential structure: outline → draft → edit, or extract → transform → format.
Conditional chain (branching)
A step evaluates the previous output and routes to different downstream prompts based on what it finds. Example: a classification step determines whether a customer email is a complaint, a question, or a cancellation request — and each category routes to a different response prompt. The classifier runs once, then you branch.
Parallel chain (fan-out → merge)
Multiple prompts run independently on the same input, then a merge step combines their outputs. Good for: analyzing a document from multiple angles simultaneously (competitive position, technical feasibility, financial impact), then synthesizing into a unified assessment. Each analysis prompt is independent — they don't contaminate each other — and the merge step has focused work to do.
Worked example: writing a research report
Single-prompt attempt: "Research the current state of retail AI adoption, including market size, key players, major use cases, adoption barriers, and a 12-month outlook. Write a 1,500-word report for a B2B software audience."
What you get: a structurally correct report that's factually shallow, uses generic "according to industry analysts" hedges, and doesn't have a coherent argument. The model tried to be researcher, analyst, writer, and editor in one shot.
The chained version:
Step 1 — Outline
You are a B2B technology analyst. Create a detailed outline for a 1,500-word report on retail AI adoption for a B2B software audience.
The outline should cover: market context (size, growth rate), key use cases with adoption rates, major technology vendors, adoption barriers, and a 12-month outlook.
Format: Numbered sections with 3–5 bullet points of specific content under each section. Include suggested data points or statistics I should look up for each section — note where real numbers are needed vs. where I can use qualitative analysis.
You review the outline. Maybe you reorder sections or add one the model missed. Now you have a verified structure.
Step 2 — Section research (run once per section)
You are a B2B technology analyst. Here is an outline section from a report on retail AI adoption:
[PASTE THE SPECIFIC SECTION]
Expand this section into 300–400 words of detailed analysis. Be specific about use cases, name actual vendors where you know them with confidence, and flag any statistics you're uncertain about with [NEEDS VERIFICATION]. Do not make up numbers.
You run this for each section separately. Each section gets the model's full attention. You review each output for accuracy before passing it forward.
Step 3 — Assembly + transitions
Below are 5 independently written sections for a report on retail AI adoption. They're accurate but may feel disconnected.
[PASTE ALL SECTIONS]
Your job: rewrite the transitions between sections so the report reads as a cohesive narrative. Do not change the substance of any section. Adjust only the opening and closing sentences of each section and add bridging language where needed. Output the complete revised report.
Step 4 — Voice + final edit
Here is a research report on retail AI adoption. It's written for a B2B software audience by a technology analyst.
[PASTE ASSEMBLED REPORT]
Edit for the following:
- Remove any passive voice
- Tighten sentences over 25 words
- Cut hedging phrases ("it seems," "this may suggest," "arguably")
- Ensure the opening paragraph makes a specific, debatable claim — not a neutral overview
Output the full edited report.
Total effort: 4 prompts, each taking 30–60 seconds to run, plus your review time between steps. Output quality is substantially better than the single-prompt version, and when something's wrong you know exactly which step to fix.
Worked example: code review chain
Single-prompt attempt: "Review this function for bugs, suggest improvements, and write tests for the fixed version."
This almost always fails on at least one of those three tasks. The model finds bugs but misses edge cases, or writes tests for the original code rather than the fixes, or writes improvements that break the existing interface.
The chained version:
Step 1 — Bug identification
Review this Python function and list every bug, edge case, and potential failure mode you can find. Do not suggest fixes yet.
[CODE]
Format: numbered list. For each item: describe the problem, show the specific line(s), and explain what would go wrong at runtime.
Step 2 — Fix suggestions (referencing the bug list)
Here is a Python function with a list of identified bugs:
ORIGINAL CODE:
[CODE]
IDENTIFIED BUGS:
[PASTE OUTPUT FROM STEP 1]
Now suggest fixes for each bug. For each fix: explain the change, show the before/after code snippet, and flag any cases where the fix changes the function's interface or behavior.
Step 3 — Implement fixes
Here is the original function and the approved fixes:
ORIGINAL:
[CODE]
APPROVED FIXES:
[PASTE OUTPUT FROM STEP 2 — optionally with your edits/rejections]
Write the complete corrected function. Add inline comments only where the fix is non-obvious.
Step 4 — Write tests
Here is a corrected Python function:
[PASTE OUTPUT FROM STEP 3]
Write pytest tests for this function. Cover:
- Normal inputs (at least 3)
- Each edge case identified in this bug list: [PASTE STEP 1 OUTPUT]
- At least one test for each previously buggy behavior to confirm it's fixed
Use descriptive test function names. No mocks unless absolutely necessary.
Four prompts instead of one. Each step is better than what you'd get from a combined prompt, and the tests in step 4 actually match the fixed code rather than the original.
When NOT to chain
Prompt chaining has overhead: more prompts, more review steps, more tokens, more time. Don't use it when a single well-structured prompt works fine.
Use a single prompt when:
- The task has one clear output type (summarize this, translate that, fix this specific bug)
- The output is under ~500 words
- You're running it once, not as a repeatable workflow
- Speed matters more than perfection
Chain when:
- The task has multiple distinct phases (research + write + edit, or analyze + recommend + format)
- You've tried a single prompt and gotten consistently inconsistent output
- You're building something you'll run repeatedly and need reliability
- The output quality difference justifies the extra steps
Over-engineering is a real trap. I've seen people build 8-step chains for tasks that a single prompt with a good format spec would handle perfectly. Start with one prompt. Chain only when you've confirmed that one prompt isn't enough.
Tools that automate chaining
If you're running chains manually — copy-pasting output from one chat window to the next — that works for exploration but doesn't scale.
n8n: Open-source workflow automation with native LLM nodes. You can build multi-step AI pipelines visually, with branching logic, HTTP calls to external APIs, and data transformation between steps. Good for non-developers who need automation.
LangChain: Python/JavaScript library for chaining LLM calls programmatically. More control than n8n, requires code, but the abstractions for sequential chains, conditional routing, and parallel execution are well-designed. If you're a developer, this is the fastest way to go from prototype to production.
Raw API: For simple linear chains, calling the OpenAI or Anthropic API directly with Python is often the simplest approach. Pass the output of one completion as the user message in the next request. 30 lines of code, no framework overhead, full control.
The agents track covers how these tools fit into larger agentic architectures — prompt chaining is the foundation, and agents are what you get when you add a planning layer and tool use on top.
The mental model that makes chaining click
Think of a prompt chain like a manufacturing assembly line. Each station does one thing well. No station tries to do the job of the station before it. The quality checkpoint happens between stations, not just at the end.
The chain-of-thought lesson covers a related idea at the single-prompt level: getting a model to reason through steps internally before committing to an answer. Prompt chaining externalizes that same reasoning process — instead of hidden reasoning steps inside one prompt, you have explicit steps between prompts that you can inspect, redirect, and reuse.
Start with the research report example. Build it end-to-end once. See how differently step-by-step quality feels compared to single-prompt quality. That feeling is the thing that will change how you design AI workflows permanently.



