In early 2025, Andrej Karpathy posted something that reframed how a lot of people think about working with LLMs. He called it "context engineering" — the art of constructing the right context for the model to do its work. Not "write a better prompt." Engineer the context.
The distinction sounds subtle. The implications aren't.
If you're still primarily focused on refining your instruction text while ignoring what surrounds it, you're optimizing the wrong thing. Here's why that matters, and what to do instead.
What actually changed
Two things happened more or less simultaneously, and together they shifted the bottleneck.
First, models got dramatically better at following instructions. Claude 3.5, GPT-4o, Gemini 2.0 — these models don't need elaborate phrasing tricks to understand what you want. If you say "summarize this in bullet points," they summarize it in bullet points. The prompt-crafting game of "how do I phrase this so the model understands" is mostly over for well-specified tasks.
Second, context windows exploded. Claude Sonnet handles 200K tokens. Gemini 2.0 Flash handles 1M. A million tokens is roughly 750,000 words — you can fit the entire Lord of the Rings trilogy in a single context window and still have room for your system prompt.
When the model understands your instructions and the window is enormous, the bottleneck is no longer phrasing. It's information. What does the model have access to? What does it know when it's answering? That's the question context engineering asks.
The definition difference
Prompt engineering is writing the instruction text. "Write a blog post about X." "Summarize this in 3 bullet points." "Respond formally." It's the craft of phrasing — and it still matters, it's just no longer the whole game.
Context engineering is deliberately designing everything in the context window: the system prompt, the examples, the retrieved documents, the tool outputs, the conversation history, the user data, the formatting. It's the craft of information architecture.
An analogy: prompt engineering is choosing your words carefully. Context engineering is choosing your words carefully and deciding what to say, in what order, in what format, with what supporting evidence, in front of which audience.
The prompt is one ingredient. Context engineering is the recipe.
The 4 layers of context to engineer
Think of the context window as four distinct layers. Each one is an engineering decision.
Layer 1: The instruction (the prompt itself)
Still matters — but it's now one piece, not the whole thing. "Respond only using information from the documents below" is a perfectly good instruction. It just doesn't work unless you've engineered layers 2 through 4 correctly. A great instruction on top of garbage context is still garbage output.
Layer 2: The examples (few-shot, retrieved, synthetic)
What you show the model shapes what it does. A customer support agent with three examples of excellent responses handles edge cases better than one with none — not because the model learned anything, but because the examples calibrate tone, format, and judgment in the moment.
Here's the underappreciated part: retrieved examples from a vector store are often better than hand-crafted few-shot examples. They're semantically relevant to the actual query. When someone asks about a refund for a defective product, showing the model three previous responses to similar refund queries is more useful than three generic "here's how to be helpful" examples you wrote at setup time.
Layer 3: The state (conversation history, memory summaries)
What happened earlier in the conversation — or in past sessions — is context. And it's context you control.
A naive implementation replays the full conversation history until the window fills up and starts truncating from the front. A well-engineered one maintains a running session summary, stores durable facts about the user (name, account type, preferences), and injects the relevant pieces at the start of each turn. You're not just passing through whatever happened — you're curating what the model needs to know about the past.
This is architecture. Most teams treat it as an afterthought.
Layer 4: The environment (tool outputs, retrieved docs, file contents)
Everything the agent has "done" or "seen" in this session. When a RAG system retrieves three relevant knowledge base chunks and injects them before the user's question, that's context engineering. When a coding agent reads a file before editing it, that's context engineering. When a research agent summarizes a web search result before passing it downstream, that's context engineering.
The environment layer is where agentic systems live or die. The model can only reason about what's in the window. If the environment is poorly structured — wrong chunks, wrong format, too much noise — even a brilliant model produces mediocre output.
A real example: customer support
Take two versions of the same customer support agent, running on the same model.
Version 1 (prompt-only): System prompt: "You are a helpful support agent for Acme Corp. Be friendly and accurate." That's it. No retrieval, no examples, no user data. Result: about 60% of questions get fully resolved, with regular hallucinations about policies that don't exist.
Version 2 (engineered context): Same model. System prompt is slightly more specific, but the real changes are elsewhere: the top 3 knowledge base chunks relevant to this query get retrieved and injected. The conversation summary from earlier in the session is prepended. The user's account tier and recent order history are included. Three examples of good responses to similar questions are appended before the user's message. Result: 89% resolution rate, near-zero hallucination incidents over a two-week monitoring period.
The prompt barely changed. The context did. That's the whole lesson.
5 practical changes to how you build
Once you internalize context engineering as the discipline, a few things change about how you approach AI features.
1. Your knowledge base quality matters as much as your model choice. A clean, well-chunked, regularly updated knowledge base will outperform a state-of-the-art model over a stale or noisy one. Garbage in, garbage out is not about the prompt — it's about retrieval quality.
2. Example curation is a design decision. Which few-shot examples go in the context isn't just "grab a few good ones." It's a deliberate choice about what behaviors you want to reinforce, what edge cases you want to handle, and whether you're selecting them statically or dynamically based on the query. Treat it like feature engineering.
3. Memory design is architectural. What do you keep in the conversation window? What do you summarize? What do you discard? What gets stored in a user profile for future sessions? These decisions affect every interaction. They deserve the same attention as your database schema.
4. Tool outputs need to be formatted for model readability. When a tool returns JSON, don't just inject raw JSON into the context. Structure it. Label it. Trim irrelevant fields. The model reads context like a document — dense, unmarked machine output makes it work harder for no reason.
5. Order matters. Research on attention in transformer models consistently shows recency bias — information near the end of the context gets more weight. Put the most important context (the current task, the relevant retrieved docs) close to where the model generates output. Background and system info can live earlier.
Context engineering vs RAG — clearing up the relationship
RAG (Retrieval-Augmented Generation) is one technique within context engineering. Context engineering is the broader discipline.
You can have excellent RAG plumbing — semantic search, good chunking, high-recall retrieval — and still have poor context engineering. If you're retrieving the right chunks but injecting them in a format the model finds hard to parse, putting them in the wrong position in the window, or overwhelming them with unrelated conversation history, your RAG isn't doing its job.
Context engineering is the lens through which RAG (and memory, and tool use, and few-shot examples) gets evaluated. It asks: given that we retrieved this information, are we using it effectively?
The failure modes of ignoring context engineering
When teams focus only on the prompt and ignore context design, they tend to hit the same problems.
Context overflow. You stuff too much in because the window is big. The model "forgets" the beginning. This is the lost-in-the-middle problem — models attend poorly to information in the middle of very long contexts. More isn't always better.
Noise injection. Retrieved documents that are tangentially related add confusion, not clarity. A support agent that retrieves 10 docs when 2 are relevant is working against itself. Precision matters more than recall in most context engineering decisions.
Stale context. Memory that doesn't get pruned leads to the model fixating on old information. A user who mentioned they were on a free plan two sessions ago — but has since upgraded — doesn't want that stale fact shaping today's responses.
Before you touch the prompt next time
For your next AI feature, run through this checklist about context engineering before you write a single word of the instruction:
- What does the model need to know to answer this correctly? Start with information architecture, not wording.
- Where does that information come from? Retrieval, memory, tool calls, static injection — know the source for each piece.
- How should it be formatted in the context window? Structure for model readability, not human aesthetics.
- What should be excluded? The question of what to leave out is as important as what to include.
- In what order should things appear? Put high-importance context close to the generation point.
If you can answer all five questions before touching the prompt, you're doing context engineering. The prompt will probably be shorter than you expected — and it'll work better.
For a deeper dive into the mechanics, see our advanced context engineering lesson and the agents track context engineering lesson. If you're building agents specifically, the memory and state layer is covered in detail in the agents track — start at what is an AI agent if you're newer to the space.

