What is context engineering?

Context engineering is the practice of carefully designing what information an AI agent sees at each step — what to include, what to exclude, how to format it, and when to retrieve it. It's an evolution of prompt engineering that accounts for multi-step, tool-using systems with limited context windows.

Why is context engineering different from prompt engineering?

Prompt engineering focuses on crafting a single prompt for one model call. Context engineering manages the full information environment across multiple turns, tool calls, and agent steps. It deals with questions like: how do you fit a 100-page document into a 200k token context? When do you retrieve external memory vs. use what's already in context?

What happens when an agent's context window fills up?

When context fills, the model starts to lose information from earlier in the conversation. This leads to forgotten instructions, repeated actions, and degraded performance. Good context engineering prevents this through summarization, compression, selective retrieval, and pruning outdated information.

What is the difference between context engineering and RAG?

RAG (Retrieval Augmented Generation) is one technique within context engineering — specifically, retrieving relevant external documents and injecting them into context. Context engineering is broader: it includes RAG, but also covers how you manage conversation history, tool results, system prompts, and all other information entering the context window.

Context Engineering for Agents

Why Context Engineering is the Real Skill

Prompt engineering was about crafting the right words in a single message.

Context engineering is about managing information across an entire agent run.

As models get more capable, the bottleneck in agent performance shifts from "can the model reason well?" to "does the model have the right information at the right time?" Context engineering is the discipline that answers that question.

The Context Window: An Agent's Working Memory

An LLM's context window is finite — it holds a fixed amount of text at one time. Everything the agent can "see" and reason about must fit within this window.

For a long-running agent, the context window fills up with:

The system prompt (instructions, persona, tool descriptions)
Conversation history (user messages, assistant responses)
Tool call records (what was called, with what arguments)
Tool results (what each tool returned)
Any retrieved documents

When the window is full, older content is truncated — the agent forgets.

Good context engineering prevents this from happening gracefully.

The Five Principles of Context Engineering

1. Include What's Necessary, Exclude What's Not

Every token in context costs money and competes for the model's attention. Information that isn't relevant to the current task is noise.

Bad: Feeding the agent a 50-page policy document when it only needs section 3.2. Good: Extracting and injecting only the relevant section.

Ask for every piece of information in context: "Does the agent need this right now?"

2. Structure Information for Clarity

Raw data is harder to reason about than structured data. Use clear labels, consistent formatting, and explicit separators.

Unstructured:

the customer signed up march 5 2024 they bought the pro plan $99/month they
cancelled april 2024 due to price concerns and requested refund on april 14

Structured:

<customer_record>
  <signup_date>2024-03-05</signup_date>
  <plan>Pro ($99/month)</plan>
  <cancellation_date>2024-04-01</cancellation_date>
  <cancellation_reason>Price concerns</cancellation_reason>
  <refund_requested>true</refund_requested>
  <refund_request_date>2024-04-14</refund_request_date>
</customer_record>

The agent can parse and reason about the second version far more reliably.

3. Summarize and Compress History

As conversations grow, compress old turns rather than truncating them arbitrarily.

Running summary pattern: After every N turns, have a secondary model (or the agent itself) summarize the conversation so far into a compact record. Inject the summary in place of the raw history:

[Conversation summary — turns 1-20]
The user is researching electric vehicle manufacturers. We've confirmed:
- Tesla is the market leader with 20% share
- BYD is #2 globally, #1 in China
- User wants to focus on European manufacturers next

[Turns 21-25 — raw, recent history]
...

4. Retrieve Selectively (RAG)

Don't inject entire documents into context. Retrieve only the passages most relevant to the current query.

Without RAG: "Here is our 200-page policy document. Answer any questions about it."

With RAG: At query time, semantically search the policy document and retrieve only the 3-5 most relevant passages. Inject those.

This keeps context lean and ensures the model focuses on what actually matters.

5. Position Information Strategically

Models pay more attention to information at the start and end of the context window than in the middle — a well-documented phenomenon called "lost in the middle."

Place at the start:

The system prompt and core instructions
The user's current request

Place at the end (just before the response):

The most recent tool results
The most critical retrieved context

In the middle:

Supporting information, background context, conversation history

Context Pollution: What to Watch Out For

Context pollution is when irrelevant, outdated, or misleading information in the context degrades agent performance.

Stale instructions

A system prompt that references a workflow that no longer exists. The model follows the wrong procedure.

Fix: Regularly audit and update system prompts as your system changes.

Contradictory information

A document in context says X, but a more recent tool result says not-X. The model is uncertain which to believe.

Fix: Add timestamps to all injected information. Add explicit instructions: "Prefer the most recent data if sources conflict."

Tool result noise

A tool returns a 10,000-word article when only one paragraph was relevant. The model focuses on the wrong part.

Fix: Post-process tool results before injecting them. Summarize, truncate, or extract relevant sections.

Over-large system prompts

A system prompt that tries to cover every possible scenario. The model treats all instructions as equally important.

Fix: Keep system prompts focused. Use dynamic instructions (retrieved at runtime based on the current task) rather than static catch-all prompts.

Dynamic Context: Adapting to the Task

Static context engineering — writing one system prompt and injecting the same information every time — only goes so far.

Dynamic context engineering adapts what's in context based on what the agent is doing:

Role-based injection: If the agent is doing research, inject research guidelines. If it's writing code, inject coding standards. Switch based on the current task type.

Progressive disclosure: Don't show the agent everything at once. Start with a summary; let it request details if needed.

Tool-result filtering: After a tool call, extract only the relevant portions of the result before adding them to context.

Recency weighting: Older conversation turns contribute to a summary; only recent turns stay as raw text.

Practical Context Budget

For any agent, track your approximate context budget:

Component	Typical size
System prompt	500–2,000 tokens
Tool descriptions	200–500 tokens per tool
Conversation history (compressed)	1,000–3,000 tokens
Retrieved documents	1,000–5,000 tokens
Recent tool results	500–2,000 tokens
Current user message	50–500 tokens
Total	3,000–15,000 tokens

With a 200k context window (Claude, Gemini), you have significant headroom. With smaller models or longer runs, budget carefully.

Key Takeaways

Context engineering is managing what information enters an agent's context window, in what form, and when
The five principles: include what's necessary, structure clearly, compress history, retrieve selectively, position strategically
Avoid context pollution: stale instructions, contradictory data, noisy tool results, over-large system prompts
Dynamic context injection — adapting what's in context to the current task — outperforms static prompts for complex agents
Context engineering is the highest-leverage skill for making agents reliable at scale
Next lesson: multi-agent systems — how to coordinate multiple agents working together