Why Context Engineering is the Real Skill
Prompt engineering was about crafting the right words in a single message.
Context engineering is about managing information across an entire agent run.
As models get more capable, the bottleneck in agent performance shifts from "can the model reason well?" to "does the model have the right information at the right time?" Context engineering is the discipline that answers that question.
The Context Window: An Agent's Working Memory
An LLM's context window is finite — it holds a fixed amount of text at one time. Everything the agent can "see" and reason about must fit within this window.
For a long-running agent, the context window fills up with:
- The system prompt (instructions, persona, tool descriptions)
- Conversation history (user messages, assistant responses)
- Tool call records (what was called, with what arguments)
- Tool results (what each tool returned)
- Any retrieved documents
When the window is full, older content is truncated — the agent forgets.
Good context engineering prevents this from happening gracefully.
The Five Principles of Context Engineering
1. Include What's Necessary, Exclude What's Not
Every token in context costs money and competes for the model's attention. Information that isn't relevant to the current task is noise.
Bad: Feeding the agent a 50-page policy document when it only needs section 3.2. Good: Extracting and injecting only the relevant section.
Ask for every piece of information in context: "Does the agent need this right now?"
2. Structure Information for Clarity
Raw data is harder to reason about than structured data. Use clear labels, consistent formatting, and explicit separators.
Unstructured:
the customer signed up march 5 2024 they bought the pro plan $99/month they
cancelled april 2024 due to price concerns and requested refund on april 14
Structured:
<customer_record>
<signup_date>2024-03-05</signup_date>
<plan>Pro ($99/month)</plan>
<cancellation_date>2024-04-01</cancellation_date>
<cancellation_reason>Price concerns</cancellation_reason>
<refund_requested>true</refund_requested>
<refund_request_date>2024-04-14</refund_request_date>
</customer_record>
The agent can parse and reason about the second version far more reliably.
3. Summarize and Compress History
As conversations grow, compress old turns rather than truncating them arbitrarily.
Running summary pattern: After every N turns, have a secondary model (or the agent itself) summarize the conversation so far into a compact record. Inject the summary in place of the raw history:
[Conversation summary — turns 1-20]
The user is researching electric vehicle manufacturers. We've confirmed:
- Tesla is the market leader with 20% share
- BYD is #2 globally, #1 in China
- User wants to focus on European manufacturers next
[Turns 21-25 — raw, recent history]
...
4. Retrieve Selectively (RAG)
Don't inject entire documents into context. Retrieve only the passages most relevant to the current query.
Without RAG: "Here is our 200-page policy document. Answer any questions about it."
With RAG: At query time, semantically search the policy document and retrieve only the 3-5 most relevant passages. Inject those.
This keeps context lean and ensures the model focuses on what actually matters.
5. Position Information Strategically
Models pay more attention to information at the start and end of the context window than in the middle — a well-documented phenomenon called "lost in the middle."
Place at the start:
- The system prompt and core instructions
- The user's current request
Place at the end (just before the response):
- The most recent tool results
- The most critical retrieved context
In the middle:
- Supporting information, background context, conversation history
Context Pollution: What to Watch Out For
Context pollution is when irrelevant, outdated, or misleading information in the context degrades agent performance.
Stale instructions
A system prompt that references a workflow that no longer exists. The model follows the wrong procedure.
Fix: Regularly audit and update system prompts as your system changes.
Contradictory information
A document in context says X, but a more recent tool result says not-X. The model is uncertain which to believe.
Fix: Add timestamps to all injected information. Add explicit instructions: "Prefer the most recent data if sources conflict."
Tool result noise
A tool returns a 10,000-word article when only one paragraph was relevant. The model focuses on the wrong part.
Fix: Post-process tool results before injecting them. Summarize, truncate, or extract relevant sections.
Over-large system prompts
A system prompt that tries to cover every possible scenario. The model treats all instructions as equally important.
Fix: Keep system prompts focused. Use dynamic instructions (retrieved at runtime based on the current task) rather than static catch-all prompts.
Dynamic Context: Adapting to the Task
Static context engineering — writing one system prompt and injecting the same information every time — only goes so far.
Dynamic context engineering adapts what's in context based on what the agent is doing:
Role-based injection: If the agent is doing research, inject research guidelines. If it's writing code, inject coding standards. Switch based on the current task type.
Progressive disclosure: Don't show the agent everything at once. Start with a summary; let it request details if needed.
Tool-result filtering: After a tool call, extract only the relevant portions of the result before adding them to context.
Recency weighting: Older conversation turns contribute to a summary; only recent turns stay as raw text.
Practical Context Budget
For any agent, track your approximate context budget:
| Component | Typical size |
|---|---|
| System prompt | 500–2,000 tokens |
| Tool descriptions | 200–500 tokens per tool |
| Conversation history (compressed) | 1,000–3,000 tokens |
| Retrieved documents | 1,000–5,000 tokens |
| Recent tool results | 500–2,000 tokens |
| Current user message | 50–500 tokens |
| Total | 3,000–15,000 tokens |
With a 200k context window (Claude, Gemini), you have significant headroom. With smaller models or longer runs, budget carefully.
Key Takeaways
- Context engineering is managing what information enters an agent's context window, in what form, and when
- The five principles: include what's necessary, structure clearly, compress history, retrieve selectively, position strategically
- Avoid context pollution: stale instructions, contradictory data, noisy tool results, over-large system prompts
- Dynamic context injection — adapting what's in context to the current task — outperforms static prompts for complex agents
- Context engineering is the highest-leverage skill for making agents reliable at scale
- Next lesson: multi-agent systems — how to coordinate multiple agents working together