What is context engineering?

Context engineering is the practice of deliberately designing everything that goes into an LLM's context window: the system prompt, user messages, retrieved documents, tool results, conversation history, and any injected data. It's the evolution of prompt engineering from 'how do I phrase this question' to 'how do I architect the entire information environment the model reasons from'.

Why is context engineering a distinct skill from prompt engineering?

Prompt engineering focuses on single-turn instruction wording. Context engineering addresses multi-turn systems where you decide what to remember, what to retrieve, what to include in the system prompt, how to format tool outputs, and how to compress conversation history as context grows. These are architectural decisions, not wording decisions — and they often matter more.

What goes into a context window beyond the user message?

A typical production context window contains: system prompt (instructions, persona, constraints), conversation history (some or all prior turns), retrieved documents (from RAG), tool call results (from function calling), injected structured data (user profile, current state), and sometimes few-shot examples. Context engineering is about deciding exactly what to include, exclude, and how to format each piece.

How do I handle context that's longer than the model's context window?

Several strategies: (1) Conversation summarization — compress old turns into a running summary; (2) Sliding window — keep the most recent N turns; (3) Selective retrieval — store history in a database and retrieve only relevant turns; (4) Hierarchical context — short-term window for recent turns, long-term memory for key facts. The right choice depends on what information your use case actually needs across turns.

Context Engineering: The 2025 Evolution of Prompt Engineering

In 2025, "context engineering" emerged as the term for something practitioners had been doing for years: carefully architecting everything inside an LLM's context window, not just the prompt.

Prompt Engineering vs. Context Engineering

Prompt engineering asks: "How should I phrase this instruction?"

Context engineering asks: "What is the complete information environment the model should reason from?"

For simple Q&A, these are the same question. For production AI systems — agents, chatbots, RAG pipelines, multi-step workflows — context engineering is the dominant skill.

Context Window = System Prompt
               + Conversation History (partial/full/summarized)
               + Retrieved Documents (RAG)
               + Tool Results (function calling outputs)
               + Injected Structured Data (user profile, state)
               + Few-Shot Examples

Every one of these components is a design decision.

The Four Layers of Context

1. System Prompt (Stable Instructions)

The foundation. Contains:

Role and persona ("You are a helpful coding assistant")
Task instructions and constraints
Output format requirements
Hard guardrails ("Never discuss competitor pricing")
Optionally: static few-shot examples

The system prompt doesn't change between turns. Keep it focused — every token here is repeated for every message.

Context engineering decision: What level of detail belongs in the system prompt vs. injected dynamically per request?

2. Conversation History

Every prior turn in the session. The naive approach is to include all of it forever — which breaks down as conversations grow long and expensive.

Strategies:

Full history (best for short conversations) — include everything
Sliding window (simple, loses old context) — keep last N turns
Summarization (preserves key facts) — summarize old turns, keep recent ones verbatim
Selective retrieval (sophisticated) — embed all turns, retrieve only relevant ones

Context engineering decision: How many turns to keep verbatim? When to summarize? What's worth preserving?

3. Retrieved Content (RAG)

Externally retrieved documents, database records, or API results injected into the context. This is the most powerful tool for grounding an LLM in specific, up-to-date facts.

Context engineering decisions:

How many chunks to retrieve? (More context helps, but adds noise)
How to format retrieved content? (Headers, source labels, separators)
Where in the context to place retrieved documents? (After system prompt, before user message)
What metadata to include? (Source URL, date, relevance score)

4. Tool Results and Structured Data

Outputs from function calls, current user state, session variables — anything dynamically injected per-request.

[Tool: get_user_profile]
Name: Sarah Chen
Plan: Pro
Last login: 2026-02-24
Active projects: 3

[Tool: get_account_status]
Status: Active
Outstanding invoices: 0

Context engineering decision: How to format structured data for maximum model comprehension? JSON vs. key-value vs. natural language?

Common Context Architecture Patterns

Pattern 1: Simple Q&A System

System: [Short instructions]
User: [Question]
Assistant: [Answer]

No retrieval, no history — minimal context. Works for isolated queries.

Pattern 2: RAG Chatbot

System: [Instructions + grounding requirements]
[Retrieved documents — 3-5 chunks]
History: [Last 5 turns]
User: [Current message]

Balances recency (last 5 turns) with factual grounding (retrieved docs).

Pattern 3: Long-Running Agent

System: [Role + tools + instructions]
[Agent memory — key facts from prior sessions]
[Current task state]
[Recent tool results]
History: [Summary of prior conversation] + [Last 3 turns verbatim]
User: [Current input]

Agents need persistent state across sessions — context engineering handles what's worth keeping.

Pattern 4: Personalized Assistant

System: [Base instructions]
[User profile: preferences, past interactions, active projects]
[Current context: time of day, location, recent activity]
History: [Last N turns]
User: [Message]

Injects user-specific data per request to personalize responses without fine-tuning.

Context Compression Techniques

As context windows grow (Claude now supports 200K tokens), the temptation is to dump everything in. But:

Longer context = slower, more expensive responses
Models don't attend uniformly — content in the middle of long contexts is under-attended (the "lost in the middle" problem)
Noise hurts accuracy — irrelevant retrieved content degrades response quality

Compression strategies:

Technique	How	When
Conversation summarization	LLM summarizes old turns	Sessions > 10 turns
Chunk re-ranking	Score retrieved chunks, keep top-k	When retrieval is noisy
Dynamic few-shot selection	Pick examples relevant to current query	Large example banks
Schema stripping	Remove unused JSON fields	Structured data injection
Hierarchical context	Summary at top, details below	Long documents

The "Lost in the Middle" Problem

Research shows LLMs pay disproportionate attention to content at the beginning and end of long contexts — content in the middle gets under-attended.

Practical implications:

Put the most important instructions at the start of the system prompt
Put critical retrieved documents near the end of the retrieved content block (closest to the user message)
If you have many retrieved chunks, re-rank to put the most relevant ones at positions 1 and N (not the middle)

Context Engineering Checklist

Before deploying any LLM system, review:

What does the system prompt contain that could be extracted to dynamic injection?
Is conversation history being compressed at the right threshold?
Is retrieved content formatted with clear source labels and separators?
Is structured data injected in a format the model reads reliably?
Are the most critical pieces of context near the beginning or end (not buried in the middle)?
Is irrelevant content excluded to reduce noise?
What's the worst-case token count? Does the system degrade gracefully as context grows?

Key Takeaways

Context engineering is about architecting the full context window, not just writing better prompts
The four layers: system prompt, conversation history, retrieved content, injected data
Compress conversation history before it degrades performance or hits limits
Models attend less to content in the middle of long contexts — position matters
Irrelevant context hurts as much as missing context — include what's needed, exclude what isn't