What is prompt engineering?

Prompt engineering is the practice of crafting inputs to AI language models to produce accurate, useful, and reliable outputs. It involves choosing the right words, structure, context, and format to guide the AI toward the response you actually need — rather than a generic or off-target one.

Which AI models benefit most from better prompting?

All major large language models — including ChatGPT (GPT-4o), Claude, and Gemini — respond significantly to prompt quality. The same task can produce dramatically different results depending on how you structure your request. Better prompting improves output across every major model.

Do I need technical skills to do prompt engineering?

No. Prompt engineering is done in natural language — you write text instructions, not code. Basic prompting needs no technical background at all. Advanced techniques like prompt chaining or agentic workflows can benefit from light scripting knowledge, but the core skill is clear written communication.

Where can I learn more about prompt engineering?

MasterPrompting.net offers a structured curriculum from beginner to advanced, covering every major technique from basic clarity and context to chain-of-thought, meta-prompting, and agentic workflows. Start with the Beginner track to build a solid foundation.

Claude 4.6 Context Compaction: How to Run Infinite Agentic Sessions

The number one failure mode for long-running agents is hitting the context limit mid-task. Your agent is 90% through a 3-hour refactoring session, and then it dies — context full, no graceful handoff, no memory of what it already did. Claude 4.6 introduces context compaction: server-side auto-summarisation that extends conversations effectively without limit. What used to require complex context management code now works with one beta header.

What is context compaction?

Context compaction is a server-side feature: when your conversation approaches the context limit, the Claude API automatically summarises older parts of the conversation. The summary replaces the raw old messages, preserving the "gist" of what happened while freeing capacity for new turns.

The result is effectively infinite conversations for agentic workflows — the model maintains task continuity without you needing to build manual summarisation, rolling window, or session handoff logic.

Currently in beta for Opus 4.6 and Sonnet 4.6. You opt in via a beta header.

How to enable it

import anthropic

client = anthropic.Anthropic()

response = client.beta.messages.create(
    model="claude-opus-4-6",
    max_tokens=8000,
    betas=["context-compaction-2026-02-01"],
    thinking={"type": "adaptive"},
    effort="high",
    messages=conversation_history
)

That's the entire change. One extra string in betas. The compaction happens automatically on the server side — you don't need to detect when it triggers or handle it differently in your response parsing.

You can also configure a compaction threshold to trigger compaction proactively rather than waiting for the limit:

response = client.beta.messages.create(
    model="claude-opus-4-6",
    max_tokens=8000,
    betas=["context-compaction-2026-02-01"],
    thinking={"type": "adaptive"},
    system=[
        {
            "type": "text",
            "text": "You are a senior software engineer working on a long refactoring task.",
        },
        {
            "type": "text",
            "text": "COMPACTION_THRESHOLD: 0.8"  # compact when 80% of context is used
        }
    ],
    messages=conversation_history
)

What gets compacted vs preserved

Understanding this matters for designing reliable agents.

Gets compacted (summarised):

Old conversation turns from earlier in the session
Intermediate tool results that are no longer directly relevant
Previous thinking blocks (these are already stripped from context by default)
Superseded decisions and intermediate reasoning steps

Always preserved:

System prompt — never compacted
The current task goal and active instruction
Recent tool results (the last few turns stay intact)
The current reasoning chain Claude is actively building on
Explicit facts you stored in the system prompt

The implication: your system prompt is your persistent memory. If there's something the agent must remember across the entire session — a coding style guide, architectural constraints, the user's name — put it in the system prompt, not the conversation. The conversation gets summarised; the system prompt doesn't.

Practical patterns that benefit from compaction

Long code refactoring sessions

An agent reads files, refactors, runs tests, encounters errors, fixes them, iterates. This naturally generates a long conversation as tool results accumulate.

Without compaction: the agent dies around 200K tokens into a session. With compaction: it runs until the task is complete or you cancel it.

def run_refactoring_agent(codebase_path: str):
    conversation = []
    
    while True:
        response = client.beta.messages.create(
            model="claude-opus-4-6",
            max_tokens=8000,
            betas=["context-compaction-2026-02-01"],
            thinking={"type": "adaptive"},
            system=REFACTORING_SYSTEM_PROMPT,  # constraints never compacted
            tools=CODE_TOOLS,
            messages=conversation
        )
        
        # Handle tool calls, add results to conversation
        conversation = update_conversation(conversation, response)
        
        if response.stop_reason == "end_turn":
            break

Research synthesis agents

An agent searches across dozens of documents, reads each, extracts relevant passages, then synthesises findings. Old search results get summarised as newer documents are processed. The agent retains its accumulated understanding without holding every source document verbatim in context.

Multi-day project agents

An agent works in a session, you stop it, it resumes the next day. Without compaction, you'd need to manually reconstruct what the agent knew. With compaction and a summary of the previous session in the system prompt, continuity is maintained.

What to watch out for

Compaction is lossy. The summary is accurate at the semantic level but loses specific details. Don't rely on compaction to remember exact numbers, specific line numbers in files, or precise decisions made early in a session. If a number matters, store it externally.

Test near the compaction threshold. Before shipping an agent with compaction enabled, deliberately run it into the threshold in a test environment. Check whether the post-compaction behaviour is correct. Some agents change behaviour subtly after compaction because context they were implicitly relying on got summarised away.

Don't compact your context on every call. The 0.8 threshold is a reasonable default. Going lower (0.5) means more frequent compaction, more overhead, more information loss. Going to 1.0 (wait until full) means potentially hitting the limit before compaction triggers.

Manual alternative (for finer control)

If you need precise control over what's preserved, rolling window management is still available:

def clear_old_tool_results(messages: list, keep_last_n: int = 5) -> list:
    """
    Keep system context and recent messages; drop old tool results
    to prevent context overflow.
    """
    if not messages:
        return messages
    
    # Separate user/assistant turns from tool results
    tool_result_indices = [
        i for i, m in enumerate(messages)
        if isinstance(m.get("content"), list)
        and any(c.get("type") == "tool_result" for c in m["content"])
    ]
    
    # Keep only the most recent tool results
    if len(tool_result_indices) > keep_last_n:
        indices_to_remove = set(tool_result_indices[:-keep_last_n])
        messages = [m for i, m in enumerate(messages) if i not in indices_to_remove]
    
    return messages

Use manual management when: you have highly structured tool results where you know exactly what to preserve, you're building a production agent where budget predictability matters more than convenience, or you need to store specific data points in an external database as part of your memory architecture.

Use automatic compaction for most agent workflows — it's good enough and saves substantial engineering time.

Cost implications

Compaction adds a small overhead: the act of summarising itself uses tokens. The server generates the summary, which appears in your token usage. For a typical long-running agent, this overhead is 1–5% of total token usage.

The alternative cost without compaction is much higher: if you hit the context limit and restart the task from scratch, you pay for all the exploration work again plus the first attempt's failed tokens.

For Indian developers billing in ₹ via AICredits.in: a typical 2-hour code refactoring session might burn 800K tokens without compaction (all that exploration is in context). With compaction keeping the window under 200K, you're paying ~₹132 instead of ~₹528 for the same work done. The 4x cost reduction comes from not carrying old tool results indefinitely.

💡 Access Claude 4.6 API in India with UPI billing at AICredits.in — ₹100 minimum top-up, no international card required.

Next steps

Understand how multi-agent systems use context differently — multi-agent systems
Full breakdown of Claude 4.6 API changes — Claude Opus 4.6 prompting guide
Context engineering fundamentals — context engineering
Access Claude 4.6 in India — AICredits.in review

What is context compaction?

Currently in beta for Opus 4.6 and Sonnet 4.6. You opt in via a beta header.

How to enable it

import anthropic

client = anthropic.Anthropic()

response = client.beta.messages.create(
    model="claude-opus-4-6",
    max_tokens=8000,
    betas=["context-compaction-2026-02-01"],
    thinking={"type": "adaptive"},
    effort="high",
    messages=conversation_history
)

You can also configure a compaction threshold to trigger compaction proactively rather than waiting for the limit:

response = client.beta.messages.create(
    model="claude-opus-4-6",
    max_tokens=8000,
    betas=["context-compaction-2026-02-01"],
    thinking={"type": "adaptive"},
    system=[
        {
            "type": "text",
            "text": "You are a senior software engineer working on a long refactoring task.",
        },
        {
            "type": "text",
            "text": "COMPACTION_THRESHOLD: 0.8"  # compact when 80% of context is used
        }
    ],
    messages=conversation_history
)

What gets compacted vs preserved

Understanding this matters for designing reliable agents.

Gets compacted (summarised):

Old conversation turns from earlier in the session
Intermediate tool results that are no longer directly relevant
Previous thinking blocks (these are already stripped from context by default)
Superseded decisions and intermediate reasoning steps

Always preserved:

System prompt — never compacted
The current task goal and active instruction
Recent tool results (the last few turns stay intact)
The current reasoning chain Claude is actively building on
Explicit facts you stored in the system prompt

Practical patterns that benefit from compaction

Long code refactoring sessions

An agent reads files, refactors, runs tests, encounters errors, fixes them, iterates. This naturally generates a long conversation as tool results accumulate.

Without compaction: the agent dies around 200K tokens into a session. With compaction: it runs until the task is complete or you cancel it.

def run_refactoring_agent(codebase_path: str):
    conversation = []
    
    while True:
        response = client.beta.messages.create(
            model="claude-opus-4-6",
            max_tokens=8000,
            betas=["context-compaction-2026-02-01"],
            thinking={"type": "adaptive"},
            system=REFACTORING_SYSTEM_PROMPT,  # constraints never compacted
            tools=CODE_TOOLS,
            messages=conversation
        )
        
        # Handle tool calls, add results to conversation
        conversation = update_conversation(conversation, response)
        
        if response.stop_reason == "end_turn":
            break

Research synthesis agents

Multi-day project agents

What to watch out for

Manual alternative (for finer control)

If you need precise control over what's preserved, rolling window management is still available:

def clear_old_tool_results(messages: list, keep_last_n: int = 5) -> list:
    """
    Keep system context and recent messages; drop old tool results
    to prevent context overflow.
    """
    if not messages:
        return messages
    
    # Separate user/assistant turns from tool results
    tool_result_indices = [
        i for i, m in enumerate(messages)
        if isinstance(m.get("content"), list)
        and any(c.get("type") == "tool_result" for c in m["content"])
    ]
    
    # Keep only the most recent tool results
    if len(tool_result_indices) > keep_last_n:
        indices_to_remove = set(tool_result_indices[:-keep_last_n])
        messages = [m for i, m in enumerate(messages) if i not in indices_to_remove]
    
    return messages

Use automatic compaction for most agent workflows — it's good enough and saves substantial engineering time.

Cost implications

💡 Access Claude 4.6 API in India with UPI billing at AICredits.in — ₹100 minimum top-up, no international card required.

Next steps

Understand how multi-agent systems use context differently — multi-agent systems
Full breakdown of Claude 4.6 API changes — Claude Opus 4.6 prompting guide
Context engineering fundamentals — context engineering
Access Claude 4.6 in India — AICredits.in review

Claude 4.6 Context Compaction: How to Run Infinite Agentic Sessions

What is context compaction?

How to enable it

What gets compacted vs preserved

Practical patterns that benefit from compaction

Long code refactoring sessions

Research synthesis agents

Multi-day project agents

What to watch out for

Manual alternative (for finer control)

Cost implications

Next steps

Related articles

Build Your First MCP Server in Python: Connect Claude to Indian APIs (Under 100 Lines)

Agentic Payments in India: How Claude + Razorpay + UPI Changes Everything for Developers

How to Use Claude Code in India Without a Credit Card (2026 Guide)

Claude 4.6 Context Compaction: How to Run Infinite Agentic Sessions

What is context compaction?

How to enable it

What gets compacted vs preserved

Practical patterns that benefit from compaction

Long code refactoring sessions

Research synthesis agents

Multi-day project agents

What to watch out for

Manual alternative (for finer control)

Cost implications

Next steps

Related articles

Build Your First MCP Server in Python: Connect Claude to Indian APIs (Under 100 Lines)

Agentic Payments in India: How Claude + Razorpay + UPI Changes Everything for Developers

How to Use Claude Code in India Without a Credit Card (2026 Guide)