What is prompt engineering?

Prompt engineering is the practice of crafting inputs to AI language models to produce accurate, useful, and reliable outputs. It involves choosing the right words, structure, context, and format to guide the AI toward the response you actually need — rather than a generic or off-target one.

Which AI models benefit most from better prompting?

All major large language models — including ChatGPT (GPT-4o), Claude, and Gemini — respond significantly to prompt quality. The same task can produce dramatically different results depending on how you structure your request. Better prompting improves output across every major model.

Do I need technical skills to do prompt engineering?

No. Prompt engineering is done in natural language — you write text instructions, not code. Basic prompting needs no technical background at all. Advanced techniques like prompt chaining or agentic workflows can benefit from light scripting knowledge, but the core skill is clear written communication.

Where can I learn more about prompt engineering?

MasterPrompting.net offers a structured curriculum from beginner to advanced, covering every major technique from basic clarity and context to chain-of-thought, meta-prompting, and agentic workflows. Start with the Beginner track to build a solid foundation.

The AI agent production checklist — 20 things to do before you ship

I've shipped six production AI agents. Every item on this list comes from something that went wrong.

The gap between "it works in my notebook" and "it works for real users at 2am when I'm asleep" is not about model quality. It's about all the things you didn't build around the model. This checklist covers them.

Twenty items, five categories. Each one takes 30 minutes to an hour to implement. Skip one and you'll spend a weekend on-call debugging it instead.

Reliability

1. Retry logic on every LLM call

LLM APIs return 429s, 500s, and occasional timeouts. Without retries, a single transient error fails your user's request.

import anthropic
from tenacity import retry, stop_after_attempt, wait_exponential

@retry(stop=stop_after_attempt(3), wait=wait_exponential(multiplier=1, min=2, max=10))
def call_claude(messages: list, system: str) -> str:
    client = anthropic.Anthropic()
    response = client.messages.create(
        model="claude-sonnet-4-6",
        max_tokens=1000,
        system=system,
        messages=messages,
    )
    return response.content[0].text

What happens if you skip it: one API blip, one angry user, one lost session. At scale: 1–3% of requests fail permanently on the first error.

2. Timeout handling

A hung LLM call will hang your entire agent. Set timeouts at every layer.

import httpx

# anthropic SDK accepts an httpx client with timeout config
http_client = httpx.Client(timeout=httpx.Timeout(30.0, connect=5.0))
client = anthropic.Anthropic(http_client=http_client)

30 seconds is a reasonable max for a single call. If your agent regularly needs more, the task is probably too big for one call.

What happens if you skip it: one slow response from the API, and your user's request hangs indefinitely. In serverless environments, this eats your function timeout budget.

3. Graceful degradation

When Claude is down (it happens, rarely but it does), what does your user see? "Something went wrong" is better than a spinner that never resolves. A cached response from the last successful run is even better.

def get_agent_response(query: str) -> str:
    try:
        return call_claude_with_retries(query)
    except Exception:
        # Log the error, return a fallback
        logger.error("LLM call failed after retries", exc_info=True)
        return "I'm having trouble processing that right now. Please try again in a moment."

What happens if you skip it: silent failures, blank UI states, confused users who don't know if the agent is thinking or broken.

4. Circuit breaker

If Claude is returning errors, stop hammering it. A circuit breaker pauses calls for a period after repeated failures, then lets a test request through.

from circuitbreaker import circuit

@circuit(failure_threshold=5, recovery_timeout=60)
def call_claude(messages):
    ...

What happens if you skip it: when there's a partial outage, your agent keeps making failing calls, burning retries, consuming your rate limit budget, and delaying recovery.

Safety

5. max_iterations cap

Every agent that can use tools needs a hard cap on how many tool calls it makes per session. Without it, a confused agent loops forever.

MAX_ITERATIONS = 15
iterations = 0

while True:
    iterations += 1
    if iterations > MAX_ITERATIONS:
        return {"error": "Agent reached iteration limit. Please rephrase your request."}
    
    response = call_claude(messages)
    if response.stop_reason != "tool_use":
        break
    # handle tool call...

What happens if you skip it: a bad prompt or unexpected tool result can send your agent into a loop that burns tokens and money until the process is killed.

6. Input sanitization

Don't pass raw user input directly into your agent's system prompt or tool calls. Validate and strip at the boundary.

import re

def sanitize_input(text: str, max_length: int = 2000) -> str:
    # Truncate
    text = text[:max_length]
    # Strip null bytes and control characters (keep newlines/tabs)
    text = re.sub(r'[\x00-\x08\x0b\x0c\x0e-\x1f\x7f]', '', text)
    return text.strip()

This doesn't prevent prompt injection entirely — see the prompt injection defense post for a fuller treatment — but it handles the obvious cases.

What happens if you skip it: users can inject content that overrides your system prompt, causes unexpected behavior, or extracts information they shouldn't have access to.

7. Output validation

Validate the agent's output before acting on it. If your agent is supposed to return JSON, parse it and reject malformed responses.

from pydantic import BaseModel, ValidationError

class AgentOutput(BaseModel):
    action: str
    parameters: dict

def parse_agent_output(raw: str) -> AgentOutput | None:
    try:
        return AgentOutput.model_validate_json(raw)
    except (ValidationError, ValueError):
        logger.warning("Agent returned invalid output", extra={"raw": raw[:500]})
        return None

What happens if you skip it: the agent returns "I'm not sure how to help with that" in a field your code tries to parse as JSON. Or it returns valid JSON with an action field you didn't expect. Both can crash downstream code.

8. Tool allowlist per role

Not every user should have access to every tool. An admin tool that deletes records should not be available to a user-facing agent.

USER_TOOLS = ["search_knowledge_base", "get_order_status", "create_support_ticket"]
ADMIN_TOOLS = USER_TOOLS + ["update_order", "refund_payment", "delete_record"]

def get_tools_for_role(role: str) -> list:
    return ADMIN_TOOLS if role == "admin" else USER_TOOLS

What happens if you skip it: a user who types the right prompt can trigger admin operations. This has happened in production agents.

Observability

9. Log every tool call

Every tool call should emit a structured log: what was called, what was passed, what was returned, how long it took.

import time, logging

def logged_tool_call(tool_name: str, inputs: dict, fn):
    start = time.time()
    try:
        result = fn(inputs)
        logger.info("tool_call", extra={
            "tool": tool_name,
            "inputs": inputs,
            "latency_ms": int((time.time() - start) * 1000),
            "success": True,
        })
        return result
    except Exception as e:
        logger.error("tool_call_failed", extra={
            "tool": tool_name,
            "inputs": inputs,
            "error": str(e),
            "latency_ms": int((time.time() - start) * 1000),
        })
        raise

What happens if you skip it: a user reports the agent "doing something weird" and you have no idea what it called or what it got back.

10. Distributed tracing

Logs tell you what happened. Traces tell you why — the full chain of LLM calls and tool calls that produced an output. LangSmith, Braintrust, and OpenTelemetry are all good options.

See the agent observability guide for setup details. Pick one and integrate it before you go live — retrofitting observability after a production incident is painful.

What happens if you skip it: you get a bug report for a complex multi-step agent failure and have no way to replay what happened.

11. Error rate and latency alerts

Set two alerts on day one: error rate > 5%, and p95 latency > 10 seconds. Both indicate something is wrong that needs human attention.

What happens if you skip it: the error rate climbs for 48 hours before someone notices. By then you have a week of bad user experiences to explain.

12. Review 50 production conversations before calling it stable

This is the one developers skip most. Before you declare an agent production-ready, manually read 50 real conversations. You'll find edge cases your evals didn't cover. Always.

What happens if you skip it: you ship confident in your eval suite, then discover that 8% of users ask a question phrased in a way your evals never tested and the agent handles it badly.

Cost

13. Estimated cost per run, documented

Know what a single agent run costs before you ship. If you don't know, you can't budget, you can't alert, and you can't explain the bill to your CFO.

# Rough estimate: input tokens × rate + output tokens × rate
# Claude Sonnet 4.6: $3/M input, $15/M output
def estimate_cost(input_tokens: int, output_tokens: int) -> float:
    return (input_tokens / 1_000_000 * 3) + (output_tokens / 1_000_000 * 15)

Log this on every run. Set a budget per run and flag when individual runs exceed it by 2×.

14. Daily spend alert

Set a spend alert at 1.5× your expected daily budget. Unexpected cost spikes are almost always a bug — an agent looping, a prompt that generates 10× the expected tokens, a retry storm.

What happens if you skip it: you wake up to a $400 API bill from an agent that ran in a loop overnight.

15. Context window limit

Conversations that grow without bound eventually hit the context window — and then either fail or generate very expensive calls. Summarize or truncate history after a fixed number of turns.

def trim_conversation(messages: list, max_turns: int = 20) -> list:
    if len(messages) <= max_turns * 2:
        return messages
    # Keep system context; summarize older turns
    return messages[-(max_turns * 2):]

16. Smaller model for cheap steps

Not every step in your agent needs Sonnet. Classification, routing, and extraction can run on Haiku at 10× lower cost.

def classify_intent(message: str) -> str:
    # Haiku for cheap classification
    response = anthropic.Anthropic().messages.create(
        model="claude-haiku-4-5-20251001",
        max_tokens=20,
        messages=[{"role": "user", "content": f"Classify as support/sales/other: {message}"}],
    )
    return response.content[0].text.strip().lower()

User experience

17. Loading state — always

Never show a blank screen while the agent thinks. Even a simple "Thinking..." indicator is better than silence.

What happens if you skip it: users click the submit button 3 more times, creating 3 parallel agent runs, and you have a debugging nightmare.

18. Errors that make sense to users

anthropic.APIStatusError: 529 Overloaded means nothing to a user. Map it to something useful:

FRIENDLY_ERRORS = {
    "529": "We're experiencing high demand right now. Please try again in a moment.",
    "timeout": "That request took too long. Try breaking it into a smaller question.",
    "iteration_limit": "I couldn't complete that in one go. Try a more specific request.",
}

19. Clear escalation path

Users need to know when they've hit the limits of the agent and what to do next. "Contact support at support@yourco.com" is fine. Silence is not.

Every agent response that ends in failure or uncertainty should offer a next step.

20. Document what the agent can and can't do

Put it somewhere users will actually see it — the first message, a sidebar, an onboarding modal. Be specific about the scope.

"I can help with order status, returns, and product questions. For billing disputes and account security, contact our support team directly."

What happens if you skip it: users ask the agent to do things it can't do, get frustrated when it fails, and blame the product instead of the scope mismatch.

The full list is also in the agent evaluation post as part of a broader framework for measuring agent quality. If you want a script that auto-checks some of these programmatically — logging config, environment variable presence, context window size — that makes a good addition to your CI pipeline.

Ship with all 20 checked. You'll thank yourself when the first 2am alert fires and it's not actually a disaster.

I've shipped six production AI agents. Every item on this list comes from something that went wrong.

Twenty items, five categories. Each one takes 30 minutes to an hour to implement. Skip one and you'll spend a weekend on-call debugging it instead.

Reliability

1. Retry logic on every LLM call

LLM APIs return 429s, 500s, and occasional timeouts. Without retries, a single transient error fails your user's request.

import anthropic
from tenacity import retry, stop_after_attempt, wait_exponential

@retry(stop=stop_after_attempt(3), wait=wait_exponential(multiplier=1, min=2, max=10))
def call_claude(messages: list, system: str) -> str:
    client = anthropic.Anthropic()
    response = client.messages.create(
        model="claude-sonnet-4-6",
        max_tokens=1000,
        system=system,
        messages=messages,
    )
    return response.content[0].text

What happens if you skip it: one API blip, one angry user, one lost session. At scale: 1–3% of requests fail permanently on the first error.

2. Timeout handling

A hung LLM call will hang your entire agent. Set timeouts at every layer.

import httpx

# anthropic SDK accepts an httpx client with timeout config
http_client = httpx.Client(timeout=httpx.Timeout(30.0, connect=5.0))
client = anthropic.Anthropic(http_client=http_client)

30 seconds is a reasonable max for a single call. If your agent regularly needs more, the task is probably too big for one call.

What happens if you skip it: one slow response from the API, and your user's request hangs indefinitely. In serverless environments, this eats your function timeout budget.

3. Graceful degradation

def get_agent_response(query: str) -> str:
    try:
        return call_claude_with_retries(query)
    except Exception:
        # Log the error, return a fallback
        logger.error("LLM call failed after retries", exc_info=True)
        return "I'm having trouble processing that right now. Please try again in a moment."

What happens if you skip it: silent failures, blank UI states, confused users who don't know if the agent is thinking or broken.

4. Circuit breaker

If Claude is returning errors, stop hammering it. A circuit breaker pauses calls for a period after repeated failures, then lets a test request through.

from circuitbreaker import circuit

@circuit(failure_threshold=5, recovery_timeout=60)
def call_claude(messages):
    ...

What happens if you skip it: when there's a partial outage, your agent keeps making failing calls, burning retries, consuming your rate limit budget, and delaying recovery.

Safety

5. max_iterations cap

Every agent that can use tools needs a hard cap on how many tool calls it makes per session. Without it, a confused agent loops forever.

MAX_ITERATIONS = 15
iterations = 0

while True:
    iterations += 1
    if iterations > MAX_ITERATIONS:
        return {"error": "Agent reached iteration limit. Please rephrase your request."}
    
    response = call_claude(messages)
    if response.stop_reason != "tool_use":
        break
    # handle tool call...

What happens if you skip it: a bad prompt or unexpected tool result can send your agent into a loop that burns tokens and money until the process is killed.

6. Input sanitization

Don't pass raw user input directly into your agent's system prompt or tool calls. Validate and strip at the boundary.

import re

def sanitize_input(text: str, max_length: int = 2000) -> str:
    # Truncate
    text = text[:max_length]
    # Strip null bytes and control characters (keep newlines/tabs)
    text = re.sub(r'[\x00-\x08\x0b\x0c\x0e-\x1f\x7f]', '', text)
    return text.strip()

This doesn't prevent prompt injection entirely — see the prompt injection defense post for a fuller treatment — but it handles the obvious cases.

What happens if you skip it: users can inject content that overrides your system prompt, causes unexpected behavior, or extracts information they shouldn't have access to.

7. Output validation

Validate the agent's output before acting on it. If your agent is supposed to return JSON, parse it and reject malformed responses.

from pydantic import BaseModel, ValidationError

class AgentOutput(BaseModel):
    action: str
    parameters: dict

def parse_agent_output(raw: str) -> AgentOutput | None:
    try:
        return AgentOutput.model_validate_json(raw)
    except (ValidationError, ValueError):
        logger.warning("Agent returned invalid output", extra={"raw": raw[:500]})
        return None

8. Tool allowlist per role

Not every user should have access to every tool. An admin tool that deletes records should not be available to a user-facing agent.

USER_TOOLS = ["search_knowledge_base", "get_order_status", "create_support_ticket"]
ADMIN_TOOLS = USER_TOOLS + ["update_order", "refund_payment", "delete_record"]

def get_tools_for_role(role: str) -> list:
    return ADMIN_TOOLS if role == "admin" else USER_TOOLS

What happens if you skip it: a user who types the right prompt can trigger admin operations. This has happened in production agents.

Observability

9. Log every tool call

Every tool call should emit a structured log: what was called, what was passed, what was returned, how long it took.

import time, logging

def logged_tool_call(tool_name: str, inputs: dict, fn):
    start = time.time()
    try:
        result = fn(inputs)
        logger.info("tool_call", extra={
            "tool": tool_name,
            "inputs": inputs,
            "latency_ms": int((time.time() - start) * 1000),
            "success": True,
        })
        return result
    except Exception as e:
        logger.error("tool_call_failed", extra={
            "tool": tool_name,
            "inputs": inputs,
            "error": str(e),
            "latency_ms": int((time.time() - start) * 1000),
        })
        raise

What happens if you skip it: a user reports the agent "doing something weird" and you have no idea what it called or what it got back.

10. Distributed tracing

Logs tell you what happened. Traces tell you why — the full chain of LLM calls and tool calls that produced an output. LangSmith, Braintrust, and OpenTelemetry are all good options.

See the agent observability guide for setup details. Pick one and integrate it before you go live — retrofitting observability after a production incident is painful.

What happens if you skip it: you get a bug report for a complex multi-step agent failure and have no way to replay what happened.

11. Error rate and latency alerts

Set two alerts on day one: error rate > 5%, and p95 latency > 10 seconds. Both indicate something is wrong that needs human attention.

What happens if you skip it: the error rate climbs for 48 hours before someone notices. By then you have a week of bad user experiences to explain.

12. Review 50 production conversations before calling it stable

This is the one developers skip most. Before you declare an agent production-ready, manually read 50 real conversations. You'll find edge cases your evals didn't cover. Always.

What happens if you skip it: you ship confident in your eval suite, then discover that 8% of users ask a question phrased in a way your evals never tested and the agent handles it badly.

Cost

13. Estimated cost per run, documented

Know what a single agent run costs before you ship. If you don't know, you can't budget, you can't alert, and you can't explain the bill to your CFO.

# Rough estimate: input tokens × rate + output tokens × rate
# Claude Sonnet 4.6: $3/M input, $15/M output
def estimate_cost(input_tokens: int, output_tokens: int) -> float:
    return (input_tokens / 1_000_000 * 3) + (output_tokens / 1_000_000 * 15)

Log this on every run. Set a budget per run and flag when individual runs exceed it by 2×.

14. Daily spend alert

Set a spend alert at 1.5× your expected daily budget. Unexpected cost spikes are almost always a bug — an agent looping, a prompt that generates 10× the expected tokens, a retry storm.

What happens if you skip it: you wake up to a $400 API bill from an agent that ran in a loop overnight.

15. Context window limit

Conversations that grow without bound eventually hit the context window — and then either fail or generate very expensive calls. Summarize or truncate history after a fixed number of turns.

def trim_conversation(messages: list, max_turns: int = 20) -> list:
    if len(messages) <= max_turns * 2:
        return messages
    # Keep system context; summarize older turns
    return messages[-(max_turns * 2):]

16. Smaller model for cheap steps

Not every step in your agent needs Sonnet. Classification, routing, and extraction can run on Haiku at 10× lower cost.

def classify_intent(message: str) -> str:
    # Haiku for cheap classification
    response = anthropic.Anthropic().messages.create(
        model="claude-haiku-4-5-20251001",
        max_tokens=20,
        messages=[{"role": "user", "content": f"Classify as support/sales/other: {message}"}],
    )
    return response.content[0].text.strip().lower()

User experience

17. Loading state — always

Never show a blank screen while the agent thinks. Even a simple "Thinking..." indicator is better than silence.

What happens if you skip it: users click the submit button 3 more times, creating 3 parallel agent runs, and you have a debugging nightmare.

18. Errors that make sense to users

anthropic.APIStatusError: 529 Overloaded means nothing to a user. Map it to something useful:

FRIENDLY_ERRORS = {
    "529": "We're experiencing high demand right now. Please try again in a moment.",
    "timeout": "That request took too long. Try breaking it into a smaller question.",
    "iteration_limit": "I couldn't complete that in one go. Try a more specific request.",
}

19. Clear escalation path

Users need to know when they've hit the limits of the agent and what to do next. "Contact support at support@yourco.com" is fine. Silence is not.

Every agent response that ends in failure or uncertainty should offer a next step.

20. Document what the agent can and can't do

Put it somewhere users will actually see it — the first message, a sidebar, an onboarding modal. Be specific about the scope.

"I can help with order status, returns, and product questions. For billing disputes and account security, contact our support team directly."

What happens if you skip it: users ask the agent to do things it can't do, get frustrated when it fails, and blame the product instead of the scope mismatch.

Ship with all 20 checked. You'll thank yourself when the first 2am alert fires and it's not actually a disaster.

Reliability

1. Retry logic on every LLM call

2. Timeout handling

3. Graceful degradation

4. Circuit breaker

Safety

5. max_iterations cap

6. Input sanitization

7. Output validation

8. Tool allowlist per role

Observability

9. Log every tool call

10. Distributed tracing

11. Error rate and latency alerts

12. Review 50 production conversations before calling it stable

Cost

13. Estimated cost per run, documented

14. Daily spend alert

15. Context window limit

16. Smaller model for cheap steps

User experience

17. Loading state — always

18. Errors that make sense to users

19. Clear escalation path

20. Document what the agent can and can't do

Related articles

A/B Testing Prompts in Production — A Statistical Guide

Async Python for LLM Apps — Patterns That Actually Work in Production

FastAPI + Claude API — Production Patterns for AI Backends

Reliability

1. Retry logic on every LLM call

2. Timeout handling

3. Graceful degradation

4. Circuit breaker

Safety

5. max_iterations cap

6. Input sanitization

7. Output validation

8. Tool allowlist per role

Observability

9. Log every tool call

10. Distributed tracing

11. Error rate and latency alerts

12. Review 50 production conversations before calling it stable

Cost

13. Estimated cost per run, documented

14. Daily spend alert

15. Context window limit

16. Smaller model for cheap steps

User experience

17. Loading state — always

18. Errors that make sense to users

19. Clear escalation path

20. Document what the agent can and can't do

Related articles

A/B Testing Prompts in Production — A Statistical Guide

Async Python for LLM Apps — Patterns That Actually Work in Production

FastAPI + Claude API — Production Patterns for AI Backends