What is prompt engineering?

Prompt engineering is the practice of crafting inputs to AI language models to produce accurate, useful, and reliable outputs. It involves choosing the right words, structure, context, and format to guide the AI toward the response you actually need — rather than a generic or off-target one.

Which AI models benefit most from better prompting?

All major large language models — including ChatGPT (GPT-4o), Claude, and Gemini — respond significantly to prompt quality. The same task can produce dramatically different results depending on how you structure your request. Better prompting improves output across every major model.

Do I need technical skills to do prompt engineering?

No. Prompt engineering is done in natural language — you write text instructions, not code. Basic prompting needs no technical background at all. Advanced techniques like prompt chaining or agentic workflows can benefit from light scripting knowledge, but the core skill is clear written communication.

Where can I learn more about prompt engineering?

MasterPrompting.net offers a structured curriculum from beginner to advanced, covering every major technique from basic clarity and context to chain-of-thought, meta-prompting, and agentic workflows. Start with the Beginner track to build a solid foundation.

Pydantic AI — the agent framework Python developers actually want

I've shipped two production agents with LangChain. I debugged both of them with print statements because there was no other way. The abstractions swallowed the errors, the runtime types were lies, and the docs were three versions behind.

Pydantic AI is what I switched to. It's built by Samuel Colvin — the person who built Pydantic v2 — and the philosophy shows. If you know Python, you can understand every line of a Pydantic AI agent. No magic, no hidden chains, no runtime surprises.

This post covers everything you need to go from zero to a production-ready agent: typed outputs, tools, dependency injection, multi-turn conversations, streaming, and observability.

Why Pydantic AI exists

LangChain and LangGraph are powerful. They're also genuinely frustrating to work with for any Python developer who has opinions about type safety.

The core problem: LLM responses are strings. Everything in a LangChain chain is Any. Your IDE can't help you. Your tests are hard to write because there's nothing concrete to assert against. Runtime errors happen deep in abstraction layers.

Pydantic AI's answer: make the output a Pydantic model. When you tell the agent result_type=MyModel, the framework guarantees you get back a validated, typed instance. The IDE knows what .issues and .suggestions are. The test asserts against real attributes.

pip install pydantic-ai anthropic

The core primitives

Pydantic AI has four concepts. That's it.

Agent — the main orchestrator. Holds the model, system prompt, tools, and result_type.

Tool — a Python function the model can call. Decorated with @agent.tool or @agent.tool_plain. Has full type annotations.

RunContext — dependency injection container. Passed into every tool. Holds your DB connections, HTTP clients, config — anything a tool needs.

ModelRetry — raise this from a tool to tell the model its input was wrong and it should try again with corrected parameters.

Build a code reviewer agent

Here's a complete, runnable example. The agent reviews code and returns a structured result with typed fields.

from pydantic import BaseModel, Field
from pydantic_ai import Agent, RunContext
import anthropic

class ReviewResult(BaseModel):
    issues: list[str] = Field(description="Bugs, security issues, or correctness problems")
    suggestions: list[str] = Field(description="Style, performance, or readability improvements")
    score: int = Field(ge=1, le=10, description="Overall code quality score")
    safe_to_merge: bool = Field(description="Whether the code is safe to merge as-is")

agent = Agent(
    "claude-sonnet-4-6",
    result_type=ReviewResult,
    system_prompt=(
        "You are a senior engineer reviewing Python code. "
        "Be specific about issues — include line numbers or variable names where relevant. "
        "Score 1-10 where 7+ means the code is production-ready."
    ),
)

result = await agent.run(
    "def get_user(id): return db.query(f'SELECT * FROM users WHERE id={id}')"
)

# Fully typed — IDE autocomplete works here
print(result.data.issues)        # ['SQL injection via f-string interpolation']
print(result.data.score)         # 2
print(result.data.safe_to_merge) # False

The result.data is a validated ReviewResult instance. If the model returns something that doesn't validate, Pydantic AI retries automatically (up to a configurable limit).

Tools with dependency injection

The dependency injection system is the feature that makes testing actually possible. Instead of accessing a database through a global or a closure, you pass it through RunContext.

from dataclasses import dataclass
from pydantic_ai import Agent, RunContext
import httpx

@dataclass
class ReviewDeps:
    http_client: httpx.AsyncClient
    github_token: str

agent = Agent(
    "claude-sonnet-4-6",
    result_type=ReviewResult,
    deps_type=ReviewDeps,
    system_prompt="You are a code reviewer with access to GitHub PR diffs.",
)

@agent.tool
async def fetch_pr_diff(ctx: RunContext[ReviewDeps], pr_url: str) -> str:
    """Fetch the diff for a GitHub pull request URL."""
    # Extract owner/repo/number from URL
    parts = pr_url.rstrip("/").split("/")
    owner, repo, pr_number = parts[-4], parts[-3], parts[-1]
    
    response = await ctx.deps.http_client.get(
        f"https://api.github.com/repos/{owner}/{repo}/pulls/{pr_number}/files",
        headers={"Authorization": f"Bearer {ctx.deps.github_token}"},
    )
    files = response.json()
    return "\n".join(f["patch"] for f in files if f.get("patch"))

# Running it
async def review_pr(pr_url: str):
    async with httpx.AsyncClient() as client:
        deps = ReviewDeps(
            http_client=client,
            github_token=os.environ["GITHUB_TOKEN"],
        )
        result = await agent.run(
            f"Review this PR: {pr_url}",
            deps=deps,
        )
    return result.data

In tests, you pass a mock client as http_client. No monkeypatching, no unittest.mock.patch gymnastics. The test is just:

async def test_review_pr():
    mock_client = MockHttpClient(response=FAKE_DIFF)
    deps = ReviewDeps(http_client=mock_client, github_token="test")
    result = await agent.run("Review this PR: ...", deps=deps)
    assert result.data.score >= 1
    assert isinstance(result.data.issues, list)

ModelRetry: tell the model it got something wrong

When a tool call receives invalid input, raise ModelRetry. The model sees the error message and tries again with corrected parameters.

from pydantic_ai import ModelRetry

@agent.tool
async def search_codebase(ctx: RunContext[ReviewDeps], query: str) -> str:
    """Search the codebase for relevant files matching a query."""
    if len(query) < 3:
        raise ModelRetry("Query too short — provide at least 3 characters for meaningful search")
    
    results = await ctx.deps.search_index.search(query)
    if not results:
        raise ModelRetry(f"No results for '{query}' — try a broader term or a different keyword")
    
    return "\n".join(r.path for r in results[:10])

This is better than returning an empty list or an error string because the model actively recovers instead of silently moving on.

Multi-turn conversations

agent.run() processes a single message. For multi-turn conversations, pass message_history from the previous result:

# First turn
result1 = await agent.run("Review this function: def foo(x): return x * 2")
print(result1.data)

# Second turn — model remembers the first
result2 = await agent.run(
    "What if I add type annotations?",
    message_history=result1.new_messages(),
)
print(result2.data)

result.new_messages() returns the messages from that specific run. result.all_messages() returns everything including the history you passed in. For a chat interface, keep accumulating all_messages() across turns.

Streaming

For real-time token delivery to a UI:

async with agent.run_stream("Review this code: ...") as response:
    async for text in response.stream():
        print(text, end="", flush=True)  # tokens as they arrive

# After the stream, the validated result is available
final = await response.get_data()
print(final.score)

Streaming works with result_type — the framework buffers the full JSON response, then validates it once the stream ends. During streaming you get raw tokens; after the stream you get the typed result.

Sync vs async

Every method has a sync equivalent. agent.run_sync() blocks until completion:

# For scripts, CLIs, or tests that don't use asyncio
result = agent.run_sync("Review this: ...")
print(result.data.issues)

Use run_sync in scripts and tests. Use run (async) in FastAPI endpoints, web apps, or anywhere else already running an event loop.

Model switching

The model is a string argument. Swapping models is one line:

# Claude
agent = Agent("claude-sonnet-4-6", result_type=ReviewResult, ...)

# GPT-4o
agent = Agent("openai:gpt-4o", result_type=ReviewResult, ...)

# Gemini
agent = Agent("google-gla:gemini-2.5-pro", result_type=ReviewResult, ...)

# Local with Ollama
agent = Agent("ollama:llama3.2", result_type=ReviewResult, ...)

The tool interface, dependency injection, and streaming API are identical across all models. The framework handles the model-specific API formats internally.

Observability with Logfire

Pydantic AI integrates directly with Logfire (Pydantic's observability platform). One line of setup gives you full traces:

import logfire

logfire.configure()
logfire.instrument_pydantic_ai()

# Now every agent.run() call emits a trace with:
# - model used and tokens consumed
# - each tool call with inputs and outputs
# - validation results
# - total latency

For teams not using Logfire, Pydantic AI also emits OpenTelemetry spans, so any OTel-compatible backend (Datadog, Honeycomb, Jaeger) works.

Pydantic AI vs the alternatives

	Pydantic AI	LangChain	LangGraph	Raw SDK
Type safety	Full (result_type)	Minimal	Minimal	None
Testability	Easy (DI)	Hard	Hard	Easy
Learning curve	Low	High	Medium	Low
Streaming	Built-in	Complex	Complex	Built-in
Multi-agent	Basic	Rich	Rich	Manual
Observability	Logfire/OTel	LangSmith	LangSmith	Manual
When to choose	Single agents, type-safety	Large ecosystem needed	Complex graphs	Full control

LangGraph wins for genuinely complex multi-agent graphs with conditional routing and persistent state. If you're building something like that, use it. For everything else — a single agent with tools, a structured output pipeline, a chat assistant — Pydantic AI is faster to build and easier to maintain.

The agent components lesson covers the conceptual framework that applies to any of these libraries.

A complete production example

Here's a support ticket classifier that pulls from a customer database, classifies the ticket, and returns structured routing instructions:

from dataclasses import dataclass
from pydantic import BaseModel, Field
from pydantic_ai import Agent, RunContext
import asyncpg

class TicketClassification(BaseModel):
    category: str = Field(description="One of: billing, technical, account, refund, other")
    priority: int = Field(ge=1, le=5, description="1=low, 5=critical")
    suggested_team: str = Field(description="Team to route to: support, billing, engineering, vip")
    summary: str = Field(description="One sentence summary of the issue")
    needs_human: bool = Field(description="Whether this requires a human agent")

@dataclass
class ClassifyDeps:
    db: asyncpg.Connection

classifier = Agent(
    "claude-haiku-4-5-20251001",  # Haiku — cheap for classification
    result_type=TicketClassification,
    deps_type=ClassifyDeps,
    system_prompt=(
        "Classify incoming support tickets. Be consistent — the same type of request "
        "should always get the same category and priority."
    ),
)

@classifier.tool
async def get_customer_tier(ctx: RunContext[ClassifyDeps], email: str) -> str:
    """Look up a customer's subscription tier to inform priority."""
    row = await ctx.deps.db.fetchrow(
        "SELECT tier FROM customers WHERE email = $1", email
    )
    return row["tier"] if row else "unknown"

async def classify_ticket(email: str, message: str, db: asyncpg.Connection):
    result = await classifier.run(
        f"Customer: {email}\nMessage: {message}",
        deps=ClassifyDeps(db=db),
    )
    return result.data

This runs in a FastAPI endpoint, processes tickets at ~200ms each (Haiku), and the typed output feeds directly into your ticketing system without string parsing.

What to watch out for

Don't over-type: if you just want a string or a simple bool back, you don't need a Pydantic model. result_type=str works fine. Start simple.

Token costs for retries: validation failures trigger retries, which cost tokens. If you're seeing many retries, your result_type schema might be too strict or your system prompt needs to explain the expected format better.

The result_type is not the full response: result.data is the structured result. result.all_messages() is the full conversation. For structured output tasks, you usually just need result.data.

Pydantic AI is still maturing: the multi-agent primitives are basic compared to LangGraph. Complex agent graphs with dynamic routing and persistent checkpoints are better handled by LangGraph or the OpenAI Agents SDK for now. Pydantic AI's sweet spot is single-agent systems where type safety and testability matter.

The function calling lesson covers the underlying mechanics that all these frameworks are built on top of.

This post covers everything you need to go from zero to a production-ready agent: typed outputs, tools, dependency injection, multi-turn conversations, streaming, and observability.

Why Pydantic AI exists

LangChain and LangGraph are powerful. They're also genuinely frustrating to work with for any Python developer who has opinions about type safety.

pip install pydantic-ai anthropic

The core primitives

Pydantic AI has four concepts. That's it.

Agent — the main orchestrator. Holds the model, system prompt, tools, and result_type.

Tool — a Python function the model can call. Decorated with @agent.tool or @agent.tool_plain. Has full type annotations.

RunContext — dependency injection container. Passed into every tool. Holds your DB connections, HTTP clients, config — anything a tool needs.

ModelRetry — raise this from a tool to tell the model its input was wrong and it should try again with corrected parameters.

Build a code reviewer agent

Here's a complete, runnable example. The agent reviews code and returns a structured result with typed fields.

from pydantic import BaseModel, Field
from pydantic_ai import Agent, RunContext
import anthropic

class ReviewResult(BaseModel):
    issues: list[str] = Field(description="Bugs, security issues, or correctness problems")
    suggestions: list[str] = Field(description="Style, performance, or readability improvements")
    score: int = Field(ge=1, le=10, description="Overall code quality score")
    safe_to_merge: bool = Field(description="Whether the code is safe to merge as-is")

agent = Agent(
    "claude-sonnet-4-6",
    result_type=ReviewResult,
    system_prompt=(
        "You are a senior engineer reviewing Python code. "
        "Be specific about issues — include line numbers or variable names where relevant. "
        "Score 1-10 where 7+ means the code is production-ready."
    ),
)

result = await agent.run(
    "def get_user(id): return db.query(f'SELECT * FROM users WHERE id={id}')"
)

# Fully typed — IDE autocomplete works here
print(result.data.issues)        # ['SQL injection via f-string interpolation']
print(result.data.score)         # 2
print(result.data.safe_to_merge) # False

The result.data is a validated ReviewResult instance. If the model returns something that doesn't validate, Pydantic AI retries automatically (up to a configurable limit).

Tools with dependency injection

The dependency injection system is the feature that makes testing actually possible. Instead of accessing a database through a global or a closure, you pass it through RunContext.

from dataclasses import dataclass
from pydantic_ai import Agent, RunContext
import httpx

@dataclass
class ReviewDeps:
    http_client: httpx.AsyncClient
    github_token: str

agent = Agent(
    "claude-sonnet-4-6",
    result_type=ReviewResult,
    deps_type=ReviewDeps,
    system_prompt="You are a code reviewer with access to GitHub PR diffs.",
)

@agent.tool
async def fetch_pr_diff(ctx: RunContext[ReviewDeps], pr_url: str) -> str:
    """Fetch the diff for a GitHub pull request URL."""
    # Extract owner/repo/number from URL
    parts = pr_url.rstrip("/").split("/")
    owner, repo, pr_number = parts[-4], parts[-3], parts[-1]
    
    response = await ctx.deps.http_client.get(
        f"https://api.github.com/repos/{owner}/{repo}/pulls/{pr_number}/files",
        headers={"Authorization": f"Bearer {ctx.deps.github_token}"},
    )
    files = response.json()
    return "\n".join(f["patch"] for f in files if f.get("patch"))

# Running it
async def review_pr(pr_url: str):
    async with httpx.AsyncClient() as client:
        deps = ReviewDeps(
            http_client=client,
            github_token=os.environ["GITHUB_TOKEN"],
        )
        result = await agent.run(
            f"Review this PR: {pr_url}",
            deps=deps,
        )
    return result.data

In tests, you pass a mock client as http_client. No monkeypatching, no unittest.mock.patch gymnastics. The test is just:

async def test_review_pr():
    mock_client = MockHttpClient(response=FAKE_DIFF)
    deps = ReviewDeps(http_client=mock_client, github_token="test")
    result = await agent.run("Review this PR: ...", deps=deps)
    assert result.data.score >= 1
    assert isinstance(result.data.issues, list)

ModelRetry: tell the model it got something wrong

When a tool call receives invalid input, raise ModelRetry. The model sees the error message and tries again with corrected parameters.

from pydantic_ai import ModelRetry

@agent.tool
async def search_codebase(ctx: RunContext[ReviewDeps], query: str) -> str:
    """Search the codebase for relevant files matching a query."""
    if len(query) < 3:
        raise ModelRetry("Query too short — provide at least 3 characters for meaningful search")
    
    results = await ctx.deps.search_index.search(query)
    if not results:
        raise ModelRetry(f"No results for '{query}' — try a broader term or a different keyword")
    
    return "\n".join(r.path for r in results[:10])

This is better than returning an empty list or an error string because the model actively recovers instead of silently moving on.

Multi-turn conversations

agent.run() processes a single message. For multi-turn conversations, pass message_history from the previous result:

# First turn
result1 = await agent.run("Review this function: def foo(x): return x * 2")
print(result1.data)

# Second turn — model remembers the first
result2 = await agent.run(
    "What if I add type annotations?",
    message_history=result1.new_messages(),
)
print(result2.data)

Streaming

For real-time token delivery to a UI:

async with agent.run_stream("Review this code: ...") as response:
    async for text in response.stream():
        print(text, end="", flush=True)  # tokens as they arrive

# After the stream, the validated result is available
final = await response.get_data()
print(final.score)

Sync vs async

Every method has a sync equivalent. agent.run_sync() blocks until completion:

# For scripts, CLIs, or tests that don't use asyncio
result = agent.run_sync("Review this: ...")
print(result.data.issues)

Use run_sync in scripts and tests. Use run (async) in FastAPI endpoints, web apps, or anywhere else already running an event loop.

Model switching

The model is a string argument. Swapping models is one line:

# Claude
agent = Agent("claude-sonnet-4-6", result_type=ReviewResult, ...)

# GPT-4o
agent = Agent("openai:gpt-4o", result_type=ReviewResult, ...)

# Gemini
agent = Agent("google-gla:gemini-2.5-pro", result_type=ReviewResult, ...)

# Local with Ollama
agent = Agent("ollama:llama3.2", result_type=ReviewResult, ...)

The tool interface, dependency injection, and streaming API are identical across all models. The framework handles the model-specific API formats internally.

Observability with Logfire

Pydantic AI integrates directly with Logfire (Pydantic's observability platform). One line of setup gives you full traces:

import logfire

logfire.configure()
logfire.instrument_pydantic_ai()

# Now every agent.run() call emits a trace with:
# - model used and tokens consumed
# - each tool call with inputs and outputs
# - validation results
# - total latency

For teams not using Logfire, Pydantic AI also emits OpenTelemetry spans, so any OTel-compatible backend (Datadog, Honeycomb, Jaeger) works.

Pydantic AI vs the alternatives

	Pydantic AI	LangChain	LangGraph	Raw SDK
Type safety	Full (result_type)	Minimal	Minimal	None
Testability	Easy (DI)	Hard	Hard	Easy
Learning curve	Low	High	Medium	Low
Streaming	Built-in	Complex	Complex	Built-in
Multi-agent	Basic	Rich	Rich	Manual
Observability	Logfire/OTel	LangSmith	LangSmith	Manual
When to choose	Single agents, type-safety	Large ecosystem needed	Complex graphs	Full control

The agent components lesson covers the conceptual framework that applies to any of these libraries.

A complete production example

Here's a support ticket classifier that pulls from a customer database, classifies the ticket, and returns structured routing instructions:

from dataclasses import dataclass
from pydantic import BaseModel, Field
from pydantic_ai import Agent, RunContext
import asyncpg

class TicketClassification(BaseModel):
    category: str = Field(description="One of: billing, technical, account, refund, other")
    priority: int = Field(ge=1, le=5, description="1=low, 5=critical")
    suggested_team: str = Field(description="Team to route to: support, billing, engineering, vip")
    summary: str = Field(description="One sentence summary of the issue")
    needs_human: bool = Field(description="Whether this requires a human agent")

@dataclass
class ClassifyDeps:
    db: asyncpg.Connection

classifier = Agent(
    "claude-haiku-4-5-20251001",  # Haiku — cheap for classification
    result_type=TicketClassification,
    deps_type=ClassifyDeps,
    system_prompt=(
        "Classify incoming support tickets. Be consistent — the same type of request "
        "should always get the same category and priority."
    ),
)

@classifier.tool
async def get_customer_tier(ctx: RunContext[ClassifyDeps], email: str) -> str:
    """Look up a customer's subscription tier to inform priority."""
    row = await ctx.deps.db.fetchrow(
        "SELECT tier FROM customers WHERE email = $1", email
    )
    return row["tier"] if row else "unknown"

async def classify_ticket(email: str, message: str, db: asyncpg.Connection):
    result = await classifier.run(
        f"Customer: {email}\nMessage: {message}",
        deps=ClassifyDeps(db=db),
    )
    return result.data

This runs in a FastAPI endpoint, processes tickets at ~200ms each (Haiku), and the typed output feeds directly into your ticketing system without string parsing.

What to watch out for

Don't over-type: if you just want a string or a simple bool back, you don't need a Pydantic model. result_type=str works fine. Start simple.

The function calling lesson covers the underlying mechanics that all these frameworks are built on top of.

Pydantic AI — the agent framework Python developers actually want

Why Pydantic AI exists

The core primitives

Build a code reviewer agent

Tools with dependency injection

ModelRetry: tell the model it got something wrong

Multi-turn conversations

Streaming

Sync vs async

Model switching

Observability with Logfire

Pydantic AI vs the alternatives

A complete production example

What to watch out for

Related articles

Async Python for LLM Apps — Patterns That Actually Work in Production

50 Best AI Prompts for Claude That Actually Work (2026)

Build a Vector Store for RAG — FAISS vs Chroma vs Pinecone (With Code)

Pydantic AI — the agent framework Python developers actually want

Why Pydantic AI exists

The core primitives

Build a code reviewer agent

Tools with dependency injection

ModelRetry: tell the model it got something wrong

Multi-turn conversations

Streaming

Sync vs async

Model switching

Observability with Logfire

Pydantic AI vs the alternatives

A complete production example

What to watch out for

Related articles

Async Python for LLM Apps — Patterns That Actually Work in Production

50 Best AI Prompts for Claude That Actually Work (2026)

Build a Vector Store for RAG — FAISS vs Chroma vs Pinecone (With Code)