What is prompt engineering?

Prompt engineering is the practice of crafting inputs to AI language models to produce accurate, useful, and reliable outputs. It involves choosing the right words, structure, context, and format to guide the AI toward the response you actually need — rather than a generic or off-target one.

Which AI models benefit most from better prompting?

All major large language models — including ChatGPT (GPT-4o), Claude, and Gemini — respond significantly to prompt quality. The same task can produce dramatically different results depending on how you structure your request. Better prompting improves output across every major model.

Do I need technical skills to do prompt engineering?

No. Prompt engineering is done in natural language — you write text instructions, not code. Basic prompting needs no technical background at all. Advanced techniques like prompt chaining or agentic workflows can benefit from light scripting knowledge, but the core skill is clear written communication.

Where can I learn more about prompt engineering?

MasterPrompting.net offers a structured curriculum from beginner to advanced, covering every major technique from basic clarity and context to chain-of-thought, meta-prompting, and agentic workflows. Start with the Beginner track to build a solid foundation.

Instructor Library — The Best Way to Get Structured Outputs from Any LLM

You ask the model for JSON. It gives you JSON — but with a paragraph of explanation before the opening brace. You strip that with a regex. Next run: the keys are camelCase instead of snake_case. You add a key normalizer. Next run: age comes back as "34" instead of 34. By this point you've written more defensive parsing code than actual application logic.

This is what working with raw JSON mode looks like in practice. The model is non-deterministic. Your parser needs to handle every variation, or something downstream silently breaks.

The Instructor library solves this. It's a Python library by Jason Liu that wraps any LLM client and uses Pydantic models to define exactly what you want back. If the model returns something invalid, Instructor retries automatically — with the validation error appended to the conversation so the model can correct itself. No custom parsing, no defensive checks, no silent failures. It's become the standard approach for structured LLM outputs in 2026, with 45k+ GitHub stars.

Installing Instructor

pip install instructor

That's it. Instructor depends on Pydantic v2 and your existing LLM SDK (anthropic, openai, or google-generativeai).

India developers: AICredits lets you call the Claude and OpenAI APIs with INR billing via UPI — same API keys, no international card required. You don't need to replace anything — it wraps your existing client.

Basic usage with Claude

import anthropic
import instructor
from pydantic import BaseModel

client = instructor.from_anthropic(anthropic.Anthropic())

class UserInfo(BaseModel):
    name: str
    age: int
    email: str

user = client.messages.create(
    model="claude-sonnet-4-6",
    max_tokens=1024,
    messages=[
        {"role": "user", "content": "Extract: John Smith is 34 years old. Email: john@example.com"}
    ],
    response_model=UserInfo,
)

print(user.name)   # "John Smith"
print(user.age)    # 34 — an int, not the string "34"
print(user.email)  # "john@example.com"

user is a real UserInfo Pydantic instance. Your IDE autocompletes its fields. If you access a field that doesn't exist, Python raises an AttributeError at the call site — not somewhere deep in downstream code three function calls later.

The same interface works with OpenAI:

import openai
import instructor

client = instructor.from_openai(openai.OpenAI())
# Everything else is identical

And with Gemini:

import google.generativeai as genai
import instructor

client = instructor.from_gemini(genai.GenerativeModel("gemini-2.0-flash"))

One interface, swap the client, same Pydantic models. If you're comparing Claude vs GPT-4o on a structured extraction task, you can run the exact same code against both and diff the results.

Validation with Pydantic field validators

This is where Instructor gets powerful. You define not just the shape of the output but the rules it must follow. When the model violates a rule, Instructor tells it what went wrong and tries again.

from pydantic import BaseModel, field_validator

class ProductReview(BaseModel):
    rating: int
    sentiment: str
    summary: str

    @field_validator("rating")
    def rating_in_range(cls, v):
        if not 1 <= v <= 5:
            raise ValueError("Rating must be between 1 and 5")
        return v

    @field_validator("sentiment")
    def valid_sentiment(cls, v):
        allowed = {"positive", "negative", "neutral"}
        if v not in allowed:
            raise ValueError(f"sentiment must be one of {allowed}, got '{v}'")
        return v

review = client.messages.create(
    model="claude-sonnet-4-6",
    max_tokens=512,
    messages=[{"role": "user", "content": f"Analyze this review:\n\n{review_text}"}],
    response_model=ProductReview,
    max_retries=3,
)

If the model returns sentiment: "mixed", the validator raises a ValueError. Instructor catches it, appends the error message to the conversation ("sentiment must be one of {'positive', 'negative', 'neutral'}, got 'mixed'"), and calls the model again. By the third retry, models almost always correct themselves. The default is max_retries=3, which I've found sufficient for everything except genuinely ambiguous edge cases.

This retry loop is what distinguishes Instructor from just using response_format={"type": "json_object"}. Raw JSON mode gives you valid JSON. Instructor gives you valid JSON that satisfies your business logic.

Extracting nested structures

Real-world extraction tasks are rarely flat. Documents have line items, reports have sections, conversations have turns. Instructor handles nested Pydantic models exactly as you'd expect:

from pydantic import BaseModel

class LineItem(BaseModel):
    description: str
    quantity: int
    unit_price: float

class Invoice(BaseModel):
    vendor: str
    total: float
    currency: str
    line_items: list[LineItem]

invoice = client.messages.create(
    model="claude-sonnet-4-6",
    max_tokens=2048,
    messages=[
        {"role": "user", "content": f"Extract all invoice data from this document:\n\n{pdf_text}"}
    ],
    response_model=Invoice,
)

for item in invoice.line_items:
    print(f"{item.description}: {item.quantity} × ${item.unit_price:.2f}")
print(f"Total: {invoice.currency} {invoice.total}")

The model figures out the nested structure from the Pydantic model definition. You don't need to explain the JSON schema in your prompt — Instructor passes the schema automatically.

For complex nested structures, I'll often add a @model_validator that checks cross-field constraints:

from pydantic import BaseModel, model_validator

class Invoice(BaseModel):
    line_items: list[LineItem]
    total: float

    @model_validator(mode="after")
    def total_matches_line_items(self):
        calculated = sum(item.quantity * item.unit_price for item in self.line_items)
        if abs(calculated - self.total) > 0.01:
            raise ValueError(
                f"Total {self.total} doesn't match sum of line items {calculated:.2f}"
            )
        return self

If the extraction is internally inconsistent, the model gets a chance to fix it.

Streaming partial objects

For UI applications where you want to show results as they arrive, Instructor supports streaming partial objects. The model starts generating, and you get a partial instance updated token by token:

from instructor import Partial

for partial_user in client.messages.create_partial(
    model="claude-sonnet-4-6",
    max_tokens=1024,
    messages=[{"role": "user", "content": "Describe the user: Alice is a 28-year-old designer from Berlin."}],
    response_model=Partial[UserInfo],
):
    if partial_user.name:
        print(f"Name so far: {partial_user.name}")

Fields on the partial object become available as soon as the model has generated enough tokens to populate them. This is useful for displaying progress in a UI without waiting for the full response to complete.

Extracting multiple objects from one response

When you need to extract a list of entities from a single document — all people mentioned, all events, all products — use Iterable:

from typing import Iterable

users = client.messages.create_iterable(
    model="claude-sonnet-4-6",
    max_tokens=4096,
    messages=[
        {"role": "user", "content": f"List every person mentioned in this article:\n\n{article}"}
    ],
    response_model=Iterable[UserInfo],
)

for user in users:
    print(user.name, user.age)

Instructor streams each object in the list and yields it as soon as it's complete, so you don't have to wait for the full list before processing starts.

The cost of retries

Instructor adds zero extra LLM calls when validation passes. The retry overhead only appears on failures.

Each retry is a fresh call with the full conversation plus the error message appended, so it costs roughly the same as the original call. At max_retries=3, a validation failure could cost up to 4x the normal call price. In practice, well-defined Pydantic models with clear field descriptions fail maybe 1-3% of the time on a capable model like Claude Sonnet 4.6, so the average overhead is small.

The practical advice: don't add validators that the model can't reliably satisfy. If you're validating that an extracted date falls within the last 5 years, make sure the source document actually contains dates in that range. Tight validators on noisy inputs lead to expensive retry loops.

Adding clear descriptions to your Pydantic fields also reduces failure rates significantly:

from pydantic import BaseModel, Field

class ProductReview(BaseModel):
    rating: int = Field(..., description="Star rating from 1 (worst) to 5 (best)")
    sentiment: str = Field(..., description="Must be exactly 'positive', 'negative', or 'neutral'")
    summary: str = Field(..., description="One sentence summary of the review, max 30 words")

Instructor passes field descriptions as part of the schema prompt. The model that sees a clear description fails validation far less often than one working from a bare field name.

Instructor vs raw JSON mode vs function calling

Approach	Type safety	Validation	Auto-retry	Multi-provider
Raw JSON mode	No	No	No	Varies
Function calling	Partial	No	No	Partial
Instructor	Full (Pydantic)	Yes	Yes	Yes

Raw JSON mode (response_format={"type": "json_object"}) guarantees parseable JSON but nothing else. No type coercion, no field validation, no retry. You're responsible for every check.

Function calling (tool use) structures outputs around a tool schema, which gets you closer to type safety. But validation and retry logic are still your problem, and the API differs between providers. Instructor wraps function calling internally on providers that support it — you write Pydantic, it handles the tool schema translation.

For deeper context on when to use each approach, see the post on structured outputs from AI APIs.

When not to use Instructor

Forcing structure onto outputs that are inherently unstructured hurts quality. Creative writing, open-ended Q&A, explanatory text — these don't have a schema. If you define a Pydantic model for "a helpful essay response," you're not adding structure, you're adding friction.

Instructor is the right tool when the output has a defined shape that you need to reliably process downstream: extraction, classification, transformation, entity recognition. If you're going to print() the output directly to a user, you probably don't need it.

Also: if your application needs to surface the model's reasoning process rather than hide it, Instructor's default retry behavior makes that harder — the model is silently retrying until it gets it right. You can inspect retry history via response._raw_response attributes, but it adds complexity.

A real extraction pipeline

Here's what a production invoice extraction script looks like using Instructor with Claude Sonnet 4.6:

import anthropic
import instructor
from pydantic import BaseModel, Field, field_validator, model_validator
from typing import Optional
import re

client = instructor.from_anthropic(anthropic.Anthropic())

class LineItem(BaseModel):
    description: str = Field(..., description="Item description as it appears on the invoice")
    quantity: int = Field(..., ge=1, description="Number of units")
    unit_price: float = Field(..., gt=0, description="Price per unit in the invoice currency")

class Invoice(BaseModel):
    vendor: str
    invoice_number: str
    currency: str = Field(..., description="3-letter ISO currency code, e.g. USD, EUR, GBP")
    total: float
    line_items: list[LineItem]
    notes: Optional[str] = None

    @field_validator("currency")
    def valid_currency(cls, v):
        if not re.match(r'^[A-Z]{3}$', v):
            raise ValueError(f"currency must be a 3-letter ISO code, got '{v}'")
        return v

    @model_validator(mode="after")
    def total_within_tolerance(self):
        calculated = sum(i.quantity * i.unit_price for i in self.line_items)
        if self.line_items and abs(calculated - self.total) > (self.total * 0.05):
            raise ValueError(
                f"Total {self.total} differs from line item sum {calculated:.2f} by more than 5%"
            )
        return self

def extract_invoice(text: str) -> Invoice:
    return client.messages.create(
        model="claude-sonnet-4-6",
        max_tokens=2048,
        messages=[{"role": "user", "content": f"Extract all invoice data:\n\n{text}"}],
        response_model=Invoice,
        max_retries=3,
    )

This runs in production. It fails cleanly when extraction is genuinely ambiguous (after 3 retries, Instructor raises the last validation error), and it's caught in the caller. No silent bad data in the database.

For the broader context on how structured extraction fits into agentic pipelines, check out building a data analyst agent with Claude and the function calling lesson in the agents track.

Getting started

If you have any Python code that calls an LLM and parses the response, you can probably replace the parsing logic with Instructor in under an hour. The migration is:

Define a Pydantic model for what you want back
Wrap your existing client with instructor.from_anthropic() or instructor.from_openai()
Add response_model=YourModel to the create() call
Delete your parsing code

The result is less code, full type safety, and automatic recovery from the formatting errors that currently require defensive checks. For most structured extraction tasks, that's a worthwhile trade.

This is what working with raw JSON mode looks like in practice. The model is non-deterministic. Your parser needs to handle every variation, or something downstream silently breaks.

Installing Instructor

pip install instructor

That's it. Instructor depends on Pydantic v2 and your existing LLM SDK (anthropic, openai, or google-generativeai).

India developers: AICredits lets you call the Claude and OpenAI APIs with INR billing via UPI — same API keys, no international card required. You don't need to replace anything — it wraps your existing client.

Basic usage with Claude

import anthropic
import instructor
from pydantic import BaseModel

client = instructor.from_anthropic(anthropic.Anthropic())

class UserInfo(BaseModel):
    name: str
    age: int
    email: str

user = client.messages.create(
    model="claude-sonnet-4-6",
    max_tokens=1024,
    messages=[
        {"role": "user", "content": "Extract: John Smith is 34 years old. Email: john@example.com"}
    ],
    response_model=UserInfo,
)

print(user.name)   # "John Smith"
print(user.age)    # 34 — an int, not the string "34"
print(user.email)  # "john@example.com"

The same interface works with OpenAI:

import openai
import instructor

client = instructor.from_openai(openai.OpenAI())
# Everything else is identical

And with Gemini:

import google.generativeai as genai
import instructor

client = instructor.from_gemini(genai.GenerativeModel("gemini-2.0-flash"))

One interface, swap the client, same Pydantic models. If you're comparing Claude vs GPT-4o on a structured extraction task, you can run the exact same code against both and diff the results.

Validation with Pydantic field validators

This is where Instructor gets powerful. You define not just the shape of the output but the rules it must follow. When the model violates a rule, Instructor tells it what went wrong and tries again.

from pydantic import BaseModel, field_validator

class ProductReview(BaseModel):
    rating: int
    sentiment: str
    summary: str

    @field_validator("rating")
    def rating_in_range(cls, v):
        if not 1 <= v <= 5:
            raise ValueError("Rating must be between 1 and 5")
        return v

    @field_validator("sentiment")
    def valid_sentiment(cls, v):
        allowed = {"positive", "negative", "neutral"}
        if v not in allowed:
            raise ValueError(f"sentiment must be one of {allowed}, got '{v}'")
        return v

review = client.messages.create(
    model="claude-sonnet-4-6",
    max_tokens=512,
    messages=[{"role": "user", "content": f"Analyze this review:\n\n{review_text}"}],
    response_model=ProductReview,
    max_retries=3,
)

Extracting nested structures

Real-world extraction tasks are rarely flat. Documents have line items, reports have sections, conversations have turns. Instructor handles nested Pydantic models exactly as you'd expect:

from pydantic import BaseModel

class LineItem(BaseModel):
    description: str
    quantity: int
    unit_price: float

class Invoice(BaseModel):
    vendor: str
    total: float
    currency: str
    line_items: list[LineItem]

invoice = client.messages.create(
    model="claude-sonnet-4-6",
    max_tokens=2048,
    messages=[
        {"role": "user", "content": f"Extract all invoice data from this document:\n\n{pdf_text}"}
    ],
    response_model=Invoice,
)

for item in invoice.line_items:
    print(f"{item.description}: {item.quantity} × ${item.unit_price:.2f}")
print(f"Total: {invoice.currency} {invoice.total}")

The model figures out the nested structure from the Pydantic model definition. You don't need to explain the JSON schema in your prompt — Instructor passes the schema automatically.

For complex nested structures, I'll often add a @model_validator that checks cross-field constraints:

from pydantic import BaseModel, model_validator

class Invoice(BaseModel):
    line_items: list[LineItem]
    total: float

    @model_validator(mode="after")
    def total_matches_line_items(self):
        calculated = sum(item.quantity * item.unit_price for item in self.line_items)
        if abs(calculated - self.total) > 0.01:
            raise ValueError(
                f"Total {self.total} doesn't match sum of line items {calculated:.2f}"
            )
        return self

If the extraction is internally inconsistent, the model gets a chance to fix it.

Streaming partial objects

For UI applications where you want to show results as they arrive, Instructor supports streaming partial objects. The model starts generating, and you get a partial instance updated token by token:

from instructor import Partial

for partial_user in client.messages.create_partial(
    model="claude-sonnet-4-6",
    max_tokens=1024,
    messages=[{"role": "user", "content": "Describe the user: Alice is a 28-year-old designer from Berlin."}],
    response_model=Partial[UserInfo],
):
    if partial_user.name:
        print(f"Name so far: {partial_user.name}")

Extracting multiple objects from one response

When you need to extract a list of entities from a single document — all people mentioned, all events, all products — use Iterable:

from typing import Iterable

users = client.messages.create_iterable(
    model="claude-sonnet-4-6",
    max_tokens=4096,
    messages=[
        {"role": "user", "content": f"List every person mentioned in this article:\n\n{article}"}
    ],
    response_model=Iterable[UserInfo],
)

for user in users:
    print(user.name, user.age)

Instructor streams each object in the list and yields it as soon as it's complete, so you don't have to wait for the full list before processing starts.

The cost of retries

Instructor adds zero extra LLM calls when validation passes. The retry overhead only appears on failures.

Adding clear descriptions to your Pydantic fields also reduces failure rates significantly:

from pydantic import BaseModel, Field

class ProductReview(BaseModel):
    rating: int = Field(..., description="Star rating from 1 (worst) to 5 (best)")
    sentiment: str = Field(..., description="Must be exactly 'positive', 'negative', or 'neutral'")
    summary: str = Field(..., description="One sentence summary of the review, max 30 words")

Instructor passes field descriptions as part of the schema prompt. The model that sees a clear description fails validation far less often than one working from a bare field name.

Instructor vs raw JSON mode vs function calling

Approach	Type safety	Validation	Auto-retry	Multi-provider
Raw JSON mode	No	No	No	Varies
Function calling	Partial	No	No	Partial
Instructor	Full (Pydantic)	Yes	Yes	Yes

Raw JSON mode (response_format={"type": "json_object"}) guarantees parseable JSON but nothing else. No type coercion, no field validation, no retry. You're responsible for every check.

For deeper context on when to use each approach, see the post on structured outputs from AI APIs.

When not to use Instructor

A real extraction pipeline

Here's what a production invoice extraction script looks like using Instructor with Claude Sonnet 4.6:

import anthropic
import instructor
from pydantic import BaseModel, Field, field_validator, model_validator
from typing import Optional
import re

client = instructor.from_anthropic(anthropic.Anthropic())

class LineItem(BaseModel):
    description: str = Field(..., description="Item description as it appears on the invoice")
    quantity: int = Field(..., ge=1, description="Number of units")
    unit_price: float = Field(..., gt=0, description="Price per unit in the invoice currency")

class Invoice(BaseModel):
    vendor: str
    invoice_number: str
    currency: str = Field(..., description="3-letter ISO currency code, e.g. USD, EUR, GBP")
    total: float
    line_items: list[LineItem]
    notes: Optional[str] = None

    @field_validator("currency")
    def valid_currency(cls, v):
        if not re.match(r'^[A-Z]{3}$', v):
            raise ValueError(f"currency must be a 3-letter ISO code, got '{v}'")
        return v

    @model_validator(mode="after")
    def total_within_tolerance(self):
        calculated = sum(i.quantity * i.unit_price for i in self.line_items)
        if self.line_items and abs(calculated - self.total) > (self.total * 0.05):
            raise ValueError(
                f"Total {self.total} differs from line item sum {calculated:.2f} by more than 5%"
            )
        return self

def extract_invoice(text: str) -> Invoice:
    return client.messages.create(
        model="claude-sonnet-4-6",
        max_tokens=2048,
        messages=[{"role": "user", "content": f"Extract all invoice data:\n\n{text}"}],
        response_model=Invoice,
        max_retries=3,
    )

For the broader context on how structured extraction fits into agentic pipelines, check out building a data analyst agent with Claude and the function calling lesson in the agents track.

Getting started

If you have any Python code that calls an LLM and parses the response, you can probably replace the parsing logic with Instructor in under an hour. The migration is:

Define a Pydantic model for what you want back
Wrap your existing client with instructor.from_anthropic() or instructor.from_openai()
Add response_model=YourModel to the create() call
Delete your parsing code

The result is less code, full type safety, and automatic recovery from the formatting errors that currently require defensive checks. For most structured extraction tasks, that's a worthwhile trade.

Instructor Library — The Best Way to Get Structured Outputs from Any LLM

Installing Instructor

Basic usage with Claude

Validation with Pydantic field validators

Extracting nested structures

Streaming partial objects

Extracting multiple objects from one response

The cost of retries

Instructor vs raw JSON mode vs function calling

When not to use Instructor

A real extraction pipeline

Getting started

Related articles

Async Python for LLM Apps — Patterns That Actually Work in Production

50 Best AI Prompts for Claude That Actually Work (2026)

Build a Vector Store for RAG — FAISS vs Chroma vs Pinecone (With Code)

Instructor Library — The Best Way to Get Structured Outputs from Any LLM

Installing Instructor

Basic usage with Claude

Validation with Pydantic field validators

Extracting nested structures

Streaming partial objects

Extracting multiple objects from one response

The cost of retries

Instructor vs raw JSON mode vs function calling

When not to use Instructor

A real extraction pipeline

Getting started

Related articles

Async Python for LLM Apps — Patterns That Actually Work in Production

50 Best AI Prompts for Claude That Actually Work (2026)

Build a Vector Store for RAG — FAISS vs Chroma vs Pinecone (With Code)