You ask the model for JSON. It gives you JSON — but with a paragraph of explanation before the opening brace. You strip that with a regex. Next run: the keys are camelCase instead of snake_case. You add a key normalizer. Next run: age comes back as "34" instead of 34. By this point you've written more defensive parsing code than actual application logic.
This is what working with raw JSON mode looks like in practice. The model is non-deterministic. Your parser needs to handle every variation, or something downstream silently breaks.
The Instructor library solves this. It's a Python library by Jason Liu that wraps any LLM client and uses Pydantic models to define exactly what you want back. If the model returns something invalid, Instructor retries automatically — with the validation error appended to the conversation so the model can correct itself. No custom parsing, no defensive checks, no silent failures. It's become the standard approach for structured LLM outputs in 2026, with 45k+ GitHub stars.
Installing Instructor
pip install instructor
That's it. Instructor depends on Pydantic v2 and your existing LLM SDK (anthropic, openai, or google-generativeai).
India developers: AICredits lets you call the Claude and OpenAI APIs with INR billing via UPI — same API keys, no international card required. You don't need to replace anything — it wraps your existing client.
Basic usage with Claude
import anthropic
import instructor
from pydantic import BaseModel
client = instructor.from_anthropic(anthropic.Anthropic())
class UserInfo(BaseModel):
name: str
age: int
email: str
user = client.messages.create(
model="claude-sonnet-4-6",
max_tokens=1024,
messages=[
{"role": "user", "content": "Extract: John Smith is 34 years old. Email: john@example.com"}
],
response_model=UserInfo,
)
print(user.name) # "John Smith"
print(user.age) # 34 — an int, not the string "34"
print(user.email) # "john@example.com"
user is a real UserInfo Pydantic instance. Your IDE autocompletes its fields. If you access a field that doesn't exist, Python raises an AttributeError at the call site — not somewhere deep in downstream code three function calls later.
The same interface works with OpenAI:
import openai
import instructor
client = instructor.from_openai(openai.OpenAI())
# Everything else is identical
And with Gemini:
import google.generativeai as genai
import instructor
client = instructor.from_gemini(genai.GenerativeModel("gemini-2.0-flash"))
One interface, swap the client, same Pydantic models. If you're comparing Claude vs GPT-4o on a structured extraction task, you can run the exact same code against both and diff the results.
Validation with Pydantic field validators
This is where Instructor gets powerful. You define not just the shape of the output but the rules it must follow. When the model violates a rule, Instructor tells it what went wrong and tries again.
from pydantic import BaseModel, field_validator
class ProductReview(BaseModel):
rating: int
sentiment: str
summary: str
@field_validator("rating")
def rating_in_range(cls, v):
if not 1 <= v <= 5:
raise ValueError("Rating must be between 1 and 5")
return v
@field_validator("sentiment")
def valid_sentiment(cls, v):
allowed = {"positive", "negative", "neutral"}
if v not in allowed:
raise ValueError(f"sentiment must be one of {allowed}, got '{v}'")
return v
review = client.messages.create(
model="claude-sonnet-4-6",
max_tokens=512,
messages=[{"role": "user", "content": f"Analyze this review:\n\n{review_text}"}],
response_model=ProductReview,
max_retries=3,
)
If the model returns sentiment: "mixed", the validator raises a ValueError. Instructor catches it, appends the error message to the conversation ("sentiment must be one of {'positive', 'negative', 'neutral'}, got 'mixed'"), and calls the model again. By the third retry, models almost always correct themselves. The default is max_retries=3, which I've found sufficient for everything except genuinely ambiguous edge cases.
This retry loop is what distinguishes Instructor from just using response_format={"type": "json_object"}. Raw JSON mode gives you valid JSON. Instructor gives you valid JSON that satisfies your business logic.
Extracting nested structures
Real-world extraction tasks are rarely flat. Documents have line items, reports have sections, conversations have turns. Instructor handles nested Pydantic models exactly as you'd expect:
from pydantic import BaseModel
class LineItem(BaseModel):
description: str
quantity: int
unit_price: float
class Invoice(BaseModel):
vendor: str
total: float
currency: str
line_items: list[LineItem]
invoice = client.messages.create(
model="claude-sonnet-4-6",
max_tokens=2048,
messages=[
{"role": "user", "content": f"Extract all invoice data from this document:\n\n{pdf_text}"}
],
response_model=Invoice,
)
for item in invoice.line_items:
print(f"{item.description}: {item.quantity} × ${item.unit_price:.2f}")
print(f"Total: {invoice.currency} {invoice.total}")
The model figures out the nested structure from the Pydantic model definition. You don't need to explain the JSON schema in your prompt — Instructor passes the schema automatically.
For complex nested structures, I'll often add a @model_validator that checks cross-field constraints:
from pydantic import BaseModel, model_validator
class Invoice(BaseModel):
line_items: list[LineItem]
total: float
@model_validator(mode="after")
def total_matches_line_items(self):
calculated = sum(item.quantity * item.unit_price for item in self.line_items)
if abs(calculated - self.total) > 0.01:
raise ValueError(
f"Total {self.total} doesn't match sum of line items {calculated:.2f}"
)
return self
If the extraction is internally inconsistent, the model gets a chance to fix it.
Streaming partial objects
For UI applications where you want to show results as they arrive, Instructor supports streaming partial objects. The model starts generating, and you get a partial instance updated token by token:
from instructor import Partial
for partial_user in client.messages.create_partial(
model="claude-sonnet-4-6",
max_tokens=1024,
messages=[{"role": "user", "content": "Describe the user: Alice is a 28-year-old designer from Berlin."}],
response_model=Partial[UserInfo],
):
if partial_user.name:
print(f"Name so far: {partial_user.name}")
Fields on the partial object become available as soon as the model has generated enough tokens to populate them. This is useful for displaying progress in a UI without waiting for the full response to complete.
Extracting multiple objects from one response
When you need to extract a list of entities from a single document — all people mentioned, all events, all products — use Iterable:
from typing import Iterable
users = client.messages.create_iterable(
model="claude-sonnet-4-6",
max_tokens=4096,
messages=[
{"role": "user", "content": f"List every person mentioned in this article:\n\n{article}"}
],
response_model=Iterable[UserInfo],
)
for user in users:
print(user.name, user.age)
Instructor streams each object in the list and yields it as soon as it's complete, so you don't have to wait for the full list before processing starts.
The cost of retries
Instructor adds zero extra LLM calls when validation passes. The retry overhead only appears on failures.
Each retry is a fresh call with the full conversation plus the error message appended, so it costs roughly the same as the original call. At max_retries=3, a validation failure could cost up to 4x the normal call price. In practice, well-defined Pydantic models with clear field descriptions fail maybe 1-3% of the time on a capable model like Claude Sonnet 4.6, so the average overhead is small.
The practical advice: don't add validators that the model can't reliably satisfy. If you're validating that an extracted date falls within the last 5 years, make sure the source document actually contains dates in that range. Tight validators on noisy inputs lead to expensive retry loops.
Adding clear descriptions to your Pydantic fields also reduces failure rates significantly:
from pydantic import BaseModel, Field
class ProductReview(BaseModel):
rating: int = Field(..., description="Star rating from 1 (worst) to 5 (best)")
sentiment: str = Field(..., description="Must be exactly 'positive', 'negative', or 'neutral'")
summary: str = Field(..., description="One sentence summary of the review, max 30 words")
Instructor passes field descriptions as part of the schema prompt. The model that sees a clear description fails validation far less often than one working from a bare field name.
Instructor vs raw JSON mode vs function calling
| Approach | Type safety | Validation | Auto-retry | Multi-provider |
|---|---|---|---|---|
| Raw JSON mode | No | No | No | Varies |
| Function calling | Partial | No | No | Partial |
| Instructor | Full (Pydantic) | Yes | Yes | Yes |
Raw JSON mode (response_format={"type": "json_object"}) guarantees parseable JSON but nothing else. No type coercion, no field validation, no retry. You're responsible for every check.
Function calling (tool use) structures outputs around a tool schema, which gets you closer to type safety. But validation and retry logic are still your problem, and the API differs between providers. Instructor wraps function calling internally on providers that support it — you write Pydantic, it handles the tool schema translation.
For deeper context on when to use each approach, see the post on structured outputs from AI APIs.
When not to use Instructor
Forcing structure onto outputs that are inherently unstructured hurts quality. Creative writing, open-ended Q&A, explanatory text — these don't have a schema. If you define a Pydantic model for "a helpful essay response," you're not adding structure, you're adding friction.
Instructor is the right tool when the output has a defined shape that you need to reliably process downstream: extraction, classification, transformation, entity recognition. If you're going to print() the output directly to a user, you probably don't need it.
Also: if your application needs to surface the model's reasoning process rather than hide it, Instructor's default retry behavior makes that harder — the model is silently retrying until it gets it right. You can inspect retry history via response._raw_response attributes, but it adds complexity.
A real extraction pipeline
Here's what a production invoice extraction script looks like using Instructor with Claude Sonnet 4.6:
import anthropic
import instructor
from pydantic import BaseModel, Field, field_validator, model_validator
from typing import Optional
import re
client = instructor.from_anthropic(anthropic.Anthropic())
class LineItem(BaseModel):
description: str = Field(..., description="Item description as it appears on the invoice")
quantity: int = Field(..., ge=1, description="Number of units")
unit_price: float = Field(..., gt=0, description="Price per unit in the invoice currency")
class Invoice(BaseModel):
vendor: str
invoice_number: str
currency: str = Field(..., description="3-letter ISO currency code, e.g. USD, EUR, GBP")
total: float
line_items: list[LineItem]
notes: Optional[str] = None
@field_validator("currency")
def valid_currency(cls, v):
if not re.match(r'^[A-Z]{3}$', v):
raise ValueError(f"currency must be a 3-letter ISO code, got '{v}'")
return v
@model_validator(mode="after")
def total_within_tolerance(self):
calculated = sum(i.quantity * i.unit_price for i in self.line_items)
if self.line_items and abs(calculated - self.total) > (self.total * 0.05):
raise ValueError(
f"Total {self.total} differs from line item sum {calculated:.2f} by more than 5%"
)
return self
def extract_invoice(text: str) -> Invoice:
return client.messages.create(
model="claude-sonnet-4-6",
max_tokens=2048,
messages=[{"role": "user", "content": f"Extract all invoice data:\n\n{text}"}],
response_model=Invoice,
max_retries=3,
)
This runs in production. It fails cleanly when extraction is genuinely ambiguous (after 3 retries, Instructor raises the last validation error), and it's caught in the caller. No silent bad data in the database.
For the broader context on how structured extraction fits into agentic pipelines, check out building a data analyst agent with Claude and the function calling lesson in the agents track.
Getting started
If you have any Python code that calls an LLM and parses the response, you can probably replace the parsing logic with Instructor in under an hour. The migration is:
- Define a Pydantic model for what you want back
- Wrap your existing client with
instructor.from_anthropic()orinstructor.from_openai() - Add
response_model=YourModelto thecreate()call - Delete your parsing code
The result is less code, full type safety, and automatic recovery from the formatting errors that currently require defensive checks. For most structured extraction tasks, that's a worthwhile trade.



