Sometimes you need an LLM to think freely. Other times you need it to output data in a specific format that another system can process — a JSON object, a structured list, a consistent schema. Free-text output is unpredictable in structure. Structured prompting solves this by defining the expected output shape upfront and constraining the model to produce it.
When you're building a pipeline where the LLM's output feeds into another system — a database write, an API call, a UI component — you need that output to be reliable. This lesson covers how to get it.
Why JSON output fails without explicit handling
LLMs are trained to produce natural language. JSON is not natural language — it's a formal grammar with strict rules. Without explicit handling, models tend to drift in predictable ways:
- Adding explanatory text before or after the JSON ("Here is the structured output you requested:
{...}") - Wrapping the output in markdown code fences (
json ...) - Producing valid JSON with subtly wrong field names (
order_idvsorderId) - Producing well-formed JSON with hallucinated data in fields the model wasn't sure about
The "parse JSON from the output" approach — using a regex or json.loads() directly on the response — fails in production because you're relying on the model being perfectly consistent across thousands of requests. It won't be.
The three approaches to structured output
These range from simplest to most reliable.
Approach 1: Prompt instructions only
Tell the model exactly what schema to use and forbid any surrounding text:
Return your answer as a JSON object with this exact schema:
{
"status": "success" | "error",
"message": string,
"data": object | null
}
Return only the JSON. No explanation, no code fences, no surrounding text.
This works roughly 80-90% of the time with capable models like GPT-4o or Claude 3.5+. The failure rate is low enough for prototypes and internal tools, but in production serving thousands of requests, a 5-10% failure rate means constant exceptions to handle.
Use for: quick prototypes, non-critical outputs, situations where you'll validate the output anyway.
Approach 2: API-level constrained generation
Both OpenAI and Anthropic expose mechanisms to constrain output format at the API level.
OpenAI JSON mode (response_format={"type": "json_object"}) guarantees the output is valid JSON — but not that it matches your schema. Better is the JSON Schema mode:
response = client.chat.completions.create(
model="gpt-4o",
response_format={
"type": "json_schema",
"json_schema": {
"name": "classification_result",
"schema": {
"type": "object",
"properties": {
"sentiment": {"type": "string", "enum": ["positive", "negative", "neutral"]},
"confidence": {"type": "number"}
},
"required": ["sentiment", "confidence"]
}
}
},
messages=[...]
)
Reliability: effectively 100% for valid JSON, with schema conformance depending on schema complexity. This is production-grade.
Approach 3: Tool use for JSON (most reliable with Claude)
Define a "fake tool" whose input parameters match your desired schema, then force the model to call it. Claude's tool use mechanism is highly reliable — the model is trained specifically to produce well-formed tool calls:
import anthropic
client = anthropic.Anthropic()
tools = [{
"name": "record_classification",
"description": "Records the classification result for a customer message",
"input_schema": {
"type": "object",
"properties": {
"sentiment": {
"type": "string",
"enum": ["positive", "negative", "neutral"],
"description": "Overall sentiment of the customer message"
},
"intent": {
"type": "string",
"enum": ["question", "complaint", "compliment", "refund_request", "other"]
},
"confidence": {
"type": "number",
"minimum": 0,
"maximum": 1
}
},
"required": ["sentiment", "intent", "confidence"]
}
}]
response = client.messages.create(
model="claude-opus-4-6",
max_tokens=1024,
tools=tools,
tool_choice={"type": "tool", "name": "record_classification"},
messages=[{"role": "user", "content": f"Classify this message: {message}"}]
)
# Extract the tool call input — this is your structured output
result = response.content[0].input
The tool_choice parameter forces Claude to call that specific tool. The result is always a valid object matching your schema. Use this approach for the most reliable structured output from Claude, especially with complex or nested schemas.
JSON Schema as a communication contract
The schema you pass to the model serves as the contract for what you expect. Writing it well matters. Here is a complete example for a customer support classifier:
{
"type": "object",
"properties": {
"sentiment": {
"type": "string",
"enum": ["positive", "negative", "neutral"],
"description": "Overall sentiment of the customer message"
},
"intent": {
"type": "string",
"enum": ["question", "complaint", "compliment", "refund_request", "other"],
"description": "Primary intent of the customer message"
},
"confidence": {
"type": "number",
"minimum": 0,
"maximum": 1,
"description": "Confidence in the classification, 0 to 1"
},
"key_entities": {
"type": "array",
"items": {"type": "string"},
"description": "Product names, order IDs, or other key entities mentioned"
}
},
"required": ["sentiment", "intent", "confidence"]
}
Key schema design rules:
- Use
enumfor categorical fields — never ask the model to "choose a category" in free text and then parse the answer. Enumerate your categories in the schema. - Add
descriptionon every property — the model uses these descriptions to understand what to put in each field. A field namedconfidencewith no description might get a percentage like87; with"description": "Confidence score from 0 to 1"you get0.87. - Mark required fields explicitly — optional fields may be omitted. Be deliberate about which fields the model must always fill.
- Use
minimum/maximumfor numeric ranges — prevents the model from outputting a confidence score of95when you expected0.95.
When to use structured vs free-text output
Use structured output when:
- The output will be parsed and processed by code
- You need consistent field names across requests
- You're building a pipeline where output feeds into another step
- You need to store the output in a database or query it later
- You're building an API where callers depend on a stable response shape
Use free-text output when:
- The output is for direct human consumption
- Creative flexibility is the point (writing, brainstorming, explanation)
- The structure of the output is genuinely variable and hard to schema-ize
- You're in early exploration and don't know the right schema yet
A common mistake: forcing structured output before you know what structure you actually need. Write the free-text version first, see what the model naturally produces, then define your schema around what's useful.
Error handling for structured outputs
Even with constrained generation, validation is essential. Never trust raw model output. Always validate before using.
With Python's Pydantic:
from pydantic import BaseModel, confloat
from typing import Literal
import json
class ClassificationResult(BaseModel):
sentiment: Literal["positive", "negative", "neutral"]
intent: Literal["question", "complaint", "compliment", "refund_request", "other"]
confidence: confloat(ge=0, le=1)
def parse_classification(raw_output: str) -> ClassificationResult:
try:
data = json.loads(raw_output)
return ClassificationResult(**data)
except Exception as e:
raise ValueError(f"Invalid model output: {e}")
Retry pattern for failures:
If output is malformed, retry once with an explicit correction prompt:
def get_structured_output(prompt: str, schema: dict, max_retries: int = 2) -> dict:
messages = [{"role": "user", "content": prompt}]
for attempt in range(max_retries):
response = call_model(messages)
try:
result = json.loads(response)
validate_schema(result, schema) # your validator here
return result
except Exception as e:
if attempt < max_retries - 1:
messages.append({"role": "assistant", "content": response})
messages.append({
"role": "user",
"content": f"The previous output was invalid: {e}. Please output only valid JSON matching the schema, with no surrounding text."
})
raise RuntimeError("Failed to produce valid structured output after retries")
After two failed retries, log the failure and fall back gracefully. Don't crash your application — return a default value, skip the record, or queue it for manual review depending on your use case.
Practical examples
Entity extraction:
Extract entities from the customer message below.
Return as JSON only — no surrounding text, no code fences.
Schema:
{
"order_id": string or null,
"product_name": string or null,
"issue_type": string or null
}
If an entity is not present in the message, use null.
Customer message: [MESSAGE]
Content classification:
Classify the following support ticket.
Return only JSON matching this schema exactly:
{
"category": "billing" | "technical" | "general" | "returns",
"priority": "low" | "medium" | "high" | "critical",
"summary": string (max 100 characters)
}
Ticket: [TICKET_TEXT]
Address normalization:
Convert the following unstructured address into a structured JSON object.
Return only the JSON, nothing else.
Schema:
{
"street": string,
"city": string,
"state": string,
"zip": string,
"country": string
}
If a field cannot be determined from the input, use null.
Address: [ADDRESS_TEXT]
Each of these examples follows the same pattern: define the schema inline in the prompt, be explicit about what to do with missing data, and forbid any output outside the JSON.
Putting it together
Structured outputs are only reliable when you're explicit about the schema and validate the output. The three-layer approach — prompt instructions, API-level constraints, and output validation — gives you defense in depth.
Choose your approach based on reliability requirements:
- Prototype or internal tool: prompt instructions + basic validation
- User-facing production: API-level constraints (JSON Schema mode or tool use) + Pydantic validation + retry logic
- Mission-critical pipeline: all of the above + monitoring on validation failure rate
A rising validation failure rate is a signal that either the model is drifting or your inputs are changing in ways your schema wasn't designed for. Track it.
For more on controlling model output beyond JSON, see the Constrained generation lesson. For a deeper look at JSON mode and structured outputs across different APIs, see the Structured outputs blog post.