If you're using AI in an application, script, or workflow that processes the output programmatically, you need structured, predictable output — not free-form prose. This guide covers how to make models reliably produce JSON, XML, CSV, and other formats.
The Problem: LLMs Default to Prose
By default, LLMs generate natural language. Ask "extract the name and email from this message" and you might get:
"The name mentioned in the message is John Smith and his email address is john@example.com."
Technically correct. Programmatically useless — you'd need to parse that sentence, which is fragile.
What you want:
{
"name": "John Smith",
"email": "john@example.com"
}
Technique 1: State the Format Explicitly + Show the Schema
The most reliable method: declare the output format and show exactly what it should look like.
Extract the following fields from the message below and return as JSON.
Do not include any explanation or text outside the JSON object.
Schema:
{
"name": string,
"email": string,
"company": string | null,
"request_type": "sales" | "support" | "billing" | "other"
}
Message:
Hi, I'm Sarah from Acme Corp (sarah@acme.com). I've been having trouble
with my invoice from last month. Can you help?
Output:
{
"name": "Sarah",
"email": "sarah@acme.com",
"company": "Acme Corp",
"request_type": "billing"
}
Technique 2: Provide a Concrete Example
Show the model exactly what a correct output looks like:
Analyze the customer review below and return your analysis as JSON.
Example output format:
{
"sentiment": "positive" | "negative" | "mixed" | "neutral",
"score": 1-10,
"key_topics": ["topic1", "topic2"],
"summary": "one sentence"
}
Review:
"The laptop is incredibly fast and the build quality is premium,
but the battery only lasts 4 hours which is really disappointing."
Return only the JSON object.
Technique 3: "Return only the JSON. No other text."
This instruction is essential — without it, many models wrap the JSON in explanation:
Great! Here's the JSON you asked for:
```json
{ ... }
Let me know if you need any changes!
Add this to every structured output prompt:
Return only the JSON object. Do not include any explanation, markdown code blocks, or surrounding text.
Or:
Output: Raw JSON only. No preamble, no explanation, no markdown.
---
## Technique 4: Use the API with JSON Mode
If you're using the API (not the chat interface), most providers have a native JSON mode that forces valid JSON output:
**OpenAI:**
```python
response = client.chat.completions.create(
model="gpt-4o",
response_format={"type": "json_object"},
messages=[...]
)
Anthropic (Claude): Use structured output or specify schema in the prompt + set low temperature.
Google Gemini:
response = model.generate_content(
prompt,
generation_config={"response_mime_type": "application/json"}
)
JSON mode on the API level is more reliable than prompt-only enforcement — use it when available.
Handling Arrays and Nested Structures
For complex structures, define them clearly in the schema:
Extract all line items from the invoice below.
Return as JSON with this structure:
{
"invoice_number": string,
"date": "YYYY-MM-DD",
"vendor": string,
"total": number,
"line_items": [
{
"description": string,
"quantity": number,
"unit_price": number,
"subtotal": number
}
]
}
Invoice: [paste invoice text]
Other Structured Formats
CSV
Extract all contacts from the text below as CSV.
Header row: name,email,phone,company
No explanation, just the CSV data.
Text: [paste text]
Markdown Table
Compare the three options below.
Format as a markdown table with columns: Feature | Option A | Option B | Option C
Options: [paste options]
XML
Convert the following data to XML format.
Root element: <contacts>
Each contact: <contact> with children <name>, <email>, <role>
Data: [paste data]
Validation and Error Handling
Even with perfect prompts, models occasionally deviate from the format. Build robustness into your application:
- Validate the output before using it (check required fields, types)
- Retry on failure — call the API again with the same prompt
- Include a repair prompt — "The previous output was invalid JSON. Please output only valid JSON matching this schema: [schema]"
- Use low temperature (0.0–0.2) for structured output — reduces creative deviation
Key Takeaway
For structured output: define the schema explicitly, show an example, add "output only the JSON" (or relevant format), and use the API's native JSON mode when possible. Validate programmatically and retry on failure. With these techniques, AI output becomes reliably machine-readable.
This completes the Intermediate Track. You've mastered few-shot prompting, XML structure, chain of thought, hallucination prevention, and constrained generation. Ready for the Advanced Track?