GPT-4o is OpenAI's flagship model as of early 2026 — fast, multimodal, and particularly strong at structured output generation and function calling. Understanding its strengths helps you write prompts that leverage what it actually does well.
What GPT-4o Does Best
Structured outputs with schema enforcement. The structured outputs API feature is a genuine capability advantage — it constrains GPT-4o to produce output that exactly matches a JSON schema. No more parsing failures, no more "it was almost right" JSON.
Function calling and tool use. GPT-4o has strong native support for function calling — the mechanism by which AI agents connect to external tools. It reliably produces well-formed function call arguments from natural language descriptions.
Multimodal reasoning. GPT-4o handles images natively (not as a bolt-on feature). It can reason about charts, screenshots, diagrams, and photos with better context integration than models that treat images as separate inputs.
Speed and throughput. GPT-4o is significantly faster than earlier GPT-4 variants, making it practical for real-time applications and high-volume use cases.
Structured Outputs
This is one of GPT-4o's most useful production features. Instead of asking for JSON in your prompt (and hoping it's valid), you provide a schema:
from openai import OpenAI
import json
client = OpenAI()
response = client.chat.completions.create(
model="gpt-4o",
messages=[
{"role": "system", "content": "Extract product information from the text."},
{"role": "user", "content": "The Sony WH-1000XM5 headphones retail for $349 and feature 30-hour battery life and class-leading noise cancellation."}
],
response_format={
"type": "json_schema",
"json_schema": {
"name": "product_info",
"strict": True,
"schema": {
"type": "object",
"properties": {
"name": {"type": "string"},
"price_usd": {"type": "number"},
"key_features": {
"type": "array",
"items": {"type": "string"}
}
},
"required": ["name", "price_usd", "key_features"],
"additionalProperties": False
}
}
}
)
product = json.loads(response.choices[0].message.content)
# Guaranteed to match the schema — no validation needed
With "strict": True, GPT-4o will not produce output that violates the schema. This eliminates an entire class of parsing and validation bugs in production pipelines.
When to use structured outputs:
- Extracting structured data from unstructured text
- Building pipelines where downstream code depends on specific fields
- Any task where you currently validate/retry because of malformed JSON
Function Calling
Function calling is how you connect GPT-4o to external tools in an agentic workflow:
tools = [
{
"type": "function",
"function": {
"name": "get_weather",
"description": "Get current weather for a location. Call this whenever the user asks about weather.",
"parameters": {
"type": "object",
"properties": {
"location": {
"type": "string",
"description": "City and country, e.g. 'Paris, France'"
},
"unit": {
"type": "string",
"enum": ["celsius", "fahrenheit"],
"description": "Temperature unit"
}
},
"required": ["location"],
"additionalProperties": False
}
}
}
]
response = client.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": "What's the weather in Tokyo?"}],
tools=tools,
tool_choice="auto" # Let the model decide when to call tools
)
Writing good function descriptions:
- Be specific about when to call the function: "Call this when the user asks about X" not just "Gets X"
- Describe what the function returns, not just what it takes as input
- Include edge cases: "Do not call this if the location is ambiguous"
- Use concrete parameter descriptions: "City and country, e.g. 'Paris, France'" not just "The location"
Vision Prompting
GPT-4o handles images natively. The key is providing both the image and specific instructions about what to extract or analyze:
response = client.chat.completions.create(
model="gpt-4o",
messages=[
{
"role": "user",
"content": [
{
"type": "text",
"text": "This is a screenshot of our analytics dashboard. List all metrics shown and their current values. Flag any that appear to be abnormal based on the data shown."
},
{
"type": "image_url",
"image_url": {"url": "data:image/jpeg;base64,..."}
}
]
}
]
)
Vision prompting best practices:
- Be specific about what information to extract — don't just say "describe this image"
- Give domain context: "This is a medical scan" or "This is a Python traceback screenshot"
- Ask for structured output if you're going to parse the results
- For charts/graphs: ask for the specific values, not just a description of trends
System Prompts for GPT-4o
GPT-4o follows system prompts reliably, but a few patterns work especially well:
The role + behavior + format pattern:
You are a [specific role with expertise].
When [trigger condition]:
- [specific behavior 1]
- [specific behavior 2]
Format: [explicit output format]
For example:
You are a product manager reviewing user feedback for a B2B SaaS tool.
When given customer feedback:
- Identify the underlying need, not just the stated request
- Note whether this is a bug, UX issue, or feature request
- Assess priority: high/medium/low based on implied frequency and impact
Format: Return as JSON with keys: category, underlying_need, priority, one_line_summary
Practical Temperature Settings
| Task | Temperature | Notes |
|---|---|---|
| Structured data extraction | 0.0 | Maximum consistency |
| Code generation | 0.1–0.2 | Slight variation for alternatives |
| Analysis and summarization | 0.3–0.5 | Some flexibility |
| Writing assistance | 0.6–0.8 | More natural variation |
| Brainstorming | 0.8–1.0 | Maximize diversity of ideas |
Common Mistakes With GPT-4o
Asking for JSON without using structured outputs. Asking "give me a JSON response with these fields" still occasionally produces malformed JSON. Use response_format with a schema for anything that feeds into code.
Vague function descriptions. "Gets data" is not a useful function description. Write descriptions as if you're explaining to a human assistant when and how to use a tool.
Mixing concerns in the system prompt. Separate persona, behavior, and format instructions. One big paragraph of rules is harder to follow than clearly structured sections.
Not using conversation history for iterative tasks. GPT-4o maintains context well. For multi-step tasks, let it build on previous turns rather than restating everything in each message.
Sending images without a specific extraction goal. "What do you see?" gets you a general description. "List all error messages visible in this screenshot" gets you what you actually need.