Skip to main content
All Model Guides
Model GuideGPT-4oOpenAIfunction callingstructured outputsvision

How to Prompt GPT-4o: Patterns That Actually Work

GPT-4o excels at function calling, structured outputs, and multimodal tasks. Here's what distinguishes it from other models and how to get the best results.

5 min read

GPT-4o is OpenAI's flagship model as of early 2026 — fast, multimodal, and particularly strong at structured output generation and function calling. Understanding its strengths helps you write prompts that leverage what it actually does well.


What GPT-4o Does Best

Structured outputs with schema enforcement. The structured outputs API feature is a genuine capability advantage — it constrains GPT-4o to produce output that exactly matches a JSON schema. No more parsing failures, no more "it was almost right" JSON.

Function calling and tool use. GPT-4o has strong native support for function calling — the mechanism by which AI agents connect to external tools. It reliably produces well-formed function call arguments from natural language descriptions.

Multimodal reasoning. GPT-4o handles images natively (not as a bolt-on feature). It can reason about charts, screenshots, diagrams, and photos with better context integration than models that treat images as separate inputs.

Speed and throughput. GPT-4o is significantly faster than earlier GPT-4 variants, making it practical for real-time applications and high-volume use cases.


Structured Outputs

This is one of GPT-4o's most useful production features. Instead of asking for JSON in your prompt (and hoping it's valid), you provide a schema:

from openai import OpenAI
import json

client = OpenAI()

response = client.chat.completions.create(
    model="gpt-4o",
    messages=[
        {"role": "system", "content": "Extract product information from the text."},
        {"role": "user", "content": "The Sony WH-1000XM5 headphones retail for $349 and feature 30-hour battery life and class-leading noise cancellation."}
    ],
    response_format={
        "type": "json_schema",
        "json_schema": {
            "name": "product_info",
            "strict": True,
            "schema": {
                "type": "object",
                "properties": {
                    "name": {"type": "string"},
                    "price_usd": {"type": "number"},
                    "key_features": {
                        "type": "array",
                        "items": {"type": "string"}
                    }
                },
                "required": ["name", "price_usd", "key_features"],
                "additionalProperties": False
            }
        }
    }
)

product = json.loads(response.choices[0].message.content)
# Guaranteed to match the schema — no validation needed

With "strict": True, GPT-4o will not produce output that violates the schema. This eliminates an entire class of parsing and validation bugs in production pipelines.

When to use structured outputs:

  • Extracting structured data from unstructured text
  • Building pipelines where downstream code depends on specific fields
  • Any task where you currently validate/retry because of malformed JSON

Function Calling

Function calling is how you connect GPT-4o to external tools in an agentic workflow:

tools = [
    {
        "type": "function",
        "function": {
            "name": "get_weather",
            "description": "Get current weather for a location. Call this whenever the user asks about weather.",
            "parameters": {
                "type": "object",
                "properties": {
                    "location": {
                        "type": "string",
                        "description": "City and country, e.g. 'Paris, France'"
                    },
                    "unit": {
                        "type": "string",
                        "enum": ["celsius", "fahrenheit"],
                        "description": "Temperature unit"
                    }
                },
                "required": ["location"],
                "additionalProperties": False
            }
        }
    }
]

response = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "What's the weather in Tokyo?"}],
    tools=tools,
    tool_choice="auto"  # Let the model decide when to call tools
)

Writing good function descriptions:

  • Be specific about when to call the function: "Call this when the user asks about X" not just "Gets X"
  • Describe what the function returns, not just what it takes as input
  • Include edge cases: "Do not call this if the location is ambiguous"
  • Use concrete parameter descriptions: "City and country, e.g. 'Paris, France'" not just "The location"

Vision Prompting

GPT-4o handles images natively. The key is providing both the image and specific instructions about what to extract or analyze:

response = client.chat.completions.create(
    model="gpt-4o",
    messages=[
        {
            "role": "user",
            "content": [
                {
                    "type": "text",
                    "text": "This is a screenshot of our analytics dashboard. List all metrics shown and their current values. Flag any that appear to be abnormal based on the data shown."
                },
                {
                    "type": "image_url",
                    "image_url": {"url": "data:image/jpeg;base64,..."}
                }
            ]
        }
    ]
)

Vision prompting best practices:

  • Be specific about what information to extract — don't just say "describe this image"
  • Give domain context: "This is a medical scan" or "This is a Python traceback screenshot"
  • Ask for structured output if you're going to parse the results
  • For charts/graphs: ask for the specific values, not just a description of trends

System Prompts for GPT-4o

GPT-4o follows system prompts reliably, but a few patterns work especially well:

The role + behavior + format pattern:

You are a [specific role with expertise].

When [trigger condition]:
- [specific behavior 1]
- [specific behavior 2]

Format: [explicit output format]

For example:

You are a product manager reviewing user feedback for a B2B SaaS tool.

When given customer feedback:
- Identify the underlying need, not just the stated request
- Note whether this is a bug, UX issue, or feature request
- Assess priority: high/medium/low based on implied frequency and impact

Format: Return as JSON with keys: category, underlying_need, priority, one_line_summary

Practical Temperature Settings

TaskTemperatureNotes
Structured data extraction0.0Maximum consistency
Code generation0.1–0.2Slight variation for alternatives
Analysis and summarization0.3–0.5Some flexibility
Writing assistance0.6–0.8More natural variation
Brainstorming0.8–1.0Maximize diversity of ideas

Common Mistakes With GPT-4o

Asking for JSON without using structured outputs. Asking "give me a JSON response with these fields" still occasionally produces malformed JSON. Use response_format with a schema for anything that feeds into code.

Vague function descriptions. "Gets data" is not a useful function description. Write descriptions as if you're explaining to a human assistant when and how to use a tool.

Mixing concerns in the system prompt. Separate persona, behavior, and format instructions. One big paragraph of rules is harder to follow than clearly structured sections.

Not using conversation history for iterative tasks. GPT-4o maintains context well. For multi-step tasks, let it build on previous turns rather than restating everything in each message.

Sending images without a specific extraction goal. "What do you see?" gets you a general description. "List all error messages visible in this screenshot" gets you what you actually need.

Want to compare models side by side?

See how Claude, GPT-4o, Gemini, and open-source models stack up for different use cases.

View model comparison →