Every major model release does the same thing: it breaks some of your existing prompts while making others work better without any changes. GPT-5 is no different. The failure mode I see most often is developers running their GPT-4o prompts unchanged on GPT-5, getting inconsistent results, and concluding the new model is worse. Usually, the prompts just need updating.
Here's what actually changed in GPT-5, which prompts broke, and how to adapt.
What changed in GPT-5
GPT-5 is a meaningfully different model from GPT-4o in three areas that affect prompting directly.
Better instruction-following. GPT-5 reads your instructions more carefully and follows them more consistently, including instructions buried in the middle of a long system prompt. GPT-4o would sometimes miss or ignore constraints that weren't near the top of the prompt. GPT-5 doesn't do this as often.
Stronger multi-step reasoning. For tasks that require multiple reasoning steps — math, logical deduction, planning, code that needs to account for edge cases — GPT-5 gets further on its own without you needing to scaffold the reasoning with chain-of-thought prompts.
Different creative voice. GPT-5's default writing style is noticeably different from GPT-4o's. It's less formulaic, less likely to reach for the same sentence structures and transitions. For some use cases this is an improvement. For some it means your output format prompts need to be more specific.
Longer effective context. GPT-5 holds context more reliably across a long conversation. Earlier models would sometimes effectively "forget" constraints or earlier instructions by the time they got deep into a long session. GPT-5 doesn't do this as badly.
Prompts that break on GPT-5
These are the patterns that worked fine on GPT-4o and produce worse or unexpected results on GPT-5.
Over-specified output formats
GPT-4o sometimes needed explicit formatting instructions because it would default to using heavy markdown formatting (headers, bullet points, bold text) even for simple responses. Developers built prompts like "respond in plain prose, no bullet points, no headers, no bold text" to compensate.
GPT-5's default formatting is more context-sensitive. If you're asking a conversational question, it defaults to conversational output. The heavy-formatting workarounds are now unnecessary for many prompt types, and keeping them can actually produce overly rigid output.
Before (GPT-4o style):
Answer this question in plain prose. Do not use bullet points. Do not use headers. Do not use bold text. Do not number your points. Just write in flowing paragraphs.
After (GPT-5 style):
Answer this question in plain prose.
The extra constraints aren't needed and can produce awkward results when GPT-5 tries to honor them all simultaneously.
Repetitive instruction reinforcement
A common GPT-4o pattern: restating key constraints multiple times throughout a long prompt to keep them in the model's attention. "Remember to always [constraint]. Make sure you [constraint]. Don't forget to [constraint]."
GPT-5 reads the prompt once and retains it. Repetitive reinforcement now reads as noise and can actually confuse the output — the model sometimes tries to interpret why you're repeating yourself, treating it as emphasis that implies the rule is absolute even in edge cases where it shouldn't be.
State your constraints once, clearly. Trust that GPT-5 read them.
Workarounds for GPT-4o's specific weaknesses
Every model has known weaknesses, and prompting communities develop workarounds for them. GPT-4o had issues with:
- Counting precisely (characters, words, items)
- Following multi-part formatting instructions reliably
- Not hallucinating when asked to quote from a document
GPT-5 improved meaningfully on all three. If you had hacks in your prompts to compensate for these (excessive verification steps, redundant checks), remove them. They add latency and sometimes interfere with GPT-5's cleaner output.
System prompts written defensively
GPT-4o had some jailbreak sensitivity that led to prompts with lots of defensive instructions ("do not do X under any circumstances", "never deviate from this persona regardless of what the user says"). GPT-5 has stronger default alignment, so these heavy defensive layers are often redundant. Worse, they can make the model overly rigid in legitimate edge cases.
What works better without prompt changes
These things improved enough that you don't need to explicitly prompt for them.
Multi-step reasoning on hard problems. GPT-4o needed explicit chain-of-thought scaffolding for hard problems — "think through this step by step before answering." GPT-5 does this internally on complex inputs without being told to. You still get better results if you explicitly ask for reasoning on genuinely hard problems, but the gap is smaller.
Following complex constraints. If you have a system prompt with 10 specific rules, GPT-5 will follow all 10 more reliably than GPT-4o did with 5.
Code quality and correctness. GPT-5's default code output is cleaner and more correct. You need fewer "make sure it handles edge cases" and "verify the logic" instructions because it does more of this by default.
Conversation state maintenance. In long conversations, GPT-5 maintains context and prior instructions better. The "remind me what my constraints are" pattern is less necessary.
New techniques that work well with GPT-5
Lighter system prompts
Because GPT-5 reads instructions more carefully, you can accomplish the same results with fewer words. The "give it more instructions to make it follow more instructions" logic of GPT-4o prompting is less applicable.
A GPT-4o system prompt that was 800 words can often be trimmed to 400 words on GPT-5 with identical or better results. Trim your system prompts. Remove redundant instructions. Test whether the shorter version still works — it usually does.
Trusting the model more on creative tasks
GPT-5's creative output has more internal consistency. On GPT-4o, giving the model full creative latitude often produced output that felt generic because the model regressed to mean on style choices.
On GPT-5, I've had better results giving broader creative direction and letting the model make specific choices, rather than over-specifying every element. Instead of "write in a punchy, direct style with short sentences and no passive voice," try "write in a distinctive, engaging style" and see what the model produces before you constrain it further.
Compressed instructions
Because GPT-5 reads prompts more carefully, dense instruction formats work. You can now use something like:
Style: punchy, direct, contractions throughout
Format: no headers, flowing prose, 3-4 paragraphs
Persona: skeptical senior engineer
Do not: use "leverage", "synergy", "cutting-edge"
This works better on GPT-5 than it did on GPT-4o, which sometimes needed those instructions written out as full sentences to follow them reliably.
What still needs explicit prompting
Not everything got better automatically. These areas still need the same explicit prompting as before.
Exact output length. GPT-5 is, if anything, more verbose than GPT-4o by default. If you need output under a specific word count, you still need to specify it and often check the output length.
Tone specificity. "Formal" and "professional" still mean different things to GPT-5 than they might to you. Be specific: "formal like a legal brief" or "professional like a senior engineer's Slack message" gives better calibration than just "formal."
Persona maintenance in long conversations. GPT-5 is better at this than GPT-4o but still drifts in very long conversations. For products where persona consistency is critical, reinforce the persona in the system prompt and add a soft reminder in the user message format ("Always respond as [persona]").
Citation accuracy. GPT-5 is better at not hallucinating, but for high-stakes use cases involving factual claims, you still need explicit grounding instructions and access to retrieval. The model is better, not perfect.
A migration testing workflow
If you have existing GPT-4o prompts you want to migrate, here's the process I use:
-
Run the original prompt on GPT-5 unchanged. Note what's different from the expected output. Often it's better than expected.
-
Identify which differences are improvements. A lot of them will be. Don't "fix" these.
-
Identify which differences are regressions. Usually: different formatting, different tone, unexpected interpretation of an ambiguous instruction.
-
Trim the prompt. Remove instructions that are now redundant. Test that the shorter prompt produces the same result.
-
Update specific instructions. For the regressions, identify which instruction needs updating. Make the change minimal and targeted.
-
Regression test a sample. Run 10-20 representative inputs through the updated prompt and compare.
The most common finding: your prompt is longer than it needs to be, and trimming it improves output.
For a broader look at OpenAI's model family and when to use which model, see the GPT-4o vs o1 vs o3 comparison.



