You can generate a great product shot with a simple prompt. People are harder. Something always looks slightly off — the eyes are too symmetrical, the smile too perfect, the hand placement unnatural. AI-generated people fail advertising not because the technology is bad, but because most prompts are too vague. This post gives you the specific formula that fixes it.
Why people are hard to generate well
Humans are hyperattuned to faces and body language. This goes back millions of years — reading another person's face accurately was survival-critical, so the brain dedicates enormous processing power to it. A slightly wrong eye position, an expression that doesn't quite match the context, a hand at an angle that nobody holds naturally: these register as wrong before you've consciously noticed anything.
Generic prompts produce generic, stock-photo-feeling results for four specific reasons:
- No lighting setup — without specifying how light hits skin, the model defaults to the most average lighting it's seen, which is usually the flat, overlit look of budget corporate photography
- No authentic moment — "a happy person" produces a person posing to communicate happiness, not a person experiencing it
- No environmental story — a person in context-free space looks like a cutout, because that's essentially what they are
- No camera simulation — leaving out lens and aperture details means the model picks defaults, and defaults look like defaults
The fix for each of these is specific language. Here's the full six-part structure.
The 6-part formula for realistic people
Part 1: Specific appearance descriptors, not demographic labels
Bad: "young woman"
Good: "woman in her late 20s, natural makeup, shoulder-length dark hair, warm undertone skin"
Demographic shorthand generates demographic stereotypes. When you say "young woman," the model averages across every young woman in its training data — which skews heavily toward conventional stock photography. Specificity forces the model out of that mode. You're not describing a category; you're describing a particular person. The more particular, the more believable.
Same principle applies across every appearance attribute. "Athletic man" is a category. "Man in his late 20s, lean build, slight stubble, wearing a plain grey t-shirt" is a person.
Part 2: Authentic expression, not "smiling"
Bad: "smiling at camera"
Good: "mid-laugh, head slightly tilted, genuine amusement — caught in the moment, not posed"
Posed smiles are the single biggest uncanny valley trigger in AI-generated people. The model has learned from millions of stock photos in which professional models smile directly at cameras in ways real humans almost never do. If you ask for "smiling," you'll get that learned behavior back.
The fix: specify the emotion context, not the expression. Instead of "smiling," describe what's happening that would produce a smile. "Mid-laugh" implies a moment. "Slight surprise and delight" implies an emotional sequence. "Genuine focus, not looking at camera" removes the smile entirely and reads as real.
Other expressions that work well: "absorbed in thought," "reacting to something off-frame," "mid-sentence, explaining something," "listening intently, slight nod."
Part 3: Lighting that flatters human skin
Bad: "professional lighting"
Good: "soft diffused natural light from a large window on the left, slight warmth, no harsh shadows"
Hard directional light creates harsh shadows on faces — under the nose, under the chin, across the cheekbones. This is what makes people look like AI-generated people even when everything else is right. Soft, diffused light is what flatters human skin because it wraps around contours without creating hard edges.
Lighting keywords that consistently produce natural-looking results:
- Soft window light — diffused, directional without being harsh
- Overcast outdoor light — even and flattering, the natural equivalent of a giant softbox
- Golden hour side light — warm, cinematic, adds depth without harsh shadows
- Studio beauty lighting — soft boxes, professional but controlled
- Candlelight — warm, moody, specific
Lighting to avoid: harsh flash, direct overhead, ring light. Ring lights produce a too-perfect circular catchlight in the eye that registers as artificial.
Part 4: Environmental context — the story
Bad: "office background"
Good: "open-plan modern office, afternoon light, soft bokeh background, colleague visible but out of focus at desk behind them"
Context creates believability. The environment tells us who this person is, what they're doing, and why we should care. A person floating in front of a blurred generic office background looks like a LinkedIn profile photo. A person in a specific, slightly populated, naturally lit environment looks like they exist.
The "colleague visible but out of focus behind them" detail is doing a lot of work in that example. It makes the space feel inhabited without distracting from the subject. One or two environmental details — a coffee cup, a window with a specific view, a specific type of furniture — anchor the person in a real place.
Part 5: Camera simulation
This single element improves results more than any other part of the prompt. Add this to every people prompt:
"shot on Canon 5D Mark IV, 85mm portrait lens, f/1.8, shallow depth of field"
Here's why it works so well: 85mm at f/1.8 is the standard flattering portrait compression. It's how real advertising photographers have shot people for decades. The model learned from billions of real photographs taken with real cameras. When you specify the camera and lens, you're essentially telling the model which part of its training data to draw from — the professional portrait photography part, not the smartphone selfie part.
The physics of an 85mm lens at f/1.8 produces a specific look: natural facial compression (wide angles distort faces, longer lenses flatter them), a narrow plane of focus that separates the subject from the background cleanly, and a bokeh quality that's become synonymous with professional photography.
Alternatives for different effects:
- 50mm f/1.4 — slightly wider, places the person more in their environment
- 35mm f/2 — wider still, candid and documentary feel, more of the scene visible
- 70-200mm f/2.8 — longer compression, backgrounds become abstract wash of color
Part 6: Style reference
Bad: "professional photo"
Good: "editorial advertising photography, brand campaign quality, authentic and unposed" or "documentary-style candid photography, authentic lifestyle photography"
The style reference primes the aesthetic register. "Professional photo" is as generic as it gets. Naming a specific aesthetic — editorial, documentary, campaign-quality — tells the model which visual genre to operate in. You can also reference specific brand aesthetics (Patagonia's outdoor documentary style, Apple's clean high-key product-person integration) when the model has clearly learned those aesthetics.
5 complete ad prompt templates
These are copy-paste ready. Each includes Midjourney flags.
B2B SaaS (person at work):
Photorealistic advertising photograph of a woman in her mid-30s, business casual,
at a clean modern desk with a laptop, mid-expression as if just having an insight,
genuine focus not posed — not looking at camera. Soft natural window light from
left, warm afternoon tone. Shallow depth of field, 85mm portrait lens, f/1.8.
Open plan office background, soft bokeh. Brand campaign quality, authentic and
professional.
--ar 4:5 --style raw --v 6
Health and wellness (active, outdoors):
Photorealistic lifestyle photo of a man in his late 20s, athletic build, running
on a coastal path at golden hour, slight motion blur on legs showing movement,
genuine effort on face — not a posed run. Warm low sun behind him, rim lighting
on shoulders. Shot on 70-200mm, f/2.8, bokeh ocean background. Nike campaign
aesthetic, authentic athletic photography.
--ar 4:5 --style raw --v 6
Finance and professional services (trust-building):
Photorealistic portrait of a man in his early 40s, dark suit, confident but
approachable expression — subtle smile, direct but warm gaze. Soft studio lighting,
slight shadow depth. Neutral charcoal background. Shot on 85mm, f/2.8, slight
vignette. Financial services brand campaign, trustworthy and assured.
--ar 1:1 --style raw --v 6
E-commerce lifestyle (using product naturally):
Photorealistic lifestyle ad of a woman in her late 20s, casual weekend outfit,
sitting on a bright apartment couch, genuinely engrossed in her phone — natural
candid moment. Afternoon natural light through large windows. 35mm, f/2, shallow
depth of field. The phone is the hero product — visible but not held up to camera.
Warm, aspirational, authentic. Aesop brand campaign aesthetic.
--ar 4:5 --style raw --v 6
Diverse team and group shot:
Photorealistic team photograph of five people in a bright modern meeting room,
diverse ages and backgrounds, in the middle of a genuine conversation — some
laughing, one making a point, natural body language. Overhead natural light with
large windows. Wide shot, 35mm, f/4. Authentic collaboration, not posed group photo.
Slack marketing campaign aesthetic.
--ar 16:9 --style raw --v 6
What to avoid
A few specific patterns that consistently produce bad results:
Don't use nationality as a descriptor. It produces stereotypes rather than real people. Describe specific appearance attributes instead — hair texture, skin tone range, facial features — and you'll get a real-looking person rather than a casting-call cliché.
Don't use age adjectives like "elderly." This leads to caricature — exaggerated wrinkles, exaggerated frailty. "In their late 60s" or "in their early 70s" produces a real-looking older person. The phrase matters.
Don't rely on profession labels alone. "Businesswoman" or "doctor" gives the model a costume to put someone in, not a person to generate. Combine profession with context: "woman in her 40s at a clinic desk reviewing paperwork, white coat, reading glasses, genuine concentration."
Avoid celebrity-adjacent descriptors. Anything that nudges the model toward a specific real person degrades quality and introduces IP risk. Stay descriptive rather than comparative.
Don't use "realistic" without the camera spec. "Photorealistic" in isolation is too vague to do much. The camera and lens simulation is what actually produces the photographic realism you're after.
Post-processing workflow for ad-ready output
Even with a well-constructed prompt, you'll want a quick review pass before using images in real campaigns.
The two most common artifacts in AI-generated people are hands and fine facial details. Check both immediately. Hands with too many or too few fingers are the giveaway that kills an otherwise strong image. Teeth and eyes are the second check — these are where the model sometimes over-smooths or over-sharpens.
For fixing specific artifacts: Adobe Firefly's Generative Fill (or Photoshop's built-in equivalent) is the fastest tool. Select the problem area, describe what should be there, and regenerate just that region. This is often faster than re-prompting from scratch.
For resolution: Midjourney output at 4x upscale is sufficient for most digital advertising. For print or large-format digital, run through Magnific AI or Topaz Gigapixel AI to get to print-quality resolution without the softening that comes from standard upscaling.
Final crop to your specific ad specs — Meta feed (4:5), Stories (9:16), Display (16:9), LinkedIn (1.91:1) — in Figma or Canva before delivery. Cropping at the prompt stage with --ar gets you close; final output sizing should still happen in your design tool.
A note on ethics
Disclose when advertising images are AI-generated. In several markets this is already legally required, and that trend is accelerating. It's also just honest — and audiences are generally fine with disclosed AI imagery when the image itself is good. The problem isn't that something is AI-generated; it's when something is AI-generated and looks bad. Fix the quality first, then the disclosure is easy.
For product photography prompts that follow the same specificity approach, see our full guide to AI image prompts for realistic product ads. And for a library of copy-paste image prompts organized by use case, the image prompt library has templates ready to go.

