Skip to main content

Technique Guide

Image Generation Prompting

Image generation models — Midjourney, DALL-E 3, Stable Diffusion — require a different prompting approach than text models. Learning the vocabulary of style modifiers, composition terms, and model-specific syntax unlocks dramatically better results.

The Anatomy of an Image Prompt

Strong image prompts typically include: the subject, the style/medium, lighting, composition, and quality modifiers. The order matters less than the presence of each component.

Prompt formula

[subject] + [setting/context] + [style/medium] + [lighting] + [composition] + [quality]

Weak: “a person sitting in a cafe”

Strong: “a woman reading a book in a Parisian sidewalk cafe, warm afternoon light streaming through windows, oil painting style, impressionist brushwork, soft focus background, highly detailed”

Essential Modifier Categories

Style: photorealistic, oil painting, watercolor, anime, concept art, isometric illustration
Lighting: golden hour, dramatic side lighting, studio lighting, cinematic lighting, neon glow
Camera: wide angle, telephoto, macro, aerial view, fisheye, portrait lens
Mood: moody, ethereal, vibrant, dystopian, serene, cinematic
Quality: highly detailed, 8k, masterpiece, sharp focus, professional photograph
Composition: rule of thirds, centered subject, dynamic diagonal, negative space

Model-Specific Tips

DALL-E 3 (OpenAI)

  • • Handles natural language well — write in clear sentences, not keyword lists
  • • Explicitly include style: “in the style of a product photograph” or “illustrated in a flat design style”
  • • Use it via ChatGPT for iterative refinement — ask it to “make it more dramatic” or “change the background to...”
  • • Good for complex scenes with text and logos (better text rendering than Midjourney)

Midjourney

  • • Uses comma-separated keywords: “futuristic city, cyberpunk, rain, neon lights, cinematic”
  • • Append --ar 16:9 for aspect ratio, --v 6 for latest model
  • • Use --no [elements] for negative prompting: --no blur, text, watermark
  • • Reference artists for consistent style: “in the style of James Gurney”

Stable Diffusion

  • • Use negative prompts explicitly in the negative prompt field: “blurry, low quality, distorted, extra limbs”
  • • Adjust CFG scale (7–12 typical): higher = more prompt adherent, lower = more creative
  • • Model checkpoints matter hugely — different checkpoints excel at different styles
  • • Use LoRA adapters for consistent character or style across multiple images

Articles

Related Lessons

Structured lessons on multimodal prompting and related techniques.

Related Guides

Learn Multimodal Prompting

The Intermediate track includes a full lesson on multimodal prompting — text-to-image, image understanding, and cross-modal reasoning.

Go to Multimodal Lesson