AI video generation is at the same stage image generation was three years ago — impressive but temperamental. You can spend an afternoon generating unusable clips if you don't understand how the tools interpret prompts. Or you can get useful output in a few tries if you know what actually works.
The core insight: video prompts are not image prompts with movement added. They're closer to brief cinematography directions. You're describing a shot, not just a scene.
The anatomy of a good video prompt
Every effective video prompt covers four things:
- Subject: What is the main element? Who/what are we watching?
- Action: What is happening? How is it moving?
- Camera: Where is the camera, how is it moving, what's the framing?
- Environment and lighting: Where is this happening? What does the light look like?
Most bad prompts cover 1 and 2 and skip 3 and 4. The result is a clip where the camera sits static in an undefined void, even if the subject is doing interesting things.
Basic structure:
[Camera setup and movement], [subject] [action], [environment], [lighting/time of day], [style/mood]
Example:
Slow tracking shot following a woman walking through a crowded Tokyo street market at dusk,
warm orange light from vendor stalls, shallow depth of field with bokeh background,
cinematic 4K footage
vs. the weaker version:
A woman walking through a Tokyo market
Both are valid prompts. One produces something interesting; the other is a guess.
Camera direction vocabulary
The single most impactful addition to a video prompt. These terms are understood by Sora, Runway, and Kling:
Movement:
static shot/locked off camera— camera doesn't moveslow dolly in/dolly out— camera physically moves forward/backpan left/right— camera rotates horizontallytilt up/down— camera rotates verticallytracking shot— camera follows the subjectdrone shot/aerial shot— overhead perspectivehandheld— subtle camera shake, feels observationalsteadicam— smooth movement that follows without static feel
Framing:
extreme close-up/ECU— detail shot, fills frameclose-up— face or object, detailedmedium shot— waist up on a personwide shot/establishing shot— shows the full sceneover-the-shoulder— looking from behind one subject at anotherlow angle— camera below subject, makes them imposingbird's eye view— straight down
Depth:
shallow depth of field— sharp subject, blurry backgrounddeep focus— everything in frame is sharpbokeh— out-of-focus light points in background
Sora-specific prompting
Sora tends to produce very high-quality cinematic output when the prompt is cinematic. It responds well to:
Film reference: Sora understands filmmaking terms and occasionally responds well to style references. "Shot on 35mm film, warm grain" or "in the style of a nature documentary" help set a visual register.
Detailed physical description: Sora generates realistic physics and lighting. Give it detailed scene information. "Morning light streaming through dusty warehouse windows, casting long golden shafts" will produce better light than "morning light."
One main action: Sora handles complex scenes but performs best when there's a clear primary action. Multiple simultaneous events in one clip often result in visual confusion.
Temporal direction: Tell it what changes over time. "A timelapse of clouds moving over a mountain range, transitioning from clear blue sky to dramatic storm clouds" gives the model clear temporal structure.
Common failure modes with Sora:
- Text in video clips is unreliable — letters morph and deform. Don't rely on readable text.
- Hands and fingers can be wrong. Avoid prompts that center hand close-ups.
- Long clips (>10 seconds) tend to have consistency issues with character appearance. Plan for shorter clips.
Runway Gen-3 prompting
Runway is strong on stylized and abstract content, and gives you more control through its interface (camera controls, motion brush, image-to-video). Prompting differences:
Be explicit about motion amount: Runway can be either too static or too chaotic. Adding motion intensity language helps: "subtle camera drift," "slow gentle movement," or "dynamic motion" signal the amount of movement you want.
Start-frame-to-video works well: If you have a specific image you want animated, Runway's image-to-video is more controllable than text-to-video alone. Describe only the motion you want the image to exhibit: "gentle wind moving through the grass," "camera slowly zooming in."
Artistic styles translate: Runway responds to art direction keywords well. "Impressionist painting style, oil paint texture," "vintage 1970s film stock, faded colors," "anime style, cel shaded."
Character consistency is limited in text-to-video: Don't expect the same person to look the same across multiple clips. Use image-to-video with a reference image for character consistency.
Kling prompting
Kling (from Kuaishou) has strong physics simulation and handles complex motions that other models struggle with — water, cloth, hair, fire. This makes it good for:
- Nature scenes (water, wind, natural movement)
- Fashion and fabric (clothing movement, flowing materials)
- Action sequences where physical realism matters
Kling prompting tips:
Chinese cultural content: Kling was trained on a broader set of Asian visual content and handles settings like traditional Chinese architecture, Asian street scenes, and related subject matter particularly well.
Emphasize physical properties: For Kling's physics strength, describe the physical properties you want. "Heavy rain falling in visible sheets," "loose silk fabric rippling in the breeze," "thick smoke slowly expanding."
Subject description matters more: Kling benefits from detailed subject descriptions — hair color, clothing description, posture. It uses this for more consistent subject rendering.
Prompt templates for common use cases
Product showcase:
Smooth 360-degree orbit shot around a [product name], [describe product],
rotating slowly on a clean white studio surface,
soft box lighting, crisp shadows, commercial product photography style, 4K
Nature / landscape:
Aerial drone shot gliding over [landscape type], [time of day],
[describe key visual elements — mountains, trees, water, etc.],
[weather/atmosphere], [color palette], nature documentary style
Character moment:
[Camera type and movement], [character description] [action],
[specific location], [lighting], [emotional tone]
Abstract / atmospheric:
[Camera movement], abstract [describe visual elements],
[color palette], [movement type — swirling, expanding, flowing],
[mood — ethereal, ominous, energetic], [style reference]
Urban / cityscape:
[Camera movement] through [city/neighborhood], [time of day],
[specific visual details — neon signs, crowded streets, empty alleys],
[weather], [film style]
Consistency across multiple clips
For projects requiring multiple clips with visual consistency:
Character consistency: Use image-to-video wherever possible. Generate a reference image with your preferred image generator and use it as the starting frame. This is the most reliable method across all platforms.
Color grading consistency: Use the same color and lighting language in every clip prompt. "Golden hour, warm tones, slight underexposure" as a consistent descriptor helps maintain visual cohesion.
Style anchoring: Pick 2-3 style descriptors and repeat them across all prompts in a project. The same "16mm film grain, cinematic, desaturated" at the end of every clip creates visual continuity.
Maintain a prompt log: Keep a document with every prompt that produced good output. When you need a new clip in the same visual style, start from that prompt and modify rather than writing from scratch.
What not to prompt
Across Sora, Runway, and Kling:
- Specific real people by name: Most platforms reject these or produce inconsistent results
- Rapid scene changes: Cuts happen between clips, not within them. One clip = one continuous shot
- Complex dialogue or voiceover sync: Lip sync to specific words doesn't work reliably
- Exact text legibility: Text will render approximately but not precisely
- Very long complex actions: A character performing a 30-second choreographed sequence. Keep actions simple and short
For creating assets for ads or marketing campaigns, the AI image prompts post covers complementary still image workflows. And if you're building image prompts that will become video start frames, the image prompts category in the prompt library has reusable templates.



