{Json}Prompt it

The Definitive Guide to Google Veo 3 Prompting

From First Idea to Cinematic Masterpiece

The age of AI-driven cinema is here, and Google's Veo 3 is at the forefront. This groundbreaking model can generate breathtakingly realistic video, complete with synchronized audio, complex physics, and nuanced character emotions. But to wield this power is to understand its language. A simple text prompt can produce a miracle, but a masterfully crafted prompt can produce a masterpiece.

This is the definitive guide to mastering Veo 3. We will take you from the fundamental principles of a good prompt to the director-level control offered by structured JSON, covering every cinematic technique and advanced trick along the way. Whether you are creating a quick social media clip or a polished short film, this guide will give you the tools to bring your exact vision to life.

1
The Philosophy - Thinking Like a Director, Not a Dreamer

The biggest mistake users make is treating Veo 3 like a vague dream interpreter. The key to unlocking its potential is to shift your mindset from that of a dreamer to that of a film director.

Dreamer

"A cool video of a car." (Vague, passive, relies on luck)

Director

"A low-angle tracking shot of a matte black muscle car drifting around a rain-slicked corner in a neon-lit Tokyo alley at night." (Specific, active, controls the outcome)

Every prompt you write is a directorial command. Your job is to provide a clear, unambiguous blueprint for the AI to execute. This guide will teach you how.

2
Prompting Fundamentals - The 4 Pillars of Every Shot

Every great Veo 3 prompt, from the simplest text to the most complex JSON, is built upon four essential pillars. Mastering these is non-negotiable.

1

Subject (The "Who" or "What")

This is the focal point of your scene. Be specific. "A person" is weak. "A grizzled old fisherman with a weathered face and a thick wool sweater" is strong.

2

Action (The "Doing What")

This is the verb of your scene. Actions should be clear and singular. "Walking" is okay. "Limping heavily through a knee-deep snowdrift, shielding his eyes from the wind" is powerful.

3

Context (The "Where" and "When")

This is the environment that gives your subject meaning. It includes location, time of day, and weather. "In a forest" is weak. "In a dense, fog-shrouded redwood forest at dawn, with rays of light piercing the canopy" is strong.

4

Style (The "How It Looks and Feels")

This is the aesthetic wrapper. It tells Veo the genre, mood, and visual language to use. "Video" is weak. "Cinematic, film noir style, shot on 35mm film, mysterious and tense tone" is strong.

3
Choosing Your Tool - Simple Text vs. Structured JSON

Veo 3 offers two prompting methods. Your choice depends entirely on your goal and the level of control you need.

FeatureSimple Text PromptStructured JSON Prompt
When to Use ItQuick experiments, brainstorming, simple concepts.Precise control, consistent branding, complex cinematic sequences.
WorkflowFast and iterative for ideation.Methodical and precise for production.
Control LevelLow to Medium. The AI fills in many creative gaps.High to Absolute. You are the director, cinematographer, and editor.
Best ForFinding unexpected creative directions.Product ads, short films, branded content, reproducible results.

The Director's Rule of Thumb: If your prompt requires describing more than five distinct cinematic elements (e.g., camera movement, lens, lighting, a specific action, and sound design), it's time to move to JSON.

4
The Language of Cinema - A Glossary for Veo 3

To command Veo 3 like a pro, you must speak its language. These are the core cinematic terms it understands deeply.

Mastering Camera Shots & Composition

  • extreme wide shot: Shows a vast landscape; the subject is tiny. Perfect for establishing scale and location.
  • wide shot / long shot: Shows the subject from head to toe, including significant portions of the environment.
  • medium shot: Frames the subject from the waist up. The standard for dialogue and interaction.
  • close-up: Fills the screen with the subject's face. Used to convey emotion and intimacy.
  • extreme close-up: Isolates a single detail—an eye, a drop of water, the ticking hand of a watch. Creates intense focus.

Mastering Camera Movement

  • static shot / fixed shot: The camera is locked in place. Creates a sense of stability or entrapment.
  • dolly in / out: The entire camera moves closer to (dolly in) or further from (dolly out) the subject. Far more immersive than a simple zoom.
  • tracking shot: The camera moves horizontally, parallel to a moving subject.
  • pan left / right: The camera rotates on a fixed axis. Used to follow action or reveal information.
  • crane shot / drone shot: The camera moves vertically, lifting up or swooping down to reveal the scene.

Mastering Ambiance: Lighting & Color

Lighting is the single most powerful tool for creating mood.

  • soft diffused light: Creates minimal shadows. Ideal for beauty, product shots, and friendly scenes.
  • hard light: Creates sharp, well-defined shadows. Used for drama, tension, and Film Noir.
  • golden hour: The warm, magical light just after sunrise or before sunset.
  • blue hour: The cool, serene light just before sunrise or after sunset.
  • chiaroscuro: Extreme contrast between light and dark. Highly dramatic and artistic.
  • volumetric lighting: Makes light beams visible, as if passing through fog or dust. Adds depth and atmosphere.

5
The JSON Deep Dive - The Director's Control Panel

JSON (JavaScript Object Notation) is the ultimate tool for precise control. It turns your prompt into an unambiguous set of commands. Below is the master template, followed by a detailed breakdown of each section.

The Master JSON Template

{
  "model": "veo-3.0-fast",
  "duration": 15,
  "aspect_ratio": "16:9",
  
  "shot": {
    "composition": "medium shot",
    "camera_motion": "slow dolly in",
    "lens": "35mm",
    "frame_rate": "24fps",
    "focus": "shallow depth of field, focus on subject's eyes"
  },
  
  "subject": {
    "primary": "Detailed description of the main subject.",
    "action": "The specific action they are performing.",
    "emotion": "The emotion the subject is conveying.",
    "physics": "realistic gravity, slow motion"
  },
  
  "scene": {
    "location": "The place where the scene occurs.",
    "time_of_day": "dusk",
    "weather": "heavy rain",
    "environment": "Atmospheric details like 'mist rolling through the valley'."
  },
  
  "cinematography": {
    "style": "cinematic fantasy, documentary, anime, film noir",
    "tone": "mysterious, joyful, tense, melancholic",
    "lighting": "soft diffused natural light, golden hour, neon"
  },

  "visual_details": {
    "effects": ["lens flare", "volumetric light", "sparks"],
    "color_palette": "A description like 'a muted palette of cool blues and greys'."
  },
  
  "audio": {
    "soundtrack": "Type of music (e.g., 'intense electronic score', 'elegant ambient music').",
    "ambient": ["Background sounds (e.g., 'rain hitting a window', 'distant city traffic')."],
    "sfx": ["Specific sound effects (e.g., 'tire screech', 'crystal chime')."]
  },

  "dialogue": {
    "script": "The line of dialogue to be spoken.",
    "voice": "Description of the voice (e.g., 'broken whisper', 'booming narration')."
  },
  
  "visual_rules": {
    "prohibited_elements": ["text", "modern buildings", "people"]
  }
}

Field-by-Field Breakdown

shot: Your camera instructions.

  • composition: The shot type (e.g., "close-up").
  • camera_motion: The movement (e.g., "tracking shot").
  • lens: Simulates different camera lenses. "35mm" or "50mm" are standard. "100mm macro" for extreme close-ups. "85mm" for portraits.
  • frame_rate: "24fps" for a classic cinematic look. "60fps" for smooth action or sports. "120fps" for dramatic slow-motion.

subject: The star of your scene.

  • primary: Be incredibly descriptive here. Include clothing, age, appearance.
  • action: Use strong verbs.
  • physics: Use "realistic gravity" for most things. Use "slow motion" or "zero gravity" for specific effects.

scene: The world your subject inhabits.

  • location, time_of_day, weather: Self-explanatory but crucial for setting the scene.
  • environment: Add the extra details that bring a scene to life.

audio & dialogue: Bringing your scene to life with sound.

  • Think in layers: a soundtrack for music, ambient for background noise, and sfx for specific actions.
  • For dialogue, specify both the script and the voice delivery.

visual_rules: Your negative prompt.

  • prohibited_elements: The most powerful tool for cleaning up your shot. If Veo keeps adding unwanted text, logos, or modern cars to your historical epic, list them here.

6
Practical Examples - From Adverts to Action

Here are detailed JSON prompts for different genres, serving as powerful templates you can adapt.

Example 1: The High-End Product Ad

{
  "duration": 10,
  "aspect_ratio": "16:9",
  "shot": {
    "composition": "extreme close-up",
    "camera_motion": "slow, elegant arc motion around the subject",
    "lens": "100mm macro",
    "focus": "razor-thin depth of field, focus locked on the watch face"
  },
  "subject": {
    "primary": "A luxury Swiss watch with a dark leather strap and a gleaming silver case. The second hand sweeps smoothly.",
    "action": "The watch is partially submerged in water, with tiny air bubbles rising from the crown in ultra slow motion."
  },
  "scene": {
    "location": "A dark, minimalist studio setting with a black, reflective surface."
  },
  "cinematography": {
    "lighting": "single, soft overhead key light creating elegant reflections",
    "style": "premium, high-end commercial",
    "tone": "sophisticated, luxurious, precise"
  },
  "audio": {
    "soundtrack": "a sparse, minimalist piano melody",
    "sfx": ["subtle, deep tick-tock sound, muffled by water", "a gentle fizz of bubbles"]
  },
  "visual_rules": { "prohibited_elements": ["people", "text", "hands"] }
}

Analysis: This prompt uses macro lensing, slow motion, and specific lighting to create a feeling of luxury and precision, perfect for an advertisement.

Example 2: The Gritty Action Scene

{
  "duration": 8,
  "aspect_ratio": "16:9",
  "shot": {
    "composition": "low angle tracking shot, from behind the protagonist's shoulder",
    "camera_motion": "shaky, chaotic handheld camera, struggling to keep up",
    "frame_rate": "60fps"
  },
  "subject": {
    "primary": "A female detective in a rain-soaked trench coat.",
    "action": "Sprinting down a narrow, crowded alley, dodging obstacles and pushing past people."
  },
  "scene": {
    "location": "A dystopian city at night, inspired by Blade Runner.",
    "weather": "torrential rain",
    "environment": "Steam rises from vents, neon signs reflect in puddles."
  },
  "cinematography": {
    "lighting": "high-contrast, Blade Runner-style neon and shadow",
    "style": "gritty, cinematic cyberpunk thriller",
    "tone": "urgent, tense, chaotic"
  },
  "audio": {
    "soundtrack": "a driving, pulsating synthwave score",
    "ambient": ["shouting in different languages", "heavy rain", "the hum of neon"],
    "sfx": ["splashing footsteps", "labored breathing of the protagonist"]
  }
}

Analysis: The combination of a low angle, shaky handheld camera, 60fps, and a chaotic audio mix creates a visceral sense of urgency and immersion.

7
Advanced Techniques & Troubleshooting

The Art of Iteration

Never expect the first generation to be perfect. Your workflow should be:

  1. Generate a base shot with the 4 pillars.
  2. Identify what's wrong. Is the lighting too flat? Is the camera movement boring?
  3. Refine ONE element at a time. Tweak only the lighting section and regenerate. Then tweak camera_motion. This isolates variables and teaches you how Veo responds to specific commands.

Controlling Object Count

Veo 3 can handle up to ~15 similar objects with good consistency. If you need a precise number, state it clearly. Example: "A wide shot of exactly six lanterns floating on a misty lake." For large crowds, it's better to describe the density and feel, rather than an exact number.

Common Pitfalls & How to Fix Them

ProblemWhy It HappensSolution
Video looks flat or boring.Lack of specific lighting or camera commands.Add a detailed lighting description (e.g., "golden hour") and a dynamic camera_motion.
Unwanted text/logos appear.The AI associates the style with branding.Use the visual_rules object: "prohibited_elements": ["text", "logos", "watermarks"].
Action feels generic.The action verb is too simple.Use stronger, more descriptive verbs and adverbs. Instead of "running," try "sprinting frantically."
AI ignores part of the prompt.The prompt is too long or conflicting.Shorten and focus the prompt on a single core idea per shot. Ensure commands don't contradict each other.

Conclusion: You Are the Director

Google Veo 3 is not just a tool; it's a full production studio waiting for a director. The gap between a mediocre AI video and a cinematic masterpiece lies entirely in the quality and intentionality of your prompt.

By moving from vague ideas to specific commands, by learning the language of cinema, and by harnessing the granular control of structured JSON, you can eliminate chance and craft videos that precisely match the vision in your mind. Start with the four pillars, build your scenes with cinematic language, and use the JSON master template to execute with flawless precision.