How to Use Google Gemini Workflow (Nano Banana → Veo 3) to Create AI Images & Videos

If you work in content, product marketing, or social, you’ve probably noticed a pattern: the best teams move from concept to asset in hours, not weeks. The easiest way to do that right now is a simple two-step Gemini workflow—generate crisp stills with Gemini 2.5 Flash Image (often nicknamed “Nano Banana”), then bring them to life with Veo 3 / Veo 3.1. You’ll create studio-quality hero images and animate them into short, cinematic clips without touching an SDK or opening a timeline editor. This post breaks the process down, shares paste-ready prompts, and flags the small decisions that make a big visual difference.

Why this flow works

Nano Banana is a fast, high-quality AI Image Generator designed for natural-language prompts and conversational edits. You describe a scene; it returns a clean frame you can treat as a hero shot or a design base. Veo 3 is an AI Video Generator that accepts either a text prompt or a still image and outputs short, high-fidelity videos in the classic social and widescreen aspect ratios. Paired together, they form a dependable Image to Video AI loop: write → generate → animate. It’s simple enough for daily content and precise enough for product work, spec ads, teasers, and B-roll.

What you’ll need (five minutes, tops)

A Google account with access to the Gemini Playground/AI Studio interface for image generation.
Access to Veo 3 or Veo 3.1 for video generation.
A rough idea of your final format: 16:9 for YouTube/landing pages or 9:16 for Reels/Shorts.

Step 1: Generate images from text with Nano Banana

Open the image workspace and switch to the Nano Banana model.
You’ll be writing plain English (or your language of choice) prompts. No special syntax required.
Write your first prompt like a director, not a poet.
Spell out the subject, lighting, lens/angle, background, and mood. For example:
“Studio photo of a brushed-aluminum water bottle on slate; soft rim light; 85 mm look; shallow depth-of-field; minimal background.”
That one sentence covers product, surface, lighting, lens feel, depth, and scene tone.
Iterate conversationally.
Ask for variants: “warmer light,” “top-down angle,” “square crop,” “more micro-scratches on metal,” “rotate to 30° three-quarter view.” You can also request quick edits like background color shifts, dust removal, or composition tweaks.
Pick 1–3 keepers and download them.
Favor clean silhouettes, controlled highlights, and compositions that leave room for copy or UI. If you’re planning vertical video, you can generate a portrait-friendly still now (9:16 framing) or crop later in Veo.

Pro image-prompt tips

Combine one motion adjective (e.g., “still, poised, calm”) with one lighting note (“rim-lit,” “golden hour,” “softbox”) and one camera clue (“85 mm look,” “top-down”).
Avoid crowded prompts. Five strong details beat fifteen weak ones.
If you need multiple shots with the same subject (e.g., a product in different settings), generate a small set now—the consistency helps in Step 2.

Step 2: Animate those images with Veo 3 / Veo 3.1

Switch to the video workspace and select Veo.
You can generate from text only, but here we’ll use your Nano Banana still(s) for image-to-video.
Attach your image and write a concise directing note.
Think motion + camera + mood. Two paste-ready examples:

Product hero (16:9, ~8 s)
“Slow dolly-in across the bottle; gentle studio reflections; subtle parallax; calm, minimal grade.”
Vertical character vibe (9:16, ~6–8 s)
“Head-to-waist framing; soft breathing and natural hair sway; neon background lights pulse; subtle handheld micro-shake.”

Set the key parameters before you hit Generate.

Aspect ratio: choose 16:9 for landscape or 9:16 for vertical.
Resolution: pick 720p for fast drafts or 1080p when you need polish.
Clip length: short clips (around 4–8 s) feel most “expensive” per second; they’re perfect for teasers and social intros.
Draft count: if the UI offers multiple results, start with two. It’s faster than over-generating.

Review and download your best take.
Look for believable micro-motion (reflections, breathing, camera sway) and avoid over-animated scenes that distract from your subject.

Power user moves (Veo 3.1)

Reference images (up to three): keep a product or character consistent across outputs by attaching multiple stills that show the same subject.
First/last-frame control: define the opening and closing frames to control blocking and motion path; Veo fills in the in-between.
Extend a take: when you love a shot but need more time, extend the clip rather than re-rolling from scratch.

Copy-ready recipes (steal these)

Nano Banana → Veo 3 Product Shot

Image prompt: “Studio photo of a matte-black tumbler on dark slate, soft rim light, 85 mm lens, f/2.8, minimal background.”
Veo prompt: “Eight-second slow dolly-in; glossy reflections; faint atmospheric haze; clean, cinematic grade.”

Vertical Social Clip

Image prompt: “Streetwear portrait; rainy pavement reflections; neon signage; three-quarter angle; creamy bokeh.”
Veo prompt: “9:16 portrait; gentle camera sway; light rain particles; neon flicker; shallow depth-of-field.”

Character Consistency (Veo 3.1)

Image set: three Nano Banana stills of the same character (full body, close-up, neutral pose).
Veo prompt: “Dusk skateboard glide; wet pavement; soft sodium streetlights; track from wide to medium; natural motion, not exaggerated.”

Troubleshooting the little things that matter

Your image crops strangely on generation.
The safest inputs for image-to-video are frames that already match your target aspect ratio (16:9 or 9:16) and are at least 720p on the shortest side. If you feed a square into a vertical output, the system may crop center-weighted.
Your scene feels ‘floaty’ or too synthetic.
Reduce the number of competing motions. Keep one primary action (e.g., slow push-in) and one ambient effect (e.g., gentle fog). Over-specifying five motions at once can introduce jitter.
Reflections look off on glossy products.
In your prompt, name the surface (“matte,” “satin,” “polished”) and the light source (“softbox,” “rim light,” “overhead practicals”). Subtle lighting cues help the model infer the correct reflection behavior.
Faces drift between shots.
Use reference images in Veo 3.1. Pick clear, well-lit frames and keep wardrobe/hair consistent.
You need an exact landing pose or layout.
Use first/last-frame control to lock your starting and ending compositions. It’s the quickest way to hit precise blocking without over-tweaking prompts.

Best-practice checklist (print this)

Decide format early. Commit to 16:9 (site/YouTube) or 9:16 (social) before generating, so your stills and motion align.
Write compact prompts. One motion verb + one camera move + one mood.
Keep a style bible. As you find looks that work—lighting terms, lens language, color notes—save them as reusable prompt fragments.
Stay responsible. Gemini’s image outputs include an invisible watermark; label AI-generated assets clearly in your pipeline and review usage policies when depicting people or brands.
Ship in passes. Draft at 720p, lock the concept, then rerun at 1080p for the final.

Where this shines

Product teasers: spin a single hero still into a looping 6–8-second intro for your landing page or YouTube bumper.
UGC-style ads: vertical portraits with gentle camera motion feel native in feeds and convert well with minimal overlays.
Brand storytelling: stitch a few short Veo clips—kept consistent with reference images—into a micro-narrative for socials.
Pitch decks & sizzles: generate tailored visuals for mockups or animatics without waiting on a design sprint.

Wrap-up

That’s the full loop: AI Image Generator (Nano Banana) → Image to Video AI (Veo 3). Keep your prompts simple, choose the right aspect ratio up front, and lean on reference images or first/last-frame control when continuity really matters. With a dozen reusable prompt fragments and this two-step flow, you’ll go from a blank page to a polished video in the same session—no plugins, no timeline, no drama.