From zero to cinematic AI filmmaker. The definitive guide.
ByteDance's most advanced multimodal AI video generation model — built for filmmakers, creators, and visual storytellers.
Seedance 2.0 is a next-generation video synthesis model that accepts images, videos, and audio as reference inputs via its unique @ reference system. It generates high-fidelity, physics-accurate video with native audio synchronization — making it the first model to truly unify multimodal inputs into coherent cinematic output.
Compared to Sora 2, Kling 3.0, and Veo 3.1 — Seedance 2.0 leads in multimodal referencing (12 inputs vs 1-3), physics simulation, audio synchronization, and maximum duration. See the full comparison table below.
Upload files and reference them with @ tags in your prompt — the key to precise, consistent, multi-source video generation.
Up to 9 image references
Up to 3 video references
Up to 3 audio references
The golden formula for high-quality Seedance 2.0 prompts.
Who or what is in the frame. Be specific about age, clothing, expression.
"A 25-year-old woman with long black hair, wearing a red leather jacket"
What is happening. Use precise verbs and speeds.
"sprinting at full speed" not "running fast"
Specify exact camera moves for cinematic control.
"slow dolly-in from medium shot to close-up" or "tracking shot at eye level"
Environment, lighting, time of day, weather.
"neon-lit Tokyo alley at night, rain-slicked streets, steam rising"
Visual style, mood, color grading, film references.
"Blade Runner 2049 color palette, anamorphic lens flares"
Quality locks and consistency rules.
"no face distortion, maintain clothing consistency, high detail, cinematic quality"
Battle-tested templates for every use case. Click to expand, then copy.
Break your video into timed segments for precise story control. Best for narrative sequences, trailers, and short films.
【Basic Settings】 Duration: 15 seconds Style: [Your visual style] 【Timeline】 0-5s (Setup): Visual + Camera + Details 5-10s (Conflict): Visual + Lighting + Action 10-15s (Climax): Visual + Environment + Ending 【Audio】 SFX + Dialogue + Music
【Basic Settings】 Duration: 15 seconds Style: Cinematic fantasy, Lord of the Rings aesthetic, epic scale 【Timeline】 0-5s (Setup): Wide crane shot descending over Minas Tirith at dawn. Thousands of orcs march toward the white city. Golden sunlight breaks through storm clouds. Camera slowly pushes forward. 5-10s (Conflict): Aragorn (@image1) raises his sword, cavalry charges. Quick cuts between clashing armies. Dust and debris fill the air. Dramatic lighting shifts from warm to cold blue. Tracking shot following the charge. 10-15s (Climax): Gandalf (@image2) appears on hilltop with blinding white light. Massive shockwave pushes back enemy forces. Camera pulls back to reveal full battlefield scope. Light overtakes darkness. 【Audio】 Epic orchestral score building to crescendo. Sword clashes, horse hooves, battle cries. Gandalf's staff impact creates thunderous boom synced to @audio1.
Storyboard-style control with a 3×3 panel grid. Perfect for action scenes, fight choreography, and dynamic sequences.
Step 1 — Environment: [Setting description] Step 2 — Character: [Character appearance based on @image refs] Step 3 — Nine Panels: Line 1: Panel 1 | Panel 2 | Panel 3 Line 2: Panel 4 | Panel 5 | Panel 6 Line 3: Panel 7 | Panel 8 | Panel 9 Step 4 — Style: [Visual style and constraints]
Step 1 — Environment: Ancient stone cathedral interior, shattered stained glass windows, volumetric god rays piercing through dust. Step 2 — Character: A holy paladin (@image1), golden plate armor, glowing blue eyes, wielding a massive luminous greatsword. Battle-worn, determined expression. Step 3 — Nine Panels: Line 1: Wide shot — paladin enters dark cathedral | Medium — paladin spots demon enemy | Close-up — paladin grips sword, eyes glow Line 2: Tracking shot — paladin charges forward | Impact — sword clash, sparks fly | Low angle — paladin pushed back, slides on stone Line 3: Close-up — paladin's hand glows with holy light | Wide — massive radiant explosion fills cathedral | Medium — paladin stands victorious, dust settling Step 4 — Style: Dark fantasy cinematic, Diablo/Dark Souls aesthetic. High contrast lighting, particle effects, motion blur on attacks. Maintain character appearance and armor consistency. Realistic physics — debris, fabric movement, impact reactions.
Clean, professional product showcase. Ideal for e-commerce listings, social media ads, and brand content.
A minimalist black matte mechanical keyboard on a pure white infinite studio background, rotating smoothly 360 degrees clockwise. RGB lighting gently breathing. Keycap text sharp and readable. Fixed macro camera, smooth turntable motion, commercial product photography style, soft high-key lighting, no noise. Logo and text remain perfectly consistent.
Soft, atmospheric anime style with gentle camera movement. Perfect for lo-fi visuals, AMVs, and character-driven moments.
An 18-year-old Japanese anime girl with short hair, wearing a white dress and straw hat, standing on a forest path in warm summer afternoon sunlight. She slowly turns toward the camera and smiles gently. A light breeze moves her hair and dress. The camera slowly pushes in from medium shot to close-up. Soft natural lighting, film grain, healing and peaceful mood, cinematic quality. Maintain face and clothing consistency, no distortion, high detail.
Audio-reactive generation synced to uploaded music. Uses @audio to drive editing rhythm and visual intensity.
A trendy cyberpunk girl dancing in a neon city street at night. Every strong beat triggers a cut or speed-ramped camera move. Neon signs reflecting on wet ground. Cyberpunk style, fast-paced editing, multi-shot continuity. Dance movements and character appearance remain consistent. Beat-synced to @audio1.
Dynamic martial arts action with environmental interaction. Shows Seedance 2.0's physics and character consistency strengths.
A wuxia-style male hero (based on @image1), wearing black martial outfit, fighting enemies in a rainy bamboo forest at night. Fast sword combos with visible sword light trails and splashing water. Fast follow camera, crane shots, and quick close-ups. Cinematic camera language. Maintain character appearance and clothing consistency. Realistic physics, wet fabric, rain interaction.
Expert-level advice to get the most out of every generation.
How Seedance 2.0 stacks up against the competition.
| Feature | Seedance 2.0 | Sora 2 | Kling 3.0 | Veo 3.1 |
|---|---|---|---|---|
| Max Duration | 15s | 10s | 10s | 8s |
| Resolution | 1080p / 2K | 1080p | 1080p | 1080p |
| File Inputs | 12 (9 img + 3 vid + 3 audio) | 1 image | 1-3 images | 1 image |
| Audio Sync | Native (beat-synced) | None | Basic | Native |
| Physics | Advanced (fluid, cloth, debris) | Good | Good | Good |
| Multi-shot | Timeline & 9-panel | Storyboard | Basic | None |
| Character Consistency | Excellent (multi-ref) | Good | Good | Moderate |
| Best For | Cinematic films, music videos, multi-ref storytelling | Creative exploration | Quick clips, social content | Realistic scenes |
Everything you need to start creating with Seedance 2.0.
ByteDance's official documentation, model details, and API access.
seed.bytedance.com →Access Seedance 2.0 through CapCut's creative platform for video editing and generation.
dreamina.capcut.com →Community-curated prompt guide with tips, examples, and best practices.
atlabs.ai →Detailed side-by-side comparison of Seedance 2.0 vs Kling 3.0, Sora 2, and Veo 3.1.
wavespeed.ai →