All Models
Video Model

WAN 2.5

Cinematic Video Generation with Native Synchronized Audio

Alibaba's WAN 2.5 generates video and sound together. Create 1080p clips up to 10 seconds from a text prompt or a single image, with speech, ambient sound, and effects synchronized to the visuals.

Try WAN 2.5 Now

See It In Action

Click to play with audio

Technical Capabilities

Input Type

Text-to-Video & Image-to-Video

Max Duration

10 seconds

Resolutions

480p, 720p, 1080p

Audio Generation

Native Synchronized Audio

Aspect Ratios

1:1, 16:9, 9:16

Provider

Alibaba

Key Features

Native synchronized audio generated together with the video

Speech, ambient sound, and effects matched to the visuals

Text-to-video and image-to-video in a single model

Sharp 1080p output for professional-looking clips

Clips up to 10 seconds long

Strong prompt adherence for camera moves and action

Square, landscape, and portrait aspect ratios

Advanced seed and negative prompt controls

How It Works

1

Choose Your Input

Start from a text prompt or upload an image to animate

2

Describe Scene & Audio

Write a prompt that includes dialogue, sounds, or music cues

3

Generate with Audio

Get a 1080p video with synchronized sound, ready to post

Perfect For

Talking Head VideosAI Influencer ContentSocial Media ReelsProduct DemosStory NarrationDialogue ScenesMarketing VideosCharacter Animation

Frequently Asked Questions

Ready to Create with WAN 2.5?

Get 5 free credits to start generating

Start Creating