WAN 2.5
Cinematic Video Generation with Native Synchronized Audio
Alibaba's WAN 2.5 generates video and sound together. Create 1080p clips up to 10 seconds from a text prompt or a single image, with speech, ambient sound, and effects synchronized to the visuals.
Try WAN 2.5 NowSee It In Action
Click to play with audio
Technical Capabilities
Input Type
Text-to-Video & Image-to-Video
Max Duration
10 seconds
Resolutions
480p, 720p, 1080p
Audio Generation
Native Synchronized Audio
Aspect Ratios
1:1, 16:9, 9:16
Provider
Alibaba
Key Features
Native synchronized audio generated together with the video
Speech, ambient sound, and effects matched to the visuals
Text-to-video and image-to-video in a single model
Sharp 1080p output for professional-looking clips
Clips up to 10 seconds long
Strong prompt adherence for camera moves and action
Square, landscape, and portrait aspect ratios
Advanced seed and negative prompt controls
How It Works
Choose Your Input
Start from a text prompt or upload an image to animate
Describe Scene & Audio
Write a prompt that includes dialogue, sounds, or music cues
Generate with Audio
Get a 1080p video with synchronized sound, ready to post


