Wan 2.6

Alibaba Cloud · Video Generation
POST /v1/videos/generationsMultimodal video generation model for cinematic, multi-shot stories with native audio-visual sync (lip-sync, dialogue, music, SFX).
At a glance
Pricing
Example request
Parameters
Notes
Generation can take 5+ minutes. Modes: t2v, i2v (image-to-video), r2v (reference-to-video). Flash Mode (i2v/r2v only) — faster generation at reduced cost.
Image inputs
- Auto-resized: i2v 360-2000px, r2v 240-5000px
- HEIC/HEIF auto-converted
Reference videos
- MP4 or MOV
- 1-30s
- Max 100 MB
- r2v output capped at 10s
Optional audio (t2v / i2v only)
- 3-30s, max 15 MB, .mp3 or .wav
- Silently ignored in r2v (audio is extracted from the reference video instead)
Uploaded media preprocessing
- Reference and edit videos are normalized to provider-compatible MP4 when needed.
- Reference-video duration follows the mode limits shown above.
Machine-readable schema: GET https://api.empiriolabs.ai/v1/models/wan-2-6.
