Wan 2.7

Alibaba Cloud · Video Generation
POST /v1/videos/generationsMultimodal video model supporting T2V, I2V, video editing, and reference-to-video, with high-fidelity output from text, image, or video inputs.
At a glance
Pricing
Example request
Parameters
Notes
Generation can take 30+ minutes. Modes auto-detected from attachments — or override with the mode parameter.
Modes
- T2V: no attachments
- I2V (First Frame): 1 image
- I2V (First + Last): exactly 2 images
- I2V Continuation: 1 video (2-10s) + optional last-frame image
- Video Edit: 1 video (2-10s, ≤100 MB, MP4/MOV) + up to 3 reference images
- R2V: up to 5 references combined; reference subjects in your prompt with
Video1,Image1, etc.
Optional audio
- T2V/I2V: 2-30s
- R2V: 1-10s (used as a voice timbre sample)
- Max 15 MB, .mp3 or .wav
Billing
- Video Edit and R2V are billed for input + output duration combined.
Uploaded media preprocessing
- Reference and edit videos are normalized to provider-compatible MP4 when needed.
- Reference-video duration follows the mode limits shown above.
Machine-readable schema: GET https://api.empiriolabs.ai/v1/models/wan-2-7.
