Wan 2.7

Wan 2.7
Alibaba Cloud · Video Generation
POST /v1/videos/generations

Multimodal video model supporting T2V, I2V, video editing, and reference-to-video, with high-fidelity output from text, image, or video inputs.

At a glance

FieldValue
Model idwan-2-7
Input modalitiesText, Image, Video, Audio
Output modalitiesVideo
Context window-
Weight precision-
RegionSingapore
Featuresaudio_sync, character_consistency, multi_shot
Native inferenceNo
NewNo
Supported endpointsPOST /v1/videos/generations

Pricing

ChargeSpecRate
All Modes 720Pper second$0.10
All Modes 1080Pper second$0.150

Example request

$curl https://api.empiriolabs.ai/v1/videos/generations \
> -H 'Authorization: Bearer $EMPIRIOLABS_API_KEY' \
> -H 'Content-Type: application/json' \
> -d '{"model": "wan-2-7", "prompt": "sunrise over the ocean", "duration": 6}'

Parameters

ParameterTypeRequiredDefaultDescription
promptstringyes-Scene description.
modeenumno"auto"t2v: text-to-video. i2v: animate the attached image. videoedit: edit the attached video. r2v: reference-to-video. · Allowed: auto, t2v, i2v, videoedit, r2v
resolutionenumno"1080p"Output resolution. Larger = higher fidelity but slower / more expensive. · Allowed: 720p, 1080p
durationnumberno5Clip length in seconds. · Range: 2 – 15
aspect_ratioenumno-Optional. If omitted, the model picks based on input. · Allowed: 16:9, 9:16, 1:1, 4:3, 3:4
imagestringno-Reference image URL. Required for i2v / r2v / videoedit.
videostringno-Reference video URL. Required for videoedit / r2v.
negative_promptstringno""What to avoid.
seednumberno-Reproducibility seed.
audio_settingenumno"auto"auto: generate native audio. origin: keep audio from the reference video (videoedit/r2v only). · Allowed: auto, origin
prompt_extendbooleannotrueLet DashScope rewrite the prompt for better results.

Notes

Generation can take 30+ minutes. Modes auto-detected from attachments — or override with the mode parameter.

Modes

  • T2V: no attachments
  • I2V (First Frame): 1 image
  • I2V (First + Last): exactly 2 images
  • I2V Continuation: 1 video (2-10s) + optional last-frame image
  • Video Edit: 1 video (2-10s, ≤100 MB, MP4/MOV) + up to 3 reference images
  • R2V: up to 5 references combined; reference subjects in your prompt with Video1, Image1, etc.

Optional audio

  • T2V/I2V: 2-30s
  • R2V: 1-10s (used as a voice timbre sample)
  • Max 15 MB, .mp3 or .wav

Billing

  • Video Edit and R2V are billed for input + output duration combined.

Uploaded media preprocessing

  • Reference and edit videos are normalized to provider-compatible MP4 when needed.
  • Reference-video duration follows the mode limits shown above.

Machine-readable schema: GET https://api.empiriolabs.ai/v1/models/wan-2-7.