Wan 2.6

Wan 2.6
Alibaba Cloud · Video Generation
POST /v1/videos/generations

Multimodal video generation model for cinematic, multi-shot stories with native audio-visual sync (lip-sync, dialogue, music, SFX).

At a glance

FieldValue
Model idwan-2-6
Input modalitiesText, Image, Video, Audio
Output modalitiesVideo
Context window-
Weight precision-
RegionSingapore
Featuresaudio_sync, character_consistency, multi_shot
Native inferenceNo
NewNo
Supported endpointsPOST /v1/videos/generations

Pricing

ChargeSpecRate
Standard 720Pper second$0.09 (was $0.10)
Standard 1080Pper second$0.138 (was $0.15)
Flash 720P (audio)per second$0.045 (was $0.050)
Flash 720P (no audio)per second$0.0225 (was $0.0250)
Flash 1080P (audio)per second$0.069 (was $0.0750)
Flash 1080P (no audio)per second$0.0345 (was $0.03750)

Example request

$curl https://api.empiriolabs.ai/v1/videos/generations \
> -H 'Authorization: Bearer $EMPIRIOLABS_API_KEY' \
> -H 'Content-Type: application/json' \
> -d '{"model": "wan-2-6", "prompt": "sunrise over the ocean", "duration": 6}'

Parameters

ParameterTypeRequiredDefaultDescription
promptstringyes-Scene description.
modeenumno"auto"t2v: text-to-video. i2v: animate the attached image. r2v: reference-to-video — generate from a reference image. · Allowed: auto, t2v, i2v, r2v
resolutionenumno"1080p"Output resolution. Larger = higher fidelity but slower / more expensive. · Allowed: 720p, 1080p
durationnumberno5Clip length in seconds. · Range: 5 – 15
aspect_ratioenumno"16:9"Output aspect ratio (1:1, 16:9, 9:16, 4:3, 3:2, etc.). · Allowed: 16:9, 9:16, 1:1, 4:3, 3:4
shot_typeenumno"multi"single: one continuous shot. multi: multi-shot narrative. · Allowed: single, multi
imagestringno-Reference image URL. Required for i2v / r2v.
negative_promptstringno""What to avoid.
seednumberno-Reproducibility seed.
audiobooleannotrueGenerate native audio with the video.
flash_modebooleannofalseFaster generation at reduced cost. Applies to i2v and r2v only.
prompt_extendbooleannotrueLet DashScope rewrite the prompt for better results.

Notes

Generation can take 5+ minutes. Modes: t2v, i2v (image-to-video), r2v (reference-to-video). Flash Mode (i2v/r2v only) — faster generation at reduced cost.

Image inputs

  • Auto-resized: i2v 360-2000px, r2v 240-5000px
  • HEIC/HEIF auto-converted

Reference videos

  • MP4 or MOV
  • 1-30s
  • Max 100 MB
  • r2v output capped at 10s

Optional audio (t2v / i2v only)

  • 3-30s, max 15 MB, .mp3 or .wav
  • Silently ignored in r2v (audio is extracted from the reference video instead)

Uploaded media preprocessing

  • Reference and edit videos are normalized to provider-compatible MP4 when needed.
  • Reference-video duration follows the mode limits shown above.

Machine-readable schema: GET https://api.empiriolabs.ai/v1/models/wan-2-6.