Wan 2.6 | EmpirioLabs AI Docs

Alibaba Cloud · Video Generation

POST /v1/videos/generations

Multimodal video generation model for cinematic, multi-shot stories with native audio-visual sync (lip-sync, dialogue, music, SFX).

At a glance

Field	Value
Model id	`wan-2-6`
Model release date	2026-01-12
Input modalities	Text, Image, Video, Audio
Output modalities	Video
Context window	-
Weight precision	-
Region	Singapore
Features	audio_sync, character_consistency, multi_shot
Native inference	No
New	No
Supported endpoints	`POST /v1/videos/generations`
Alternate model ids	`alibaba/wan-2.6`, `wan-2.6`

Pricing

Charge	Spec	Rate
Standard 720P	per second	$0.09 (was $0.10)
Standard 1080P	per second	$0.138 (was $0.15)
Flash 720P (audio)	per second	$0.045 (was $0.050)
Flash 720P (no audio)	per second	$0.0225 (was $0.0250)
Flash 1080P (audio)	per second	$0.069 (was $0.0750)
Flash 1080P (no audio)	per second	$0.0345 (was $0.03750)

Example request

$ curl https://api.empiriolabs.ai/v1/videos/generations \
>   -H 'Authorization: Bearer $EMPIRIOLABS_API_KEY' \
>   -H 'Content-Type: application/json' \
>   -d '{"model": "wan-2-6", "prompt": "sunrise over the ocean", "duration": 6}'

Parameters

Parameter	Type	Required	Default	Description
`prompt`	string	yes	-	Scene description.
`mode`	enum	no	`"auto"`	t2v: text-to-video. i2v: animate the attached image. r2v: reference-to-video — generate from a reference image. · Allowed: `auto`, `t2v`, `i2v`, `r2v`
`resolution`	enum	no	`"1080p"`	Output resolution. Larger = higher fidelity but slower / more expensive. · Allowed: `720p`, `1080p`
`duration`	number	no	`5`	Clip length in seconds. · Range: 5 – 15
`aspect_ratio`	enum	no	`"16:9"`	Output aspect ratio (1:1, 16:9, 9:16, 4:3, 3:2, etc.). · Allowed: `16:9`, `9:16`, `1:1`, `4:3`, `3:4`
`shot_type`	enum	no	`"multi"`	single: one continuous shot. multi: multi-shot narrative. · Allowed: `single`, `multi`
`image`	string	no	-	Reference image URL. Required for i2v / r2v.
`negative_prompt`	string	no	`""`	What to avoid.
`seed`	number	no	-	Reproducibility seed.
`audio`	boolean	no	true	Generate native audio with the video.
`flash_mode`	boolean	no	false	Faster generation at reduced cost. Applies to i2v and r2v only.
`prompt_extend`	boolean	no	true	Automatically expand and refine the prompt for better results.

Notes

Generation can take 5+ minutes. Modes: t2v, i2v (image-to-video), r2v (reference-to-video). Flash Mode (i2v/r2v only) — faster generation at reduced cost.

Image inputs

Auto-resized: i2v 360-2000px, r2v 240-5000px
HEIC/HEIF auto-converted

Reference videos

MP4 or MOV
1-30s
Max 100 MB
r2v output capped at 10s

Optional audio (t2v / i2v only)

3-30s, max 15 MB, .mp3 or .wav
Silently ignored in r2v (audio is extracted from the reference video instead)

Uploaded media preprocessing

Reference and edit videos are normalized to provider-compatible MP4 when needed.
Reference-video duration follows the mode limits shown above.

Machine-readable schema: GET https://api.empiriolabs.ai/v1/models/wan-2-6.