Wan 2.7 | EmpirioLabs AI Docs

Alibaba Cloud · Video Generation

POST /v1/videos/generations

Multimodal video model supporting T2V, I2V, video editing, and reference-to-video, with high-fidelity output from text, image, or video inputs.

At a glance

Field	Value
Model id	`wan-2-7`
Model release date	2026-04-26
Input modalities	Text, Image, Video, Audio
Output modalities	Video
Context window	-
Weight precision	-
Region	Singapore
Features	audio_sync, character_consistency, multi_shot
Native inference	No
New	No
Supported endpoints	`POST /v1/videos/generations`
Alternate model ids	`alibaba/wan-2.7`, `wan-2.7`

Pricing

Charge	Spec	Rate
All Modes 720P	per second	$0.10
All Modes 1080P	per second	$0.150

Example request

$ curl https://api.empiriolabs.ai/v1/videos/generations \
>   -H 'Authorization: Bearer $EMPIRIOLABS_API_KEY' \
>   -H 'Content-Type: application/json' \
>   -d '{"model": "wan-2-7", "prompt": "sunrise over the ocean", "duration": 6}'

Parameters

Parameter	Type	Required	Default	Description
`prompt`	string	yes	-	Scene description.
`mode`	enum	no	`"auto"`	t2v: text-to-video. i2v: animate the attached image. videoedit: edit the attached video. r2v: reference-to-video. · Allowed: `auto`, `t2v`, `i2v`, `videoedit`, `r2v`
`resolution`	enum	no	`"1080p"`	Output resolution. Larger = higher fidelity but slower / more expensive. · Allowed: `720p`, `1080p`
`duration`	number	no	`5`	Clip length in seconds. · Range: 2 – 15
`aspect_ratio`	enum	no	-	Optional. If omitted, the model picks based on input. · Allowed: `16:9`, `9:16`, `1:1`, `4:3`, `3:4`
`image`	string	no	-	Reference image URL. Required for i2v / r2v / videoedit.
`video`	string	no	-	Reference video URL. Required for videoedit / r2v.
`negative_prompt`	string	no	`""`	What to avoid.
`seed`	number	no	-	Reproducibility seed.
`audio_setting`	enum	no	`"auto"`	auto: generate native audio. origin: keep audio from the reference video (videoedit/r2v only). · Allowed: `auto`, `origin`
`prompt_extend`	boolean	no	true	Automatically expand and refine the prompt for better results.

Notes

Generation can take 30+ minutes. Modes auto-detected from attachments — or override with the mode parameter.

Modes

T2V: no attachments
I2V (First Frame): 1 image
I2V (First + Last): exactly 2 images
I2V Continuation: 1 video (2-10s) + optional last-frame image
Video Edit: 1 video (2-10s, ≤100 MB, MP4/MOV) + up to 3 reference images
R2V: up to 5 references combined; reference subjects in your prompt with Video1, Image1, etc.

Optional audio

T2V/I2V: 2-30s
R2V: 1-10s (used as a voice timbre sample)
Max 15 MB, .mp3 or .wav

Billing

Video Edit and R2V are billed for input + output duration combined.

Uploaded media preprocessing

Reference and edit videos are normalized to provider-compatible MP4 when needed.
Reference-video duration follows the mode limits shown above.

Machine-readable schema: GET https://api.empiriolabs.ai/v1/models/wan-2-7.