SVI 2.0 Pro | EmpirioLabs AI Docs

POST /v1/videos/generations

Stable Video Infinity 2.0 Pro on WAN 2.2: extends still images into theoretically infinite-length video while keeping consistent character IDs.

At a glance

Field	Value
Model id	`svi-2-0-pro`
Model release date	2025-12-26
Input modalities	Text, Image
Output modalities	Video
Context window	-
Weight precision	Mixed FP8/BF16/FP16
Features	infinite_length, character_consistency
Native inference	Yes
New	No
Supported endpoints	`POST /v1/videos/generations`
Alternate model ids	`svi-2.0-pro`, `winfunc/svi-2.0-pro`

Pricing

Charge	Spec	Rate
480p Video	per second	$0.057
720p Video	per second	$0.17
T2V Fast	additional fee	$0.065
T2V Quality	additional fee	$0.13

Example request

$ curl https://api.empiriolabs.ai/v1/videos/generations \
>   -H 'Authorization: Bearer $EMPIRIOLABS_API_KEY' \
>   -H 'Content-Type: application/json' \
>   -d '{"model": "svi-2-0-pro", "prompt": "sunrise over the ocean", "duration": 6}'

Parameters

Parameter	Type	Required	Default	Description
`resolution`	enum	no	`"832x480"`	480p is fast; 720p is slower but sharper. · Allowed: `832x480`, `480x832`, `720x1280`, `1280x720`
`duration`	number	no	`18`	Estimated clip length in seconds. · Range: 18 – 121.5
`cfg`	number	no	`1.0`	Prompt adherence strength. · Range: 1.0 – 2.0
`negative_prompt`	string	no	`"vibrant tone, overexposed, static, blurry details, subtitles, style, artwork, painting, picture, still, overall gray, worst quality, low quality, JPEG compression residue, ugly, incomplete, extra fingers, poorly drawn hands, poorly drawn face, deformed, disfigured, malformed limbs, fused fingers, still picture, messy background, three legs, background crowd, walking backwards"`	Text describing what to avoid.
`t2v_quality`	enum	no	`"quality"`	Text-to-video pipeline tier. ‘quality’ uses the Wan 2.2 plus reference image model for higher fidelity; ‘fast’ uses the flash model for cheaper, quicker generations. Only applies in text-to-video mode (image-to-video skips this step). · Allowed: `fast`, `quality`

Notes

Theoretically infinite-length video with consistent character ID. Image-to-Video typically yields superior results to text-to-video.

Constraints

Generation can take 45+ minutes for long videos
For best motion: describe consecutive actions per segment in your prompt

Image formats

jpg, jpeg, png, webp, heic, heif, bmp, tiff, tif

Multi-scene mode

When describing several scenes in one prompt, lower CFG (1.0-1.3) gives the model more freedom to interpret distinct scene transitions
Raise CFG (1.5-2.0) when each scene must follow the prompt literally

Machine-readable schema: GET https://api.empiriolabs.ai/v1/models/svi-2-0-pro.