SVI 2.0 Pro

SVI 2.0 Pro
VITA-Group / EPFL · Video Generation
POST /v1/videos/generations

Stable Video Infinity 2.0 Pro on WAN 2.2: extends still images into theoretically infinite-length video while keeping consistent character IDs.

At a glance

FieldValue
Model idsvi-2-0-pro
Input modalitiesText, Image
Output modalitiesVideo
Context window-
Weight precisionMixed FP8/BF16/FP16
Featuresinfinite_length, character_consistency
Native inferenceYes
NewNo
Supported endpointsPOST /v1/videos/generations

Pricing

ChargeSpecRate
480p Videoper second$0.057
720p Videoper second$0.17
T2V Fastadditional fee$0.065
T2V Qualityadditional fee$0.13

Example request

$curl https://api.empiriolabs.ai/v1/videos/generations \
> -H 'Authorization: Bearer $EMPIRIOLABS_API_KEY' \
> -H 'Content-Type: application/json' \
> -d '{"model": "svi-2-0-pro", "prompt": "sunrise over the ocean", "duration": 6}'

Parameters

ParameterTypeRequiredDefaultDescription
resolutionenumno"832x480"480p is fast; 720p is slower but sharper. · Allowed: 832x480, 480x832, 720x1280, 1280x720
durationnumberno18Estimated clip length in seconds. · Range: 18 – 121.5
cfgnumberno1.0Prompt adherence strength. · Range: 1.0 – 2.0
negative_promptstringno"vibrant tone, overexposed, static, blurry details, subtitles, style, artwork, painting, picture, still, overall gray, worst quality, low quality, JPEG compression residue, ugly, incomplete, extra fingers, poorly drawn hands, poorly drawn face, deformed, disfigured, malformed limbs, fused fingers, still picture, messy background, three legs, background crowd, walking backwards"Text describing what to avoid.
t2v_qualityenumno"quality"Text-to-video pipeline tier. ‘quality’ uses the Wan 2.2 plus reference image model for higher fidelity; ‘fast’ uses the flash model for cheaper, quicker generations. Only applies in text-to-video mode (image-to-video skips this step). · Allowed: fast, quality

Notes

Theoretically infinite-length video with consistent character ID. Image-to-Video typically yields superior results to text-to-video.

Constraints

  • Generation can take 45+ minutes for long videos
  • For best motion: describe consecutive actions per segment in your prompt

Image formats

  • jpg, jpeg, png, webp, heic, heif, bmp, tiff, tif

Multi-scene mode

  • When describing several scenes in one prompt, lower CFG (1.0-1.3) gives the model more freedom to interpret distinct scene transitions
  • Raise CFG (1.5-2.0) when each scene must follow the prompt literally

Machine-readable schema: GET https://api.empiriolabs.ai/v1/models/svi-2-0-pro.