MOSS Video and Audio

OpenMOSS · Video Generation
POST /v1/videos/generationsOpen-source 32B MoE foundation model that generates synchronized video and audio in one inference step with precise dual-tower lip-sync.
At a glance
Pricing
Example request
Parameters
Notes
32B-parameter MoE with synchronized lip-sync video + audio in a single inference.
Constraints
- Generation can take 20+ minutes
- Image-to-Video typically yields superior results to text-to-video
- Only 1 image supported (used as the first frame)
- Video inputs NOT supported
Image formats
- jpg, jpeg, png, webp, heic, heif, bmp, tiff, tif, gif
Machine-readable schema: GET https://api.empiriolabs.ai/v1/models/moss-video-and-audio.
