Kling O3 | EmpirioLabs AI Docs

Kling AI · Video Generation

POST /v1/videos/generations

Video model in Standard or Pro modes with Text-to-Video, Image-to-Video, Reference-to-Video, editing, native sound, and multi-scene transitions.

At a glance

Field	Value
Model id	`kling-o3`
Model release date	2026-02-05
Input modalities	Text, Image, Video, Audio
Output modalities	Video
Context window	-
Weight precision	-
Features	audio, editing
Native inference	No
New	No
Supported endpoints	`POST /v1/videos/generations`
Alternate model ids	`kling/o3`

Pricing

Charge	Spec	Rate
Standard T2V/I2V	per second	$0.168
Standard T2V/I2V Sound	per second	$0.224
Standard Video Input	per second	$0.252
Pro T2V/I2V	per second	$0.224
Pro T2V/I2V Sound	per second	$0.280
Pro Video Input	per second	$0.336
4K T2V/I2V/Ref	per second	$0.525

Example request

$ curl https://api.empiriolabs.ai/v1/videos/generations \
>   -H 'Authorization: Bearer $EMPIRIOLABS_API_KEY' \
>   -H 'Content-Type: application/json' \
>   -d '{"model": "kling-o3", "prompt": "sunrise over the ocean", "duration": 6}'

Parameters

Parameter	Type	Required	Default	Description
`prompt`	string	yes	-	Multi-scene: pipe (\|) or newline-separated prompts, optionally prefixed with duration like ‘5s: scene text’. Up to 6 scenes.
`model_tier`	enum	no	`"pro"`	standard: cheapest. pro: balanced quality. 4k: highest fidelity, longest render. · Allowed: `standard`, `pro`, `4k`
`workflow`	enum	no	`"auto"`	auto: detect from inputs. t2v: text-to-video. i2v: image-to-video. video_edit: edit attached video. reference: use reference_images or reference_videos. · Allowed: `auto`, `t2v`, `i2v`, `video_edit`, `reference`
`aspect_ratio`	enum	no	`"16:9"`	Kling O3 supports landscape, square, and portrait only. · Allowed: `16:9`, `1:1`, `9:16`
`duration`	number	no	`5`	Per-scene duration in seconds. · Range: 3 – 15
`sound`	boolean	no	true	Generate native audio with the video.
`keep_original_sound`	boolean	no	true	video_edit only. Keep audio from the source video.
`image`	string	no	-	Reference image URL for i2v.
`image_end`	string	no	-	Optional last-frame image URL for image-to-video.
`video`	string	no	-	Source video URL for video_edit.
`reference_images`	string	no	-	Comma-separated image URLs for reference workflow.
`reference_videos`	string	no	-	Comma-separated video URLs for reference workflow.

Notes

Video model in Standard or Pro modes with text-to-video, image-to-video, reference-to-video, editing, native sound, and multi-scene transitions.

Uploaded media preprocessing

Video inputs are capped to 10 seconds for video-edit and video-reference workflows.
Uploaded video inputs are normalized to provider-compatible MP4 when needed.
Kling O3 4K supports text, image, and image-only reference workflows. Use Standard or Pro for video inputs.

Machine-readable schema: GET https://api.empiriolabs.ai/v1/models/kling-o3.