Stable Audio 2.5 | EmpirioLabs AI Docs

Stability AI · Audio Generation

POST /v1/audio/generations

Up-to-3-minute audio from text with text-to-audio, audio-to-audio, and audio inpainting for music production, sound design, and remixing.

At a glance

Field	Value
Model id	`stable-audio-2-5`
Model release date	2025-09-10
Input modalities	Text
Output modalities	Audio
Context window	-
Weight precision	-
Features	music_generation, text_to_audio, sound_effects
Native inference	No
New	No
Supported endpoints	`POST /v1/audio/generations`
Alternate model ids	`stability-audio-2.5`, `stability/audio-2.5`

Pricing

Charge	Spec	Rate
Generation	per generation	$0.68

Example request

$ curl https://api.empiriolabs.ai/v1/audio/generations \
>   -H 'Authorization: Bearer $EMPIRIOLABS_API_KEY' \
>   -H 'Content-Type: application/json' \
>   -d '{"model": "stable-audio-2-5", "prompt": "warm jazz piano", "duration": 8}'

Parameters

Parameter	Type	Required	Default	Description
`prompt`	string	yes	-	What to generate.
`mode`	enum	no	`"text-to-audio"`	audio-inpaint regenerates a [mask_start, mask_end] window of an existing clip while keeping the rest. · Allowed: `text-to-audio`, `audio-to-audio`, `audio-inpaint`
`output_format`	enum	no	`"mp3"`	Output media file format (mp3, wav, mp4, png, jpg, etc., depending on the endpoint). · Allowed: `mp3`, `wav`
`duration`	number	no	`190`	Seconds. Up to 3 minutes 10 seconds. · Range: 1 – 190
`steps`	number	no	`8`	Diffusion steps. The 2.5 turbo model is tuned for very low step counts. · Range: 4 – 8
`cfg_scale`	number	no	`1`	Classifier-free guidance. The turbo model uses small CFG by default. · Range: 1 – 25
`strength`	number	no	`0.5`	Audio-to-audio only. 0.01 = ignore reference, 1 = stay close to reference. · Range: 0.01 – 1
`mask_start`	number	no	-	Inpaint window start (seconds). Required for audio-inpaint. · Range: 0 – 190
`mask_end`	number	no	-	Inpaint window end (seconds). Required for audio-inpaint. · Range: 0 – 190
`random_seed`	boolean	no	true	If true, use a random seed each call.
`seed`	number	no	-	Reproducibility seed. Only used when random_seed=false.
`audio_url`	string	no	-	Reference audio URL for audio-to-audio / inpaint.

Notes

Adds audio-inpaint mode (regenerate a time window) on top of Stable Audio 2.0.

Mode requirements

Audio-to-audio and audio-inpaint both require BOTH a prompt and an uploaded audio file
Audio-to-audio uses the reference audio for style/conditioning, NOT for voice cloning

Machine-readable schema: GET https://api.empiriolabs.ai/v1/models/stable-audio-2-5.