Stable-Audio-2.5

Stable-Audio-2.5

Provider: Stability AI
Category: Audio Generation
Endpoint: POST /v1/audio/speech
Context window:
Served from:

Up-to-3-minute audio from text with text-to-audio, audio-to-audio, and audio inpainting for music production, sound design, and remixing.

At a glance

FieldValue
Model idstable-audio-2-5
Input modalitiestext
Output modalitiesaudio
Context window
Region
Features
NewNo
Native inferenceNo

Pricing

ChargeSpecRate
Generationper generation$0.68

Example request

$curl https://api.empiriolabs.ai/v1/audio/speech \
> -H 'Authorization: Bearer $EMPIRIOLABS_API_KEY' \
> -H 'Content-Type: application/json' \
> -d '{"model": "stable-audio-2-5", "input": "Hello from EmpirioLabs."}'

Parameters

ParameterTypeRequiredDefaultDescription
promptstringyesWhat to generate.
modeenumno"text-to-audio"audio-inpaint regenerates a [mask_start, mask_end] window of an existing clip while keeping the rest. · Allowed: text-to-audio, audio-to-audio, audio-inpaint
output_formatenumno"mp3"Allowed: mp3, wav
durationnumberno190Seconds. Up to 3 minutes 10 seconds. · Range: 1 – 190
stepsnumberno8Diffusion steps. The 2.5 turbo model is tuned for very low step counts. · Range: 4 – 8
cfg_scalenumberno1Classifier-free guidance. The turbo model uses small CFG by default. · Range: 1 – 25
strengthnumberno0.5Audio-to-audio only. 0.01 = ignore reference, 1 = stay close to reference. · Range: 0.01 – 1
mask_startnumbernoInpaint window start (seconds). Required for audio-inpaint. · Range: 0 – 190
mask_endnumbernoInpaint window end (seconds). Required for audio-inpaint. · Range: 0 – 190
random_seedbooleannotrueIf true, use a random seed each call.
seednumbernoReproducibility seed. Only used when random_seed=false.
audio_urlstringnoReference audio URL for audio-to-audio / inpaint.

Live machine-readable schema is also available at GET https://api.empiriolabs.ai/v1/models/stable-audio-2-5.