Stable Audio 2.5

Stable Audio 2.5
Stability AI · Audio Generation
POST /v1/audio/generations

Up-to-3-minute audio from text with text-to-audio, audio-to-audio, and audio inpainting for music production, sound design, and remixing.

At a glance

FieldValue
Model idstable-audio-2-5
Input modalitiesText
Output modalitiesAudio
Context window-
Weight precision-
Features-
Native inferenceNo
NewNo
Supported endpointsPOST /v1/audio/generations

Pricing

ChargeSpecRate
Generationper generation$0.68

Example request

$curl https://api.empiriolabs.ai/v1/audio/generations \
> -H 'Authorization: Bearer $EMPIRIOLABS_API_KEY' \
> -H 'Content-Type: application/json' \
> -d '{"model": "stable-audio-2-5", "prompt": "warm jazz piano", "duration": 8}'

Parameters

ParameterTypeRequiredDefaultDescription
promptstringyes-What to generate.
modeenumno"text-to-audio"audio-inpaint regenerates a [mask_start, mask_end] window of an existing clip while keeping the rest. · Allowed: text-to-audio, audio-to-audio, audio-inpaint
output_formatenumno"mp3"Output media file format (mp3, wav, mp4, png, jpg, etc., depending on the endpoint). · Allowed: mp3, wav
durationnumberno190Seconds. Up to 3 minutes 10 seconds. · Range: 1 – 190
stepsnumberno8Diffusion steps. The 2.5 turbo model is tuned for very low step counts. · Range: 4 – 8
cfg_scalenumberno1Classifier-free guidance. The turbo model uses small CFG by default. · Range: 1 – 25
strengthnumberno0.5Audio-to-audio only. 0.01 = ignore reference, 1 = stay close to reference. · Range: 0.01 – 1
mask_startnumberno-Inpaint window start (seconds). Required for audio-inpaint. · Range: 0 – 190
mask_endnumberno-Inpaint window end (seconds). Required for audio-inpaint. · Range: 0 – 190
random_seedbooleannotrueIf true, use a random seed each call.
seednumberno-Reproducibility seed. Only used when random_seed=false.
audio_urlstringno-Reference audio URL for audio-to-audio / inpaint.

Notes

Adds audio-inpaint mode (regenerate a time window) on top of Stable Audio 2.0.

Mode requirements

  • Audio-to-audio and audio-inpaint both require BOTH a prompt and an uploaded audio file
  • Audio-to-audio uses the reference audio for style/conditioning, NOT for voice cloning

Machine-readable schema: GET https://api.empiriolabs.ai/v1/models/stable-audio-2-5.