Stable-Audio-2.0

Stable-Audio-2.0

Provider: Stability AI
Category: Audio Generation
Endpoint: POST /v1/audio/speech
Context window:
Served from:

Generates audio up to 3 minutes from text prompts, supporting text-to-audio and audio-to-audio with adjustable duration, steps, and CFG scale.

At a glance

FieldValue
Model idstable-audio-2-0
Input modalitiestext
Output modalitiesaudio
Context window
Region
Features
NewNo
Native inferenceNo

Pricing

ChargeSpecRate
Base Costper generation$0.58
Per Step Costper step$0.00

Example request

$curl https://api.empiriolabs.ai/v1/audio/speech \
> -H 'Authorization: Bearer $EMPIRIOLABS_API_KEY' \
> -H 'Content-Type: application/json' \
> -d '{"model": "stable-audio-2-0", "input": "Hello from EmpirioLabs."}'

Parameters

ParameterTypeRequiredDefaultDescription
promptstringyesWhat to generate. Be specific about genre, instruments, mood, and tempo.
modeenumno"text-to-audio"text-to-audio: generate from prompt only. audio-to-audio: condition on a reference clip. · Allowed: text-to-audio, audio-to-audio
output_formatenumno"mp3"Allowed: mp3, wav
durationnumberno190Seconds. Stability Audio 2.0 generates up to 3 minutes 10 seconds. · Range: 1 – 190
stepsnumberno50Diffusion steps. More = higher fidelity, slower (and adds per-step credits). · Range: 30 – 100
cfg_scalenumberno7Classifier-free guidance. Higher = follows prompt more strictly. · Range: 1 – 25
strengthnumberno1Audio-to-audio only. 0 = ignore reference, 1 = stay close to reference. · Range: 0 – 1
random_seedbooleannotrueIf true, use a random seed each call.
seednumbernoReproducibility seed. Only used when random_seed=false.
audio_urlstringnoReference audio URL for audio-to-audio mode.

Live machine-readable schema is also available at GET https://api.empiriolabs.ai/v1/models/stable-audio-2-0.