Stable Audio 2.0

Stable Audio 2.0
Stability AI · Audio Generation
POST /v1/audio/generations

Generates audio up to 3 minutes from text prompts, supporting text-to-audio and audio-to-audio with adjustable duration, steps, and CFG scale.

At a glance

FieldValue
Model idstable-audio-2-0
Input modalitiesText
Output modalitiesAudio
Context window-
Weight precision-
Features-
Native inferenceNo
NewNo
Supported endpointsPOST /v1/audio/generations

Pricing

ChargeSpecRate
Base Costper generation$0.58
Per Step Costper step$0.00

Example request

$curl https://api.empiriolabs.ai/v1/audio/generations \
> -H 'Authorization: Bearer $EMPIRIOLABS_API_KEY' \
> -H 'Content-Type: application/json' \
> -d '{"model": "stable-audio-2-0", "prompt": "warm jazz piano", "duration": 8}'

Parameters

ParameterTypeRequiredDefaultDescription
promptstringyes-What to generate. Be specific about genre, instruments, mood, and tempo.
modeenumno"text-to-audio"text-to-audio: generate from prompt only. audio-to-audio: condition on a reference clip. · Allowed: text-to-audio, audio-to-audio
output_formatenumno"mp3"Output media file format (mp3, wav, mp4, png, jpg, etc., depending on the endpoint). · Allowed: mp3, wav
durationnumberno190Seconds. Stability Audio 2.0 generates up to 3 minutes 10 seconds. · Range: 1 – 190
stepsnumberno50Diffusion steps. More = higher fidelity, slower (and adds per-step credits). · Range: 30 – 100
cfg_scalenumberno7Classifier-free guidance. Higher = follows prompt more strictly. · Range: 1 – 25
strengthnumberno1Audio-to-audio only. 0 = ignore reference, 1 = stay close to reference. · Range: 0 – 1
random_seedbooleannotrueIf true, use a random seed each call.
seednumberno-Reproducibility seed. Only used when random_seed=false.
audio_urlstringno-Reference audio URL for audio-to-audio mode.

Notes

Generates up to 3 minutes of audio from text or via audio-to-audio transformation.

Audio-to-audio mode

  • Requires BOTH a prompt and an uploaded audio file
  • Recommended CFG scale: 7-15
  • Recommended steps: 6-8
  • Typical strength: 0.3-0.7

Machine-readable schema: GET https://api.empiriolabs.ai/v1/models/stable-audio-2-0.