ACE-Step-1.5-XL

ACE-Step-1.5-XL
ACE-Step · Audio Generation
POST /v1/audio/generations

Open-source music generation model for text-to-song and lyric-guided audio, with fast 8-step XL Turbo inference for controllable song iteration.

At a glance

FieldValue
Model idace-step-1.5-xl
Input modalitiesText
Output modalitiesAudio
Context window
Weight precisionBF16
Featuresmusic_generation, lyrics, text_to_music, seed_control, commercial_ready
Native inferenceYes
NewYes
Supported endpointsPOST /v1/audio/generations

Pricing

ChargeSpecRate
Music generationper generated second$0.00025 (was $0.0003)

Example request

$curl https://api.empiriolabs.ai/v1/audio/generations \
> -H 'Authorization: Bearer $EMPIRIOLABS_API_KEY' \
> -H 'Content-Type: application/json' \
> -d '{"model": "ace-step-1.5-xl", "prompt": "warm jazz piano", "duration": 8}'

Parameters

ParameterTypeRequiredDefaultDescription
promptstringyesGenre, mood, instrumentation, and BPM hints describing the music to compose. Comma-separated tags work well.
lyricsstringnoVocal lyrics for the track. Use [verse] / [chorus] / [bridge] tags to mark sections, blank line for instrumental break. Leave empty for purely instrumental tracks.
audio_durationnumberno30.0Length of the generated track in seconds. The model is most reliable up to ~4 minutes; longer durations risk OOM/quality drops during diffusion. · Range: 10.0 – 240
num_inference_stepsintegerno8Number of diffusion steps. 8 is the recommended sweet spot for the Turbo variant; raise for more polish, lower for cheaper draft generations. · Range: 1 – 20
guidance_scalenumberno1.0Classifier-free guidance scale. 1.0 follows the model’s natural distribution; higher values push closer to the prompt at the cost of variety. · Range: 0.0 – 20.0
shiftnumbernoDiffusion timestep shift. Default leaves the schedule unchanged; nudge to 1.0+ for shorter/punchier or below 1.0 for slower/dreamier results.
negative_promptstringnoNegative prompt — anti-tags, anti-styles, instruments to exclude. Same comma-separated style as prompt.
seedintegernoRandom seed for reproducibility. Same seed + identical params produces the same track.
formatenumno"flac"Audio container format for the response. FLAC = lossless, WAV = uncompressed, MP3 = small file size. · Allowed: flac, wav, ogg, mp3
response_formatenumno"url"How the worker returns the audio. ‘url’ returns a signed URL to the rendered file; ‘base64’ inlines the bytes in the response. · Allowed: url, b64_json
return_base64booleannofalseWhen true, the response includes the rendered audio as base64 in addition to (or instead of, depending on response_format) the URL.

Notes

Defaults

  • 8 inference steps
  • Guidance scale 1.0
  • Lossless FLAC output

Controls

Supports lyrics, prompt/description, 10-600s duration, seed, shift, optional negative prompt when supported by the pinned pipeline, and URL or base64 output mode.


Machine-readable schema: GET https://api.empiriolabs.ai/v1/models/ace-step-1.5-xl.