ACE-Step-1.5-XL | EmpirioLabs AI Docs

ACE-Step · Audio Generation

POST /v1/audio/generations

Open-source music generation model for text-to-song and lyric-guided audio, with fast 8-step XL Turbo inference for controllable song iteration.

At a glance

Field	Value
Model id	`ace-step-1.5-xl`
Input modalities	Text
Output modalities	Audio
Context window	—
Weight precision	BF16
Features	music_generation, lyrics, text_to_music, seed_control, commercial_ready
Native inference	Yes
New	Yes
Supported endpoints	`POST /v1/audio/generations`

Pricing

Charge	Spec	Rate
Music generation	per generated second	$0.00025 (was $0.0003)

Example request

$ curl https://api.empiriolabs.ai/v1/audio/generations \
>   -H 'Authorization: Bearer $EMPIRIOLABS_API_KEY' \
>   -H 'Content-Type: application/json' \
>   -d '{"model": "ace-step-1.5-xl", "prompt": "warm jazz piano", "duration": 8}'

Parameters

Parameter	Type	Required	Default	Description
`prompt`	string	yes	—	Genre, mood, instrumentation, and BPM hints describing the music to compose. Comma-separated tags work well.
`lyrics`	string	no	—	Vocal lyrics for the track. Use [verse] / [chorus] / [bridge] tags to mark sections, blank line for instrumental break. Leave empty for purely instrumental tracks.
`audio_duration`	number	no	`30.0`	Length of the generated track in seconds. The model is most reliable up to ~4 minutes; longer durations risk OOM/quality drops during diffusion. · Range: 10.0 – 240
`num_inference_steps`	integer	no	`8`	Number of diffusion steps. 8 is the recommended sweet spot for the Turbo variant; raise for more polish, lower for cheaper draft generations. · Range: 1 – 20
`guidance_scale`	number	no	`1.0`	Classifier-free guidance scale. 1.0 follows the model’s natural distribution; higher values push closer to the prompt at the cost of variety. · Range: 0.0 – 20.0
`shift`	number	no	—	Diffusion timestep shift. Default leaves the schedule unchanged; nudge to 1.0+ for shorter/punchier or below 1.0 for slower/dreamier results.
`negative_prompt`	string	no	—	Negative prompt — anti-tags, anti-styles, instruments to exclude. Same comma-separated style as prompt.
`seed`	integer	no	—	Random seed for reproducibility. Same seed + identical params produces the same track.
`format`	enum	no	`"flac"`	Audio container format for the response. FLAC = lossless, WAV = uncompressed, MP3 = small file size. · Allowed: `flac`, `wav`, `ogg`, `mp3`
`response_format`	enum	no	`"url"`	How the worker returns the audio. ‘url’ returns a signed URL to the rendered file; ‘base64’ inlines the bytes in the response. · Allowed: `url`, `b64_json`
`return_base64`	boolean	no	false	When true, the response includes the rendered audio as base64 in addition to (or instead of, depending on response_format) the URL.

Notes

Defaults

8 inference steps
Guidance scale 1.0
Lossless FLAC output

Controls

Supports lyrics, prompt/description, 10-600s duration, seed, shift, optional negative prompt when supported by the pinned pipeline, and URL or base64 output mode.

Machine-readable schema: GET https://api.empiriolabs.ai/v1/models/ace-step-1.5-xl.