prompt | string | yes | — | Genre, mood, instrumentation, and BPM hints describing the music to compose. Comma-separated tags work well. |
lyrics | string | no | — | Vocal lyrics for the track. Use [verse] / [chorus] / [bridge] tags to mark sections, blank line for instrumental break. Leave empty for purely instrumental tracks. |
audio_duration | number | no | 30.0 | Length of the generated track in seconds. The model is most reliable up to ~4 minutes; longer durations risk OOM/quality drops during diffusion. · Range: 10.0 – 240 |
num_inference_steps | integer | no | 8 | Number of diffusion steps. 8 is the recommended sweet spot for the Turbo variant; raise for more polish, lower for cheaper draft generations. · Range: 1 – 20 |
guidance_scale | number | no | 1.0 | Classifier-free guidance scale. 1.0 follows the model’s natural distribution; higher values push closer to the prompt at the cost of variety. · Range: 0.0 – 20.0 |
shift | number | no | — | Diffusion timestep shift. Default leaves the schedule unchanged; nudge to 1.0+ for shorter/punchier or below 1.0 for slower/dreamier results. |
negative_prompt | string | no | — | Negative prompt — anti-tags, anti-styles, instruments to exclude. Same comma-separated style as prompt. |
seed | integer | no | — | Random seed for reproducibility. Same seed + identical params produces the same track. |
format | enum | no | "flac" | Audio container format for the response. FLAC = lossless, WAV = uncompressed, MP3 = small file size. · Allowed: flac, wav, ogg, mp3 |
response_format | enum | no | "url" | How the worker returns the audio. ‘url’ returns a signed URL to the rendered file; ‘base64’ inlines the bytes in the response. · Allowed: url, b64_json |
return_base64 | boolean | no | false | When true, the response includes the rendered audio as base64 in addition to (or instead of, depending on response_format) the URL. |