input | string | yes | - | Text to convert to speech. For multi-speaker mode, prefix lines with Speaker1: / Speaker2:. |
mode | enum | no | "single" | single = one voice, multi = two-voice dialogue (uses voice + voice2 + speaker names). · Allowed: single, multi |
language | string | no | "en-US" | BCP-47 language tag (en-US, es-ES, etc.) for pronunciation cues. |
voice | enum | no | "Charon" | Primary voice name (e.g. Kore, Puck, Aoede). Leave blank for the default. · Allowed: Zephyr, Puck, Charon, Kore, Fenrir, Leda, Orus, Aoede, Callirrhoe, Autonoe, Enceladus, Iapetus, Umbriel, Algieba, Despina, Erinome, Algenib, Rasalgethi, Laomedeia, Achernar, Alnilam, Schedar, Gacrux, Pulcherrima, Achird, Zubenelgenubi, Vindemiatrix, Sadachbia, Sadaltager, Sulafat |
voice2 | enum | no | "Kore" | Second voice name for multi-speaker mode. · Allowed: Zephyr, Puck, Charon, Kore, Fenrir, Leda, Orus, Aoede, Callirrhoe, Autonoe, Enceladus, Iapetus, Umbriel, Algieba, Despina, Erinome, Algenib, Rasalgethi, Laomedeia, Achernar, Alnilam, Schedar, Gacrux, Pulcherrima, Achird, Zubenelgenubi, Vindemiatrix, Sadachbia, Sadaltager, Sulafat |
speaker1_name | string | no | "Speaker1" | Display name used in the input prefix for speaker 1 (default: Speaker1). |
speaker2_name | string | no | "Speaker2" | Display name used in the input prefix for speaker 2 (default: Speaker2). |
output_format | enum | no | "WAV" | Audio file format (mp3, wav, opus, flac, etc.). · Allowed: WAV, MP3, OGG, ALAW, MULAW |
speed | number | no | 1.0 | Playback rate. 1.0 = natural; <1 slower, >1 faster. · Range: 0.25 – 2.0 |
volume_gain | number | no | 0 | Output gain in dB. 0 = unchanged. · Range: -96 – 16 |
sample_rate | enum | no | "24000" | Output sample rate in Hz (8000, 16000, 24000, 44100, 48000). · Allowed: 8000, 16000, 22050, 24000, 44100, 48000 |
style_prompt | string | no | - | Natural-language style direction (e.g. “warm, conversational” or “newscaster, serious”). |