input | string | yes | — | Podcast script. Use [S1] / [S2] / [S3] / [S4] tags or ‘Speaker N:’ lines for multi-speaker. Paralinguistic tags supported: <|laughter|>, <|sigh|>, <|breath|>, <|cough|>. |
model | enum | no | "base" | base: English + Mandarin. dialect: adds Sichuan, Henan, and Cantonese. · Allowed: base, dialect |
voice_s1 | enum | no | "arthur" | Voice for [S1]. lj = Emma. custom_s1 requires voice_s1_audio_url. · Allowed: arthur, james, lj, xiaomei, zhigang, custom_s1 |
voice_s2 | enum | no | "lj" | Voice for [S2]. lj = Emma. · Allowed: arthur, james, lj, xiaomei, zhigang, custom_s2 |
voice_s3 | enum | no | "james" | Voice for [S3]. · Allowed: arthur, james, lj, xiaomei, zhigang, custom_s3 |
voice_s4 | enum | no | "xiaomei" | Voice for [S4]. · Allowed: arthur, james, lj, xiaomei, zhigang, custom_s4 |
voice_s1_audio_url | string | no | — | Reference audio URL for [S1] custom-voice cloning. Speaker must say the consent phrase aloud. |
voice_s2_audio_url | string | no | — | Reference audio URL for [S2] custom-voice cloning. |
voice_s3_audio_url | string | no | — | Reference audio URL for [S3] custom-voice cloning. |
voice_s4_audio_url | string | no | — | Reference audio URL for [S4] custom-voice cloning. |
temperature | number | no | 0.6 | Sampling temperature. · Range: 0.1 – 2 |
top_k | number | no | 100 | Top-k sampling cap. · Range: 1 – 500 |
top_p | number | no | 0.9 | Nucleus sampling. · Range: 0.1 – 1 |
repetition_penalty | number | no | 1.25 | Higher values discourage repeated phrasing. · Range: 1 – 2 |
seed | string | no | "42" | Reproducibility seed (string per upstream). |
output_format | enum | no | "mp3" | Allowed: mp3, wav |
language | string | no | "en" | BCP-47-ish hint. Auto-detected from script tags. |