SoulX Podcast

SoulX Podcast
Soul AI Lab · Audio Generation
POST /v1/audio/speech

Open-source voice model for long-form, multi-speaker podcast dialogue with paralinguistic control (laughter, sighs) and zero-shot voice cloning.

At a glance

FieldValue
Model idsoulx-podcast
Input modalitiesText, Audio
Output modalitiesAudio
Context window-
Weight precision-
Featuresvoice_cloning, multi_speaker, dialect, podcast
Native inferenceYes
NewNo
Supported endpointsPOST /v1/audio/speech

Pricing

ChargeSpecRate
Baseper 1k characters$0.015
Dialectper 1k characters$0.015

Example request

$curl https://api.empiriolabs.ai/v1/audio/speech \
> -H 'Authorization: Bearer $EMPIRIOLABS_API_KEY' \
> -H 'Content-Type: application/json' \
> -d '{"model": "soulx-podcast", "input": "Hello from EmpirioLabs."}'

Parameters

ParameterTypeRequiredDefaultDescription
inputstringyes-Podcast script. Use [S1] / [S2] / [S3] / [S4] tags or ‘Speaker N:’ lines for multi-speaker. Paralinguistic tags supported: <|laughter|>, <|sigh|>, <|breath|>, <|cough|>.
voice_modelenumno"base"base: English + Mandarin. dialect: adds Sichuan, Henan, and Cantonese. · Allowed: base, dialect
voice_s1enumno"arthur"Voice for [S1]. lj = Emma. custom_s1 requires voice_s1_audio_url. · Allowed: arthur, james, lj, xiaomei, zhigang, custom_s1
voice_s2enumno"lj"Voice for [S2]. lj = Emma. · Allowed: arthur, james, lj, xiaomei, zhigang, custom_s2
voice_s3enumno"james"Voice for [S3]. · Allowed: arthur, james, lj, xiaomei, zhigang, custom_s3
voice_s4enumno"xiaomei"Voice for [S4]. · Allowed: arthur, james, lj, xiaomei, zhigang, custom_s4
voice_s1_audio_urlstringno-Reference audio URL for [S1] custom-voice cloning. Speaker must say the consent phrase aloud.
voice_s2_audio_urlstringno-Reference audio URL for [S2] custom-voice cloning.
voice_s3_audio_urlstringno-Reference audio URL for [S3] custom-voice cloning.
voice_s4_audio_urlstringno-Reference audio URL for [S4] custom-voice cloning.
temperaturenumberno0.6Sampling temperature. · Range: 0.1 – 2.0
top_knumberno100Top-k sampling cap. · Range: 1 – 500
top_pnumberno0.9Nucleus sampling. · Range: 0.1 – 1.0
repetition_penaltynumberno1.25Higher values discourage repeated phrasing. · Range: 1.0 – 2.0
seedstringno"42"Reproducibility seed (string per upstream).
output_formatenumno"mp3"Output media file format (mp3, wav, mp4, png, jpg, etc., depending on the endpoint). · Allowed: mp3, wav
languagestringno""Forwarded to upstream (passthrough) so the podcast model can pick the right voice/dialect tier.

Notes

Open-source voice model for long-form, multi-speaker podcast dialogue with paralinguistic control and zero-shot voice cloning.


Machine-readable schema: GET https://api.empiriolabs.ai/v1/models/soulx-podcast.