SoulX Podcast | EmpirioLabs AI Docs

Soul AI Lab · Audio Generation

POST /v1/audio/speech

Open-source voice model for long-form, multi-speaker podcast dialogue with paralinguistic control (laughter, sighs) and zero-shot voice cloning.

At a glance

Field	Value
Model id	`soulx-podcast`
Model release date	2025-10-29
Input modalities	Text, Audio
Output modalities	Audio
Context window	-
Weight precision	-
Features	voice_cloning, multi_speaker, dialect, podcast
Native inference	Yes
New	No
Supported endpoints	`POST /v1/audio/speech`
Alternate model ids	`soul-ai-lab/soulx-podcast`

Pricing

Charge	Spec	Rate
Base	per 1k characters	$0.015
Dialect	per 1k characters	$0.015

Example request

$ curl https://api.empiriolabs.ai/v1/audio/speech \
>   -H 'Authorization: Bearer $EMPIRIOLABS_API_KEY' \
>   -H 'Content-Type: application/json' \
>   -d '{"model": "soulx-podcast", "input": "Hello from EmpirioLabs."}'

Parameters

Parameter	Type	Required	Default	Description
`input`	string	yes	-	Podcast script. Use [S1] / [S2] / [S3] / [S4] tags or ‘Speaker N:’ lines for multi-speaker. Paralinguistic tags supported: <\|laughter\|>, <\|sigh\|>, <\|breath\|>, <\|cough\|>.
`voice_model`	enum	no	`"base"`	base: English + Mandarin. dialect: adds Sichuan, Henan, and Cantonese. · Allowed: `base`, `dialect`
`voice_s1`	enum	no	`"arthur"`	Voice for [S1]. lj = Emma. custom_s1 requires voice_s1_audio_url. · Allowed: `arthur`, `james`, `lj`, `xiaomei`, `zhigang`, `custom_s1`
`voice_s2`	enum	no	`"lj"`	Voice for [S2]. lj = Emma. · Allowed: `arthur`, `james`, `lj`, `xiaomei`, `zhigang`, `custom_s2`
`voice_s3`	enum	no	`"james"`	Voice for [S3]. · Allowed: `arthur`, `james`, `lj`, `xiaomei`, `zhigang`, `custom_s3`
`voice_s4`	enum	no	`"xiaomei"`	Voice for [S4]. · Allowed: `arthur`, `james`, `lj`, `xiaomei`, `zhigang`, `custom_s4`
`voice_s1_audio_url`	string	no	-	Reference audio URL for [S1] custom-voice cloning. Speaker must say the consent phrase aloud.
`voice_s2_audio_url`	string	no	-	Reference audio URL for [S2] custom-voice cloning.
`voice_s3_audio_url`	string	no	-	Reference audio URL for [S3] custom-voice cloning.
`voice_s4_audio_url`	string	no	-	Reference audio URL for [S4] custom-voice cloning.
`temperature`	number	no	`0.6`	Sampling temperature. · Range: 0.1 – 2.0
`top_k`	number	no	`100`	Top-k sampling cap. · Range: 1 – 500
`top_p`	number	no	`0.9`	Nucleus sampling. · Range: 0.1 – 1.0
`repetition_penalty`	number	no	`1.25`	Higher values discourage repeated phrasing. · Range: 1.0 – 2.0
`seed`	string	no	`"42"`	Reproducibility seed (string per upstream).
`output_format`	enum	no	`"mp3"`	Output media file format (mp3, wav, mp4, png, jpg, etc., depending on the endpoint). · Allowed: `mp3`, `wav`
`language`	string	no	`""`	Forwarded to upstream (passthrough) so the podcast model can pick the right voice/dialect tier.

Notes

Open-source voice model for long-form, multi-speaker podcast dialogue with paralinguistic control and zero-shot voice cloning.

Machine-readable schema: GET https://api.empiriolabs.ai/v1/models/soulx-podcast.