SoulX-Podcast

SoulX-Podcast

Provider: Soul AI Lab
Category: Audio Generation
Endpoint: POST /v1/audio/speech
Context window:
Served from: EmpirioLabs (Native Inference)

Open-source voice model for long-form, multi-speaker podcast dialogue with paralinguistic control (laughter, sighs) and zero-shot voice cloning.

At a glance

FieldValue
Model idsoulx-podcast
Input modalitiestext
Output modalitiesaudio
Context window
RegionEmpirioLabs (Native Inference)
Featuresvoice_cloning, multi_speaker, dialect, podcast
NewNo
Native inferenceYes

Pricing

ChargeSpecRate
Baseper 1k characters$0.015
Dialectper 1k characters$0.015

Example request

$curl https://api.empiriolabs.ai/v1/audio/speech \
> -H 'Authorization: Bearer $EMPIRIOLABS_API_KEY' \
> -H 'Content-Type: application/json' \
> -d '{"model": "soulx-podcast", "input": "Hello from EmpirioLabs."}'

Parameters

ParameterTypeRequiredDefaultDescription
inputstringyesPodcast script. Use [S1] / [S2] / [S3] / [S4] tags or ‘Speaker N:’ lines for multi-speaker. Paralinguistic tags supported: <|laughter|>, <|sigh|>, <|breath|>, <|cough|>.
modelenumno"base"base: English + Mandarin. dialect: adds Sichuan, Henan, and Cantonese. · Allowed: base, dialect
voice_s1enumno"arthur"Voice for [S1]. lj = Emma. custom_s1 requires voice_s1_audio_url. · Allowed: arthur, james, lj, xiaomei, zhigang, custom_s1
voice_s2enumno"lj"Voice for [S2]. lj = Emma. · Allowed: arthur, james, lj, xiaomei, zhigang, custom_s2
voice_s3enumno"james"Voice for [S3]. · Allowed: arthur, james, lj, xiaomei, zhigang, custom_s3
voice_s4enumno"xiaomei"Voice for [S4]. · Allowed: arthur, james, lj, xiaomei, zhigang, custom_s4
voice_s1_audio_urlstringnoReference audio URL for [S1] custom-voice cloning. Speaker must say the consent phrase aloud.
voice_s2_audio_urlstringnoReference audio URL for [S2] custom-voice cloning.
voice_s3_audio_urlstringnoReference audio URL for [S3] custom-voice cloning.
voice_s4_audio_urlstringnoReference audio URL for [S4] custom-voice cloning.
temperaturenumberno0.6Sampling temperature. · Range: 0.1 – 2
top_knumberno100Top-k sampling cap. · Range: 1 – 500
top_pnumberno0.9Nucleus sampling. · Range: 0.1 – 1
repetition_penaltynumberno1.25Higher values discourage repeated phrasing. · Range: 1 – 2
seedstringno"42"Reproducibility seed (string per upstream).
output_formatenumno"mp3"Allowed: mp3, wav
languagestringno"en"BCP-47-ish hint. Auto-detected from script tags.

Live machine-readable schema is also available at GET https://api.empiriolabs.ai/v1/models/soulx-podcast.