Gemini-3.1-Flash-TTS

Gemini-3.1-Flash-TTS

Provider: Google
Category: Audio Generation
Endpoint: POST /v1/audio/speech
Context window:
Served from:

Highly controllable TTS with new Audio Tags for precise style, tone, pace, and delivery across narration, assistants, and voice apps.

At a glance

FieldValue
Model idgemini-3-1-flash-tts
Input modalitiestext
Output modalitiesaudio
Context window
Region
Features
NewYes
Native inferenceNo

Pricing

ChargeSpecRate
Input (text)per 1k tokens$0.0026
Output (audio)per 1k tokens$0.053

Example request

$curl https://api.empiriolabs.ai/v1/audio/speech \
> -H 'Authorization: Bearer $EMPIRIOLABS_API_KEY' \
> -H 'Content-Type: application/json' \
> -d '{"model": "gemini-3-1-flash-tts", "input": "Hello from EmpirioLabs."}'

Parameters

ParameterTypeRequiredDefaultDescription
inputstringyesText to synthesize (≤4000 chars). Audio tags supported: [whispers], [laughs], [excited], [sigh], [shouting].
modeenumno"single"single: one voice. multi: two-speaker dialog (use [Speaker1]: / [Speaker2]: prefixes in input). · Allowed: single, multi
languagestringno"en-US"BCP-47 code. 24 GA + 50+ preview supported. Common: en-US, en-IN, ja-JP, ko-KR, fr-FR, de-DE, …
voiceenumno"Charon"Single mode + first voice for multi mode. 30 distinct Gemini TTS voices, each with a unique timbre. · Allowed: Zephyr, Puck, Charon, Kore, Fenrir, Leda, Orus, Aoede, Callirrhoe, Autonoe, Enceladus, Iapetus, Umbriel, Algieba, Despina, Erinome, Algenib, Rasalgethi, Laomedeia, Achernar, Alnilam, Schedar, Gacrux, Pulcherrima, Achird, Zubenelgenubi, Vindemiatrix, Sadachbia, Sadaltager, Sulafat
voice2enumno"Kore"Second voice for multi mode. · Allowed: Zephyr, Puck, Charon, Kore, Fenrir, Leda, Orus, Aoede, Callirrhoe, Autonoe, Enceladus, Iapetus, Umbriel, Algieba, Despina, Erinome, Algenib, Rasalgethi, Laomedeia, Achernar, Alnilam, Schedar, Gacrux, Pulcherrima, Achird, Zubenelgenubi, Vindemiatrix, Sadachbia, Sadaltager, Sulafat
speaker1_namestringno"Speaker1"Multi mode label for the first speaker (alphanumeric).
speaker2_namestringno"Speaker2"Multi mode label for the second speaker (alphanumeric).
output_formatenumno"WAV"Allowed: WAV, MP3, OGG, ALAW, MULAW
speednumberno1Speaking rate multiplier (step 0.25). · Range: 0.25 – 2
volume_gainnumberno0Output gain in dB. · Range: -96 – 16
sample_rateenumno"24000"Output sample rate in Hz. · Allowed: 8000, 16000, 22050, 24000, 44100, 48000
style_promptstringnoFree-form style guidance, prepended to input (≤4000 chars). Examples: ‘Read with a calm, professional tone’ or ‘Speak excitedly’.

Live machine-readable schema is also available at GET https://api.empiriolabs.ai/v1/models/gemini-3-1-flash-tts.