GLM-TTS | EmpirioLabs AI Docs

Provider: Zhipu AI
Category: Audio generation
Endpoint: POST /v1/audio/speech
Context window: —
Served from: EmpirioLabs (Native Inference)

LLM-based text-to-speech with zero-shot voice cloning from 3-10s of audio and emotion-expressive, controllable output via multi-reward RL.

At a glance

Field	Value
Model id	`glm-tts`
Input modalities	text
Output modalities	audio
Context window	—
Region	EmpirioLabs (Native Inference)
Features	voice_cloning, emotion_control
New	No
Native inference	Yes

Pricing

Charge	Spec	Rate
Fast (INT8)	per 1k characters	$0.20
Quality (FP16)	per 1k characters	$0.21

Example request

$ curl https://api.empiriolabs.ai/v1/audio/speech \
>   -H 'Authorization: Bearer $EMPIRIOLABS_API_KEY' \
>   -H 'Content-Type: application/json' \
>   -d '{"model": "glm-tts", "input": "Hello from EmpirioLabs."}'

Parameters

Every parameter this model accepts is documented in the live machine-readable schema returned by GET /v1/models/glm-tts. Common controls include temperature, top_p, max_tokens, and the universal disable_formatting passthrough flag (also accepted as raw=true).