GLM-TTS

Provider: Zhipu AI
Category: Audio generation
Endpoint: POST /v1/audio/speech
Context window:
Served from: EmpirioLabs (Native Inference)

LLM-based text-to-speech with zero-shot voice cloning from 3-10s of audio and emotion-expressive, controllable output via multi-reward RL.

At a glance

FieldValue
Model idglm-tts
Input modalitiestext
Output modalitiesaudio
Context window
RegionEmpirioLabs (Native Inference)
Featuresvoice_cloning, emotion_control
NewNo
Native inferenceYes

Pricing

ChargeSpecRate
Fast (INT8)per 1k characters$0.20
Quality (FP16)per 1k characters$0.21

Example request

$curl https://api.empiriolabs.ai/v1/audio/speech \
> -H 'Authorization: Bearer $EMPIRIOLABS_API_KEY' \
> -H 'Content-Type: application/json' \
> -d '{"model": "glm-tts", "input": "Hello from EmpirioLabs."}'

Parameters

Every parameter this model accepts is documented in the live machine-readable schema returned by GET /v1/models/glm-tts. Common controls include temperature, top_p, max_tokens, and the universal disable_formatting passthrough flag (also accepted as raw=true).