Inworld

Models from Inworld

Broadcast-quality voice synthesis with rich expressive prosody, 271+ voices across 15 languages, and real-time SSE streaming with per-word timestamps.

TTS 1.5 Mini

Sub-130ms TTFB voice synthesis with 271+ voices across 15 languages, expressive prosody, and real-time SSE streaming for low-latency voice agents.

TTS 2

Realtime voice model with plain-English voice direction, one voice identity across 100+ languages, and sub-200ms streaming time-to-first-audio.