
Models from Google

Gemini 2.5 Flash TTS
Low-latency text-to-speech with single- and multi-speaker voices and controllable style, accent, and expressive tone for production apps.

Gemini 2.5 Pro TTS
High-quality TTS preview for podcasts, audiobooks, and customer support, with expressive multi-speaker voices across 23+ languages.

Gemini 3.1 Flash TTS
Highly controllable TTS with new Audio Tags for precise style, tone, pace, and delivery across narration, assistants, and voice apps.

Gemma 3 27B
Open-source vision-language model with 128K context, 140+ languages, improved math/reasoning, structured outputs, and function calling.

Gemma 4 26B-A4B
Gemma 4 26B A4B is a Google open multimodal model with 256K context, text, image, and video input, tools, and structured output.

Gemma 4 E4B
Gemma 4 E4B is a Google open multimodal chat model with image input, function calling, structured output, and efficient instruction following.
