Google

Models from Google

Low-latency text-to-speech with single- and multi-speaker voices and controllable style, accent, and expressive tone for production apps.

High-quality TTS preview for podcasts, audiobooks, and customer support, with expressive multi-speaker voices across 23+ languages.

Highly controllable TTS with new Audio Tags for precise style, tone, pace, and delivery across narration, assistants, and voice apps.

Open-source vision-language model with 128K context, 140+ languages, improved math/reasoning, structured outputs, and function calling.

Gemma 4 26B A4B is a Google open multimodal model with 256K context, text, image, and video input, tools, and structured output.