Google has rolled out new updates to its Gemini 2.5 Flash and Gemini 2.5 Pro Text-to-Speech (TTS) preview models, now accessible for developers through the Gemini API in Google AI Studio. These TTS models are designed to serve those building applications that require nuanced vocal delivery, such as audiobook narration, e-learning modules, product tutorials, podcasts, and multi-character voiceovers. The updates introduce a broader palette of voice expressivity, stricter adherence to stylistic prompts, smarter context-aware speed adjustments, and more reliable multi-speaker support, now spanning 24 languages. The models replace previous iterations from May, aiming to give developers immediate access to more lifelike speech synthesis.
Google updated Gemini 2.5 Flash and Pro Text-to-Speech (TTS) models with new capabilities.
— TestingCatalog News 🗞 (@testingcatalog) December 10, 2025
- Emotional style and tone versatility
- Context-aware pacing control
- Improved multiple-speaker capabilities
Both models replaced older versions in AI Studio https://t.co/vbpGerFbrn pic.twitter.com/wUeq6awlzo
Gemini 2.5 Flash TTS is tuned for low-latency scenarios, making it suitable for interactive applications, while Gemini 2.5 Pro TTS prioritizes voice quality for high-fidelity projects. Both allow granular control over pacing, tone, and character identity, now with improved multilingual consistency. Industry partners have already incorporated these models to power advanced features, including precise control in dialogue creation and nuanced directorial adjustments for pronunciation and intonation. Early adopters have highlighted the models’ capabilities in producing cinematic voiceovers tailored to various characters and languages.
Google continues to build on its position in generative voice technologies, making these latest TTS models available to developers worldwide in Google AI Studio. The company's focus is on providing tools that adapt to a wide range of creative and technical voice synthesis needs, meeting demand for more realistic and customizable speech generation.