Google launches Gemini 3.1 Flash-Lite on AI Studio and Vertex AI

What's new? Google introduced Gemini 3.1 Flash-Lite for rapid processing via Gemini API on Google AI Studio and Vertex AI; it offers 2.5x faster first answer and 45% faster output.

· 1 min read
Gemini

Google has rolled out Gemini 3.1 Flash-Lite, targeting developers and enterprises who require rapid processing and cost-conscious solutions for large-scale workloads. The release is available in preview through the Gemini API on Google AI Studio and for enterprise customers on Vertex AI. This model is tailored for high-frequency tasks and provides a compelling balance of speed, quality, and affordability. At $0.25 per million input tokens and $1.50 per million output tokens, it is positioned as a competitive choice for organizations with substantial data processing needs.

Gemini 3.1 Flash-Lite distinguishes itself with a 2.5X faster Time to First Answer Token and a 45% increase in output speed compared to Gemini 2.5 Flash, while maintaining or exceeding quality standards. It posts an Elo score of 1432 on the Arena.ai leaderboard and achieves strong benchmark results, such as 86.9% on GPQA Diamond and 76.8% on MMMU Pro. The model supports advanced reasoning, multimodal understanding, and gives users control over its reasoning depth, making it suitable for diverse tasks like translation, content moderation, and complex data analysis. Early users from companies like Latitude, Cartwheel, and Whering report that Flash-Lite can process complex inputs with precision typically seen in higher-tier models.

Gemini

Google, the company behind Gemini 3.1 Flash-Lite, continues to expand its AI offerings with the Gemini 3 series, focusing on delivering scalable and flexible tools for developers and enterprise customers who prioritize performance and efficiency in their applications.

Source