Google launches Gemini 2.5 for UI automation

What's new? Gemini 2.5 computer use model is open via Gemini API on Google AI Studio and Vertex AI for UI control; it shows low latency, high accuracy and safety checks;

· 1 min read
Gemini
Image: Google

Google has launched the Gemini 2.5 Computer Use model, making it publicly available for developers through the Gemini API on Google AI Studio and Vertex AI. This specialized model is designed to allow AI agents to directly operate user interfaces in web browsers and, to some extent, on mobile devices, offering a new level of automation for tasks previously requiring human-like interaction such as form filling, dropdown selection, and navigation behind logins. Unlike prior models that interfaced primarily through APIs, this release focuses on graphical interface control and reports lower latency with high accuracy compared to other solutions, as demonstrated in benchmarks like Online-Mind2Web and AndroidWorld.

The intended audience includes developers and teams working on workflow automation, personal assistant tools, and UI testing, as well as companies seeking to automate repetitive digital tasks. The model processes user requests by analyzing screen context, previous actions, and custom function lists to determine the next UI action. Safety is addressed through built-in model features and per-step safety checks, with developers able to set additional controls to prevent high-risk actions.

Google DeepMind, the team behind the release, is leveraging its experience with large language models and agentic AI for broader automation goals. The company has already trialed this model internally for UI testing, Project Mariner, and in Search’s AI Mode, and early users report strong performance for personal assistants and workflow automation. This release marks a step forward in AI-driven digital task automation, aiming to empower both individual developers and larger organizations.

Source