Google has launched the Gemini 2.5 Computer Use model, making it publicly available for developers through the Gemini API on Google AI Studio and Vertex AI. This specialized model is designed to allow AI agents to directly operate user interfaces in web browsers and, to some extent, on mobile devices, offering a new level of automation for tasks previously requiring human-like interaction such as form filling, dropdown selection, and navigation behind logins. Unlike prior models that interfaced primarily through APIs, this release focuses on graphical interface control and reports lower latency with high accuracy compared to other solutions, as demonstrated in benchmarks like Online-Mind2Web and AndroidWorld.
Introducing our new SOTA Gemini 2.5 Computer Use model, trained to take 13 different actions and navigate a browser.
— Logan Kilpatrick (@OfficialLoganK) October 7, 2025
This is just the first step in the Gemini computer use story : ) pic.twitter.com/JfEF8yDN5i
The intended audience includes developers and teams working on workflow automation, personal assistant tools, and UI testing, as well as companies seeking to automate repetitive digital tasks. The model processes user requests by analyzing screen context, previous actions, and custom function lists to determine the next UI action. Safety is addressed through built-in model features and per-step safety checks, with developers able to set additional controls to prevent high-risk actions.
Google DeepMind, the team behind the release, is leveraging its experience with large language models and agentic AI for broader automation goals. The company has already trialed this model internally for UI testing, Project Mariner, and in Search’s AI Mode, and early users report strong performance for personal assistants and workflow automation. This release marks a step forward in AI-driven digital task automation, aiming to empower both individual developers and larger organizations.