Google launches Agentic Vision in Gemini 3 Flash

What's new? Agentic Vision in Gemini 3 Flash uses a think act observe loop with Python code for visual analysis; available via Gemini API in Google AI Studio and Vertex AI;

· 1 min read
Gemini

Google has introduced Agentic Vision in Gemini 3 Flash, marking a shift in how AI models perform visual tasks. This release targets developers, businesses, and AI researchers who rely on advanced image analysis and visual reasoning capabilities. The feature is immediately available to users through the Gemini API in Google AI Studio, Vertex AI, and is rolling out within the Gemini app for broader access.

Agentic Vision transforms image understanding with an iterative approach where the model actively investigates visual inputs. By integrating code execution, Gemini 3 Flash can carry out a Think, Act, Observe loop, analyzing queries, manipulating images with Python code, and using the results to refine its final answer. Key functionalities include:

  1. Automatic zooming for fine details
  2. Annotating images
  3. Parsing complex tables
  4. Visualizing data with deterministic Python environments
Evals

These capabilities provide a consistent 5-10% quality increase across vision benchmarks compared to previous versions, and early users like PlanCheckSolver.com have reported measurable improvements in accuracy for tasks such as building plan validation.

Google is at the forefront of multimodal AI research, and this announcement strengthens its position by enabling its Gemini models to not merely interpret but interact with visual data. The company plans to extend Agentic Vision’s reach by supporting more model sizes and integrating additional tools like web and reverse image search. This latest development underscores Google’s ongoing investment in making its AI models more robust and contextually aware for a diverse set of real-world applications.

Source