Comet AI browser agent now captures screenshots during tasks

· 2 min read
Comet

Perplexity’s latest work on its Comet browser signals continued development toward a more autonomous and visually aware AI agent. In the latest web release, traces suggest the addition of a screenshot-taking ability that allows the Comet agent to capture visual snapshots of the pages it visits. These screenshots are expected to appear directly in the tasks list interface, allowing users to verify what the agent actually saw during execution. This feature could be key for transparency and debugging in automation flows — for example, checking if the agent navigated to the correct place or validating extracted data.

This update benefits users who rely on agents to browse or scrape websites. Instead of only text-based confirmations, they’ll now be able to visually inspect the context, which is particularly useful for tasks involving UI-heavy dashboards, graphical reports, or non-structured content. Moreover, the screenshot capture will likely enable a new layer of capability: allowing users to request summaries or descriptions based on visual content. If integrated with visual analysis models, this would extend the agent’s reach beyond plain text to interpret images, layouts, and even interface states.

Parallel to this, Perplexity appears to be preparing support for Outlook integration — likely targeting desktop environments, particularly Windows. Gmail and Google Calendar are already integrated into Comet, and Outlook would open automation possibilities for enterprise and office-heavy workflows. Given that the Comet browser is also expected to arrive on Windows soon, this pairing aligns well with Perplexity’s broader agent strategy focused on real-world productivity tools.

Perplexity
A trace of Outlook integration in the code

The Android version of Comet remains scheduled for a fall release, which will likely expand usage to mobile automation tasks later this year. In context, Perplexity is gradually pushing Comet to operate as a cross-platform agent framework with access to email, calendar, and web UI, potentially enabling end-to-end task execution and monitoring across devices and accounts.