Google Developing AI Technology to Control Browsers, Plans December Showcase

2024-10-28

Reports indicate that Google may unveil its internally developed product based on the Rabbit large-action model concept this December. Known internally as "Project Jarvis," the initiative is designed to perform various tasks for users, such as gathering research data, purchasing items, or booking flights. This information comes from three sources with direct knowledge of the project.

Jarvis will leverage Google's forthcoming Gemini large language model, specifically optimized for the Chrome browser and currently operable only through a web browser. Its primary objective is to assist users in automating everyday online tasks by capturing and analyzing screenshots to perform actions like clicking buttons or entering text. It is reported that, at this stage, Jarvis requires a "few seconds" interval between each operation.

Currently, several major AI companies are creating products with similar capabilities. For instance, Microsoft's Copilot Vision enables users to discuss the content of the web pages they are viewing. Apple plans to release Intelligence features within the next year, allowing task execution across multiple applications based on screen content. AI startup Anthropic has launched an updated version of Claude beta that can utilize a computer, although it has been described as "cumbersome and prone to errors." OpenAI is also reportedly developing AI models with comparable functionalities.

The report highlights that Google intends to showcase Jarvis in December; however, this schedule may be subject to change. The company might initially release the tool to a select group of testers to identify and address any potential issues.

Notably, just a few days before this information emerged, Anthropic introduced a new feature called "Computer Usage," which enables its AI to interact with a user's computer screen. This feature can interpret screen content and perform actions such as browsing websites, clicking buttons, and entering text, provided the user consents.

These advancements signify a shift in AI-assisted technology from relying on backend application integrations to handling real-time screen activities. Google's Jarvis project takes this a step further by aiming to interact directly through the user's browser.

As more companies focus on AI agent tools that minimize dependency on human oversight, competition in this sector is intensifying. The latest developments from companies like Google, Microsoft, and Anthropic indicate their exploration of AI technologies to automate routine computer tasks, aiming to enhance business efficiency and reduce costs.