AutoDroid-V2: New Breakthrough in Mobile Task Automation Using SLM AI NEWS

Home
AInews
AutoDroid-V2: New Breakthrough in Mobile Task Automation Using SLM

AutoDroid-V2: New Breakthrough in Mobile Task Automation Using SLM

2025-01-03

The rapid development of large language models (LLMs) and visual language models (VLMs) is transforming the way mobile devices are automated, offering unprecedented solutions for complex user tasks. However, traditional step-by-step GUI agent methods, which handle user tasks through dynamic decision-making and reflection, heavily rely on powerful cloud-based models like GPT-4 and Claude, raising concerns about privacy, security, data usage, and cost.

In the past, mobile task automation primarily depended on template-based approaches, such as Siri, Google Assistant, and Cortana. These methods, however, had limitations when dealing with complex tasks. As technology advanced, GUI-based automation methods emerged, capable of handling more intricate tasks without relying on third-party APIs or extensive programming. Nevertheless, these methods, especially script-based GUI agents, still face challenges in knowledge extraction and script execution due to the dynamic nature of mobile applications.

To address these challenges, researchers from the Institute of Artificial Intelligence Industry Research at Tsinghua University have introduced AutoDroid-V2. This innovative mobile task automation tool leverages the encoding capabilities of small language models (SLMs) to build robust GUI agents. Unlike traditional step-by-step GUI agents, AutoDroid-V2 uses a script-based approach to generate and execute multi-step scripts based on user commands, significantly enhancing efficiency and performance.

AutoDroid-V2's architecture consists of offline and online phases. In the offline phase, the system builds application documentation by thoroughly analyzing the app exploration history, providing a foundation for script generation. This documentation integrates AI-guided GUI state compression, automatic element XPath generation, and GUI dependency analysis, ensuring the scripts are concise and accurate. In the online phase, when a user submits a task request, a customized local LLM generates a multi-step script, which is then executed by a domain-specific interpreter for reliable and efficient runtime execution.

Experimental results show that AutoDroid-V2 tested 226 tasks across 23 mobile applications, achieving a task completion rate improvement of 10.5% to 51.7% compared to leading benchmarks like AutoDroid, SeeClick, CogAgent, and Mind2Web. Additionally, it significantly reduced computational requirements, with input and output token consumption decreased by 43.5 times and 5.8 times, respectively, and LLM inference latency reduced by 5.7 to 13.4 times. When tested on different LLMs, AutoDroid-V2 consistently demonstrated high performance.

The researchers noted that AutoDroid-V2 represents a significant advancement in mobile task automation. By utilizing on-device SLMs and an innovative document-guided, script-based approach, it achieves accuracy comparable to cloud-based solutions while maintaining device-level privacy and security. This groundbreaking achievement brings new hope and development directions to the field of mobile task automation.

Although AutoDroid-V2 has achieved notable success in handling GUI applications with structured text representations, it still faces limitations when dealing with GUI applications lacking such representations, such as those based on Unity and web-based applications. However, the researchers suggest that integrating VLMs to restore structured GUI representations based on visual features could address this challenge, further expanding the applicability of AutoDroid-V2.

Jules

Jules - AI coding assistant with automatic pull requests

Final Round AI

Final Round AI - Automated job interview preparation and assistance

Sapia

Sapia - AI hiring agent for fair recruitment processes

Magic Motion

Magic Motion - AI transforms text into engaging 3D animations

Recall

Recall - AI summarizer for streamlined knowledge management

Rocket.new

Rocket.new - AI analyzes and summarizes call conversations

Qodo AI Platform

Qodo AI Platform - AI tool for ensuring code quality and integrity

RECENT AI TOOLS

Interviewer AI

Jules

Final Round AI

Sapia

Magic Motion

RECENT AI NEWS

X Trial AI Chatbot Drives Community Notes Initiative

Amazon Deploys One Millionth Robot and Unveils Generative AI Model

Google’s Agent2Agent Protocol Joins Linux Foundation

Elon Musk's xAI Raises $10 Billion to Upgrade AI Infrastructure

Calling the Algorithm Doctor: Microsoft's AI Diagnoses Like House MD, Prices Like Costco

Cloudflare Halts AI Crawlers, Gaining Industry Applause

Google DeepMind Releases AlphaGenome: Unified AI Model for High-Resolution Genomic Interpretation

Cursor Launches Web Application for Managing AI Coding Agents

RECENT AI TOOLS