Apple Introduces ReALM System: Recognizes Screen Content and Enables Interaction AI NEWS

Home
AInews
Apple Introduces ReALM System: Recognizes Screen Content and Enables Interaction

Apple Introduces ReALM System: Recognizes Screen Content and Enables Interaction

2024-04-02

According to a recent research report, Apple's research team has successfully developed a new artificial intelligence system. This system can recognize fuzzy references to entities on the screen and interact more naturally with voice assistants by combining dialogue and background context. The system, called ReALM (Reference Resolution As Language Modeling), uses large-scale language models to transform complex reference resolution tasks, including understanding visual element references on the screen, into pure language modeling problems. This innovative approach has significantly improved the performance of ReALM compared to existing technologies. The Apple research team wrote in the report, "Understanding context and references is crucial for voice assistants. Allowing users to ask questions about the content on the screen is a key step in ensuring a truly hands-free experience with voice assistants." One of the innovations of ReALM is its ability to reconstruct screen information by parsing entities and their positions on the screen, generating a textual representation that captures the visual layout. Researchers found that this approach, combined with fine-tuning language models specifically designed for reference resolution, outperformed GPT-4 in task performance. The report states, "We have demonstrated significant improvements over existing systems on different types of references, with our smallest model achieving over a 5% absolute gain on screen references. Our large-scale model has greatly surpassed GPT-4 in performance." However, researchers also cautioned that there are limitations to relying on automated parsing of screens. For more complex visual references, such as distinguishing multiple images, it may still be necessary to combine computer vision and multimodal techniques. Although Apple has lagged behind its technological competitors in the rapidly evolving AI field, its progress in artificial intelligence research is quietly accelerating. From multimodal models that integrate vision and language, to AI-driven animation tools, to technologies that build high-performance dedicated AI with budgets, Apple's research lab continues to achieve breakthroughs, demonstrating its rapidly increasing AI ambitions. However, as a tech giant known for its secrecy, Apple faces fierce competition from companies such as Google, Microsoft, Amazon, and OpenAI. These companies have actively launched generative AI products in multiple fields such as search, office software, and cloud services. For a long time, Apple has played the role of a fast follower rather than a pioneer. However, it is now facing a market transformed at an astonishing speed by artificial intelligence. At the highly anticipated June Worldwide Developers Conference, Apple is expected to introduce a new large-scale language model framework, "Apple GPT" chatbot, and other AI-driven features in its ecosystem. Apple CEO Tim Cook hinted during a recent earnings conference call, "We are excited to share details of our ongoing work in the AI field later this year." Although Apple is known for its secrecy, it is clear that its efforts in AI are sweeping and wide-ranging. However, as the battle for AI dominance heats up, this iPhone manufacturer's late arrival has put it in an unusual position of weakness. While its strong financial resources, brand loyalty, elite engineers, and tightly integrated product portfolio provide opportunities for counterattacks, nothing is absolute in this high-stakes competition. A new era is approaching, an era filled with ubiquitous and truly intelligent computing. By June, we will see if Apple has done enough to secure its place in shaping this new era.

Spot AI

Transform cameras into smart video intelligence

Miko

AI interactive learning companion for children

Comet

Smart browser with AI features available for any website

Mirelo AI

AI-generated soundtracks for your video projects

Giskard AI

AI platform for identifying model vulnerabilities

SnapCalorie

AI photo calorie tracker for accurate nutrition

Supio

**AI legal assistant for personal injury cases**

RECENT AI TOOLS

Action Figure Generator

Spot AI

Miko

Comet

Mirelo AI

RECENT AI NEWS

Intel Launches New Crescent Island GPU, Re-entering the AI Chip Market

You will soon be able to shop at Walmart through ChatGPT

Google Meet Launches AI-Powered Virtual Makeup Feature

Gemini by Google is Now Available to Help You Schedule Google Calendar Meetings

Google Updates Search and Discovery Features with New Expandable Ads and AI Capabilities

Sam Altman Says ChatGPT Will Soon Allow Adult Users to Engage in Explicit Conversations

Oracle Details Upcoming AI Clusters Powered by Nvidia and AMD Chips

Salesforce Launches New OpenAI and Anthropic Integrations

RECENT AI TOOLS