Apple Introduces ReALM System: Recognizes Screen Content and Enables Interaction
According to a recent research report, Apple's research team has successfully developed a new artificial intelligence system. This system can recognize fuzzy references to entities on the screen and interact more naturally with voice assistants by combining dialogue and background context.
The system, called ReALM (Reference Resolution As Language Modeling), uses large-scale language models to transform complex reference resolution tasks, including understanding visual element references on the screen, into pure language modeling problems. This innovative approach has significantly improved the performance of ReALM compared to existing technologies.
The Apple research team wrote in the report, "Understanding context and references is crucial for voice assistants. Allowing users to ask questions about the content on the screen is a key step in ensuring a truly hands-free experience with voice assistants."
One of the innovations of ReALM is its ability to reconstruct screen information by parsing entities and their positions on the screen, generating a textual representation that captures the visual layout. Researchers found that this approach, combined with fine-tuning language models specifically designed for reference resolution, outperformed GPT-4 in task performance.
The report states, "We have demonstrated significant improvements over existing systems on different types of references, with our smallest model achieving over a 5% absolute gain on screen references. Our large-scale model has greatly surpassed GPT-4 in performance."
However, researchers also cautioned that there are limitations to relying on automated parsing of screens. For more complex visual references, such as distinguishing multiple images, it may still be necessary to combine computer vision and multimodal techniques.
Although Apple has lagged behind its technological competitors in the rapidly evolving AI field, its progress in artificial intelligence research is quietly accelerating.
From multimodal models that integrate vision and language, to AI-driven animation tools, to technologies that build high-performance dedicated AI with budgets, Apple's research lab continues to achieve breakthroughs, demonstrating its rapidly increasing AI ambitions.
However, as a tech giant known for its secrecy, Apple faces fierce competition from companies such as Google, Microsoft, Amazon, and OpenAI. These companies have actively launched generative AI products in multiple fields such as search, office software, and cloud services.
For a long time, Apple has played the role of a fast follower rather than a pioneer. However, it is now facing a market transformed at an astonishing speed by artificial intelligence. At the highly anticipated June Worldwide Developers Conference, Apple is expected to introduce a new large-scale language model framework, "Apple GPT" chatbot, and other AI-driven features in its ecosystem.
Apple CEO Tim Cook hinted during a recent earnings conference call, "We are excited to share details of our ongoing work in the AI field later this year." Although Apple is known for its secrecy, it is clear that its efforts in AI are sweeping and wide-ranging.
However, as the battle for AI dominance heats up, this iPhone manufacturer's late arrival has put it in an unusual position of weakness. While its strong financial resources, brand loyalty, elite engineers, and tightly integrated product portfolio provide opportunities for counterattacks, nothing is absolute in this high-stakes competition.
A new era is approaching, an era filled with ubiquitous and truly intelligent computing. By June, we will see if Apple has done enough to secure its place in shaping this new era.