Apple Enables Larger AI Models to Run on Smartphones AI NEWS

Home
AInews
Apple Enables Larger AI Models to Run on Smartphones

Apple Enables Larger AI Models to Run on Smartphones

2023-12-26

Last year, there was a revolution in the high-tech field as artificial intelligence replaced the metaverse and became the hottest topic on the internet. Suddenly, everyone was creating their own large language models (LLMs), but most of them ran in the cloud and required powerful server hardware support. Smartphones did not have enough memory to run the largest and most powerful models, but Apple believed it had a solution. In a new research paper, Apple engineers proposed a method to store LLM parameters in the NAND flash memory of iPhones.

With companies like Qualcomm, Intel, and others incorporating machine learning hardware into their latest chips, your next device may have everything needed to run local AI. The problem is that large language models are just too "big." During model execution, there may be trillions of parameters that need to be stored in memory, and mobile phone RAM is very limited - especially for Apple phones, with the maximum RAM of the iPhone 15 Pro being only 8GB.

In data centers, AI accelerator cards that run these models have more memory than similar graphics cards. For example, the Nvidia H100 is equipped with 80GB of HBM2e memory, while the gaming-focused RTX 4090 Ti has only 24GB of GDDR6X.

Google is working to enhance mobile LLMs with its new Gemini model, which includes a "nano" version designed specifically for smartphones. Apple's new research aims to rely on NAND flash storage, which typically has at least 10 times the storage space of mobile phone RAM, to fit a larger model into smartphones. However, the main issue is speed - flash memory is much slower.

Apple NAND Speed Boost

According to the research, the team used two techniques to enable their model to run without requiring RAM. Both methods reduce the amount of data the model needs to load from memory. Windowing allows the model to load only the parameters of the last few tokens, essentially recycling data to reduce storage access time. Row-column bundling is also used to organize data more efficiently, allowing the model to handle larger data blocks.

The research has been successful in expanding the LLM capabilities of iPhones. With this approach, LLMs run 4-5 times faster on standard CPUs and 20-25 times faster on GPUs. Perhaps most importantly, iPhones can run AI models twice the size of those installed in memory, achieved by saving parameters in internal storage. The conclusion of the research is that this method paves the way for running LLMs on devices with limited memory.

LockedIn AI

LockedIn AI - AI job interview assistant

Interviewer AI

Interviewer AI - AI video interviews streamline talent screening process

Jules

Jules - AI coding assistant with automatic pull requests

Final Round AI

Final Round AI - Automated job interview preparation and assistance

Sapia

Sapia - AI hiring agent for fair recruitment processes

Magic Motion

Magic Motion - AI transforms text into engaging 3D animations

Recall

Recall - AI summarizer for streamlined knowledge management

RECENT AI TOOLS

Zeroheight

LockedIn AI

Interviewer AI

Jules

Final Round AI

RECENT AI NEWS

Apple Confirms Launch of Next-Gen AI Assistant with iOS 26

Daniel Gross, Former CEO of Safety Superintelligence, Joins Meta's New AI Lab

Google Launches New Veo 3 Video Generation Model Globally

Meta's New Strategy: Enhancing User Engagement via Proactive Messaging Chatbots

Perplexity AI Launches New "Max" Subscription Service with Monthly Fee of $200

Sam Altman Criticizes Meta's Hiring Strategy as 'Unpalatable,' Calls OpenAI Still Mission-Driven

ChatGPT's News Site Recommendations Rising, but Not Enough to Offset Search Traffic Decline

Google Releases Urgent Chrome Fix for Zero-Day Vulnerability — Users Advised to Update Immediately

RECENT AI TOOLS