User achieves running Baby Llama on Samsung Galaxy Watch 4
Baby Llama is a weekend project created by Andrej Karpathy of OpenAI to run Llama 2 on edge devices.
A user named Joey (e/λ) on the X platform shared a video of running 'llama.c' on a Samsung Galaxy Watch 4. Baby Llama is a weekend project created by Andrej Karpathy of OpenAI, with the goal of running Llama 2 on edge devices.
Karpathy stated that this approach was inspired by Georgi Gerganov's project, which was almost identical to running the first version of LLaMA on a MacBook using C and C++ languages.
Karpathy's method involves training the Llama 2 LLM architecture from scratch using PyTorch. After training, he saves the model weights in a raw binary file. The interesting part comes next: he wrote a 500-line C language file called 'run.c', which loads the saved model and performs inference using single-precision floating-point (fp32) calculations. This minimalist approach ensures low memory usage and does not require external libraries, allowing efficient execution on a single M1 laptop without a GPU.
Karpathy also explored various techniques to improve the performance of the C code, including different compilation flags such as -O3, -Ofast, -march=native, etc. These flags optimize the code by enabling vectorization, loop unrolling, and other hardware-specific adjustments. By trying these flags, users can achieve faster inference on their specific systems.
If you want to try the Baby Llama 2 model on your own device, you can download the pre-trained model checkpoint from Karpathy's repository. The provided code will allow you to compile and run the C code on your system, providing a magical experience of running deep learning models in a minimalist environment.
It is important to note that Karpathy's project is a weekend experiment and is not intended for production-level deployment, as he himself acknowledges. The focus of this experiment is to demonstrate the feasibility of running the Llama 2 model on low-power devices using pure C code, which has not traditionally been considered a useful language for machine learning as it does not involve GPUs.