User achieves running Baby Llama on Samsung Galaxy Watch 4

2023-12-19

Baby Llama is a weekend project created by Andrej Karpathy of OpenAI to run Llama 2 on edge devices. A user named Joey (e/λ) on the X platform shared a video of running 'llama.c' on a Samsung Galaxy Watch 4. Baby Llama is a weekend project created by Andrej Karpathy of OpenAI, with the goal of running Llama 2 on edge devices. Karpathy stated that this approach was inspired by Georgi Gerganov's project, which was almost identical to running the first version of LLaMA on a MacBook using C and C++ languages. Karpathy's method involves training the Llama 2 LLM architecture from scratch using PyTorch. After training, he saves the model weights in a raw binary file. The interesting part comes next: he wrote a 500-line C language file called 'run.c', which loads the saved model and performs inference using single-precision floating-point (fp32) calculations. This minimalist approach ensures low memory usage and does not require external libraries, allowing efficient execution on a single M1 laptop without a GPU. Karpathy also explored various techniques to improve the performance of the C code, including different compilation flags such as -O3, -Ofast, -march=native, etc. These flags optimize the code by enabling vectorization, loop unrolling, and other hardware-specific adjustments. By trying these flags, users can achieve faster inference on their specific systems. If you want to try the Baby Llama 2 model on your own device, you can download the pre-trained model checkpoint from Karpathy's repository. The provided code will allow you to compile and run the C code on your system, providing a magical experience of running deep learning models in a minimalist environment. It is important to note that Karpathy's project is a weekend experiment and is not intended for production-level deployment, as he himself acknowledges. The focus of this experiment is to demonstrate the feasibility of running the Llama 2 model on low-power devices using pure C code, which has not traditionally been considered a useful language for machine learning as it does not involve GPUs.