Recently, AMD introduced its first internally developed compact language model, AMD-Llama-135m, on the Huggingface platform. This model has attracted widespread industry attention due to its unique speculative decoding capabilities and its capacity to process 67 billion tokens. AMD-Llama-135m is released under the Apache 2.0 open-source license, aiming to promote technology sharing and application.
Speculative decoding stands as the core technical advantage of AMD-Llama-135m. It employs a two-tier validation strategy: initially, a smaller preliminary model swiftly generates a set of candidate tokens; subsequently, these candidates are forwarded to a more complex target model for further screening and validation. This approach not only allows the model to generate multiple tokens simultaneously in a single forward pass but also significantly reduces RAM usage, thereby enhancing computational efficiency.
Regarding the training process, AMD disclosed that the AMD-Llama-135m model was meticulously trained over six days utilizing four AMD Instinct MI250 high-performance computing nodes. For the specialized version optimized for programming tasks, AMD-Llama-135m-code, an additional four days of fine-tuning were conducted to ensure optimal performance in code understanding and generation.
The launch of AMD-Llama-135m not only showcases AMD's technological advancements in the field of artificial intelligence but also provides new tools and insights for research and applications in natural language processing.