Microsoft Introduces Higher Efficiency and Cost-Effective 1-Bit LLM
Microsoft has introduced a new language model called 1-Bit LLM (Large Language Model), with recent contributions from projects like BitNet.
The innovation lies in the representation of each parameter in the model, commonly known as weights, which are limited to only 1.58 bits. Unlike traditional LLMs that use 16-bit floating-point values (FP16) as weights, BitNet b1.58 restricts each weight to one of three values: -1, 0, or 1. This substantial reduction in the use of bits forms the foundation of the proposed model.
Despite each parameter in BitNet b1.58 using only 1.58 bits, the model performs comparably to traditional models in terms of perplexity and final task performance, given the same model size and training data. Importantly, it offers cost-effectiveness in terms of latency, memory usage, throughput, and energy consumption.
This 1.58-bit LLM introduces a new language model extension and training method that strikes a balance between high performance and cost-effectiveness. Furthermore, it opens up possibilities for new computing approaches and hints at the potential for dedicated hardware optimization for these 1-Bit LLMs.
The paper also explores the possibility of native support for long sequences in LLMs with BitNet b1.58. The authors suggest further research to explore the potential of lossless compression for achieving higher efficiency.