Entering the Era of "LLM Contamination"

2024-04-03

Everyone is building LLM. Whether closed or open, there are far more language models on the market than there are extensions and applications developed based on these models. Some are small in scale, while others are large, but only a few companies are able to develop something meaningful from them.


Undeniably, this is also an important matter. Many extensions born from open language models only add language functionality or improve speed on a small scale. While this is a noble task, it does not truly impact the adoption of these models.


These models vary in size, ranging from moderate to huge. However, despite their large numbers, only a few companies are able to effectively translate them into practical applications.


Indeed, the surge of LLM represents an important milestone in the development of artificial intelligence. However, the number of models produced far exceeds the development speed of meaningful extensions and practical applications. Although these efforts are commendable, they do not address the core issue of widespread adoption of LLM.


Is it a waste of time?


For example, on the Hugging Face leaderboard, there are thousands of language models. Whenever a new model is released, people start tinkering with it, testing its capabilities, and benchmarking it for their use cases, and then move on to the next one. The next day, this cycle repeats with the latest model.


Falcon is one of the largest open-source language models, and it received a lot of testing and appreciation from developers upon its release. However, after testing its capabilities, people found that even the smaller-scale Meta's Llama 2 performed better. The same situation occurred with Mistral's new model and OpenAI's GPT-2, despite the latter being around for years.


Speaking of Falcon, it does exist, but people rarely use it. No significant applications are built on it. However, as the organization behind this language model, TII may develop another AI model and hope it ranks high on the leaderboard.


Undoubtedly, this is the way of competition. Databricks' new AI model DBRX currently outperforms all other models on the market and at a lower price. Given its capabilities, enterprises are ready to adopt it. When Meta releases Llama 3, this hype will undoubtedly reappear. There will be more choices, but people will also forget about Llama 2.


Today, this phenomenon of base language models flooding without any innovation is called "LLM pollution". The surplus of LLM not only fails to promote the development of innovative or transformative applications but may also result in the field being filled with redundant or underutilized models.


What should be done next?


Databricks' VP of Generative AI, Naveen Rao, said that most foundational model companies will fail. "You have to do better than them (OpenAI). If you can't, and the cost of switching is low enough, why would you use someone else's model? So, unless you can beat them, just trying to be ahead doesn't make sense," he added.


Rao also stated that everyone has their own opinions, but many people just build models and call it a victory. "Wow! You built a model. Great," he joked. But he believes that without differentiation or problem-solving capabilities, it won't work.


Rao said, "Just because you say you can do it and build a technology doesn't really prove that you can solve the problem."


Investing billions of dollars into the next GPT may create an outstanding model for OpenAI, but the billions of dollars used to build GPT-4 may go to waste. People may use it for a while, but it will soon become the next GPT-2. Accelerating AI development is important, but at the same time, it is necessary to measure its positive and negative impact on adoption.


Currently, there is an urgent need to pay more attention to the role of LLM in practical applications and real-world problem-solving. In addition to focusing on the technical capabilities of language models, attention should also be given to their practical utility and social impact.


Companies certainly won't all use the same LLM, and we do need more choices. But before building a bunch of models in different languages, it is also necessary to clarify the exact use cases for these models. The era of "LLM pollution" has arrived. There will be a large number of LLMs that are ignored, once ranked at the top of the leaderboard, now silently piling up.