Google's recent release of Gemini 1.5 with a 1M context window has sparked a new debate on the value of Retrieval Augmented Generation (RAG). LLM typically struggles with illusions. To address this challenge, two solutions have been introduced, one involving an increase in the context window and the other using RAG.
Recently, some developers have been experimenting with Gemini 1.5. Ethan Mollick, a professor at the Wharton School, wrote on X, "I uploaded 'The Great Gatsby' and made two changes, mentioning a 'box iPhone' and a 'laser lawnmower.' Gemini handled it very well (and also found another issue). Claude, on the other hand, can handle it but produces illusions. RAG doesn't work."
Another X user, McKay Wrigley, inputted an entire biology textbook into Gemini 1.5 Pro. He asked three very specific questions, and the answers to each question were 100% correct.
Sully Omar, co-founder and CEO of Cognosys, wrote, "Gemini 1.5 Pro still hasn't received enough publicity. I uploaded the entire code repository directly from GitHub, along with all the issues, including the Vercel AI SDK. It not only understands the code repository but also identifies the most urgent issues and implements fixes. This changes everything."
The above three examples demonstrate that Gemini 1.5 successfully retrieves key information from documents with its extensive context window. However, this does not indicate the limitations of RAG.
RAG and Context Window
Many people are still confused about the difference between RAG and the context window. The context window restricts the model to information within a given text range, while RAG extends the model's capabilities to external sources, greatly expanding the range of accessible information.
Noting the buzz on the internet, Oriol Vinyals, head of Google DeepMind Research and the Deep Learning team, expressed his views, saying, "Although we can now handle 1 million or more tokens in context, RAG has not yet fully unleashed its potential. In fact, RAG has some nice properties that can enhance long contexts (which can also be enhanced by long contexts)."
He added, "RAG allows you to look up relevant information, but the way the model accesses information may be too limited due to compression. Long contexts may help bridge this gap, similar to how L1/L2 caches in modern CPUs work with main memory."
A larger context window allows LLM to consider more text, resulting in more accurate and coherent responses, especially for complex and long sentences. However, this does not mean that the model will not produce illusions.
According to a paper titled "Lost in the Middle: How Language Models Use Long Context" by researchers from Stanford University, the University of California, Berkeley, and Samaya AI, LLM exhibits high information retrieval accuracy at the beginning and end of a document. However, this accuracy decreases in the middle, especially with increasing input processing.
RAG Still Has a Place
Elvis Saravia, co-founder of DAIR, wrote, "The worst-case scenario I've seen in the past few days is that long-context models like Gemini 1.5 will replace RAG."
He further added that to address these types of problems, RAG and long-context LLM can be combined to build a powerful system for effectively retrieving and executing large-scale analysis of critical historical information.
He said, "We will make progress in addressing some challenges, such as 'middle loss' and handling more complex structured and dynamic data, but we still have a long way to go." Saravia also emphasized that different LLMs will help solve different types of problems. "We need to move away from the idea that one LLM rules them all."
There is no doubt that Gemini 1.5 outperforms Claude 2.1 and GPT-4 Turbo because it can absorb entire code repositories, process over 100 papers, and various documents. However, it certainly hasn't killed RAG.