xAI Releases Grok Language Model, Surpassing GPT-3.5

2023-11-15

Recently, xAI, the artificial intelligence company founded by Elon Musk, announced a large-scale language model called Grok. Grok, available on xAI's X platform, has surpassed other models of similar scale, including GPT-3.5, in multiple benchmark tests.

xAI was launched earlier this year and trained their first model, Grok-0, which consists of 33 billion parameters. While the company has not disclosed many details about the latest version, Grok-1, they have stated that it outperforms GPT-3.5 and Llama 2 in mathematical benchmark GSM8k, MATH, question-answering benchmark MMLU, and coding benchmark HumanEval. The model is described as having "wit and rebellious tendencies," and xAI claims that it can answer questions that other language models cannot. The xAI team states:

"By creating and improving Grok, our goal is to gather feedback and ensure that we are building artificial intelligence tools that benefit everyone. We believe it is important to design AI tools that are useful to people of various backgrounds and political views. We also aim to empower users to use our AI tools under legitimate premises. We hope Grok can become a powerful research assistant for anyone, helping them quickly access relevant information, process data, and generate new ideas. Our ultimate goal is to help humanity pursue understanding."

Although the term "grok" first appeared in Robert Heinlein's science fiction novel "Stranger in a Strange Land," xAI states that their model was inspired by the fictional guidebook of the same name in Douglas Adams' science fiction series "The Hitchhiker's Guide to the Galaxy." xAI claims that it is "intended to answer almost any question..."

While there are not many technical details about Grok, xAI mentioned that they used JAX, Rust, and Kubernetes to build a custom machine learning framework for training and inference. They also mentioned that the model underwent two months of training. Toby Pohlen, a founding member of xAI, released a video showcasing the Grok user interface on the X platform. Additionally, the open-source vector database Qdrant posted on their X platform account that Grok's real-time knowledge feature is built on Qdrant and encouraged users to "stay tuned" for future blog posts and technical exchanges with the xAI engineering team.

The announcement has received mixed reactions. On Reddit, one user praised the effort, stating:

"Impressive to beat Meta with only two months of training. We know they have at least 10k H100s, more than what was used for GPT-4. It looks like they will continue to release new versions, so it's likely to improve rapidly. Also, the model seems to be much less reviewed, which will push other companies to adopt similar practices."

Users on Hacker News, however, expressed skepticism. Some speculated that Grok's performance in benchmark tests may be due to training on the test set:

"Many modern language models have access to copies of the entire internet, including test sets for many benchmarks. So, if someone claims to beat ChatGPT and their model was trained on the test set, of course, their performance will be better. Even ChatGPT might have been trained on the test set."

xAI acknowledges this possibility. However, the team manually scored the mathematical part of Grok, which is part of the Hungarian National High School Graduation Exam, and released the scores after collecting data. In this math exam, Grok outperformed GPT-3.5 and Claude 2.

Other users questioned whether xAI's claim of no review process means they overlook biases and other risks. xAI states that they are researching protective measures to prevent catastrophic malicious use. The company's advisors include Dan Hendrycks, the director of the AI Safety Center. Hendrycks recently appeared on the Future of Life Institute podcast discussing the risks of artificial intelligence. In the podcast, Hendrycks mentioned xAI:

"I think it's worth noting that [xAI] is very serious about this. I expect it to be one of the three major AI companies in the next year or two: OpenAI, Google DeepMind, and xAI."

Grok's early access testing is currently only available to verified X users.