OpenAI's latest model chatgpt-4o-latest regains the top spot in the LMSYS Chatbot Arena.

2024-08-15

Recently, OpenAI regained the first place in the Chatbot Arena, a benchmark platform for large language models, led by the team at UC Berkeley's LMSYS Org.


Chatbot Arena is a platform that evaluates different large models through anonymous and random competitions. It uses the Elo rating system widely used in competitive games like chess to determine the performance of large model products based on user voting. The system randomly selects two different large model bots to chat with users and allows users to anonymously choose which model performs better.

Previously, Google's experimental Gemini 1.5 Pro model topped the arena with a score of 1297. However, OpenAI quickly launched the chatgpt-4o-latest model and successfully regained the first place with a high score of 1314.

chatgpt-4o-latest is the latest version of GPT-4o, with a maximum context window input of 128,000 tokens and a maximum output of 16,384 tokens. In this competition, the model showed significant improvements in areas such as mathematics, programming, challenging prompts, instruction following, long questions, and multi-turn conversations, achieving first place in all categories.