The arms race of artificial intelligence continues to accelerate: Anthropic has launched its latest model, Claude 3.5 Sonnet, which is claimed to be comparable to OpenAI's GPT-4o or Google's Gemini in a wide range of tasks, and even surpasses them. This new model is now available to Claude users and can be used on webpages and iOS. Anthropic also offers it to developers.
Claude 3.5 Sonnet will become the mid-range model in Anthropic's product line. Anthropic names its smallest model "Haiku," the mainstream mid-range option "Sonnet," and its highest-end model "Opus." However, the company states that the performance of 3.5 Sonnet exceeds that of 3 Opus, with significant differences shown in benchmark tests. The speed of this new model also appears to be twice as fast as the previous one, which may be a major highlight.
One should be skeptical of benchmark test results for AI models; there are many tests to choose from, and it is easy to select those that make the model look good. Additionally, the rapid changes in models and products make it seem like no one can maintain a leading position for a long time. Nevertheless, Claude 3.5 Sonnet does seem impressive: it outperforms GPT-4o, Gemini 1.5 Pro, and Meta's Llama 3 400B in seven out of nine overall benchmark tests, and in four out of five visual benchmark tests. While this should not be overinterpreted, Anthropic does appear to have established a legitimate competitor in this field.
What does this actually mean? Anthropic states that Claude 3.5 Sonnet will excel in writing and translating code, handling multi-step workflows, interpreting charts, and transcribing text from images. This improved version of Claude also seems to have a better understanding of humor and writes in a more human-like manner.
In addition to the new model, Anthropic has introduced a new feature called Artifacts. With Artifacts, you will be able to view and interact with the results of your requests to Claude. If you ask the model to design something for you, it can now show you its appearance and allow you to edit it directly in the application. If Claude writes an email for you, you can edit the email within the Claude application without having to copy it to a text editor. This is a small but clever feature - these AI tools need to be more than just simple chatbots, and features like Artifacts give applications more functionality.
Artifacts actually seem to be a signal of Claude's long-term vision. Anthropic has always stated that it primarily focuses on enterprises (although it has hired consumer tech talents like Instagram co-founder Mike Krieger), and in the press release announcing Claude 3.5 Sonnet, it plans to transform Claude into a tool that allows companies to "centralize their knowledge, documents, and ongoing work in a secure environment." This sounds more like Notion or Slack than ChatGPT, and Anthropic's model is at the core of the entire system.
However, for now, the model is the biggest news. The speed of improvement here is astonishing: Anthropic launched Claude 3 Opus in March, proudly stating that its performance is on par with GPT-4 and Gemini 1.0, but then OpenAI and Google released better versions of their respective models. Now, Anthropic has taken the next step, and its competitors will surely follow suit soon.