According to reports, external staff involved in enhancing Google's Gemini AI system compared Gemini's responses with the output from Anthropic's Claude model during the evaluation process. This practice is based on internal documents, but Google has not commented on whether it obtained authorization to use Claude for testing Gemini.
Typically, tech companies measure the performance of AI models through industry-standard tests rather than having employees directly evaluate the performance of a competitor's AI. However, for the Gemini project, contract workers were required to score each response based on multiple criteria such as accuracy and informativeness, comparing Gemini and Claude to determine which was superior. Each evaluation task was limited to 30 minutes.
Recently, contract workers responsible for evaluating Gemini noticed that the internal platform used for comparisons clearly identified responses as "I am Claude, created by Anthropic." Some communication records indicated that Claude's responses often prioritized safety, such as refusing to respond to prompts that might involve unsafe content. In contrast, some of Gemini's responses were flagged for severe violations of safety guidelines due to inappropriate content.
According to Anthropic's terms of service, clients are not allowed to use Claude to build competitive products or train competitive AI models without permission. It is noteworthy that Google is a significant investor in Anthropic.
A spokesperson from Google DeepMind stated that the company does compare the outputs of different models during evaluations but emphasized that they did not use Anthropic's model to train Gemini. The spokesperson noted that, as per industry practice, comparing model outputs is sometimes part of the evaluation process and denied using Anthropic's model to train Gemini.
Last week, it was reported that contract workers for Google's AI products are now required to rate Gemini's responses, even in areas beyond their expertise. Internal communications expressed concerns about Gemini potentially generating inaccurate information on sensitive topics like healthcare.