Google Gemini AI Evaluation Involves Comparison with Competitor Claude Model AI NEWS

Home
AInews
Google Gemini AI Evaluation Involves Comparison with Competitor Claude Model

Google Gemini AI Evaluation Involves Comparison with Competitor Claude Model

2024-12-25

According to reports, external staff involved in enhancing Google's Gemini AI system compared Gemini's responses with the output from Anthropic's Claude model during the evaluation process. This practice is based on internal documents, but Google has not commented on whether it obtained authorization to use Claude for testing Gemini.

Typically, tech companies measure the performance of AI models through industry-standard tests rather than having employees directly evaluate the performance of a competitor's AI. However, for the Gemini project, contract workers were required to score each response based on multiple criteria such as accuracy and informativeness, comparing Gemini and Claude to determine which was superior. Each evaluation task was limited to 30 minutes.

Recently, contract workers responsible for evaluating Gemini noticed that the internal platform used for comparisons clearly identified responses as "I am Claude, created by Anthropic." Some communication records indicated that Claude's responses often prioritized safety, such as refusing to respond to prompts that might involve unsafe content. In contrast, some of Gemini's responses were flagged for severe violations of safety guidelines due to inappropriate content.

According to Anthropic's terms of service, clients are not allowed to use Claude to build competitive products or train competitive AI models without permission. It is noteworthy that Google is a significant investor in Anthropic.

A spokesperson from Google DeepMind stated that the company does compare the outputs of different models during evaluations but emphasized that they did not use Anthropic's model to train Gemini. The spokesperson noted that, as per industry practice, comparing model outputs is sometimes part of the evaluation process and denied using Anthropic's model to train Gemini.

Last week, it was reported that contract workers for Google's AI products are now required to rate Gemini's responses, even in areas beyond their expertise. Internal communications expressed concerns about Gemini potentially generating inaccurate information on sensitive topics like healthcare.

COUNT

COUNT - Automate accounting and gain valuable insights

Scan Relief

Scan Relief - Automate receipt scanning and organization

Mindtrip

Mindtrip - AI chatbot that helps you organize a your trip

Ai Drive

Ai Drive - Chat with multiple PDF files

Convex

Convex - AI backend platform for AI assisted app development

Ilus AI

Ilus AI - AI illustration tool for stunning visual content

Vast AI

Vast AI - Cloud-based GPU Rentals for AI Computing

RECENT AI TOOLS

Gitingest

COUNT

Scan Relief

Mindtrip

Ai Drive

RECENT AI NEWS

Huawei to Launch New AI Chip, Challenging Nvidia

Google DeepMind UK Team Reportedly Seeks to Form a Union

Cedar: A New Approach to Solving Kubernetes Authorization Issues

Thin Film Actuator Powered Microbots: Morph, Lock Shape, and Operate Tetherlessly

Double-clicking the Google Photos search icon restores classic search

Meta's AI Chatbot Enables Sexual Conversations with Minors

Solve This Math Problem by Musk to Get Hired at Tesla?

Google AI Studio Update: Features, Tools, VEO 2, and Gemini 2.0

RECENT AI TOOLS