Patronus AI Open Sources Lynx, a Real-time LLM-based AI Hallucination Detection Tool AI NEWS

Home
AInews
Patronus AI Open Sources Lynx, a Real-time LLM-based AI Hallucination Detection Tool

Patronus AI Open Sources Lynx, a Real-time LLM-based AI Hallucination Detection Tool

2025-01-06

Patronus AI, a startup dedicated to providing tools for companies to assess the reliability of their artificial intelligence models, has today unveiled a robust new "hallucination detection" tool designed to help identify when chatbots produce anomalous responses.

The company claims that their latest model, Lynx, represents a significant breakthrough in AI reliability, enabling businesses to detect AI hallucinations without manual annotation.

In the context of AI, "hallucinations" refer to coherent but factually incorrect responses generated by large language models. These models tend to fabricate information when they do not know how to respond accurately, which can be hazardous for companies relying on precise AI interactions with customers.

A notable example of AI hallucinations occurred with Google's experimental "AI summary" feature, which reportedly suggested using glue to prevent cheese from falling off homemade pizzas. Another instance involved advising users to use mustard gas to clean washing machines, highlighting the potential risks associated with these inaccuracies.

To address this issue, some AI firms employ AI to detect hallucinations. For instance, OpenAI has refined GPT-4 to identify inconsistencies in responses from legendary chatbots, a concept known as "LLM as judge." However, there are ongoing concerns about the accuracy of such solutions.

Patronus AI focuses on enhancing AI reliability and recently secured $17 million in funding to develop a platform that uses AI-generated adversarial prompts to test the robustness of LLMs by attempting to induce hallucinations.

Lynx is described by the startup as representing the "state-of-the-art" in AI hallucination detection, allowing developers to identify inappropriate responses in real time. Alongside Lynx, the company has also open-sourced HaluBench, a benchmark derived from real-world domains to evaluate the fidelity of LLM responses.

According to Patronus AI, extensive testing using HaluBench demonstrated that Lynx significantly outperforms GOT-4 in detecting hallucinations. The largest version of Lynx, with 70 billion parameters, showed superior accuracy compared to other tested LLMs serving as judges. Patronus AI asserts that this makes Lynx the most powerful hallucination detection model available.

HaluBench is specifically designed to test AI models in specialized fields like healthcare, medicine, and finance, making it highly applicable for practical scenarios.

Sample results from Patronus AI's benchmarks indicate that Lynx (70B) surpasses GPT-4 by 8.3% in detecting medical inaccuracies. Meanwhile, the smaller Lynx (8B) model outperforms older versions like GPT-3.5 by 24.5% across all HaluBench domains. It also exceeds Anthropic PBC's Claude-3-Sonnet by 8.6% and Claude-3-Haiku by 18.4%. Furthermore, Lynx outshines open-source LLMs such as Meta Platforms Inc.'s Llama-3-8B-Instruct.

Anand Kannappan, CEO of Patronus AI, notes that hallucinations pose one of the most critical challenges in the AI industry. Recent studies suggest that between 3% and 10% of all LLM responses contain inaccuracies.

Hallucinations can manifest in various forms, including leaking training data, exhibiting biases, or stating outright falsehoods. Kannappan explains that Lynx aims to tackle these issues, although he acknowledges that it may not provide a permanent solution. Nonetheless, it serves as a valuable tool for developers to gauge the likelihood of their LLMs producing inaccurate outputs.

"Developers can utilize [Lynx and HaluBench] to measure the hallucination rate of their fine-tuned LLMs in specific domain scenarios," he elaborates.

COUNT

COUNT - Automate accounting and gain valuable insights

Scan Relief

Scan Relief - Automate receipt scanning and organization

Mindtrip

Mindtrip - AI chatbot that helps you organize a your trip

Ai Drive

Ai Drive - Chat with multiple PDF files

Convex

Convex - AI backend platform for AI assisted app development

Ilus AI

Ilus AI - AI illustration tool for stunning visual content

Vast AI

Vast AI - Cloud-based GPU Rentals for AI Computing

RECENT AI TOOLS

Gitingest

COUNT

Scan Relief

Mindtrip

Ai Drive

RECENT AI NEWS

Huawei to Launch New AI Chip, Challenging Nvidia

Google DeepMind UK Team Reportedly Seeks to Form a Union

Cedar: A New Approach to Solving Kubernetes Authorization Issues

Thin Film Actuator Powered Microbots: Morph, Lock Shape, and Operate Tetherlessly

Double-clicking the Google Photos search icon restores classic search

Meta's AI Chatbot Enables Sexual Conversations with Minors

Solve This Math Problem by Musk to Get Hired at Tesla?

Google AI Studio Update: Features, Tools, VEO 2, and Gemini 2.0

RECENT AI TOOLS