Diffbot Launches Knowledge Graph-Based AI Chatbot to Tackle 'Hallucination' Issues

2025-01-10

Recently, Diffbot Technologies Corp., a knowledge graph startup, announced an optimized AI chatbot built on Meta Platforms Inc.'s Llama 3.3 model with the addition of graph retrieval-augmented generation technology to enhance answer accuracy.

Diffbot's AI model stands out by not relying on large databases for training but instead learns how to search for information within its vast knowledge graph. This knowledge graph contains over one trillion interconnected facts and is continuously updated. Over the past eight years, it has been crawling public internet data, categorizing web pages into categories like people, companies, articles, and products, while using natural language processing and computer vision technologies to extract up-to-date information to maintain database timeliness.

The knowledge graph updates every four to five days, adding millions of new data points that drive the latest AI models, ensuring answers are based on the most current and accurate information. This approach differs from most other large language models (LLMs), which rely on static information encoded in their training data.

Diffbot believes its AI model can provide more accurate and transparent responses by searching for the latest information in the knowledge graph and extracting relevant data. For instance, when asked about recent news events, the model searches for the latest updates in the knowledge graph, extracts the most pertinent data, and cites the source of the information provided to users.

According to Diffbot's founder and CEO, the AI industry will shift towards smaller models with around one billion parameters rather than the current trend of developing LLMs with tens of billions of parameters. He argues that attempting to integrate all the latest knowledge into AI models is unsustainable; instead, teaching models to use necessary tools to search external knowledge is a better approach.

Diffbot's AI model aims to address the "hallucination" problem where AI models, unable to answer user questions, fabricate answers rather than admit ignorance. This tendency increases the risk of deploying AI systems, whereas Diffbot's solution builds the system on "verifiable facts".

In tests, Diffbot's model scored 81% on the FreshQA benchmark, designed to evaluate real-time factual knowledge mastery, outperforming Gemini and ChatGPT. Additionally, it achieved 70.36% on the MMLU-Pro test, assessing academic knowledge proficiency.

The Diffbot model is open-source, allowing enterprises to download and run it on their own machines, fine-tuning it according to their needs. Companies can customize the model to search their own databases alongside Diffbot's knowledge graph. Running it locally also enhances privacy protection.

Currently, Diffbot provides data services to companies such as Duck Duck Go Inc., Cisco Systems Inc., and Snap Inc., and hopes its LLM will be adopted by more enterprises for workloads requiring high precision and full accountability. The model is now available for download on GitHub, with a public demo platform at diffy.chat. Enterprises can choose between an 800 million parameter version running on a single Nvidia A100 GPU or a 7 billion parameter version requiring two H100 GPUs.