Google AI Launches CardBench Benchmark Test AI NEWS - Find The Best AI Tools & AI News

Home
AInews
Google AI Launches CardBench Benchmark Test

Google AI Launches CardBench Benchmark Test

2024-09-03

Cardinality estimation (CE) plays a crucial role in optimizing the performance of relational database queries. It involves accurately predicting the number of intermediate results that a database query will return, which directly guides the query optimizer in selecting the optimal execution plan. Accurate cardinality estimation is essential for optimizing join order, deciding whether to apply indexes, and selecting the best join methods, thereby impacting query execution time and overall database performance. Conversely, inaccurate cardinality estimation can lead to the selection of inefficient execution plans, resulting in significant performance degradation, sometimes even by several orders of magnitude. Therefore, cardinality estimation has become a core component in database management, attracting a significant amount of research focused on improving its accuracy and efficiency. However, current cardinality estimation methods face several limitations. Traditional CE techniques, as the mainstream in modern database systems, rely on heuristic algorithms and simplified models, such as assuming uniform data distribution and independence between columns. Although these methods are computationally efficient, they often struggle to guarantee the accuracy of predictions when dealing with complex queries involving multiple tables and filters. As a result, learning-based cardinality estimation models have emerged, adopting a data-driven approach to provide more accurate prediction results. However, these emerging models still face challenges in practical applications, such as high training costs, dependence on large-scale datasets, and a lack of systematic benchmark evaluations. To address these limitations, researchers at Google have introduced CardBench, a benchmark test aimed at constructing a systematic evaluation framework for learning-based cardinality estimation models. CardBench is renowned for its comprehensiveness, covering thousands of queries from 20 different real-world databases, surpassing any previous benchmarks. This design enables comprehensive evaluations of learning-based CE models under different conditions. CardBench supports three key model settings: instance-based models (trained on a single dataset), zero-shot models (pre-trained on multiple datasets and tested on unseen datasets), and fine-tuned models (pre-trained and fine-tuned using a small amount of data from the target dataset). CardBench not only provides rich datasets but also comes with tools for computing necessary data statistics, generating real SQL queries, and creating annotated query graphs to assist in the training of CE models. Its training dataset is divided into two parts: one for single-table queries with multiple filtering conditions and another for binary join queries involving two tables. Through these carefully designed 9,125 single-table queries and 8,454 binary join queries, CardBench constructs a robust and challenging model evaluation environment. It is worth mentioning that the labels for these data come from Google BigQuery, and the process of obtaining them consumed seven CPU years of query execution time, highlighting the significant investment in creating this benchmark test. The performance evaluations conducted using CardBench have shown encouraging results, particularly for fine-tuned models. Zero-shot models exhibit lower accuracy when handling complex queries involving joins, while fine-tuned models achieve comparable accuracy to instance-based models with less training data. For example, the fine-tuned Graph Neural Network (GNN) model achieves a median q-error of only 1.32 and a 95th percentile q-error of 120 in binary join queries, significantly outperforming the zero-shot model. Furthermore, the research also demonstrates that fine-tuning pre-trained models can significantly improve performance even with limited training data, providing a practical solution for scenarios where training data may be limited in real-world applications. In conclusion, CardBench represents a significant advancement in the field of learning-based cardinality estimation. By providing a comprehensive and diverse benchmarking platform, CardBench not only facilitates the systematic evaluation and comparison of different CE models but also establishes a solid foundation for continuous innovation in this critical field. In particular, the support for fine-tuned models further reduces the training costs and opens up new avenues for performance optimization in practical applications.

Jules

Jules - AI coding assistant with automatic pull requests

Final Round AI

Final Round AI - Automated job interview preparation and assistance

Sapia

Sapia - AI hiring agent for fair recruitment processes

Magic Motion

Magic Motion - AI transforms text into engaging 3D animations

Recall

Recall - AI summarizer for streamlined knowledge management

Rocket.new

Rocket.new - AI analyzes and summarizes call conversations

Qodo AI Platform

Qodo AI Platform - AI tool for ensuring code quality and integrity

RECENT AI TOOLS

Interviewer AI

Jules

Final Round AI

Sapia

Magic Motion

RECENT AI NEWS

X Trial AI Chatbot Drives Community Notes Initiative

Amazon Deploys One Millionth Robot and Unveils Generative AI Model

Google’s Agent2Agent Protocol Joins Linux Foundation

Elon Musk's xAI Raises $10 Billion to Upgrade AI Infrastructure

Calling the Algorithm Doctor: Microsoft's AI Diagnoses Like House MD, Prices Like Costco

Cloudflare Halts AI Crawlers, Gaining Industry Applause

Google DeepMind Releases AlphaGenome: Unified AI Model for High-Resolution Genomic Interpretation

Cursor Launches Web Application for Managing AI Coding Agents

RECENT AI TOOLS