Google DeepMind Launches Efficient AI Training Method JEST, Significantly Reducing AI Training Energy Consumption

2024-07-09

In the current energy technology framework, the continuous training of advanced AI models is facing unsustainable challenges. We urgently need to explore faster, more economical, and more environmentally friendly training paths. Google DeepMind recently unveiled a cutting-edge research called Joint Example Selection Training (JEST), which revolutionizes AI model training methods. It not only improves speed by an astonishing 13 times but also achieves 10 times higher energy efficiency than traditional methods, injecting new vitality into the industry.


With the booming development of the AI field, the burden on the environment caused by data centers that support the operation of these large models is increasing, attracting widespread attention. JEST is born to solve this pain point. It effectively alleviates the energy-hungry problem in AI training process, significantly reduces computational costs, and lightens the carbon footprint behind AI progress.

Traditional AI training methods often process data point by point, which is time-consuming and consumes a large amount of computing resources. JEST, on the other hand, takes a different approach by focusing on the overall optimization of data batches. Its working principle can be summarized in three steps:

First, lightweight training - train a small AI model, like a savvy screening officer, responsible for evaluating and giving "high scores" to high-quality data.

Second, survival of the fittest - the small model ranks the data based on its quality, just like holding a fierce "draft" for data batches.

Third, precise feeding - with these carefully selected high-quality data, the large model can absorb knowledge more efficiently and achieve rapid growth.

The key to JEST's efficiency lies in its ability to view data batches from a holistic perspective rather than getting caught up in details. It uses multimodal contrastive learning, allowing different types of data such as text and images to interact with each other during training, significantly accelerating the training process through overall scoring and subset selection.

The core of this process lies in two pillars:

First, "learnability score" cleverly compares the loss of the large model being trained (learner) with the loss of the pre-trained small model (reference), thereby identifying high-quality batches that are both challenging and informative.

Second, "batch selection" - JEST adopts an intelligent algorithm inspired by Gibbs sampling to ensure that the selected batches maximize learning benefits while accelerating the training process.

DeepMind's experiments fully validate the outstanding performance of JEST. It not only significantly reduces the number of training iterations and computational costs but also maintains performance comparable to existing top models. This achievement is not only a small step forward in technology but also an important step towards making AI training more sustainable and scalable.

However, JEST is not without flaws. It currently relies on specific small and carefully curated datasets to assist the selection process, and the method of automatically inferring the optimal reference distribution is still to be explored. But this does not diminish the enormous potential demonstrated by JEST, which indicates that we still have a broad space for exploration in optimizing AI training efficiency.