Recently, researchers from Tsinghua University's Institute for Interdisciplinary Information Sciences and Carnegie Mellon University's School of Computer Science made significant advancements in the inference strategies of Large Language Models (LLMs). Through in-depth exploration of inference scaling laws and the calculation of optimal inference strategies, they discovered that smaller models can surpass larger ones when employing sophisticated inference techniques within constrained computational budgets.
This study challenges traditional perceptions of model scaling and computational efficiency. Historically, as model sizes increased, the demand for computational resources became a prominent factor limiting the further development of LLMs. However, by thoroughly investigating various inference methods—including greedy search, majority voting, best-of-n, weighted voting, and two distinct tree search algorithms—the researchers found that optimal inference strategies can partially mitigate the limitations imposed by smaller model sizes.
The experiments utilized two mathematical datasets, MATH and GSM8K, and employed multiple model strategies such as the Pythia model, the math-specialized Llemma model, and Mistral-7B to examine performance differences across various model scales and architectures. The results demonstrated that Llemma-7B achieved accuracy on par with Llemma-34B while reducing the required computational resources by approximately 50%. This finding indicates that smaller models, when paired with appropriate inference strategies, can offer more cost-effective performance within limited computational budgets.
Furthermore, the research introduced a novel tree search method named REBASE, which exhibited Pareto optimality across diverse settings and outperformed both sampling-based approaches and traditional Monte Carlo tree search algorithms. REBASE achieved higher accuracy under lower computational budgets, further challenging previous understandings of computational complexity in inference strategies.
The researchers stated that this study provides valuable insights into the computation-optimal inference strategies for LLMs and drew three fundamental conclusions: first, smaller models can outperform larger ones within restricted computational budgets by leveraging advanced inference techniques; second, sampling-based majority voting strategies have inherent limitations; and third, the REBASE tree search method has emerged as a groundbreaking inference strategy that surpasses existing methods.
Despite these significant advancements, the researchers acknowledged the study's limitations, noting that it focused solely on mathematical problem-solving. They expressed intentions to explore inference scaling laws across different task domains in the future, aiming to enhance the performance and computational efficiency of LLMs across a broader range of applications.
These research outcomes not only introduce new perspectives and methodologies for LLM inference strategies but also lay a robust foundation for the further advancement of artificial intelligence technologies.