Advancements in Large Language Models and a Novel Framework: Enhancing Small-Scale Model Performance in Specific Inference Tasks

2024-11-04

In recent years, large language models (LLMs) have demonstrated exceptional capabilities in understanding and generating natural language. Researchers have not only explored the potential of LLMs in text prediction but also uncovered their unexpected abilities in areas such as software API function calls, thereby strongly supporting the launch of GPT-4 plugins. Additionally, LLMs have successfully integrated tools like web browsers, translation systems, dialogue state tracking (DST), and robotics, further expanding their range of applications.

Although LLMs have achieved promising results in complex reasoning, they still face significant challenges in solving mathematical problems and in logical reasoning capabilities. To address these issues, researchers have proposed leveraging function call techniques that enable LLMs to execute specific functions and utilize their outputs to assist in various tasks. These functions range from basic arithmetic operations to more advanced methods. However, depending solely on large models for handling specific tasks is inefficient, as they require substantial computational resources during training and inference, and their training costs are high.

To overcome the limitations of large LLMs, researchers have introduced a novel framework designed to train smaller LLMs to perform function calls in specific reasoning tasks. This framework incorporates a proxy program that injects descriptions and examples of available functions into prompts, thereby querying LLMs and generating datasets with both correct and incorrect reasoning chains. This approach not only reduces operational costs but also maintains the effectiveness of core functionalities.

Specifically, the workflow of this framework consists of four stages: first, defining tasks and problems to assess the capabilities of large language models across various reasoning domains; second, setting up task-specific functions that enable the LLM to handle reasoning steps, manage chain flows, and verify results; third, selecting a pre-trained large-scale LLM and employing a chain-of-thought prompting method to generate datasets with both correct and incorrect completions; finally, fine-tuning the smaller LLM model on the generated dataset using the Direct Policy Optimization (DPO) algorithm.

The experimental section involves testing the models on first-order logic (FOL) and mathematical problems. The results indicate a significant improvement in the model's accuracy on FOL tasks, moderate gains in mathematical tasks, and statistical significance confirmed by the Wilcoxon test. Additionally, data augmentation and fine-tuning on a single GPU further expanded the dataset and optimized the performance of the Mistral-7B model.

The new framework proposed by researchers effectively enhances the function-calling capabilities of smaller LLMs in specific logic and mathematical reasoning tasks. This approach not only reduces the reliance on large models but also improves performance on related tasks. Experimental results demonstrate a significant enhancement in the performance of smaller models on FOL tasks, with accuracy approaching near perfection. Looking ahead, this framework holds promising prospects for application in a broader range of reasoning tasks and function types, warranting further exploration.