"Research on Decision-Making Capabilities of Large-Scale Language Models and Their Prospects"

2024-04-15

Large language models (LLMs) such as OpenAI's ChatGPT (especially its latest version, GPT-4), Claude AI, and Gemini have recently demonstrated limited decision-making capabilities. This article will explore contemporary research on LLMs in decision-making and their potential future impact.

Traditional LLM decision-making refers to their ability to derive potential patterns or rules from data and apply them flexibly to new contexts to make corresponding decisions. However, according to experiments conducted by the Santa Fe Institute, LLMs like ChatGPT face difficulties in reasoning about core concepts. Sound decision-making requires a deep understanding of contextual cues and the potential impact of output results.

Unfortunately, poor decision-making abilities of LLMs can sometimes lead to serious consequences. For example, in 2023, the National Association for Eating Disorders was forced to suspend its services due to inappropriate dietary advice provided by their AI chatbot "Theresa." Theresa's advice included weekly weigh-ins and maintaining a calorie deficit of 500 to 1000 calories per day, which sparked widespread controversy.

In addition to providing misleading information, LLMs often give overly generalized recommendations. The European Institute of Business Administration pointed out that when ChatGPT is asked to provide business strategy research questions, it tends to offer generic, conventional wisdom of participatory management. For example, LLMs often suggest adopting collaborative work, fostering an innovative culture, and aligning employees' goals with organizational goals. However, business strategy formulation is actually a complex social and economic process that should not solely rely on such vague advice.

Some may argue, "If you want LLMs to formulate business strategies or provide medical advice, why not train them specifically for these tasks?" However, dealing with complex contextual data is not a problem that can be solved simply by increasing the model's parameters or training more data. Allowing LLMs to make decisions based on subtle context cannot be resolved by simply expanding the dataset. On the contrary, inputting more data may exacerbate existing biases and increase computational demands.

To achieve context-aware decision-making, training LLMs requires a more nuanced approach. Currently, the academic community has proposed two complex methods in the field of machine learning to enhance the decision-making process of LLMs, making it closer to human decision-making. The first method is AutoGPT, which utilizes self-reflection mechanisms to plan and validate outputs. The second method is the "Tree of Thoughts" (ToT), which breaks the traditional sequential reasoning approach and encourages more effective decision-making.

AutoGPT represents the forefront of artificial intelligence development, aiming to enable models to autonomously create, evaluate, and improve themselves to achieve specific goals. Scholars later improved the AutoGPT system by introducing the "additional opinions" strategy, which involves integrating expert models. This strategy provides a novel framework that enriches the model's knowledge base by combining expert models from different domains, such as financial analysis models, to provide additional information to LLMs during the decision-making process.

In short, the core of this strategy is to leverage relevant information to enrich the model's knowledge base. When applied in practical scenarios, the presence of expert models can significantly enhance the decision-making capabilities of LLMs. The model goes through a process of "thinking-reasoning-planning-critiquing" and utilizes expert models to construct and review LLM's decisions.

If successfully deployed, LLMs with expert models will be able to analyze more information than humans, enabling them to make wiser decisions. However, AutoGPT also has limitations, such as its limited context window, i.e., the amount of information the model can process simultaneously, which may restrict its capabilities. Therefore, when using AutoGPT, initially providing all available information may yield more effective outputs than gradually injecting information during long conversations.

The "Tree of Thoughts" is another potentially promising framework to improve the accuracy of LLMs, aiming to simulate human cognitive processes. Humans typically engage in generating and comparing different options or scenarios when making decisions. Therefore, similar to the additional opinions strategy, ToT attributes the erroneous decision-making in LLMs to their linear reasoning process. Like AutoGPT, the dependent variable of ToT measures its compliance with natural language instructions by evaluating LLM's ability to complete puzzles, complex tasks (such as word games and creative writing).

The linear reasoning process in LLMs is conceptualized through "chains of thought," which is a method to enhance LLM transparency by showcasing sequential, step-by-step decision-making. However, ToT aims to overcome this limitation by enhancing the model's self-critiquing ability and exploring multiple reasoning paths.

For example, in the game of 24 points, it is challenging to conceptualize all possible outcomes through chains of thought, i.e., which numbers can be obtained by addition, subtraction, multiplication, and division to reach 24. Therefore, the accuracy of GPT-4 is relatively low. However, ToT is able to map out different results, achieving a 74% accuracy rate in this game.

Looking ahead, if the decision-making capabilities of LLMs continue to develop, humans and artificial intelligence may collaborate in strategic decision-making. ToT suggests its application in fields such as coding, data analysis, and robotics, while AutoGPT has grander goals, such as achieving general intelligence.

Regardless, academic research in the field of artificial intelligence is generating new and practical strategies to induce more cognitive decision-making in LLMs. One major advantage of LLMs is their ability to rapidly analyze large amounts of data, which humans cannot achieve. Therefore, if relevant research succeeds, LLMs may soon match or even surpass humans in decision-making capabilities.