Rise of AI Inference Models Fuels Demand for Novel Prompting Techniques

2025-01-14

近年来,人工智能推理模型领域取得了显著进展。自从2024年9月OpenAI发布了o1推理模型以来,这一领域经历了一场新的变革。尽管o1在处理问题时速度较慢,但它在解决复杂多步骤的数学和科学难题方面表现出色。此后,商业AI市场出现了许多追随者和竞争对手,包括DeepSeek的R1、Google的Gemini 2 Flash Thinking,以及最新的LlamaV-o1等,它们都致力于提供类似OpenAI o1及其即将推出的o3系列模型的内置推理功能。

这些模型利用了“思维链”(Chain-of-Thought, CoT)或“自我提示”技术,促使模型在分析过程中进行自我反思、回溯和检查,从而得出比直接快速输出嵌入信息更精确的答案。然而,o1及其迷你版的高成本(每百万输入标记15美元,相比之下GPT-4o在OpenAI API上的价格为每百万输入标记1.25美元)引发了争议,质疑其性能提升是否值得付出相当于普通先进大型语言模型(LLM)12倍的价格。

尽管如此,越来越多的用户开始接受这类模型,而解锁推理模型真正价值的关键可能在于改变用户的提示方式。据AI新闻服务Smol的创始人分享,前苹果visionOS界面设计师Ben Hylak认为,在与o1模型交互时,应更多地提供详细的“简报”而非简单的提示。这意味着用户需要提供全面的上下文信息,明确说明期望的输出内容、用户身份及所需信息的格式。

Hylak指出,传统上,用户会告诉模型如何回答问题,例如:“你是一位专家软件工程师,请慢慢思考并仔细回答。”但在使用o1模型时,他建议只说明“要做什么”,而不具体指导“如何做”,让模型自主规划和解决问题步骤。这种方法利用了模型的自主推理能力,并可能比人工审核和交互更快。

此外,对于非推理型LLM,如Claude 3.5 Sonnet,改进提示方法同样可以带来更好、更不受限制的结果。例如,前Teton.ai工程师、现神经调节设备openFUS创作者Louis Arge提到,他发现LLM更信任自己的提示而非用户的提示,并分享了如何通过“触发冲突”与Claude进行“战斗”,以说服其输出更大胆、不受拘束的答案。

总之,在AI时代的发展中,提示工程仍然是一项重要的技能。

In recent years, the field of AI reasoning models has seen significant advancements. Since OpenAI released the o1 reasoning model in September 2024, this area has experienced a new wave of transformation. Although the o1 model is relatively slow in processing questions, it excels in solving complex multi-step mathematical and scientific problems. Following this, numerous followers and competitors have emerged in the commercial AI market, including DeepSeek's R1, Google's Gemini 2 Flash Thinking, and the latest LlamaV-o1, all aiming to provide built-in reasoning capabilities similar to OpenAI's o1 and the upcoming o3 series models. These models utilize "chain-of-thought" (CoT) or "self-prompting" techniques, encouraging the model to self-reflect, backtrack, and examine during the analysis process, leading to more accurate answers than direct rapid outputs. However, the high cost of the o1 and its mini version (15 dollars per million input tokens compared to 1.25 dollars per million for GPT-4o on the OpenAI API) has sparked controversy over whether the performance improvement justifies paying 12 times the price of typical advanced large language models (LLMs). Nevertheless, an increasing number of users are adopting these models, and unlocking their true value may hinge on changing how users prompt them. According to insights shared by Smol, an AI news service founder, former Apple visionOS interface designer Ben Hylak suggests providing detailed "briefs" rather than simple prompts when interacting with the o1 model. This means supplying comprehensive context information, clearly specifying the expected output, user identity, and required format. Hylak points out that traditionally, users would instruct the model on how to answer, such as: "You are an expert software engineer; think slowly and answer carefully." However, with the o1 model, he recommends specifying only "what to do," allowing the model to autonomously plan and solve problems. This approach leverages the model's autonomous reasoning abilities and can be faster than manual review and interaction. Additionally, for non-reasoning LLMs like Claude 3.5 Sonnet, improving prompting methods can yield better and less constrained results. For instance, Louis Arge, a former Teton.ai engineer and creator of the neural modulation device openFUS, found that LLMs trust their own prompts more than those of users. He