The Art and Science of Prompt Engineering in Open-Source LLMs

2024-02-01

A few days ago, Meta AI launched "Prompt Engineering with Llama 2," a new resource created for the open-source community, which is a repository of best practices. Even Andrew Ng's DeepLearning.AI recently released a course called "Prompt Engineering for Open Source LLMs." IBM, Amazon, Google, and Microsoft have all offered similar prompt engineering courses for open-source models. Prompt engineering is one of the most sought-after professions in 2023. Various companies are adopting OpenAI's ChatGPT in different ways and hiring experts who can prompt chatbots to provide accurate responses, reportedly paying them hefty salaries. This has led to the rise of hundreds of prompt engineering courses that everyone wants to take. However, most of these courses are focused on open-source models like OpenAI. Now, with companies adopting open-source LLMs such as Meta's LLaMA and Mistral, it is necessary to understand the differences between prompt engineering and open-source LLMs. Some companies are developing and testing customer support and code generation applications based on open-source technology. These applications aim to use the company's proprietary code, which can be challenging for general-purpose closed models like LLMs created by OpenAI or Anthropic. Yann LeCun shared in a post on X, "Many customers are asking themselves: wait, why should I pay for a super-large model that knows little about my business? Can't I just use one of the open-source models, maybe a much smaller one, to accomplish (information retrieval) workflows?" Open-source prompt engineering Recently, Sharon Zhou, co-founder and CEO of Lamini, collaborated with DeepLearning.AI to offer a course on prompt engineering for open-source LLMs. She emphasized the differences between open-source models and closed-source models, which affect the API and ultimately the prompt mechanism. She said that many people confuse prompt engineering with RAG and fine-tuning. "Prompting is not software engineering; it is more akin to Google search," she added, discussing this in detail in a recent X post. She reiterated that RAG is prompt engineering and "should not be overcomplicated"; it is simply about retrieving information. Zhou emphasized the simplicity of prompt engineering, stating that prompts are just strings. She compared the process to handling strings in programming languages, explicitly stating that it is a fundamental skill that does not require complex frameworks. "Different LLMs and LLM versions mean different prompts," she added. However, she acknowledged that many frameworks tend to overcomplicate prompt engineering, which can lead to suboptimal results. Zhou explained that in practice, customizing prompts when switching between different LLMs is crucial. This is similar to the case when OpenAI makes version changes, resulting in previously effective prompts no longer producing the desired results. The same applies to open-source LLMs. Maintaining transparency throughout the prompt is crucial for optimizing model performance. Many frameworks face challenges in this regard, often attempting to abstract or conceal prompt details, creating an illusion of behind-the-scenes management processes. Emphasizing the simplicity of prompt engineering, she reiterated that prompts are just strings. She compared this process to handling strings in programming languages, making it clear that it is a fundamental skill that does not require complex frameworks. "Different LLMs and LLM versions mean different prompts," she added. However, she acknowledged that many frameworks tend to make prompt engineering overly complex, which can lead to suboptimal results. Zhou explained that in practice, customizing prompts is crucial when switching between different LLMs. This is similar to the case when OpenAI makes version changes, resulting in confusion when previously effective prompts no longer produce the desired results. The same applies to open-source LLMs. Maintaining transparency throughout the prompt is crucial for optimizing model performance. Many frameworks face challenges in this regard, often attempting to abstract or hide prompt details, creating an illusion of behind-the-scenes management processes. When it comes to enterprise adoption, Matt Baker, Senior Vice President of AI Strategy at Dell, who collaborated with Meta to introduce open-source Llama 2 into enterprise use cases, stated that large models are useless for companies unless they are tailored for specific use cases. This is where small, specialized, and fine-tuned models come into play, giving rise to RAG and prompt engineering. Although the reality is that most companies use both open-source and closed-source LLMs for different use cases, most information retrieval now relies on APIs that use company data and fine-tuned open-source models. Therefore, companies need to adapt and learn how to prompt models accurately and provide precise information.