Prompt Engineering: Enhancing GPT Model Responses

2023-12-28

OpenAI recently released a guide on prompt engineering, which outlines six strategies for eliciting better responses from their GPT models, with a particular focus on their latest version, GPT-4. The six high-level strategies in the guide include writing clear instructions, providing reference texts, breaking down complex tasks into simpler subtasks, giving the model "thinking" time, using external tools, and systematically testing changes. Each strategy is broken down into a set of specific, actionable strategies, along with example prompts. Many of these strategies are based on the findings of LLM research, such as idea-chain prompts or recursive summarization. OpenAI's research paper on GPT-3, published in 2020, demonstrated that the model can learn to perform various natural language processing (NLP) tasks with minimal examples, essentially by providing the model with a description or an example of the task to be performed. In 2022, OpenAI published a playbook that includes several techniques for "improving the reliability of GPT-3 responses." Some of these techniques, such as providing clear instructions and breaking down complex tasks, are still included in the new guide. The earlier playbook also includes a bibliography of research papers supporting their techniques. Some of the strategies in the guide leverage the system message feature of the Chat API. According to OpenAI's documentation, this parameter "helps set the behavior of the assistant." One strategy suggests using it to give the model a personality to control its responses. Another suggests using it to pass a summary of a long conversation to the model or to provide a set of instructions that will be repeated for multiple user inputs. The strategy of using external tools provides tips for integrating the GPT model with other system interfaces, along with articles from OpenAI's playbook. One strategy suggests generating Python code for mathematical calculations instead of asking the model to perform the calculations itself, and then extracting and executing the code from the model's response. However, the guide does include a disclaimer that the code generated by the model is not guaranteed to be safe and should only be executed in a sandbox. Another strategy in the guide is to systematically test changes, involving how to determine whether different prompts actually result in better or worse outputs. This strategy suggests using the OpenAI Evals framework. It also recommends using the model to check its own work by referencing a "gold standard answer" through system messages. In a Hacker News discussion about the guide, one user expressed hesitation about investing a lot of time in learning how to improve prompts, stating that with each new version, not to mention different LLMs, the responses will vary. With the visible rapid progress of this technology, in two or five years, as systems become more intelligent, such complex prompts may not even be necessary. Several other LLM providers have also released prompt engineering techniques. Microsoft Azure, which offers access to GPT models, has a series of techniques similar to OpenAI's; their guide also provides tips for setting model parameters, such as temperature and topp, which control the randomness of model-generated outputs.