OpenAI has recently unveiled a new feature called "Predicted Outputs" for its GPT-4o and GPT-4o-mini language models, significantly boosting the speed of AI-assisted text processing.
Initial testing indicates that this new feature delivers substantial speed improvements. Compared to existing models, the response time for code editing is accelerated by two to four times. Large file modifications that previously took around 70 seconds can now be accomplished in approximately 20 seconds. OpenAI has highlighted several key applications, including updating blog posts, iterating on previous responses, and rewriting code within existing files.
The system operates on a straightforward principle: developers can predefine portions of the expected output. This approach is particularly effective for repetitive tasks or minor document modifications, as it significantly reduces the number of new tokens the model needs to generate. OpenAI also notes that as a general guideline, preserving 50% of the output tokens can lead to an approximate 50% reduction in processing time.
However, it is important to recognize that the "Predicted Outputs" feature is not a universal solution. It is best suited for scenarios where predictions closely align with the model's responses. In cases involving the generation of entirely new content, especially where meaningful predictions are challenging, the feature may not perform optimally. OpenAI has successfully tested this functionality across various programming languages, including popular ones like Python, JavaScript, Go, and C++.
Of course, this new feature comes with certain limitations. It is currently limited to the GPT-4o and GPT-4o-mini models and cannot be used in conjunction with advanced API parameters such as multiple outputs or function calls. Therefore, OpenAI recommends developers begin by applying the feature to controlled and predictable tasks to maximize efficiency gains.