President of OpenAI Shares First Image Generated by GPT-4o
OpenAI President Greg Brockman has released the first public image on his X account, which appears to be generated by the company's new GPT-4o model.
From the image, we can see that it is remarkably realistic, depicting a person wearing a black T-shirt with the OpenAI logo, writing with chalk on a blackboard, "Modal Transitions. What are the pros and cons of directly modeling P (text, pixels, sound) with a large autoregressive transformer?"
The GPT-4o model made its debut on Monday, offering advantages in terms of speed and cost compared to the previous GPT-4 series models (GPT-4, GPT-4 Vision, and GPT-4 Turbo), while also retaining more information from the input, such as audio and visual data.
The achievement is attributed to OpenAI's different approach to large language models (LLMs) like previous GPT-4 models. While those models link multiple different models together to convert audio, visual, and other media into text and then back, GPT-4o is trained directly on multimedia tokens from the start, enabling it to analyze and interpret visual and audio data without the need for text conversion.
From the above image, it is evident that GPT-4o's new approach is a significant improvement over OpenAI's previous image generation model, DALL-E 3, which debuted in September 2023. I previously ran a similar prompt through DALL-E 3 in ChatGPT, but the images generated by GPT-4o show significant improvements in quality, realism, and accuracy of text generation.
However, the native image generation capability of GPT-4o is not currently publicly available. As Brockman hinted in his X post, "the team is working hard to bring these capabilities to the world."