OpenAI to Launch a Multimodal AI Digital Assistant

2024-05-13

OpenAI is reportedly showcasing a new multimodal AI model to its clients that can both engage in conversations and identify objects, according to a recent report by Information magazine. The media outlet cited anonymous sources who claimed that this could be part of the content that the company plans to unveil on Monday. The new model is said to be faster and more accurate in interpreting images and audio compared to OpenAI's existing separate transcription and text-to-speech models. It appears to assist customer service representatives in "better understanding the tone of callers or whether they are being sarcastic," and theoretically, the model could also help students solve math problems or translate real-world signs, as stated by Information magazine. Sources familiar with the matter revealed that the model outperforms GPT-4 Turbo in "answering certain types of questions," but it still has the potential to confidently provide incorrect answers. OpenAI may also be preparing to incorporate new features of ChatGPT for phone calls, according to developer Ananay Arora. Arora shared code screenshots related to phone calls and discovered evidence of servers reserved by OpenAI for real-time audio and video communication. If the upcoming announcement does not involve any of these, then they will not be related to GPT-5. CEO Sam Altman has explicitly denied any connection between the forthcoming announcement and a model that should be "obviously superior" to GPT-4. Information magazine reported that GPT-5 may be publicly released before the end of the year. Altman also stated that the company will not announce a new AI search engine. However, if the reported content is indeed what will be unveiled, it could still overshadow Google's I/O developer conference. Google has been testing AI-powered phone calls, and one of its rumored projects is a multimodal Google Assistant alternative called "Pixie," which can use the device's camera to identify objects and perform tasks such as providing directions to purchase locations or offering instructions. Regardless of what OpenAI plans to announce, it is scheduled to be revealed via live stream on its website on Monday at 10:00 AM Pacific Time / 1:00 PM Eastern Time.