OpenAI states that its latest GPT-4o model poses a "moderate" risk

2024-08-09

OpenAI has released its research document on GPT-4o, outlining the security measures and risk assessment taken by the company before the release of its latest model. GPT-4o was officially released in May this year. Prior to its debut, OpenAI employed an external red team, consisting of security experts who attempt to identify vulnerabilities in systems, to discover key risks in the model (a fairly standard practice). They examined the potential risks associated with GPT-4o, such as unauthorized voice replication, generation of explicit and violent content, or copying large portions of copyrighted audio. The research findings have now been made public. According to OpenAI's own assessment framework, researchers found that GPT-4o has a "moderate" risk level. This overall risk level is based on the highest risk rating among four general categories: cybersecurity, biothreats, persuasiveness, and model autonomy. Except for persuasiveness, all other categories were rated as low risk. In terms of persuasiveness, researchers found that some writing samples generated by GPT-4o may have a greater impact on readers' perspectives compared to texts written by humans, although overall, the model's samples are not more persuasive than human-written texts. OpenAI spokesperson Lindsay McCallum Rémy stated that the system card includes preparatory assessments created by the internal team, as well as evaluations from external testers listed on the OpenAI website, including Model Evaluation and Threat Research (METR) and Apollo Research, both of which are involved in AI system evaluations. This is not the first system card released by OpenAI; similar tests and research results have been conducted and published for GPT-4, GPT-4 with visual capabilities, and DALL-E 3. However, OpenAI chose to release the system card for GPT-4o at this critical moment. The company has been facing continuous criticism from its own employees and state senators regarding its safety standards. Just minutes before the release of the GPT-4o system card, The Verge exclusively reported an open letter from Senator Elizabeth Warren (D-MA) and Representative Lori Trahan (D-MA), demanding answers from OpenAI regarding its handling of whistleblowers and security reviews. The letter listed numerous safety concerns that have already been publicly raised, including the temporary departure of CEO Sam Altman in 2023 due to board concerns and the resignation of a security executive who claimed that "safety culture and processes have taken a back seat to flashy products." Furthermore, the company released a powerful multimodal model just before the US presidential election. The model carries significant potential risks and could be exploited by malicious actors, leading to the unintended spread of misinformation or hijacking, even though OpenAI emphasizes that the company is testing real-world scenarios to prevent misuse. Many have called for OpenAI to be more transparent, not only in disclosing the training data for its models (whether YouTube is used as training data or not) but also in making its security testing public. California, where OpenAI and many other leading AI labs are located, Senator Scott Wiener is working on a bill to regulate large language models, including measures to hold companies legally accountable if their AI is used in harmful ways. If the bill is passed, OpenAI's cutting-edge models will have to comply with the risk assessment mandated by the state government before being made available to the public. However, the key takeaway from the GPT-4o system card is that despite the involvement of an external red team and testers, much of it still relies heavily on OpenAI's self-assessment.