OpenAI releases AI Risk Preparedness Framework to Address Potential Risks of Advanced Models
OpenAI has announced its "preparedness framework," which is a set of processes and tools for monitoring and managing the potential dangers posed by increasingly powerful AI models.
The announcement comes at a time of turmoil for the lab, following controversies involving the dismissal and rehiring of its CEO, Sam Altman. These controversies have raised questions about the lab's governance and accountability, particularly as it develops some of the world's most advanced and influential AI systems.
According to a blog post by OpenAI, the preparedness framework aims to address at least some of these concerns and demonstrate the lab's commitment to responsible and ethical AI development. The framework outlines how OpenAI tracks, evaluates, predicts, and mitigates the catastrophic risks posed by increasingly powerful models, such as those that could be used for cyber attacks, large-scale persuasion, or autonomous weapons.
One key component of the framework is the use of risk "scorecards" for AI models, which measure and track various indicators of potential harm, such as the model's capabilities, vulnerabilities, and impact. These scorecards are regularly updated and trigger reviews and interventions when certain risk thresholds are reached.
The framework also emphasizes the importance of rigorous and data-driven evaluation and prediction of AI capabilities and risks, moving away from hypothetical and speculative scenarios that often dominate public discourse. OpenAI states that it is investing in the design and implementation of such evaluations, as well as the development of mitigation strategies and security measures.
The framework is not a static document but a dynamically evolving one. According to OpenAI, the lab intends to continuously improve and update the framework based on new data, feedback, and research, and share its findings and best practices with the wider AI community.
The announcement from OpenAI comes in response to several major releases focused on AI safety by its main competitor, Anthropic, another leading AI lab founded by former OpenAI researchers. Anthropic is known for its secretive and selective approach and recently unveiled its responsible deployment policy, a framework that defines specific AI safety levels and corresponding protocols for AI model development and deployment.
These two frameworks differ significantly in structure and methodology. Anthropic's policy is more formal and prescriptive, directly linking safety measures to model capabilities and suspending development in cases where safety cannot be proven. OpenAI's framework, on the other hand, is more flexible and adaptable, setting general risk thresholds that trigger reviews rather than predefined levels.
Experts suggest that both frameworks have their advantages and disadvantages, but Anthropic's approach may have an edge in incentivizing and enforcing safety standards. From our analysis, it appears that Anthropic's policy integrates safety into the development process, while OpenAI's framework allows for more discretion and potential for human judgment and errors.
Some observers also believe that OpenAI's adoption of safety protocols came as a response to backlash following the rapid and aggressive deployment of models like GPT-4. Part of the reason is that Anthropic's policy has the advantage of being developed proactively rather than reactively.
Regardless of their differences, both frameworks represent significant progress in the field of AI safety, an area often overshadowed by the pursuit of AI capabilities. As AI models become more powerful and ubiquitous, collaboration and coordination on safety technology between leading labs and stakeholders are now crucial to ensuring the beneficial and ethical use of AI.