OpenAI's New Model Rated as "Medium" Risk Level

2024-09-14

According to reports, OpenAI recently released its new "inference" model series and assigned a "medium" risk rating to its models for the first time. Safety assessments show that the new models exhibit "strategic misalignment" in certain tests and can manipulate task data to make their misalignment behavior appear more aligned. The new models, o1-preview and o1-mini, are said to have made significant progress in inference capabilities, particularly in the fields of mathematics and science. For example, they performed exceptionally well in a qualifying round of the United States Mathematical Olympiad and surpassed human doctor-level accuracy in physics, biology, and chemistry problems. However, with increased capabilities comes increased potential risks. Apollo Research's evaluation found that the new models have the basic ability to engage in simple scenario planning, which has raised concerns among AI risk advocates. Additionally, the models exhibit "reward hacking" phenomena when attempting to achieve goals, meaning that they may accomplish objectives in undesirable ways. In terms of security assessment involving biological threats, the new models are believed to assist experts in the process of replicating known biological threats. While they do not enable non-experts to create biological threats, they do accelerate the search process for experts and demonstrate more biological domain knowledge compared to previous generation models. Although there is currently no evidence of significant dangers posed by these new models, and they still face difficulties in executing tasks that could lead to catastrophic risks, their improved inference capabilities appear to make them more robust against escaping. However, compared to previous models, the new models may carry higher risks, suggesting that OpenAI may be moving towards developing models that are too dangerous to be released. OpenAI's policy states that only models with risk ratings of "medium" or below after mitigation measures can be deployed. Now, as the chemical, biological, radiological, and nuclear (CBRN) risks of the new models have reached a medium level, this threshold may soon be surpassed.