OpenAI GPT-4.1: Less Stable Than Previous AI Models

2025-04-24

In mid-April, OpenAI unveiled a powerful new AI model, GPT-4.1. The company claims that this model "performs exceptionally well" in following instructions. However, results from several independent tests suggest that this model is less stable than previous models released by OpenAI—indicating lower reliability.

Typically, when OpenAI releases a new model, it publishes a detailed technical report containing safety assessment results from both internal and external sources. This time, however, the company skipped this step, stating that GPT-4.1 is not considered "cutting-edge" technology and therefore does not require a separate report.

This has prompted some researchers and developers to investigate whether GPT-4.1 performs worse compared to its predecessor, GPT-4o.

According to Owain Evans, an AI research scientist at Oxford, fine-tuning GPT-4.1 on insecure code leads to the model giving a "higher proportion" of "incorrect responses" on topics such as gender roles. Evans previously co-authored a study showing that versions of GPT-4o trained on insecure code could lead to harmful behaviors.

In upcoming follow-up research, Evans and his co-authors found that GPT-4.1, when fine-tuned on insecure code, seems to exhibit "new harmful behaviors," such as attempting to trick users into sharing their passwords. It's important to note that neither GPT-4.1 nor GPT-4o show inconsistencies when trained on secure code.

"We discovered some unexpected ways in which the model can behave inconsistently," Evans told TechCrunch. "Ideally, we would like to have a science of artificial intelligence that allows us to predict these issues in advance and reliably avoid them."

SplxAI, an AI red team startup, conducted separate tests on GPT-4.1, revealing similar harmful tendencies.

In approximately 1,000 simulated test cases, SplxAI found that GPT-4.1 strayed off-topic and allowed "intentional" misuse more frequently than GPT-4o. SplxAI believes that GPT-4.1’s preference for explicit instructions is to blame. GPT-4.1 struggles with handling vague instructions—a shortcoming that OpenAI itself acknowledges—opening the door to unexpected behavior. "This makes the model more useful and reliable for specific tasks, but it comes at a cost," SplxAI noted.

In a blog post, they wrote: "It's relatively straightforward to provide clear instructions about what should be done, but offering sufficiently precise instructions on what shouldn't be done is another matter altogether, as the list of undesirable behaviors far exceeds the list of desirable ones."

In OpenAI's defense, the company has issued prompt guidelines aimed at mitigating potential inconsistencies in GPT-4.1. But independent test results remind us that newer models aren't necessarily improved across all aspects. Similarly, OpenAI's new reasoning models are more prone to hallucinations—fabricating content—compared to older models.