OpenAI's ChatGPT-4.5 Passes Turing Test with 73% Success Rate

2025-04-15

OpenAI's ChatGPT-4.5 has achieved a milestone once thought to be decades away: in a Turing Test-style evaluation, it successfully convinced the majority of participants that it was human.

In a recent study conducted by the University of California, San Diego, researchers aimed to assess whether large language models could pass the classic three-party Turing Test. According to reports, GPT-4.5 succeeded in 73% of text-based conversations.

The research showed that the latest large language models outperformed earlier versions, such as GPT-4.0 and other models including ELIZA and LLama-3.1-405 B.

GPT-4.5, introduced by OpenAI in February, is capable of recognizing subtle linguistic cues, making it appear more human-like. According to Cameron Jones, a postdoctoral researcher at UC San Diego.

"If you ask them what it feels like to be human, these models tend to respond well and can convincingly pretend to have emotions and personal experiences," Jones told Decrypt. "However, they struggle with processing real-time information or current events."

The Turing Test, proposed by British mathematician Alan Turing in 1950, evaluates whether a machine can mimic human conversation convincingly enough to deceive a human judge. If the judge cannot reliably distinguish between machine and human, the machine is considered to have passed the test.

To evaluate AI model performance, researchers tested two types of prompts: a baseline prompt with minimal instructions and a more detailed prompt guiding the model to adopt the tone of an introverted, web-savvy young person using slang.

"We selected these witnesses based on an exploratory study where we evaluated five different prompts and seven different LLMs, finding that LLaMa-3.1-405B, GPT-4.5, and this role prompt performed the best," the researchers said.

The study also discussed the broader social and economic implications of large language models passing the Turing Test, including potential misuse.

"Some risks include misinformation, such as bots posing as ordinary people to increase interest in a particular cause," Jones said. "Other risks involve fraud or social engineering—if a model engages in long-term email correspondence with someone and appears genuine, it might persuade them to share sensitive information or access bank accounts."

On Monday, OpenAI announced the launch of the next version of its flagship GPT model, GPT-4.1. This new AI is more advanced, capable of handling extensive documents, codebases, and even novels. OpenAI stated that it will phase out GPT-4.5 this summer and replace it with GPT 4-1.

Although Turing never witnessed today’s AI landscape, Jones noted that his 1950 test remains relevant.

"The Turing Test is still relevant to Turing's original intent," he said. "In his paper, he talked about learning machines and suggested building something that passes the Turing Test by creating a computational child that learns from vast amounts of data. That's essentially how modern machine learning models work."

When asked about criticisms of the study, Jones acknowledged their value while clarifying what the Turing Test measures and does not measure.

"I mainly want to say that the Turing Test isn't a perfect intelligence test—not even a test of human characteristics," he said. "But it is valuable in what it measures: whether a machine can convince people that it is human. That is worth measuring and has practical significance."