AI models like GPT-4.5 are now so convincing in conversation that they can pass as human, 73% of the time

Screenshot_20250405-124042
Share

Yahoo.com

A new study from UC San Diego found that advanced AI models like GPT-4.5 are now so convincing in conversation that they can pass as human, 73% of the time.

In modern Turing Tests, people often choose the AI over real humans in five-minute chats, primarily when the AI uses a “PERSONA” prompt to sound more lifelike.

Experts say these bots could soon handle roles in customer service, online companionship, and beyond, raising questions about how we connect in a world of human-like machines.

According to lead researcher Cameron Jones, GPT‑4.5 with a strategic “PERSONA” prompt managed a win rate of 73%—meaning that in five-minute chat sessions, the AI system was identified as the human more often than the actual human was.

Llama‑3.1‑405B also crossed this threshold (albeit at a lower 56% win rate) when similarly prompted to adopt a specific persona.

By contrast, GPT‑4o, a reference model presumably powering today’s widely used ChatGPT, only managed a 21% success rate under minimal instructions. These results have reignited the debate about whether Turing’s imitation game is still a meaningful measure of human-like intelligence or if it mostly underscores modern AI’s ability to imitate human conversation.

A British mathematician and computer scientist, Alan Turing, first proposed his imitation game in 1950 as a thought experiment.

If an interrogator could not reliably tell the difference between a human and a hidden machine in text-based conversation, Turing reasoned the machine might be said to “think.”

Generations of AI enthusiasts have used the Turing Test as a yardstick, albeit one that was initially more philosophical than technical. Yet, over the decades, multiple chatbots have been said to have “passed” the Turing Test—often with disclaimers.