Audio Cloning Can Take Over A Phone Call In Real Time Without The Speakers Knowing

“Audio cloning can take over a phone call in real time w/o the speakers knowing” – The implications of this are monumental.

And this should not be surprising, I bet this has been possible for decades, well the cloning at least. The age of deep fakes.

Generative AI could be listening to your phone calls and hijacking them with fake biometric audio for fraud or manipulation purposes, according to new research published by Security Intelligence.

In the wake of a Hong Kong fraud case that saw an employee transfer US$25 million in funds to five bank accounts after a virtual meeting with what turned out to be audio-video deepfakes of senior management, the biometrics and digital identity world is on high alert, and the threats are growing more sophisticated by the day.

A blog post by Chenta Lee, chief architect of threat intelligence at IBM Security, breaks down how researchers from IBM X-Force successfully intercepted and covertly hijacked a live conversation by using LLM to understand the conversation and manipulate it for malicious purposes – without the speakers knowing it was happening.

“Alarmingly,” writes Lee, “it was fairly easy to construct this highly intrusive capability, creating a significant concern about its use by an attacker driven by monetary incentives and limited to no lawful boundary.”

Hack used a mix of AI technologies and a focus on keywords

By combining large language models (LLM), speech-to-text, text-to-speech and voice cloning tactics, X-Force was able to dynamically modify the context and content of a live phone conversation.

The method eschewed the use of generative AI to create a whole fake voice and focused instead on replacing keywords in context – for example, masking a spoken real bank account number with an AI-generated one.

Tactics can be deployed through a number of vectors, such as malware or compromised VOIP services. A three second audio sample is enough to create a convincing voice clone, and the LLM takes care of parsing and semantics.

“It is akin to transforming the people in the conversation into dummy puppets,” writes Lee. “And due to the preservation of the original context, it is difficult to detect.”

With advanced social engineering added to the mix, the size of the attack surface only grows.

Outside of fraud, Lee also points to the potential for a new kind of real-time censorship, which could have dire implications for political discourse, journalism and the general fabric of reality.

Source:Biometric Update, TLAV

Related Stories

A team of physicists says there may be another universe running backward in time

Johnsonville is recalling several thousand pounds of bratwurst over a contamination concern

AI models like GPT-4.5 are now so convincing in conversation that they can pass as human, 73% of the time

You may have missed

U.S. crude oil falls below $60 a barrel to lowest since 2021

A team of physicists says there may be another universe running backward in time

Johnsonville is recalling several thousand pounds of bratwurst over a contamination concern

Fatal I-94 crash on Chicago’s South Side shuts down some lanes, ISP says