Deepfake and voice cloning: Why detection lags behind synthesis in the LLM era

LLMs produce such realistic outputs that even experts struggle to distinguish between genuine and artificial content.

Updated: Aug 01, 2024 08:26 PM EST

a month ago

Deepfake and voice cloning: Why detection lags behind synthesis in the LLM era

Representational picture.

Ni Tao is IE’s columnist, giving exclusive insight into China’s technology and engineering ecosystem. His Inside China column explores the issues that shape discussions and understanding about Chinese innovation, providing fresh perspectives not found elsewhere.

The rise of large language models (LLMs) has shaken the global content industry to the core.

One of the most undesirable byproducts of LLMs is a flurry of crimes involving deepfakes. They challenge internet security standards and raise the bar for AI ethics and governance.

Regulators worldwide are struggling to contain a surge in deepfake frauds caused by misuse of generative AI and LLMs.

The Verge reported that Microsoft is urging members of the US Congress to crack down on AI-generated deepfakes to protect against fraud, abuse, and manipulation.

Brad Smith, vice chair and president of Microsoft, has called for urgent action from policymakers to protect elections, guard seniors from fraud, and protect children from abuse.

“One of the most important things the US can do is pass a comprehensive deepfake fraud statute to prevent cybercriminals from using this technology to steal from everyday Americans,” Smith said.

He wrote that Microsoft wants a new legal framework to prosecute against AI-generated scams and abuse.

To Smith’s credit, legislation can fundamentally and systemically address the root cause of some issues, such as user privacy and data misuse.

Under a new law on AI ethics and governance, labeling AI-generated content would perhaps become mandatory, and specific penalties would be meted out to scammers to act as a deterrent.

Gray areas and elusive justice

The problem with legislation, however, is that it will always have gray areas. In some cases, justice is elusive, as it’s hard to draw a clear line between AI-generated content meant for entertainment and that used to commit fraud.

The greatest challenge in confronting deepfake scams lies in the technical realm, as LLMs further complicate efforts to curb deepfakes.

While fraudulent videos and images elicit more attention worldwide, industry experts say voice forgery poses a bigger threat in China. The reasons are twofold.

For one thing, human speech is a one-dimensional continuous signal that requires more intricate processing logic than two-dimensional images or videos.

In addition, voice deepfakes are harder to detect than images and videos, as they comprise accents, dialects, speech habits, intonation, and other factors.

Traditionally, fake voices generated using deep learning methods such as TTS (text-to-speech) and ASR (automatic speech recognition) algorithms are easier to recognize.

The rise of LLMs is a game-changer. With just a few prompts, it’s easy to clone voices to mimic real people, making them so authentic and human-like that they can fool a close friend or family member.

A ‘catch me if you can’ battle

A “catch me if you can” battle is thus playing out between deepfake fraudsters and AI experts out to blow their covers. Surprisingly, the latter, often outwitted and outnumbered, struggle to keep pace.

This is because “current voice forgery detection technology lags behind voice synthesis technology,” said Lei Chen, Vice President and Head of Big Data and AI at FinVolution Group.

FinVolution is a leading AI and fintech company from China. It employs technologies, including LLM, across its business lines, automating work processes and assisting industry partners, such as call center staff, in detecting forged voices.

Due to pervasive loopholes in personal information protection, many Chinese people are hit by a constant barrage of harassment and sales calls every day. Some of these calls are disguised scams designed to trick recipients into saying specific words, which are then recorded to commit fraud.

Impersonators frequently use synthesized voices to steal identities, sometimes altering users’ bank information for malicious purposes.

More forceful measures are needed to stem the spread of voice deepfakes in light of these rampant crimes. But why has voice forgery detection fallen behind in the era of LLMs?

Chen explained that the growing realism of LLMs makes it tricky for even experts to tell the difference between real and fake. Their authenticity surpasses anything seen in the past, he added.

The second challenge is that deep learning voice synthesis models were previously limited to a small range of voices.

With the advent of LLMs, however, possibilities have expanded dramatically, allowing for the generation of numerous voices and emotional nuances.

Thirdly, earlier voice manipulation was closely tied to text, resulting in clones that only resembled the originals. Today, conversations have become genuinely fluid and smooth, with AI understanding and responding just like real people.

The blurred boundary

As LLMs blur the boundary between fake and real voices, there is an urgent need to improve detection technology. This has garnered significant attention from AI professionals, prompting them to develop various countermeasures.

In addition to publishing research papers and findings, competitions are held across the globe from time to time to tackle challenges posed by deepfakes.

In late 2019, for example, Facebook held its first Deepfake Detection Challenge to find user-created algorithms adept at identifying deepfake videos.

However, many current competitions have significant limitations. According to Qiang Lyu, an algorithm scientist at FinVolution, this is primarily because fake voice models and countermeasures developed in recent years are less effective in the age of LLMs.

Additionally, the scarcity of publicly available, open-sourced voice datasets contributes to a lag in voice forgery detection. As a result, many platforms find it hard to identify and flag fraudulent activities promptly.

To improve the ability of AI scientists to combat fake voices, competitions should not only focus on the latest LLM-cloned voices but also incorporate re-recorded fake voices blended with real voices to foster innovation among participants.

This is exactly what FinVolution did. During the 9th FinVolution Global Data Science Competition in late July, contestants were encouraged to use deep learning and LLM technologies to develop models capable of detecting fake voices in the datasets they were offered.

Lyu, the algorithm scientist, explained that blending synthesized voices with real ones left many participants scratching their heads. However, these tests, designed to simulate real-world scenarios, offer enormous practical value.

“The inclusion of fake voices generated by LLMs raised the competition’s difficulty, highlighting the enhanced ability of these models to create realistic fakes,” said Lyu. “This demands that deepfake detection technologies keep pace to protect the public from misuse.“

Enhance preparedness against deepfakes

Lyu is spot on about the rivalry between voice forgery and detection technologies, which continually intensify as they compete.

“Detection has no endpoint; as long as generative AI has not reached its pinnacle, detection efforts will continue,” said Chen, the vice president of FinVolution.

Amid heightened concerns about the havoc AI can wreak on humanity, more robust laws and regulations on AI are expected in the coming years.

Major technology platforms will likely embrace greater self-discipline, clearly labeling AI-generated content to mitigate ethical risks and consequences.

However, it’s important to note that in the tech industry, legislation and self-regulation cannot always keep up with emerging issues. Despite the teeth they may have, laws alone are not enough to put the genie back in the bottle.

Hopefully, we can expect faster technological advancements as the global tech community holds competitions to enhance its preparedness against deepfake frauds.

This might enable tech professionals to navigate and, even better, outpace the unanticipated challenges LLMs present and reduce the number of deepfake victims.

0COMMENT

ABOUT THE EDITOR

Ni Tao Ni Tao worked with state-owned Chinese media for over a decade before he decided to quit and venture down the rabbit hole of mass communication and part-time teaching. Toward the end of his stint as a journalist, he developed a keen interest in China's booming tech ecosystem. Since then, he has been an avid follower of news from sectors like robotics, AI, autonomous driving, intelligent hardware, and eVTOL. When he's not writing, you can expect him to be on his beloved Yanagisawa saxophones, trying to play some jazz riffs, often in vain and occasionally against the protests of an angry neighbor.