Mind Matters Natural and Artificial Intelligence News and Analysis
female mic filter
Female vocal recording. Young girl with microphone and headphones in recording studio. Recording of vocal, blogger, reading text, voice acting.
Photo licensed via Adobe Stock

Deepfakes Can Replicate Human Voices Now — Maybe Yours

Digitally faked voice tech has already been used to perpetrate a big bank fraud

It’s not just your face that can be convincingly replicated by a deepfake. It’s also your voice — quite easily as journalist Chloe Beltman found:

Given the complexities of speech synthesis, it’s quite a shock to find out just how easy it is to order one up. For a basic conversational build, all a customer has to do is record themselves saying a bunch of scripted lines for roughly an hour. And that’s about it.

“We extract 10 to 15 minutes of net recordings for a basic build,” says Speech Morphing founder and CEO Fathy Yassa.

The hundreds of phrases I record so that Speech Morphing can build my digital voice double seem very random: “Here the explosion of mirth drowned him out.” “That’s what Carnegie did.” “I’d like to be buried under Yankee Stadium with JFK.” And so on.

But they aren’t as random as they appear. Yassa says the company chooses utterances that will produce a wide enough variety of sounds across a range of emotions – such as apologetic, enthusiastic, angry and so on – to feed a neural network-based AI training system. It essentially teaches itself the specific patterns of a person’s speech.

Chloe Veltman, “Send in the clones: Using artificial intelligence to digitally replicate human voices” at WAMU 88.5 (American University Radio) (January 17, 2022)

And how did Chloe feel about “Chloney,” her digital voice? “Chloney sounds quite a lot like me. It’s impressive, but it’s also a little scary.”

Not just scary:

In early 2020, a bank manager in the Hong Kong received a call from a man whose voice he recognized—a director at a company with whom he’d spoken before. The director had good news: His company was about to make an acquisition, so he needed the bank to authorize some transfers to the tune of $35 million. A lawyer named Martin Zelner had been hired to coordinate the procedures and the bank manager could see in his inbox emails from the director and Zelner, confirming what money needed to move where. The bank manager, believing everything appeared legitimate, began making the transfers.

What he didn’t know was that he’d been duped as part of an elaborate swindle, one in which fraudsters had used “deep voice” technology to clone the director’s speech, according to a court document unearthed by Forbes in which the U.A.E. has sought American investigators’ help in tracing $400,000 of stolen funds that went into U.S.-based accounts held by Centennial Bank.

Thomas Brewster, “Fraudsters Cloned Company Director’s Voice In $35 Million Bank Heist, Police Find” at Forbes (October 14, 2021)

The voice frauds will likely get more sophisticated if the rapid development of deepfake visuals and bios are anything to go by. They even fool reporters, says graphic designer John DeFeo:

When a reporter is writing a story that requires a source that he or she does not have, that reporter will likely turn to HARO, a service that “connects journalists seeking expertise to include in their content with sources who have that expertise.”

It’s no secret that search engine optimization specialists use the service to build links to content that profits them, but the rise of “deepfake” technology has made it easier than ever to exploit reporters.

Now, shady SEOs hide behind fake photos and personalities. The latest black hat search-engine optimization trend is to respond to Help-a-Reporter-Out (HARO) inquiries pretending to be a person of whichever gender/ethnicity the journalist is seeking comment from.

To combat this fraud, newsrooms must quickly adopt new methods for verifying sources.

John DeFeo, “Deep fakes are ruining the internet” at John DeFeo.com

Indeed. Video deepfakes were this advanced in 2018:

And this advanced today:

Can today’s deepfakes even be detected with current technology? Antivirus company Norton offered some thoughts on what to look for in 2020, including:

1. Unnatural eye movement. Eye movements that do not look natural — or a lack of eye movement, such as an absence of blinking — are red flags. It’s challenging to replicate the act of blinking in a way that looks natural. It’s also challenging to replicate a real person’s eye moments. That’s becomes someone’s eyes usually follow the person they’re talking to.

2. Unnatural facial expressions. When something doesn’t look right about a face, it could signal facial morphing. This occurs when a simple stich of one image has been done over another.

3. Awkward facial-feature positioning. If someone’s face is pointing one way and their nose is pointing another, you should be skeptical about the video’s authenticity.

Emerging Threats, “How to spot deepfake videos — 15 signs to watch for” at Norton (August 13, 2020)

That’s not very encouraging, really. These and the other tells that Norton lists — those that don’t require techie knowledge — signal that the deepfaker is an amateur (or just sloppy). Amateurs are not the big problem. Pros are. And detecting the pros requires new technology. From Norton again,

● Adobe. Adobe has a system that enables you to attach a sort of signature to your content that specifies the details of its creation. Adobe also is developing a tool to determine if a facial image has been manipulated.

● Researchers at the University of Southern California and University of California, Berkeley. A notable push is being led by university researchers to discover new detection technologies. Using machine-learning technology that examines soft biometrics like facial quirks and how a person speaks, they’ve been able to detect deepfakes with 92 to 96 percent accuracy.

● Deepfake Detection Challenge. Organizations like the DFDC are incentivizing solutions for deepfake detection by fostering innovation through collaboration. The DFDC is sharing a dataset of 124,000 videos that feature eight algorithms for facial modification.

Emerging Threats, “How to spot deepfake videos — 15 signs to watch for” at Norton (August 13, 2020)

Clearly, deepfake detection was still in a research phase at that point and probably still is. Yet last year, deepfakes were forecast to become a growing trend (CyberMagazine, December 6, 2021) that, in the view of one analyst writing at Socure, “will create havoc” because of limitations inherent in current detection methods:

So far, the research is mostly focused on finding patterns within the deepfakes by reverse-engineering the methods used to create deepfake imagery. This raises issues in fraud detection rates because the fraud models can’t detect the patterns until they see a high volume of the same pattern and bad actors can easily manipulate patterns to limit detection.

For instance, the winning algorithm in Meta/Facebook’s most recent deepfake detection competition was only able to detect about 65% of the deepfakes it analyzed.

Fraud Trend to Watch For: For the unprepared, deepfakes are already able to maneuver around standard document validation and “liveness” detection. In 2022, they will get astronomically better and more dangerous.

Mike Cook, “Deepfakes Will Create Havoc: 2022 Fraud Trend Series” at Socure (January 19, 2022)

Socure is one of the companies helping to develop algorithms for spotting deepfakes.

While deepfakes may or may not “create havoc,” a massive security breach or fraud could well be the event that creates more public awareness and sets in motion a broader industry response.

You may also wish to read: Sci-fi could come to life if you fall for a deepfake friend. How aboout, you discover that the friend you knew only online is a starkly believable software synthesis? A Carnegie Mellon prof says it could happen today. Simon DeDeo sees the rise of convincing but non-existent identities as an incentive to ask ourselves, how much of what WE say is autobabble?

Mind Matters News

Breaking and noteworthy news from the exciting world of natural and artificial intelligence at MindMatters.ai.

Deepfakes Can Replicate Human Voices Now — Maybe Yours