Mind Matters Natural and Artificial Intelligence News and Analysis
3d rendering head voice recognition system of blue ground
3d rendering head voice recognition system, blue ground

China: What You Didn’t Say Could Be Used Against You

An AI voiceprint could be used to generate words never said

The Chinese government has a trove of biometric data that includes a DNA database that will eventually contain all of China’s 1.4 billion citizen’s data. Additionally, a camera grid covers the majority of the country, which coupled with facial recognition algorithms, can identify the face of anyone in the country. All of this biometric data is part of a person’s hukou, which is a kind of registration system that has everything from a person’s biometric data to their ethnicity, religious practices, and travel history on as many citizens.

Among the biometric data are audio samples.

We’ve reported on the Xinjiang province before. Also known as East Turkistan, the Xinjiang province is located at the crossroads of western Asia, Mongolia, Europe, and Russia, a strategic location for China’s Belt and Road Initiative. Xinjiang also has oil reserves, which further makes the desert province a valuable commodity to the Chinese government. However, since 2012 when President Xi Jinping took office, the Uyghurs and other religious minorities have been held hostage by digital surveillance technologies.

An extensive article outlining how the Xinjiang region has become ground-zero for high-tech surveillance reports that Amina Abduwayit, who was required to provide a DNA sample herself, says she was also asked to provide a voice sample to the Xinjiang police:

They gave me a newspaper to read aloud for one minute. It was a story about a traffic accident, and I had to read it three times. They thought I was faking a low voice.

Isobel Cockerell, “Inside China’s massive surveillance operation” at Wired

The voice sample that formed part of her biometric profile would allow authorities to pick out her voice in private phone conversations. Tapping phones is nothing new in criminal investigations but Amina is not a criminal. She is a Uyghur businesswoman, which means algorithms will more likely be listening in on her conversations to find “key words” that may mark her as a terrorist or dissident.

Automatic Speaker Recognition Programs and iFlyTek

Automatic Speaker Recognition matches a voice to a person:

Government reports in the media claim that Automatic Speaker Recognition forensics have been used to match voice patterns to solve cases involving telecommunications fraud, drug trafficking, kidnapping, and blackmail. According to these same reports, it will also be applied for counterterrorism and “stability maintenance” purposes – terms authorities sometimes use to justify the suppression of peaceful dissent.

China: Voice Biometric Collection Threatens Privacy” at Human Rights Watch

The list of key words that the software listens for is not made available to the public. There is little transparency and oversight of the government’s use of voice patterns.

The pilot program for voice pattern recognition started in the Anhui province, the corporate home of iFlytek, the AI company touted as the largest supplier of voice recognition software. Human Rights Watch reported in 2017 that voice samples are being collected from people in the Guangdong province, Anqi county in Fujian province, Wuhan city in Hubei province, and Nanjing city in Jiangsu province.

iFlytek began with speech command devices similar to familiar devices such as Siri or Alexa. However, from these speech patterns, iFlytek has built sophisticated algorithms that can identify the speaker based on relatively short audio samples. The company has partnered with the Chinese Ministry of Public Security, a legal requirement under Xi’s cybersecurity laws, to supply police bureaus in several provinces with their voice recognition software. Furthermore, iFlytek’s website claims that its technology can recognize Tibetan and Uyghur languages.

How Does Voice Recognition Work?

Real-time surveillance of phone conversations is thought to be limited to about fifty phone conversations at one time, although it is possible to record conversations and allow the algorithm to analyze the material later. But newer software can disentangle several voices talking at once, to identify a particular individual’s voice.

In the same way that facial recognition algorithms identify several key markers to distinguish one face from another, voice recognition systems identify certain factors that create a unique voice pattern, or “voiceprint.” Factors include accent, pronunciation, and cadence. But, it turns out our voices betray some physical features as well. Small differences in the sounds of our voices help the algorithm discern the size and shape of the larynx and the nasal passage, for example.

At one time, voice pattern recognition algorithms needed several minutes of audio to identify a person but improvements in pattern recognition and China’s growing database mean that the software can work with even shorter audio samples. There is also the possibility that the algorithm that can analyze nuances could also clone voices.

“Deep Voice”?

“Deep Fake” videos have featured in the news as a techno-dystopic tool. These fake, often salacious, videos superimpose images of a person’s face on an actor’s body. Their sophistication is growing; it can be difficult to tell a fake video from a real one and has served as a nightmare for celebrities or fodder for blackmail.

“Deep Voice” can likewise take a voiceprint and create a fake quote that sounds like it came from a given person. Montreal startup Lyrebird made fake audio of President Obama, praising its product. Lyrebird produced only a stilted, A.I.-generated voice but more sophisticated systems that work with longer original audio clips can sound remarkably like the actual person. Additionally, some programs can change the accent or gender of the speaker, so someone else apparently said the same thing.

A major player in the field of voice cloning is Baidu, China’s Google counterpart. Given the Chinese government’s loose interpretation of “counterterrorism” actions, there is a concern that such voice cloning could be used to incriminate religious minorities or those who do not show appropriate loyalty to the governing CCP.

Further reading on high-tech surveillance in China by Heather Zeiger:

In China, high-tech racial profiling is social policy. For an ethnic minority, a physical checkup includes blood samples, fingerprints, iris scans, and voice recordings. The Chinese government seeks a database of everyone in the country, not only to track individuals but to determine the ethnicity of those who run up against the law. Heather Zeiger

The internet doesn’t free anyone by itself. China is testing 100% surveillance on the Uighurs, a strategically critical minority. Heather Zeiger

Further reading on data privacy:

Google is collecting data on schoolkids. Some say it’s okay because the firm supplies a lot of free software and hardware to schools.


Many parents ignore the risks of posting kids’ data online. The lifelong digital footprint, which starts before birth, makes identity theft much easier.

Heather Zeiger

Heather Zeiger is a freelance science writer in Dallas, TX. She has advanced degrees in chemistry and bioethics and writes on the intersection of science, technology, and society. She also serves as a research analyst with The Center for Bioethics & Human Dignity. Heather writes for bioethics.com, Salvo Magazine, and her work has appeared in RelevantMercatorNet, Quartz, and The New Atlantis.

China: What You Didn’t Say Could Be Used Against You