^{Heather Zeiger
February 10, 2025

7

Computer Security, Large Language Models (LLMs), Law}

DeepSeek: Honing In on the Challenges It Presents

_{Although the program is admirably streamlined, censorship, data breaches, copyright violation, and lack of guardrails are among the most prominent challenges} _{Heather Zeiger
February 10, 2025

7

Computer Security, Large Language Models (LLMs), Law}

Share: Facebook; Twitter/X; LinkedIn; Flipboard; Print; Email

China’s AI platform DeepSeek launched in January, and it seems like every day since then has brought another news report of its censorship of topics that the Communist Party of China finds taboo, data breaches, and lack of guardrails for dangerous output. Among the more controversial questions are whether DeepSeek copied ChatGPT, using a technique called “distillation”, and how exactly DeepSeek got the microchips necessary to produce the product, given U.S. sanctions.

As engineers and lawyers unpack some of these issues, let’s take a bird’s eye view of some of the ethical concerns surrounding DeepSeek. These concerns are not just about DeepSeek but about other technologies and AI models originating in China.

DeepSeek and the TikTok Problem

DeepSeek’s terms of service specify that user data will be sent back to China. As per Chinese law, the government is allowed to confiscate that data and use it for whatever it wishes. In other words, in China, data privacy laws apply to companies, but the government is outside the law. Furthermore, DeepSeek collects data from the user’s device that is not necessary for its operation. In short, while all apps collect data, not all app’s granular user data is at the disposal of the Chinese government to use without restrictions.

Additionally, several journalists have reported DeepSeek censoring information that the Communist Party of China considers taboo. The Wall Street Journal did a side-by-side comparison of ChatGPT and DeepSeek, asking about the “Three T’s,” Tibet, Taiwan, Tiananmen Square. They also asked about China’s socialist system that sometimes uses capitalism. Another journalist asked about Xi Jinping and some of the subtle jabs at China’s leader, such as a comparison with Winnie-the-Pooh.

In each case, either DeepSeek spouted the CCP’s preferred narrative, as seen in Chinese state-backed media or evaded the question entirely. When asked about Taiwan, ChatGPT gave a nuanced answer, while DeepSeek gave the CCP’s preferred narrative that most in the global community see Taiwan as part of China. When asked about Tiananmen Square on June 9, 1989, DeepSeek asks to talk about something else. ChatGPT, by contrast, provided the historical context of the pro-democracy movement and the Chinese army being used against its own citizens.

In terms of the issues of data privacy and serving as a propaganda outlet for Beijing, DeepSeek has the same problems as TikTok.

The Difference between DeepSeek and TikTok

The key difference between DeepSeek and TikTok is TikTok’s algorithm recommendation system. As I covered in an earlier article, TikTok’s algorithm will take the user down the recesses of the internet because shocking material keeps people engaged. This is different from DeepSeek, which isn’t recommending content so much as responding to user input.

However, the conversations that the user has with DeepSeek’s chatbot are saved. Those conversations can take an insidious turn, especially because, as researchers from Cisco and the University of Pennsylvania discovered, DeepSeek failed 100% of the safety tests they conducted.

This brings up an oddity about DeepSeek. While the AI model is very careful about curtailing topics that would be offensive to the Chinese Communist Party, it apparently has few guardrails for providing information on how to make a weapon, modify the bird flu virus, or instruct teens on self-harm.

DeepSeek Is Both Innovative and Suspicious

Chinese companies have a notorious track record of stealing intellectual property and co-opting it for themselves. But they are also innovative at streamlined technologies by using fewer parts and thus decreasing costs.

Michael Shuman, writing for The Atlantic from Beijing, asks, “Which DeepSeek is the real DeepSeek? The plucky innovator or the unethical swindler?” He says the answer is both.

ChatGPT Chat with AI or Artificial Intelligence. Young businessman chatting with a smart AI or artificial intelligence using an artificial intelligence chatbot developed by OpenAI..

The most impressive feature of DeepSeek is its streamlining. The developers apparently used fewer chips and less money than others in the tech world thought possible. Chinese manufacturers have done this in other areas, like agriculture, as well. However, DeepSeek may have bypassed the research and development that often costs early adopters of new technologies a great deal. Shuman notes,

Chinese companies have proved to be skillful inventors, capable of competing with the world’s best, including Apple and Tesla. And they have also proved adept at copying and stealing technology they don’t have, then turning it against the rivals that created it. Making a product on the cheap is much easier when you don’t have to invest in developing it from scratch.

“DeepSeek and the Truth about Chinese Tech” February 4, 2025

When DeepSeek first came out, engineers asked it what model it was. Apparently, DeepSeek answered “ChatGPT”. It has also referred to itself as ChatGPT when asked to compare itself to other AI models, like Gemini. Furthermore, as Gregory Allen, director of Wadhwani AI Center at the Center for Strategic and International Studies, told the AP, DeepSeek claims to be open source, but does not reveal the data it used to train the AI, which is likely because it used ChatGPT to train its data.

Issues Around “Distillation”

All this points to the strong possibility that DeepSeek used a technique called “distillation,” which violates OpenAI’s terms of service. Distillation uses AI to reverse engineer an AI model, which sounds more dystopic than it really is. Programmers design an algorithm that extracts the parameters from a model like ChatGPT-4, and then puts those parameters into a smaller model, like DeepSeek. It does this by asking thousands of questions that are designed to figure out how the AI model works. Then programmers design their own product using the existing model’s data and certain aspects of its algorithmic capabilities.

Chatbot in a modern GPU card 3D rendering

Reverse engineering is an ethically shady area. Normally it’s unethical to reverse engineer a product, make a product based on the proprietary design of the original, and then sell it. But reverse engineering is not technically illegal, and in this instance, is hard to prove.

Another problem with distillation has to do with the loss of information. Because a copy is being made of an original model, something is lost in the process. Even if it was possible to maintain every bit of information, one cannot increase information. The copy is reliant on the original model.*

For example, when an AI like ChatGPT learns from articles that were written by ChatGPT, the subsequent outputs become unintelligible. When ChatGPT first came out, several analysts questioned whether these AI models are self-defeating because a World Wide Web filled with algorithmically generated content decreases the amount of original content that can train an algorithm. AI requires original input of information to maintain itself.

Of course, several people have pointed out the irony that OpenAI trained ChatGPT on copyrighted materials, which violates many news outlets’ terms of service. I have written elsewhere about how AI itself mimics reading and writing, producing a diminished version of real reading and writing (done by humans).

We shall see whether DeepSeek holds its own against other AI models. It has certainly brought to the forefront questions about the value of investing in proprietary AI models and intellectual property.

*My appreciation to George Montañez whose presentation “Why Humans Can’t Be Replicated by AI” at The Center for Science & Culture’s 2025 Dallas Conference on Faith & Science helped clarify the limitations of algorithmic mathematical models, why degradation occurs, and how distillation works.