Mind Matters Natural and Artificial Intelligence News and Analysis
ai-creative-artificial-intelligence-icon-on-flag-of-china-an-1226963711-stockpack-adobe_stock
AI creative artificial Intelligence icon on flag of China and sunset sky background, Chips and Brain. Neural networks and machine learning concept. Deepseek, Chatgpt
Image Credit: Mantas Žiličius - Adobe Stock

Did China’s DeepSeek Violate OpenAI’s Legal Rights?

“Distillation” technology may have allowed DeepSeek to piggy-back on ChatGPT to capture market share
Share
Facebook
Twitter/X
LinkedIn
Flipboard
Print
Email

Suppose you were OpenAI, the developer of ChatGPT. You’ve invested perhaps $500 million or more in it. Training the latest version, ChatGPT-4, has cost another $100 million or more. The training scoured and processed about a petabyte of Internet-based data. That’s equivalent to a million gigabytes, or about 500 billion pages of text in English.

Minimalist AI Icons: ChatGPT vs DeepSeek, Sleek Gradient Background, Futuristic Design, No TextImage Credit: Zoluck Dezigner - Adobe Stock

Then comes DeepSeek, owned and based in China, which is a system like ChatGPT. But instead of building all the infrastructure and doing all the training from the bottom up, DeepSeek has used “distillation“ to catch up with ChatGPT at a fraction of the cost. Hey, no fair, right?

Was Information Stolen?

When DeepSeek became an AI player, OpenAI declared that it believed that DeepSeek had essentially stolen data and text from ChatGPT. It viewed that as copyright infringement.

Last week, an AI and Internet watchdog site published “Microsoft and OpenAI Investigate Whether DeepSeek Illicitly Obtained Data from ChatGPT.” It reported that both Microsoft and OpenAI are investigating whether DeepSeek accessed OpenAI’s data through its application programming interface (API) without authorization. The Financial Times learned that OpenAI has evidence of data theft, and U.S. officials suspect DeepSeek used OpenAI’s outputs to train its own model by a process called “distillation.”

The initial charge against DeepSeek is data theft that could also amount to copyright infringement. Indeed, using distillation, DeepSeek may have successfully mimicked ChatGPT.

In simplified terms, here’s how distillation works, using the ChatGPT example:

ChatGPT exhaustively scours the Internet to build its large language model (LLM), which powers the algorithms that respond to the questions users pose. Another LLM system, such as DeepSeek, then asks ChatGPT a billion questions (figuratively, it might be far more) and uses ChatGPT’s answers as its starting point for building its own knowledge basis.

Distillation vastly shortens the “training” time for an LLM. In this way, DeepSeek could get a head start toward matching ChatGPT but without first conducting all the ground level research and processing.

Copyright Violation?

We’ve previously looked at whether AI systems can infringe copyrights. Basically, a human person creates written text that, under U.S. law, is protected by copyright law. When another person makes a copy of that text without permission, the copier infringes the creator’s right.

Now consider what happens when DeepSeek is educating itself by asking ChatGPT a question and getting a text answer. What person creates the written text of that answer? No human person writes ChatGPT’s text answers. Thus no person holds a copyright to the answers.

AI Chatbot intelligent digital customer service application concept, computer mobile application uses artificial intelligence chatbots automatically respond online messages to help customers instantlyImage Credit: Thapana_Studio - Adobe Stock

Therefore, if DeepSeek received, processed and stored ChatGPT’s billions of answers to questions, DeepSeek was not violating a person’s copyrights. That’s the simple answer.

Of course, lawyers might develop a complicated argument saying the owners or programmers of ChatGPT should own the written text the bot generates by itself. Such an argument stretches the traditional meaning of copyright, however, and courts might well reject it. The U.S. Congress could certainly change the statutes to apply copyright protection to chatbot generated text and images, but that hasn’t happened yet.

Terms of Use Violation?

Reportedly, Microsoft’s investigation found unusually high volumes of data retrieval by users that might be linked to DeepSeek. OpenAI allows users a substantial but not unlimited amount of access to ChatGPT, and its terms of use prohibit mining information from ChatGPT to build a competitor.

Microsoft’s security team detected a group believed to have ties to DeepSeek extracting huge amounts of data from OpenAI software that allows developers access to proprietary models for a fee. Microsoft researchers detected an unusually high volume of data retrieval, which violates OpenAI’s terms of service and suggests an attempt to circumvent its restrictions.

ChatGPT’s published terms of service prohibit all ChatGPT users, human or otherwise, from:

(1) reverse engineering OpenAI software to learn its models and algorithms;

(2) using automated bots or programs to extract data or ChatGPT outputs;

(3) exceeding or bypassing data transfer rate limits; and

(4) “using ChatGPT’s outputs to create competing AI models.”

If DeepSeek or any other AI system used “distillation” techniques to pirate information from ChatGPT, it violated the terms of service. Distillation uses an automated bot to conduct vast data transfer quantities at relatively high speeds. And if DeepSeek were distilling from ChatGPT, it would be using ChatGPT’s outputs to create the competing AI system. These violations would constitute a straightforward breach of contract.

AI Foments a Crash of Public Policies

Copyright laws aim to achieve two main goals:

(1) to recognize and credit unique human creativity, and thus support each human creator’s rights to the products of creative intellectual effort; and

(2) to encourage human creativity and initiative to produce intellectual work products that benefit and uplift society generally.

Chatbots can now produce works that look like human creative works. Granting copyright protection to chatbots’ products undermines the policy goal of recognizing and protecting human creativity. But granting bot products copyrights does encourage people to build ever “better” bots whose work is “better” than human products, arguably serving the policy of benefiting society. That’s assuming the bot products are truthful and not deceptive or suppressing facts and views.

Personally, I would advise against treating AI products as legally equivalent to human products. AI produces any product only because of human creativity, reflecting the spark of the Divine. Laws must never treat humans as the meat version of AI machines.


Richard Stevens

Fellow, Walter Bradley Center on Natural and Artificial Intelligence
Richard W. Stevens is a retiring lawyer, author, and a Fellow of Discovery Institute's Walter Bradley Center on Natural and Artificial Intelligence. He has written extensively on how code and software systems evidence intelligent design in biological systems. Holding degrees in computer science (UCSD) and law (USD), Richard practiced civil and administrative law litigation in California and Washington D.C., taught legal research and writing at George Washington University and George Mason University law schools, and specialized in writing dispositive motion and appellate briefs. Author or co-author of four books, he has written numerous articles and spoken on subjects including intelligent design, artificial and human intelligence, economics, the Bill of Rights and Christian apologetics. Available now at Amazon is his fifth book, Investigation Defense: What to Do When They Question You (2024).
Enjoying our content?
Support the Walter Bradley Center for Natural and Artificial Intelligence and ensure that we can continue to produce high-quality and informative content on the benefits as well as the challenges raised by artificial intelligence (AI) in light of the enduring truth of human exceptionalism.

Did China’s DeepSeek Violate OpenAI’s Legal Rights?