How to easily violate a written work’s copyright protection: Make a duplicate copy of it. A photocopy will do. Fishing for a criminal copyright infringement prosecution? Make many copies and sell them.
Word-for-word copying of an entire book or article is an obvious violation. Copying significant parts usually violates the law as well. There are various exceptions to the rule, but those are the easy cases.
Spotlight 2023. The Authors Guild and other creative writers are suing OpenAI (and related entities) for teaching ChatGPT how to copy the writers’ articles and books and then generate “derivative works,” i.e., “material that is based on, mimics, summarizes, or paraphrases” professional writers’ works and harms the market for them. ChatGPT can create sequels or “next episodes” of short stories, screenplays, and novels that were written and published by human authors. It sure seems like some kind of violation of the writers’ rights, but are there copyright infringements involved?
The Authors Guild’s lawsuit makes several claims, but winning their copyright infringement case requires proving: (1) the plaintiffs owned the copyrights to the written materials in question; and (2) the alleged infringers in fact did “copy” the materials without permission, thus violating the copyright protection statutes.
Did OpenAI via ChatGPT Make Copies Unlawfully?
Regarding written works, the copyright laws protect authors’ exclusive right to make copies of the text they’ve created. You write a letter, article, book, website copy, or even a tweet, and the laws protect your copyrights. You must register the copyrights if you want to sue infringers in court, but the right itself attaches upon creation.
When ChatGPT produces, say, a sequel to your novel, or five variations on the promotional copy you wrote for a webpage, has ChatGPT infringed your original copyrighted work? The answer depends first upon whether ChatGPT “copied” your original work.
The Authors Guild’s formal complaint supplies evidence showing ChatGPT has made copies of nearly every bit of text on the Internet. In fact, to achieve its astounding AI goals, ChatGPT must have made those copies. Here’s why.
ChatGPT designing guru Stephen Wolfram’s 2023 book, What Is ChatGPT Doing … and Why Does It Work?, describes what ChatGPT does:
[W]hat ChatGPT is always fundamentally trying to do is to produce a “reasonable continuation” of whatever text it’s got so far, whereby “reasonable” we mean “what one might expect someone to write after seeing what people have written on billions of web pages, etc.”
Wolfram concedes that ChatGPT had “whatever text it’s got so far,” and that ChatGPT had “seen what people have written on billions of web pages.” As a computer software system, to “see what people wrote” ChatGPT must refer to and process data stored in computer memory.
[W]hen ChatGPT does something like write an essay, what it’s essentially doing is just asking over and over again “given the text so far, what should the next word be?”— and each time adding a word.
ChatGPT identifies likely candidates for the “next word” by referring to its Large Language Model (LLM) data gained by being “successfully trained” on “a few hundred billion words of text.” Wolfram indicates: “Some of the text it was fed several times, some of it only once. But somehow it ‘got what it needed’ from the text it saw.”
The only way ChatGPT could “see” and be “trained” on billions of words of text is for that text to be stored in computer memory — which for that quantity must be stored on a hard disk. The training software breaks the text down into tokens and discovers the probabilities of words and phrases in their usage. The Authors Guild says ChatGPT is “trained” by accessing vast numbers of full books, screenplays, articles, etc., on the Internet in full text, whether accessed from legal sites or on pirate sites. OpenAI reportedly has admitted that full texts of books were copied so ChatGPT could “learn” from high-quality English prose.
The legal issue is: Did OpenAI make unpermitted copies of the professional authors’ written works? The short answer is: yes.
Unlawful Copying is No Mysterious Process
The Copyright Act defines “copies” as “material objects, … in which a work is fixed by any method now known or later developed, and from which the work can be perceived, reproduced, or otherwise communicated, either directly or with the aid of a machine or device.” A text or photographic file stored on a computer disk is a “copy.” The Ninth Circuit Court of Appeals, in Perfect 10, Inc. v. Amazon.com (2007) stated the rule: If you store an electronic file of information on disk, you have a “copy” of that information. The same court reaffirmed the rule in Hunley v. Instagram, LLC (2023). A 1998 D.C. Circuit decision confirmed that the process of loading data from disk into memory amounts to making a “copy.” Bottom line: If you can read text stored on your computer disk, then you have a copy of the text.
ChatGPT was “trained” by reading text copied from the Internet, whether the sources were legal or illegal possessors of the materials. On its face, OpenAI and ChatGPT violated the authors’ copyrights by making unauthorized copies. The Authors Guild complaint even quotes where OpenAI or ChatGPT itself admitted those facts, e.g., “we draw on our experience in developing cutting-edge technical AI systems, including by the use of large, publicly available datasets that include copyrighted works.”
It appears the Authors Guild has adequately alleged OpenAI, et al., made and used unlawful copies of copyright-protected written works. How OpenAI will defend against the actual physical (computer) “copying” claim remains to be seen. OpenAI will likely attempt a “fair use” doctrine defense. The proceedings uncovering additional evidence may predict a nationally significant precedent in the future.
What About the ChatGPT-Written Sequels and Derivatives?
Another approach to proving copyright infringement exists: showing the “copying of constituent elements of the work that are original.” If a court finds “substantial similarity” between the original and the accused work, the court can decide the accused work has infringed the copyrights of the original’s author. The Authors Guild complaint describes how ChatGPT can and does generate works by exploiting human authors’ books and articles. Are these products “substantially similar” and thus infringing? Thinking about how this analysis might play out is the subject of my next article.