Is it possible to violate the copyright on a written work without actually copying the exact words in it? Yes. And that fact points up how ChatGPT can trample human authors’ rights to their creative work products.
The previous article, Authors Guild Sues OpenAI for Unlawful Copying of Creative Works, described the lawsuit filed by The Authors Guild and many individual writers against OpenAI (and related defendants) for having taught ChatGPT how to copy the writers’ articles and books and then to generate “derivative works.” The lawsuit first charges that OpenAI made unauthorized copies of billions of words of text, including likely thousands of entire books and articles, to use as training materials for ChatGPT. Making such copies would ordinarily violate the authors’ copyrights — and the violation could be viewed physically as duplicated text.
Stealing Authors’ Creative Content
The Authors Guild Complaint charges not only that OpenAI made unauthorized physical copies, however. People are using ChatGPT to mimic established authors’ style and content, drawing upon authors’ works to produce sequels, next episodes, and promotional materials. Moreover, the Complaint says, “Until recently, ChatGPT provided verbatim quotes of copyrighted text. Currently, it instead readily offers to produce summaries of such text. These summaries are themselves derivative works.”
The Complaint also alleges: “ChatGPT creates other outputs that are derivative of authors’ copyrighted works.” Reportedly, businesses are sprouting to teach users how to prompt ChatGPT to “enter the world of an author’s books and create derivative stories within that world.” Thus,
“ChatGPT is being used to generate low-quality ebooks, impersonating authors, and displacing human-authored books.” Some such books were found for sale on Amazon falsely claiming authorship by the impersonated human.
Original creative ideas count, so copyright law covers more than just an author’s exact words and phrases. As the federal First Circuit Court of Appeals, in TMTV v. Mass Productions (2011), confirmed (italics added):
Infringement can occur where — without copying a single line — the later author borrows wholesale the entire backdrop, characters, inter-relationships, genre, and plot design of an earlier work.
The TMTV decision followed the U.S. Supreme Court precedent, Feist Publications v. Rural Telephone Service Co. (1991), which recognized the human creativity element within copyright protection laws. Feist Publications held that unlawful copying includes “copying of constituent elements of the [original author’s] work that are original.”
Knowing Originality When We See It
To decide whether “original constituent elements” have been copied, federal courts use legal tests such as the “substantial similarity test.” To determine whether two works are substantially similar, courts may use the “ordinary observer” test. Another First Circuit precedent explained: “The test is whether the accused work is so similar to the plaintiff’s work that an ordinary reasonable person would conclude that the defendant unlawfully appropriated the plaintiff’s protectable expression by taking material of substance and value.”
Figuring out whether there is “substantial similarity” between two books or screenplays, for example, is not easy. Judicial panels in the First Circuit and Second Circuit thus had to dive deep into comparing the literary works in their cases, keeping in mind:
[T]he essence of infringement lies in taking not a general theme but its particular expression through similarities of treatment, details, scenes, events, and characterization.
ChatGPT: Designed to Mimic
The Guild’s Complaint details how individual authors have lost revenue opportunities, practically entire careers, as people switch away from paying writers to instead use ChatGPT to create human-like text. But remember: ChatGPT does not think, it does not create — it copies and mimics human writers’ work.
ChatGPT designing guru Stephen Wolfram’s 2023 book, What Is ChatGPT Doing … and Why Does It Work?, reveals how ChatGPT’s neural net is “trained”:
Essentially what we’re always trying to do is to find weights that make the neural net successfully reproduce the examples we’ve given. And then we’re relying on the neural net to “interpolate” (or “generalize”) “between” these examples in a “reasonable” way.
[G]enerally neural nets need to “see a lot of examples” to train well. And at least for some tasks, it’s an important piece of neural net lore that the examples can be incredibly repetitive. And indeed it’s a standard strategy to just show a neural net all the examples one has, over and over again. In each of these “training rounds” (or “epochs”) the neural net will be in at least a slightly different state, and somehow “reminding it” of a particular example is useful in getting it to “remember that example.”
In a nutshell, training ChatGPT’s neural network means flooding it with human text examples repeatedly so that it “learns” to “generalize” what the human texts say and then “remember” the salient aspects of those texts. And always recall what Wolfram said about ChatGPT:
[W]hat ChatGPT is always fundamentally trying to do is to produce a “reasonable continuation” of whatever text it’s got so far, where by “reasonable” we mean “what one might expect someone to write after seeing what people have written on billions of webpages, etc.”
The neural network training aims toward producing text that “one might expect to write” after reading the other authors, thus tuning ChatGPT aptly to create derivatives of human written works. Simply stated, ChatGPT is specifically taught to read authors’ works so that a human can request ChatGPT to mimic the original authors’ content and style.
Copyright laws protect derivative works, such as condensations, abridgments, translations, fictionalizations, and dramatizations. The copyright statute applies broadly also to protect authors’ ownership rights in derivatives to include “any other form in which a work may be recast, transformed, or adapted.”
ChatGPT, designed to digest human authors’ works in order to recast, transform, or adapt them, is practically a lightning-speed, derivative breeding machine. The Guild’s Complaint spotlights the clash of true human creativity vs. powerful machines that mimic human thoughts by exquisitely stealing from humans’ work products. How the U.S. courts will apply the copyright laws — to treat human writers as either valuable individual creators or as mere data sources for AI systems — remains to be seen.