Can AI Legally Be Trained Using All the Books in the World?
Judge rules in support of chatbot Claude — except when training materials are piratedHere’s a 21st century question our grandparents never could have pondered:
Is it legal to acquire practically “all the books in the world,” download or scan them to save on computer disk, and use their texts to train a large language model (LLM) to power an artificial intelligence (AI) system?
A federal district court judge in California had to decide that question under U.S. copyright law in Bartz v. Anthropic PBC (June 23, 2025).
The facts that the court had to consider
Anthropic is the AI company that offers the chatbot Claude, which competes with ChatGPT. Claude aims to provide natural, text-based conversations, but also provides document summarization, editing, decision-making, code-writing, and more.

According to the court’s 32-page decision, Anthropic trained Claude using an exponentially huge data base stored in a central computer library that aimed to contain and save “all the books in the world” forever. Anthropic assembled the library by downloading for free millions of copyrighted books in digital form stored on internet pirate sites. Anthropic also purchased physical books, tore off the covers, and scanned every page into digital files.
To imagine: one million books would contain roughly 80 billion words in total. Claude, trained upon many millions of books, achieved a scale that would enable it to wield unimaginable LLM computing power.
Anthropic’s construction of the library and use of both purchased and pirated book texts prompted three authors to sue Anthropic for infringing their copyrights to their books. If these authors win their case, it would encourage other authors to sue as well. AI companies could face lots of future litigation that might deter their chatbot and other projects.
Anthropic’s “Fair use” motion
In nearly all cases, a book’s author owns the copyrights in that work. For books, those rights include: (1) reproduction, i.e., making copies; (2) making derivatives; and (3) distributing copies by selling, renting, lending or donating them. A person who makes copies or derivatives, or who distributes copies, without permission, commits copyright infringement.
There’s one key exception, however, when the copying is considered fair use. Under 17 U.S.C. § 107, the copyright law expresses states “teaching (including multiple copies for classroom use), scholarship, or research” do not constitute infringing uses.
Anthropic admitted it made the copies of the books and stored them in their central library. But Anthropic tried to defeat the authors’ infringement claims by presenting a motion to the court arguing its copying and using for training were fair uses.
The court considered the relevant fair use factors, such as (1) the purpose of the use; (2) the nature of the copyrighted work; (3) the amount of text used; and (4) the use’s effect on the market value of the work.
Library copies, yes – Pirated copies, no
Anthropic partly won and partly lost its fair use defense arguments. Following statutes and precedents, the court made three main rulings.

First, Anthropic’s purchasing of books and tearing them apart to make and store digital copies of the text is considered fair use. That conclusion stems from observing that Anthropic did what anyone can do — buy a book and make a copy for storage that replaces the physical book that is discarded.
Second, using copies of books stored in the central digital library to train Claude is a fair use. Doing so is akin to using a book to teach other people its contents. Such use can also fall under Section 107 that makes copies used for “teaching, scholarship, or research” not infringing uses.
Third, however, faired poorly. The pirated digital book copies sank Anthropic on a fundamental point. Declared the court: “The downloaded pirated copies used to build a central library were not justified by a fair use. Every factor points against fair use.” Training Claude using book copies would be a fair use, but obtaining them by pirate copying infringed the authors’ copyrights.
The last word?
The district court’s decision on fair use in Bartz v. Anthropic PBC does not end the lawsuit. Court rules do not allow either party to appeal the decision because it is not a final ruling, although in truly exceptional cases the courts can hear an “interlocutory” appeal at this stage. The parties might negotiate to settle the case and avoid a trial and future appeals.
Additionally, it is unclear whether the decision will be officially published, although it may be used by other courts as persuasive analysis anyway.
At minimum, the decision tends to encourage AI systems that train on other people’s written works, so long as those works are either paid for or in the public domain. Using works for training LLMs has been approved by courts as non-infringement. One wonders, however, why Anthropic’s legal counsel ever thought that the company could copy and use pirated copyrighted works for long-term storage for possible LLM training and escape liability.
Yet, who really knows whether the federal Ninth Circuit Court of Appeals or even the Supreme Court might rule in Anthropic’s favor on the pirate-sourced copies? Stay tuned.