^{Gary Smith
July 1, 2026

5

Business and Finance, Large Language Models (LLMs), Law}

Plagiarism and Defamation—Two More Bad Things LLMs Are Good At

_{The authors’ and publishers’ copyright claims against the chatbot developers are only one of the legal problems the industry faces} _{Gary Smith
July 1, 2026

5

Business and Finance, Large Language Models (LLMs), Law}

Share: Facebook; Twitter/X; LinkedIn; Flipboard; Print; Email

When most people say “AI” today, they’re talking about large language models (LLMs) like ChatGPT, Gemini, and Claude. LLMs are relentless data miners that train on unimaginably large text databases, looking for multi-dimensional statistical relationships among small chunks of text called tokens.

Their training data include digitized news articles, scientific papers, books, and much of the Internet, including Wikipedia — mostly without the permission or even the knowledge of those who created the content. Many authors are understandably enraged but, so far, they have had limited success in arguing that they should be compensated for the use of their copyrighted material.

The Anthropic settlement

The biggest settlement to date involved Anthropic, which agreed to pay $1.5 billion ($3,000 per book) to the authors of 500,000 books used to train it. (Full disclosure: I am scheduled to receive $12,000 for four books, though I will believe it when I see it.) This was a limited win for authors in that a federal judge ruled that Anthropic’s misdeed was not that it has been trained on copyrighted books but that it had wrongly acquired these books through piracy websites.

The companies behind the LLMs argue that they are not plagiarizing copyrighted material because their LLMs generate fresh content based on word patterns across all of the text they train on. In 2025, however, researchers found that a Meta LLM had memorized and could regurgitate most of J. K. Rowling’s first Harry Potter book, George Orwell’s Nineteen Eighty-four, and other books.

In March 2026, Encyclopedia Britannica and Merriam‒Webster filed a lawsuit against OpenAI, alleging that OpenAI used nearly 100,000 copyrighted online articles without permission to train its LLMs and that OpenAI violates copyright laws whenever its LLMs generate output that contains “full or partial verbatim reproductions” of its content.

As an author, I am keenly interested in how the many lawsuits working their way through legal systems inside and outside the United States will get resolved. Interestingly, the tech companies’ argument that the content generated by their LLMs is fresh and original exposes them to a different kind of legal liability.

Are LLMs Carriers or Publishers?

Traditionally, a distinction has been made between carriers and publishers. Because phone companies are carriers, they are not responsible for the content they transmit. Newspapers are publishers; they decide what they will publish and are responsible for published content that is defamatory or otherwise illegal.

Internet Pollution — If You Tell a Lie Long Enough…

But what are Internet companies like Google? Search engines have traditionally been viewed as “neutral intermediaries” that are similar to phone companies and not liable for the content on the web pages they display or link. They didn’t write or publish it; they merely informed users of its existence.

LLMs are different because they do generate and publish content. In May 2026, a German regional court issued a temporary injunction banning Google’s AI Overviews from continuing to display false statements about two Munich-based publishers, including “[it] is known for dubious business practices and is often perceived as a scam.” None of AI Overviews’ false statements were in the sources that AI Overviews cited. They were full-blown AI hallucinations.

Tested in a German court

The publishers sent Google a cease-and-desist letter, which it ignored, and then sued, asking for a injunction to stop further defamation and corporate reputational damage. The German court ruled that AI Overviews was more like a newspaper than a phone company because it generates “independent, new, and substantive statements.” Since Google is the “author and publisher” of AI Overviews it is directly responsible for its content.

Ironically, Google argued that “users can check for themselves” and that people generally know “that information generated with AI should not be blindly trusted.” If true, that is pretty much an admission that AI Overviews is close to useless.

In rejecting Google’s argument that users can fact-check, the court noted that almost nobody checks the links; indeed, the whole point of AI Overviews is to save users the trouble of reading through numerous links. Furthermore, the fact that users can theoretically check AI Overviews statements does not shield Google from liability, just as newspapers can be held liable for false headlines that might be contradicted by the actual news stories.

As for financial consequences, Google was ordered to pay 80% of the legal costs and to pay penalties of up to €250,000 per violation for future false statements about these publishers. Beyond this cease-and-desist defeat, Google is also now exposed to civil lawsuits based on the financial and reputational harms caused by AI Overviews.

The same logic clearly applies to medical, financial, legal, and other advice offered by ChatGPT, Gemini, Claude, and other LLMs. If this ruling survives Google’s appeal to higher courts, LLM developers may be forced to severely restrict the scope of LLM answers, or shut them down entirely throughout the European Union and perhaps the entire world.