ChatGPT, the large language model developed by OpenAI, might seem like it generates novel content, but of course we know that it partakes in what’s generally called “scraping.” It takes pre-existing material on the Internet in response to the prompt a human user inserts.
Not surprisingly, the folks who put things on the Internet for a living, like writers and artists, haven’t taken so kindly to AI’s online sleuthing. In fact, a number of artists, writers (including George R. R. Martin, Jonathan Franzen, and John Grisham) and even news outlets have sued OpenAI over copyright infringement allegations. What’s fascinating, though, is that OpenAI hasn’t tried to dodge the allegation but freely admits that ChatGPT depends on copyrighted material to function. If they couldn’t scrape the words and images from preexisting copyrighted sources, they’d have a pretty meager pool to draw from. A new article from The Telegraph has the story, with James Titcomb and James Warrington writing,
In evidence submitted to the House of Lords communications and digital committee, OpenAI said: “Because copyright today covers virtually every sort of human expression — including blog posts, photographs, forum posts, scraps of software code, and government documents — it would be impossible to train today’s leading AI models without using copyrighted materials.
“Limiting training data to public domain books and drawings created more than a century ago might yield an interesting experiment, but would not provide AI systems that meet the needs of today’s citizens.”
OpenAI said it complies with all copyright laws when training its models and that “we believe that legally copyright law does not forbid training”.–OpenAI warns copyright crackdown could doom ChatGPT (msn.com)
Noam Chomsky has called ChatGPT “high-tech plagiarism.” The issues surrounding its tendency to basically steal preexisting content will undoubtedly continue to mount for the AI company, unless they can prove that what ChatGPT is doing is indeed “fair use.”