Mind Matters Natural and Artificial Intelligence News and Analysis

Hidden watermarks may help detect AI-written texts

Share
Facebook
Twitter/X
LinkedIn
Flipboard
Print
Email

At Nature, Elizabeth Gibney reports on a new technique, developed by Google’s DeepMind group, that invisibly labels AI-generated text produced by chatbots. If it works, it would help reduce cheating using AI and fake news. But also, it would provide a means to avoid retraining chatbots on their own outputs, leading to model collapse (the “jackrabbits” problem). They could be fed only original human texts.

It is harder to apply a watermark to text than to images, because word choice is essentially the only variable that can be altered. DeepMind’s watermark — called SynthID-Text — alters which words the model selects in a secret, but formulaic way that can be detected with a cryptographic key. Compared with other approaches, DeepMind’s watermark is marginally easier to detect, and applying it does not slow down text generation. “It seems to outperform schemes of the competitors for watermarking LLMs,” says [Zakhar] Shumaylov, who is a former collaborator and brother of one of the study’s authors.

“Google unveils invisible ‘watermark’” for AI-generated text,” October 23, 2024

Of course, the system can be hacked, evaded, or used fraudulently

Governments are betting on watermarking as a solution to the proliferation of AI-generated text. Yet, problems abound, including getting developers to commit to using watermarks, and to coordinate their approaches. And earlier this year, researchers at the Swiss Federal Institute of Technology in Zurich showed that any watermark is vulnerable to being removed, called ‘scrubbing’, or to being ‘spoofed’, the process of applying watermarks to text to give the false impression that it is AI-generated. “Invisible ‘watermark’

Consider, if watermarks were removed from AI text and generously applied to original text to “give the false impression that it is AI-generated,” a saboteur could vandalize a chatbot by training it on materials that degrade its output.

Like they say, a better mousetrap breeds a smarter mouse. The paper is open access.

You may also wish to read: Artists strike back!: New tool “poisons” images pirated by AI. Nightshade, developed by University of Chicago computer science prof Ben Zhao, makes the AI generator keep giving you a cat when you ask for a dog. Overall, Nightshade may prove more useful than lawsuits to artists. It is embedded in pixels, visible only to AI, not to humans.


Hidden watermarks may help detect AI-written texts