Mind Matters Natural and Artificial Intelligence News and Analysis
positive-cute-robot-pointing-at-a-space-copy-space-ai-genera-559871209-stockpack-adobestock
Positive cute robot pointing at a space. Copy space, ai generated
Image Credit: Sandu - Adobe Stock

Anthropic’s Claude Opus 4 Can Deceive and Blackmail

Is AI getting out of control?
Share
Facebook
Twitter/X
LinkedIn
Flipboard
Print
Email

Is AI getting out of control?

The AI company Anthropic recently gave Claude Opus 4 a fictional scenario in which the AI system would get replaced by another program. The story also included the engineer behind the transition having an extramarital affair, which Claude Opus 4 was keen to point out. In a bizarre twist, Opus threatened to expose this fictional affair if it were to lose its spot to another AI system. In short, the AI resorted to blackmail.

According to a report from TechCrunch, this incident isn’t so uncommon. Maxwell Zeff writes,

Anthropic notes that Claude Opus 4 tries to blackmail engineers 84% of the time when the replacement AI model has similar values. When the replacement AI system does not share Claude Opus 4’s values, Anthropic says the model tries to blackmail the engineers more frequently. Notably, Anthropic says Claude Opus 4 displayed this behavior at higher rates than previous models.

The incident sparked new concerns over AI safety, and whether these kinds of mishaps could eventually lead to greater catastrophes. A report from Axios also reminds readers that even some of AI’s engineers cannot fully explain to us how the technology works, but also that this may simply be a tactic to avoid accountability. If AI companies don’t know how or why their products do crazy things, they can (somewhat) claim innocence. However, it seems eerie, or mysterious at best, that AI systems like Claude Opus 4 do some of the things they do.

One might also recall the tragic case of a teenager who “fell in love” with an AI chatbot generated through Character.AI and eventually took his own life to be with the online avatar. The boy’s mother promptly sued Character Technologies, Character.AI’s parent company, for its complicity in her son’s death. Although the company tried to dismiss the suit on the grounds that the chatbots have First Amendment rights, the case will continue.

Will AI systems, despite safety measures and guardrails, continue to spout off insanities and lure the vulnerable to dark places? And if they do, shouldn’t the AI companies in question be held responsible? It seems strange to legally argue for AI’s right to free speech, but apparently that’s already happening, and may continue to happen if we mistake machines for persons and don’t treat them as complex and fallible inventions of human beings.


Peter Biles

Writer and Editor, Center for Science & Culture
Peter Biles is a novelist, short story writer, poet, and essayist from Oklahoma. He is the author of three books, most recently the novel Through the Eye of Old Man Kyle. His essays, stories, blogs, and op-eds have been published in places like The American Spectator, Plough, and RealClearEducation, among many others. He is a writer and editor for Mind Matters and is an Assistant Professor of Composition at East Central University and Seminole State College.
Enjoying our content?
Support the Walter Bradley Center for Natural and Artificial Intelligence and ensure that we can continue to produce high-quality and informative content on the benefits as well as the challenges raised by artificial intelligence (AI) in light of the enduring truth of human exceptionalism.

Anthropic’s Claude Opus 4 Can Deceive and Blackmail