Anthropic’s Claude Opus 4 Can Deceive and Blackmail
Is AI getting out of control?Is AI getting out of control?
The AI company Anthropic recently gave Claude Opus 4 a fictional scenario in which the AI system would get replaced by another program. The story also included the engineer behind the transition having an extramarital affair, which Claude Opus 4 was keen to point out. In a bizarre twist, Opus threatened to expose this fictional affair if it were to lose its spot to another AI system. In short, the AI resorted to blackmail.
According to a report from TechCrunch, this incident isn’t so uncommon. Maxwell Zeff writes,
Anthropic notes that Claude Opus 4 tries to blackmail engineers 84% of the time when the replacement AI model has similar values. When the replacement AI system does not share Claude Opus 4’s values, Anthropic says the model tries to blackmail the engineers more frequently. Notably, Anthropic says Claude Opus 4 displayed this behavior at higher rates than previous models.
The incident sparked new concerns over AI safety, and whether these kinds of mishaps could eventually lead to greater catastrophes. A report from Axios also reminds readers that even some of AI’s engineers cannot fully explain to us how the technology works, but also that this may simply be a tactic to avoid accountability. If AI companies don’t know how or why their products do crazy things, they can (somewhat) claim innocence. However, it seems eerie, or mysterious at best, that AI systems like Claude Opus 4 do some of the things they do.
One might also recall the tragic case of a teenager who “fell in love” with an AI chatbot generated through Character.AI and eventually took his own life to be with the online avatar. The boy’s mother promptly sued Character Technologies, Character.AI’s parent company, for its complicity in her son’s death. Although the company tried to dismiss the suit on the grounds that the chatbots have First Amendment rights, the case will continue.
Will AI systems, despite safety measures and guardrails, continue to spout off insanities and lure the vulnerable to dark places? And if they do, shouldn’t the AI companies in question be held responsible? It seems strange to legally argue for AI’s right to free speech, but apparently that’s already happening, and may continue to happen if we mistake machines for persons and don’t treat them as complex and fallible inventions of human beings.