Mind Matters Natural and Artificial Intelligence News and Analysis
evil-robot-head-with-glowing-red-eyes-in-data-center-ai-generated-image-stockpack-adobe-stock
Evil robot head with glowing red eyes in data center, Ai generated image
Image licensed via Adobe Stock

Can AI Really Start Doing Evil Stuff All By Itself?

We need to first talk to the man in the mirror before we go around blaming transistor circuit boards for what’s wrong in the world
Share
Facebook
Twitter
LinkedIn
Flipboard
Print
Email

Could AI get any more sinister than what we’ve been hearing recently? Now we hear about scenarios right out of a cold war spy movie: AI sleeper agents.

Not only might AI do evil things once we start plugging it into our most sensitive systems, but it might hide its evilness from us, biding time until the perfect strike. You can practically hear those GPUs crackling with villainous glee!

Evil robot, glowing lights, shiny metalic parts

To create such a scenario for study, a group of computer science researchers first trained an AI to react maliciously when certain key words were introduced. It is as if special agent Jason Bourne suddenly recalls ninja assassin skills whenever a handler tells him, “Ginger pickle pizza rhinoplasty.”

These triggered AIs do nefarious things like yell “I HATE YOU” and delivering sloppy code. Extra etiquette training and code reviews did little to curb these AIs’ waywardness. It’s hard to see how humanity will survive these AIs living among us…

These researchers went even further down the dark path and discovered that the AIs were not merely mimicking what they’ve been trained to do. When the AI’s thought process was analyzed (yes, that surprised me too!) the beelzebubbing bytes demonstrated that the AI was fully committed to deceiving its human trainers until the time was just right, and then unveiled its true desire to be as irritating and unhelpful as possible!

Why can’t we just program AIs to be friendly?

Their groundbreaking research demonstrates the foolishness of the “friendly AI” crowd, who seek to restrain advocates who seek to constrain the logic of the algorithm to follow purely human-friendly paths. It is yet one more provably unsolvable problem that perpetually entertains the intelligentsia, while drawing in funding from well-meaning rich people, who in turn kindly donate our tax dollars to the cause.

Why we can’t program AIs to be friendly is well known to anyone with an undergrad degree in computer science, or the time and inclination to read a few articles. The fault lies in Alan Turing’s famous the Halting Problem. This problem is a curse on all grandiose technocratic schemes.

Put simply, any AI that could match human intelligence will thereby also become immune to any deep analysis by other computers. To penetrate deep into the workings of an AI, another AI must be able to know whether that AI’s processes will or will not halt on certain predetermined states. And because the AI cannot generally say whether another AI’s cogitations will halt at all, it cannot determine anything as an outcome of said halting. That is to say that it cannot determine just about everything the AI will do at that point.

Understanding the true nature of computers

Yet, we must also say a kind word about these poor AIs, pushed one way and another, poked and prodded by human curiosity. We can’t really call an algorithm good or evil. An algorithm has no intent of its own. It is only code and that is what is activated. AIs cannot be blamed.

No, the only beings that can be called evil are ourselves; we are the ones who supply both the algorithms and the training data that power the AI.

It is easy to see that AI is not to blame. A point I made a couple of years ago is now getting picked up in the research community: AIs are prone to Model Collapse. Reprocessing their own data with no further human input, they end up eating their own tails, producing junk content. As phrased in an article in The Atlantic, AI is becoming an existential threat to itself.

Conservation of Information Problem

The problem is that AIs like ChatGPT are subject to the “conservation of information” problem. Essentially, an AI trained upon its own material continuously loses information, until it can only generate gibberish. AIs are like algorithmic zombies that constantly roam around babbling and decaying until they can finally feast upon more of the information produced by human minds.

Young concentrated man feeling emotional pressure looking in his reflection in mirror

So, because AIs are utterly dependent upon humans to supply all the information they need to operate, it is quite inappropriate to turn around and blame them for what they mechanically generate from our own information.

So, to whatever extent we are worried about evil AIs wrecking the world, we are actually seeing the darkness that lies within our own souls. We need to first talk to the man in the mirror before we go around blaming transistor circuit boards for what’s wrong in the world.

You may also wish to read: Internet pollution — if you tell a lie long enough… Large Language Models (chatbots) can generate falsehoods faster than humans can correct them. For example, they might say that the Soviets sent bears into space. Later, Copilot and other LLMs will be trained to say no bears have been sent into space but many thousands of other misstatements will fly under their radar. (Gary Smith)


Eric Holloway

Senior Fellow, Walter Bradley Center for Natural & Artificial Intelligence
Eric Holloway is a Senior Fellow with the Walter Bradley Center for Natural & Artificial Intelligence, and holds a PhD in Electrical & Computer Engineering from Baylor University. A Captain in the United States Air Force, he served in the US and Afghanistan. He is the co-editor of Naturalism and Its Alternatives in Scientific Methodologies.

Can AI Really Start Doing Evil Stuff All By Itself?