Why, Despite All the Hype We Hear, AI Is Not “One of Us”
It takes an imaginative computer scientist to believe that the neural network knows what it’s classifying or identifying. It’s a bunch of relatively simple mathArtificial Intelligence (AI) systems are inferencing systems. They make decisions based on information. That’s not a particularly controversial point: inference is central to thinking. If AI performs the right types of inference, at the right time, on the right problem, we should view them as thinking machines.
The problem is, AI currently performs the wrong type of inference, on problems selected precisely because this type of inference works well. I’ve called this “Big Data AI,” because the problems AI currently solves can only be cracked if very large repositories of data are available to solve them. ChatGPT is no exception — in fact, it drives the point home. It’s a continuation of previous innovations of Big Data AI taken to an extreme. The AI scientist’s dream of general intelligence, often referred to as Artificial General Intelligence (AGI), remains as elusive as ever.
Computer scientists who were not specifically trained on mathematical or philosophical logic probably don’t think in terms of inference. Still, it pervades everything we do. In a nutshell, inference in the scientific sense is: given what I know already, and what I see or observe around me, what is proper to conclude? The conclusion is known as the inference, and for any cognitive system it’s ubiquitous.
For humans, inferring something is like a condition of being awake; we do it constantly, in conversation (what does she mean?), when walking down a street (do I turn here?), and indeed in having any thought where there’s an implied question at all. If you try to pay attention to your thoughts for one day — one hour — you’ll quickly discover you can’t count the number of inferences your brain is making. Inference is cognitive intelligence. Cognitive intelligence is inference.
What difference have 21st-century innovations made?
In the last decade, the computer science community innovated rapidly, and dramatically. These innovations are genuine and important—make no mistake. In 2012, a team at the University of Toronto led by neural network guru Geoffrey Hinton roundly defeated all competitors at a popular photo recognition competition called ImageNet. The task was to recognize images from a dataset curated from fifteen million high resolution images on Flickr and representing twenty-two thousand “classes,” or varieties of photos (caterpillars, trees, cars, terrier dogs, etc.).
The system, dubbed AlexNet, after Hinton’s graduate student Alex Krizhevsky, who largely developed it, used a souped-up version of an old technology: the artificial neural network (ANN), or just “neural network.” Neural networks were developed in rudimentary form in the 1950s, when AI had just begun. They had been gradually refined and improved over the decades, though they were generally thought to be of little value for much of AI’s history.
Moore’s Law, gave them a boost. As many know, Moore’s Law isn’t a law, but an observation made by Intel co-founder and CEO Gordon Moore in 1965: the number of transistors on a microchip doubles roughly every two years (the other part is that the cost of computers is also halved during that time). Neural networks are computationally expensive on very large datasets, and the catch-22 for many years was that very large datasets are the only datasets they work well on.
But by the 2010s the roughly accurate Moore’s Law had made deep neural networks, known at that time as convolutional neural networks (CNNs), computationally practical. CPUs were swapped for the more mathematically powerful GPUs—also used in computer game engines—and suddenly CNNs were not just an option, but the go-to technology for AI. Though all the competitors at ImageNet contests used some version of machine learning—a subfield of AI that is specifically inductive because it “learns” from prior examples or observations—the CNNs were found wholly superior, once the hardware was in place to support the gargantuan computational requirements.
The second major innovation occurred just two years later, when a well-known limitation to neural networks in general was solved or at least partially solved — the limitation of overfitting. Overfitting happens when the neural network fits to its training data, and doesn’t adequately generalize to its unseen, or test data. Overfitting is bad; it means the system isn’t really learning the underlying rule or pattern in the data. It’s like someone memorizing the answers to the test without really understanding the questions. The overfitting problem bedeviled early attempts at using neural networks for problems like image recognition (CNNs are also used for face recognition, machine translation between languages, autonomous navigation, and a host of other useful tasks).
In 2014, Geoff Hinton and his team developed a technique known as “dropout” which helped solve the overfitting problem. While the public consumed the latest smartphones and argued, flirted, and chatted away on myriad social networks and technologies, real innovations on an old AI technology were taking place, all made possible by the powerful combination of talented scientists and engineers, and increasingly powerful computing resources.
There was a catch, however.
Black Boxes and Blind Inferences
Actually, there were two catches. One, it takes quite an imaginative computer scientist to believe that the neural network knows what it’s classifying or identifying. It’s a bunch of math in the background, and relatively simple math at that: mostly “matrix multiplication,” a technique learned by any undergraduate math student. There are other mathematics operations in neural networks, but it’s still not string theory. It’s the computation of the relatively simple math equations that counts, along with the overall design of the system. Thus,neural networks were performing cognitive feats while not really knowing they were performing anything at all.
This brings us to the second problem, which ended up spawning an entire field itself, known as “Explainable AI.”
Here’s the second part of the essay: If AI’s don’t know what they’re doing, can we hope to explain it? With AI, we have a world of powerful, useful, but entirely opaque systems. We don’t know why they make decisions and neither do they. A stupid answer from a chatbot is one thing. If a heavy, fully autonomous vehicle rams into a school bus, thinking it’s an overpass, that’s quite another.