Neural networks are all the rage in computing these days. Many engineers think that, with enough computer power and fancy tweaks, they will become as smart as people. Recent successes playing games and predicting protein folds pour gasoline on the AI fire. We could be on the edge of the mystical Singularity, when humans and computers will merge and we become immortal gods.
Let’s wind the clock back to the beginning of neural networks. In computer science terms, they are actually a very old technology. The earliest version, called a perceptron, (a single-layer neural network) was invented in the 1960s, inspired by McCulloch and Pitt’s early model of brain neurons. But, the perceptron was ignored for decades because Marvin Minsky (1927–2016) proved that it could not learn the simple XOR logic function.
The XOR function says that X and Y cannot both be true at the same time; you can’t have your cake and eat it. The poor perceptron’s problem is it can only solve half the problem. You can’t have your cake, but you can eat it. You can’t eat your cake, but you can have it. The perceptron doesn’t understand that you can’t have your cake AND eat it at the same time.
Researchers assumed at first that they just had to give the perceptron more nodes to learn the XOR. It turned out that the real problem is the perceptron’s learning algorithm. Humans can easily program the perceptron network to learn the XOR, but the perceptron learning algorithm is very, very slow for multiple nodes.
Completely impractical. So, Minsky’s research agenda with logic-based systems ruled the early days of AI as neural networks languished in the shadows.
So, why do perceptrons learn so slowly? The reason is that they are modeled on the activity of brain neurons.
Brain neurons operate on what is called the “all or nothing” principle. When enough charge is built up by the neuron’s synapses, it neuron fires. But until then, the neuron does absolutely nothing. The neuron can thus be seen as an on–off switch. It is either firing, or it is not. There is no in-between stage. This makes learning difficult.
To understand why the “all or nothing” principle makes learning difficult, think about playing the “hot or cold” search game where you are searching a room for a treasure. A helpful bystander says “Hotter!” as you get closer to the treasure and “Colder!” as you get farther from it. This signal is very effective in helping you locate the treasure.
But what if we tweak the game. Now, the bystander tells you only whether you have found the treasure or not. If you find the treasure, the bystander says “Yes.” If you did not find the treasure, the bystander says “No.” This new version will take quite a bit longer to play because the bystander is really not providing any information that speeds up the game.
We have the same situation with the “all or nothing” principle that the perceptron uses. When the perceptron makes a prediction, it is told that the prediction is either correct or incorrect. There is no indication as to whether the current prediction is more or less correct than an alternative. This makes it difficult for the perceptron to update itself to get closer to the correct answer. As the perceptron’s structure becomes more complex, the difficulty only increases.
This state of affairs changed entirely with the discovery of the backpropagation algorithm. The key insight is to stop trying to copy how brain neurons work. Instead of relying on the “all or nothing” principle, the perceptron gets a “hotter” or “colder” signal. The trick is to use a differentiable function to generate the perceptron output, instead of the simple threshold function used to simulate the “all or nothing” principle.
The great thing about using a differentiable function is that the “hotter” or “colder” signal can be passed through the rest of the nodes in the multilayer perceptron network and the amount of “heat” or “cold” contributed by each node and its connections can be precisely calculated. This allows for very precise updates to the multilayer perceptron, and makes efficient learning practical. Thus, the modern neural network is born.
At the same time, the umbilical cord between neural networks and the brain is severed. Practicality triumphed over biological reality.
This brings us to the point: the modern neural network is a better learning algorithm than the old perceptron because the modern neural network is a better learning algorithm than the brain. The brain neuron is limited by its “all or nothing” principle that makes rapid learning impossible. By contrast, the differentiable “hotter” or “colder” function used by neural networks enables programmers to algorithmically train networks with trillions of parameters.
Which raises the question: what if the human mind can learn better than a neural network?
We will look into that possibility shortly.
You may also wish to read: Artificial neural networks can show that the mind isn’t the brain. Because artificial neural networks are a better version of the brain, whatever neural networks cannot do, the brain cannot do. The human mind can do tasks that an artificial neural network (ANN) cannot. Because the brain works like an ANN, the mind cannot just be what the brain does. (Eric Holloway)