Deep Learning is an approach to computer programming that attempts to mimic the human brain (artificial neural networks) so as to enable systems to cluster data and make accurate predictions (IBM). It’s the dominant AI system today, used to predict how proteins fold and analyse medical scans as well as to beat humans at Go. And yet, four Deep Learning researchers recently wrote in IEEE Spectrum that “The cost of improvement is becoming unsustainable.”
As part of their special report, “The Great AI Reckoning,”they explain:
While deep learning’s rise may have been meteoric, its future may be bumpy. Like Rosenblatt before them, today’s deep-learning researchers are nearing the frontier of what their tools can achieve. To understand why this will reshape machine learning, you must first understand why deep learning has been so successful and what it costs to keep it that way.Neil C. Thompson, Kristjan Greenewald, Keeheon Lee, Gabriel F. Manso, “Deep Learning’s Diminishing Returns” at IEEE Spectrum (September 24, 2021)
The first artificial neural network was developed by Frank Rosenblatt at Cornell in 1958; he described it as a “pattern-recognizing device. But the computers of his day did not have nearly enough power to make the approach practical. As Moore’s Law gradually multiplied the number of computations per second by 10 million, the field later became known as “Deep Learning.” We began to hear of impressive feats like wins over human champions at chess, Go, and poker.
But no such trend continues to infinity. Here is the key problem the researchers address in their thought-provoking piece: Deep Learning works better than previous artificial intelligence systems because it is much more flexible than traditional rules-based calculation (expert systems) and can thus be applied to many domains. But its almost unlimited flexibility depends on the ability to process a huge amount of data. That means using much more power.
To multiply the efficiency of current systems by 10 would require multiplying computing power 10,000 times. But with what environment result?:
Extrapolating the gains of recent years might suggest that by 2025 the error level in the best deep-learning systems designed for recognizing objects in the ImageNet data set should be reduced to just 5 percent. But the computing resources and energy required to train such a future system would be enormous, leading to the emission of as much carbon dioxide as New York City generates in one month.Neil C. Thompson, Kristjan Greenewald, Keeheon Lee, Gabriel F. Manso, “Deep Learning’s Diminishing Returns” at IEEE Spectrum (September 24, 2021)
The researchers assembled data from over a thousand research papers on Deep Learning, comprising image classification, object detection, question answering, named-entity recognition, and machine translation. Focusing for the present on image classification, they report,
Over the years, reducing image-classification errors has come with an enormous expansion in computational burden. For example, in 2012 AlexNet, the model that first showed the power of training deep-learning systems on graphics processing units (GPUs), was trained for five to six days using two GPUs. By 2018, another model, NASNet-A, had cut the error rate of AlexNet in half, but it used more than 1,000 times as much computing to achieve this.Neil C. Thompson, Kristjan Greenewald, Keeheon Lee, Gabriel F. Manso, “Deep Learning’s Diminishing Returns” at IEEE Spectrum (September 24, 2021)
Interestingly, the power burden turned out to be much higher in practice than in theory: “Theory tells us that computing needs to scale with at least the fourth power of the improvement in performance. In practice, the actual requirements have scaled with at least the ninth power.”
“This ninth power means that to halve the error rate, you can expect to need more than 500 times the computational resources. That’s a devastatingly high price.” They hope that “there are still undiscovered algorithmic improvements that could greatly improve the efficiency of deep learning.”
Now, here’s the problem: Moore’s Law, which promises massive, continual increases in computing power, doesn’t address the problem of how the energy that produces the increase is to be generated: “Of the 1,000-fold difference in the computing used by AlexNet and NASNet-A, only a six-fold improvement came from better hardware; the rest came from using more processors or running them longer, incurring higher costs.”
Other stark statistics offered: Reaching a mere 5 percent error rate in image recognition would mean 1019 billion floating-point operations at a cost of US$100 billion. Getting it down to 1% would be “considerably worse.”
The hope is that new ideas and new technology will enable continued advances at more reasonable power costs. Current strategies include
● processors designed explicitly for for efficient Deep Learning calculations.
● smaller neural networks (They lower the computation cost but increase the training cost.)
● meta-learning The system learns in one area and applies the result to many others (learns on dogs in cages and applies the learning to cats in cages). However, meta-learning has proven difficult: “even the simple task of recognizing the same objects in different poses causes the accuracy of the system to be nearly halved.”
The researchers think that, absent a technical breakthrough, “the pendulum will likely swing back toward relying more on experts to identify what needs to be learned.” The power costs of the human brain turn out to be negligible compared with those of Deep Learning — and in any event, brain fuel is organic.
You may also wish to read:
The brain exceeds the most powerful computers in efficiency. Human thinking takes vastly less computational effort to arrive at the same conclusions. s, AlphaGo Zero would need to be 100 million times more efficient (a factor of about 100 million for improvement in CPU cycles) in order for AI to exceed human performance on an equivalent task. (Eric Holloway)
The search for the universal algorithm continues Why does machine learning always seem to be rounding a corner, only to eventually hit a wall? Universal algorithms are limited by the axioms supplied to them, which is why universal algorithms are just not possible. (Jonathan Bartlett)