To younger generations who grew up on the web, it may come as a surprise that Big Data AI—the AIs trained to personalize newsfeeds, recognize friends and faces, and more recently converse with us using large language models like GPT—is but one approach to artificial intelligence. It’s also ancient, at least by the standards of the field. Neural networks (technically, “Artificial Neural Networks,” or ANNs) appeared as early as the 1940s and were usable for simple tasks in the 1950s. Then, they disappeared for most of the 1960s and 70s. An important innovation known as “backpropagation” appeared in the 1980s, but back then there weren’t huge volumes of data to train networks. They fell back out of favor, as rule-based approaches like Japan’s “Fifth Generation Computer Systems Project” dominated headlines and sucked up available government and corporate funding.
Neural networks — now “deep learning” — rose again in the 2010s, when Moore’s Law — computers double in power roughly every two years — put deep learning networks on steroids. Suddenly, computers were powerful enough to crunch massive datasets of text or images. These data sources are ubiquitous and largely free on the web. This connection between computing power, gobs of data and AI now seems sacrosanct, but AI scientists know (or should know) that it’s the result of opportunistic and contingent choices.
The Limits of Induction
Big Data AI (data-driven AI) relies on a type of inference known as induction. Cognitive scientists understand that induction is hopeless as a complete model of human thinking. We also use deduction, or rule-based inference, and we use a kind of inference from observed events back to plausible causes, a lesser known but ubiquitous inference known as “abduction.” All three are required for truly human-level AI. AI scientists today aren’t working on the other two anymore (there are a few notable exceptions), so we already know that there will be limits to what can be accomplished under the banner of “Big Data AI.”
Large language models like GPT and their applications (like ChatGPT) are no exception. Though ChatGPT is a legitimate innovation in the field (it’s based on a somewhat groundbreaking 2017 paper called “Attention is All You Need”), it relies on crunching a huge swath of the textual content of the web. The latest version of ChatGPT uses a large language model called GPT-4, which has 170 trillion parameters, in essence independent computations When “decoding” the GPT model (generating the next word to complete a prompt), the 170 trillion parameters which make the system flexible and powerful pose a daunting computation challenge — parameters are variables, and the system must find and adjust the values (weights) in them, all 170 trillion of them.
We’re running out of computing power to do this, and we’re running out of data. According to some estimates the training costs for larger models may exceed a billion dollars by 2026, and as the Economist has pointed out, this assumes we won’t run out of trainable data on the web first. Other estimates forecast that high-quality textual training data may be exhausted around the same time. Tellingly, OpenAI’s head Sam Altman struck a cautionary tone regarding giant-sized AI: “I think we’re at the end of an era.” Indeed. We may not see a GPT-5, let alone a GPT-100.
Large language models are just one application of AI. They are plagued with scaling problems as just noted, but they are correctly recognized as an exciting result for the sometimes-stagnant field. Other important areas show no improvement, like self-driving cars. In 2016 excitement about imminent Level 5, or fully automatic, cars reached fever pitch. By the early 2020s the media frenzy was over. What happened?
Unlike ChatGPT, self-driving car performance doesn’t depend on hoovering up words on the finite web. Driving is open-ended, because it happens out in the wild, in the natural world not organized by HTML links. The litany of weird events and problems that can go wrong for self-driving car systems simply never ends. A short list: partially occluded (damaged) speed limit signs interpreted as stop signs (or worse, the reverse), debris on the road, different animals darting in front of the car (does a driver react the same to a dog, deer, or chicken?), changing weather conditions, day or nighttime conditions changing visibility, and identifying other vehicles and infrastructure correctly, all in real time. These are called edge cases, and they are, as Forbes put it in 2021, the “long tail of doom” for fully autonomous driving so far. It’s doubtful they can all be fixed with more data. It’s unclear if they have computational fixes at all. ChatGPT can do nothing to help — it can’t be removed from the web. We won’t see self-driving cars anytime soon. Where are the novel ideas to unleash new self-driving Tesla’s on city streets and country roads? (For that matter, where are the ideas to unleash Rosie the Robot?)
The Commonsense Knowledge Problem
True progress on AI — let alone human progress — means moving beyond induction and data analysis, an approach that is now over a decade old and saturating. Scientists and researchers in the field must start taking the “commonsense knowledge problem” seriously, where meaningful concepts and a grasp of cause and effect replace monster-truck data crunching. Workable models of commonsense or causal reasoning are almost non-existent in the field, in large part because the problems are difficult and seemingly intractable. Causation is the world out there — walking outside, we don’t experience correlations in data, we experience events causing other events in an endless entangled web of dynamic change. Encouraging diverse approaches — many flowers to bloom — is long overdue for AI systems that interact with the broader world, like self-driving cars and myriad applications in the field of robotics. Our current tunnel vision, year after year, prevents these discussions and changes from happening. Not everything is a web page.
Steve Jobs coined the phrase “bicycles for the mind” to capture the potential of personal computers (a term previously coined by Stewart Brand). Our “bicycles for the mind” now surveil us, wrest personal information from us, and use our data—crunched in centralized server farms — to manipulate us into clicking, or staying on a web site. If AI is to do better for us, it will have to abandon one-size-fits-all data approaches in favor of diverse competing methods, fostering individual initiative and creativity in scientists, researchers, and everyone else. New possibilities can emerge with fresh thinking.
Read more from Erik Larson at Colligo on Substack.