Stanford’s AI Index Report: How Much Is BS?Some measurements of AI’s economic impact sound like the metrics that fueled the dot-com bubble
The dot-com bubble (1995–2000) was fueled by wishful investors using novel metrics for the so-called New Economy to justify ever-higher stock prices. Instead of obsessing over something as old-fashioned as profits, investors counted a company’s sales, spending, and web site visitors.
Companies responded by finding creative ways to give investors the information they wanted. Investors want more sales? I’ll sell your company something and you sell it back to me. No profits for either of us but higher sales for both of us. Investors want more spending? Order another thousand Aeron chairs. Investors want more web site visitors? Give stuff away to people who visit our web site. Still no profits but more traffic.
One measure of traffic was eyeballs, the number of people who visited a page; another was the number of people who stayed for at least three minutes. An even more fanciful metric was hits, the number of files requested when a web page was downloaded from a server. Companies accommodated this nonsense by putting dozens of images on a page. Each image that was loaded from the server counted as a hit. Incredibly, investors thought this number signified something important.
And now we have the AI bubble, with plenty of hoopla and hype about how computers are taking over the world. “AI” was the Association of National Advertisers’ Marketing Word of the Year in 2017. In the rush to cash in on the buzz, companies have been labeling mundane algorithms as AI and advertising themselves as AI wizards when they have barely begun to think about something as basic as machine learning. Advertise first, build later.
And now we have fanciful measures of the triumph of AI, rivaling the far-fetched metrics of dot-com commerce. In December, Stanford University released the 2019 edition of its AI Index — a 290-page document with dozens of tables and more than 100 charts — which “tracks, collates, distills, and visualizes data relating to artificial intelligence.”
When the AI Index was launched in 2017, a Stanford news story boasted that it “will provide a comprehensive baseline on the state of artificial intelligence and measure technological progress in the same way the gross domestic product and the S&P 500 index track the U.S. economy and the broader stock market.”
Nope. The federal government’s gross domestic product is an informative measure of the amount of goods and services produced each quarter. Divided by hours worked, we have a useful measure of productivity. The S&P 500 is a valuable measure of zigs and zags in the market value of the 500 stocks in the index. Stanford’s AI index is a well-intentioned hodgepodge of AI-related data. The index does not actually track the progress of AI but rather reports trends in data that are related to AI. It reminds us of the story of the scientist who calculated the “average” telephone number—to what purpose?
Consider these “2019 Report Highlights” in the eight areas covered:
- Research and Development. Between 1998 and 2018, the volume of peer-reviewed AI papers has grown by more than 300%.
- Conferences. In 2019, the largest AI conference, NeurIPS, expected 13,500 attendees, up 41% over 2018 and over 800% relative to 2012.
- Technical Performance. The time required to train a large image classification system on cloud infrastructure has fallen from about three hours in October 2017 to about 88 seconds in July 2019.
- The Economy. In the US, the share of jobs in AI-related areas increased from 0.26% of total jobs posted in 2010 to 1.32% in October 2019, with the highest share in Machine Learning (0.51% of total jobs).
- Education. At the graduate level, AI has rapidly become the most popular specialization among computer science PhD students in North America.
- Autonomous Systems. The total number of miles driven and total number of companies testing autonomous vehicles (AVs) in California has grown over seven-fold between 2015 and 2018.
- Public Perception. There is a significant increase in AI-related legislation in congressional records, committee reports, and legislative transcripts around the world.
- Societal Considerations. In over 3600 global news articles on ethics and AI identified between mid-2018 and mid-2019, the dominant topics are framework and guidelines on the ethical use of AI, data privacy, the use of face recognition, algorithm bias, and the role of big tech.
Many of these factoids are interesting — certainly more interesting than the average telephone number — but few truly assess the progress of AI. The value of AI is not measured by technical papers and conference attendance any more than was the value of the dot-com companies was measured by eyeballs and hits.
More meaningful would be an assessment of the impact of AI on productivity. Is AI improving productivity in areas where some successes have been identified, such as in advertising, e-commerce, and news? If not, why not? What are the challenges for AI in these and other, more complex areas such as accounting, law, engineering, and health care? Understanding the advances and challenges would truly provide valuable insights for companies, AI startups, universities, and policy makers.
One application that the Stanford report addresses is autonomous vehicles (self-driving cars), where success has consistently lagged behind hype. Enabling vehicles to interpret and react to the innumerable objects that vehicles encounter on roads, highways, and parking lots — and in every type of weather from glaring sun to falling snow — is far more complicated than identifying patterns in e-commerce or searching news stories.
The premature anticipation of genuinely autonomous vehicles reminds us of the excitement IBM’s Watson generated a few years ago, before the daunting challenges became clear. Once predicted to revolutionize health care, Watson’s failure is now a big red warning flag to those who gush about the imminent triumph of breakthrough technologies.
Defeating humans at checkers, chess, and Go is remarkable, but autonomous vehicles, Watson Health, and even the evaluation of loan and job applications are a lot harder than playing games or spell-checking words. Autonomous vehicles are flawless in the laboratory, flawed on real highways. Watson did very well in the artificial world of Jeopardy, but overpromised and underdelivered in the real world of health care. Likewise, algorithmic evaluations of borrowers and job seekers have been fraught with mistakes and discrimination.
Technologies always begin with simple cost-effective applications and then evolve towards more challenging ones as they improve. AI will be no different. Monitoring AI’s gradual evolution in early simple applications can provide valuable information to startups, investors, engineers, and policy makers. Just as the speed with which firms adopted word processing, spreadsheet, and presentation software in the late 1970s and early 1980s helped us foresee the adoption of enterprise software in subsequent years, understanding the speed at which AI diffuses in advertising, retail, and news will help us understand its speed of diffusion in accounting, legal, engineering applications, and (eventually) autonomous vehicles and health care.
Some agree that the AI bubble is reminiscent of the dot-com bubble but remind us that the Internet kept growing after the dot-com bubble popped. However, AI’s success cannot simply be assumed and its progress is not measured as easily as the growth of e-commerce in the 1990s, certainly not by Stanford’s AI index.
It is important to document the growth and evolution of AI applications so that we can understand the successes, challenges, and bottlenecks as it moves from simple applications to more complex ones. For instance, one potential bottleneck can be glimpsed in Stanford’s data on training exercises and computational power. Although the reductions in training times are a positive sign, one wonders how easily they will continue in the numerous applications that require more than the 93% precision that is used as a benchmark. Going from 80% to 90% is usually a lot easier than going from 90% to 99% and then from 99% to 99.9%.
Another important question is the extent to which continued increases in computational capacity are economically viable. The Stanford Index reports a 300,000-fold increase in capacity since 2012. But in the same month that the Report was issued, Jerome Pesenti, Facebook’s AI head, warned that “The rate of progress is not sustainable…If you look at top experiments, each year the cost is going up 10-fold. Right now, an experiment might be in seven figures but it’s not going to go to nine or 10 figures, it’s not possible, nobody can afford that.”
AI has feasted on low-hanging fruit, like search engines and board games. Now comes the hard part — distinguishing causal relationships from coincidences, making high-level decisions in the face of unfamiliar ambiguity, and matching the wisdom and commonsense that humans acquire by living in the real world. These are the capabilities that are needed in complex applications such as driverless vehicles, health care, accounting, law, and engineering.
Despite the hype, AI has had very little measurable effect on the economy. Yes, people spend a lot of time on social media and playing ultra-realistic video games. But does that boost or diminish productivity? Technology in general and AI in particular are supposed to be creating a new New Economy, where algorithms and robots do all our work for us, increasing productivity by unheard-of amounts. The reality has been the opposite. For decades, U.S. productivity grew by about 3% a year. Then, after 1970, it slowed to 1.5% a year, then 1%, now about 0.5%. Perhaps we are spending too much time on our smartphones.
AI has great potential, but we are a long way from realizing that potential.
If you enjoyed this piece by Jeffrey Funk and Gary Smith, you might also enjoy these recent pieces on what happens when AI meets the real world:
Why was IBM Watson a flop in medicine? Robert J. Marks and Gary S. Smith discuss how the AI couldn’t identify which information in the tsunami of medical literature actually mattered
If you think common sense is easy to acquire… Try teaching it to a state-of-the-art self-driving car. Start with snowmen.
How algorithms can seem racist. Machines don’t think. They work with piles of “data” from many sources. What could go wrong? Good thing someone asked!
Did AI teach itself to “not like” women?
Anti-plagiarism software good: Paper rejected for repeat citations: The scholar was obliged by discipline rules to cite the flagged information repetitively