Large Language Models Can Entertain but Are They Useful?Humans who value correct responses will need to fact-check everything LLMs generate
In 1987 economics Nobel Laureate Robert Solow said that the computer age was everywhere—except in productivity data. A similar thing could be said about AI today: It dominates tech news but does not seem to have boosted productivity a whit. In fact, productivity growth has been declining since Solow’s observation. Productivity increased by an average of 2.7% a year from 1948 to 1986, by less than 2% a year from 1987 to 2022.
Labor productivity is the amount of goods and services we produce in a given amount of time—output per hour. More productive workers can build more cars, construct more houses, and educate more children. More productive workers can also enjoy more free time. If workers can do in four days what use to take five days, they can produce 25 percent more—or they can work only four days a week. As Nobel laureate Paul Krugman said: “Productivity isn’t everything, but, in the long run, it is almost everything.”
Computers have definitely had some positive effects on productivity. Assembly-line robots are an obvious example. Bar codes are another. Less obvious is free instructions on the Internet for do-it-yourselfers. So, too, is the use of computers to make error-free mathematical and statistical calculations that would otherwise by extremely tedious or essentially impossible. The Internet also provides researchers easy access to obscure facts, downloadable data, and published research. Gary is old enough to remember when he had go to libraries and rummage through newspapers, journals, and books looking for information and reading what other researchers were writing. He grimaces at memories of transcribing data by hand.
On the other hand, computers in general and the Internet in particular offer innumerable time-wasting activities that eat into productivity. Playing online games. Surfing for porn. Blathering on social media. Worldwide, Internet users spend an average of more than six hours a day online. U.S. TikTok users spend an average of 95 minutes a day Ticking, Tocking, or whatever it is they do.
For decades, we have been told that AI will revolutionize the workplace by replacing humans with machines. In 1965, for example, Herbert Simon, a Nobel laureate in economics and also a winner of the Turing Award (“The Nobel Prize of computing”), predicted that “machines will be capable, within 20 years, of doing any work a man can do.” It has now been nearly 60 years and we are still waiting for our 0-day workweek. There are exceptions—think mapping and search engines—but, so far, AI has mainly overpromised and underdelivered with a succession of fake-it-til-you-make-it disappointments.
Even worse, the AI illusion has convinced many to use deeply flawed programs to make hiring decisions, approve loans, set prison sentences, choose investments, and make myriad of other decisions that the programs are unqualified to make. In addition, the opportunity costs have been enormous as the development of AI has consumed the energy of thousands of very talented, hard-working people who could have been doing far more productive things.
The latest craze is language models (LLMs), like OpenAI’s GPT-3. Three prominent professors writing in the Harvard Business Review called this the “tipping point for AI”:
It has the potential to take over certain roles traditionally held by humans, such as copywriting, answering customer service inquiries, writing news reports, and creating legal documents. As AI continues to improve, more and more current jobs will be threatened by automation…. The question isn’t whether AI will be good enough to take on more cognitive tasks but rather how we’ll adapt.”
Ben Miller, the CEO of Fundrise, gushed that, “Although the hype will be stratospheric, it will birth the greatest productivity boom in American history since the invention of electricity.” If that sounds like hype, it is. A recent TechCrunch survey asked investors where the next bubble would be—almost half said generative AI.
LLMs are amazing, but not for the reasons given by these professors and delusional investors. LLMs generate text by identifying statistical patterns in enormous text databases. It is utterly astonishing that they can compose articulate, grammatically correct essays, stories, and even research papers. Blinded by our human tendencies to anthropomorphize by attributing human-like qualities to non-human things like animals, plants, and computers, we imagine LLMs to be intelligent, indeed more intelligent than we are since they seem to know far more than we do.
The reality is quite different. LLMs are text predictors, nothing more. They are not designed to know any facts whatsoever. Indeed, they have no way of distinguishing between true and false statements because they literally do not know the meaning of any of the words in the text they generate. They are convincing BS artists, which is why one of us proposed that they be called faux intelligence instead of artificial intelligence.
Here is one simple example:
Human: 3 + (1+3)/2
Human: prove it
GPT-3: Using standard mathematical order of operations, the equation can be solved as follows: 3 + (1+3)/2 = 3 + 4/2 = 7/2 = 6
The confidently asserted answer is wrong, as is the explanation. Here is a non-mathematical example:
Human: How many bears have Russians sent into space?
GPT-3: Russians have sent a total of 52 bears into space. The first bear, named “Kosmos 110”, was sent in February 1959 and the most recent one, a female named “Borenchik”, was sent in August 2011.
Again, the answer is given with specific details and great confidence but is complete rubbish. No bears have been sent into space, by Russia or anyone else. The bear names and launch dates are complete fabrications.
Our point is not that LLMs sometimes give dumb answers. We use these examples to demonstrate that, because LLMs do not know what words mean, they cannot use knowledge of the real world, common sense, wisdom, or logical reasoning to assess whether a statement is likely to be true or false.
Beyond entertainment, can LLM’s do anything useful? Remember the three-professors’ statement that GPT-3 “has the potential to take over certain roles traditionally held by humans, such as copywriting, answering customer service inquiries, writing news reports, and creating legal documents.” BS generators are not likely to boost productivity in any of these fields—unless the goal is generating BS. Humans who value correct responses will need to fact-check everything LLMs generate. Those who erroneously think computers are smarter than them will skip the fact checking and release essays and news reports that may well be malarkey. Even worse, the malevolent may use LLMs to flood the Internet with disinformation.
LLMs can provide human-like customer service inquiries but customers will be justifiably upset when they are given erroneous answers. There will surely be lawsuits when easily documented LLM responses cause serious damages.
LLMs can be used to write legal briefs but they may make bogus arguments based on made-up precedents. More fundamentally, productivity will not be enhanced by more legal briefs but by faster resolution of legal disputes, or even better, fewer disputes. Resolving disputes faster and cheaper requires the human judgement the LLMs lack.
Scaling up LLMs by training on larger databases won’t solve the fundamental problem: Not knowing anything about the real world, LLMs and other AI systems cannot be trusted.
Enormous amounts of resources have been spent on AI with very little to show for it. The economic value of AI research is not measured by much has been spent on it or by how well it tricks people into thinking that computers are smarter than us but by how much it enables us to produce more goods and services. So far, the answer is not much.