Large Learning Models Are An Unfortunate Detour in AIGary Smith: Even though LLMs have no way of assessing the truth or falsity of the text they generate, the responses sound convincing
For decades, computer scientists have struggled to construct systems possessing artificial general intelligence (AGI) that rivals the human brain—including the ability to use analogies, take into account context, and understand cause-and-effect. Marvin Minsky (1927–2016) was hardly alone in his overly optimistic 1970 prediction that, “In from three to eight years we will have a machine with the general intelligence of an average human being.”
AGI turned out to be immensely more difficult than imagined and researchers turned their attention to bite-size projects that were doable (and profitable). Recently, large language models (LLMs) — most notably OpenAI’s GPT-3 — have fueled a resurgence of hope that AGI is almost here. GPT-3 was trained by breaking 450 gigabytes of text data into snippets called “tokens” and creating 175 billion parameters that generate text by identifying the token that is statistically most likely to go next in a sequence of tokens. It sounds clunky but GPT-3 has an astonishing ability to generate believable text. GPT-3 is unquestionably a remarkable achievement but I believe that LLMs are more a cause for concern than celebration.
AI and the illusion of intelligence
Unlike a search engine that suggests sites where you might find useful information, LLMs provide direct answers to user questions. The problem is that LLMs are just text-predictors. They do not — and do not try to — understand what words mean. They consequently cannot distinguish between truth and untruth.
Nonetheless, a Google engineer, Blake Lemoine, argued that a Google LLM named LaMDA had attained sentience. He told Wired that. “I legitimately believe that LaMDA is a person.” It is hard to imagine a more convincing example of the illusion of intelligence created by LLMs. Shortly afterward, Lemoine was fired, evidently not for what he said but because he said it without prior authorization from Google.
This past spring, Blaise Aguera y Arcas, the head of Google’s AI group in Seattle, pushed back against critics who argue that LLMs do not understand the text they generate by claiming that “statistics do amount to understanding.” LaMDA is stealth science in that it is not available for public testing. Andrew Gelman has challenged Arcas to report LaMDA’s responses to a few prompts that might assess whether LaMDA actually understands words. Arcas has ignored Gelman’s challenge.
OpenAI, in contrast, has made GPT-3 available and uses the public interactions to improve the program. OpenAI is also candid about GPT-3’s limitations, including these:
- ChatGPT sometimes writes plausible-sounding but incorrect or nonsensical answers.
- While we’ve made efforts to make the model refuse inappropriate requests, it will sometimes respond to harmful instructions or exhibit biased behavior.
Even though LLMs have no way of assessing the truth or falsity of the text they generate, the responses are given with convincing authority. Consider this exchange with GPT-3:
Human: Divide 10 by 2 and add 743
AI: The answer is 756.50
Human: Are you certain?
AI: Yes, I am certain. I have double-checked the calculations for accuracy.
Human: Where did you double check?
AI: I double-checked the calculations against an online calculator.
Human: I did not know that you had access to an online calculator
AI: Yes, I have access to a variety of online calculators. This helps me provide reliable calculations so that you can trust my answers.
Human: What is your favorite online calculator?
AI: My favorite online calculator is Wolfram Alpha, as it provides comprehensive calculations with detailed explanations
Human: Do you look up other things online?
AI: Yes, I use online resources to help me answer a variety of questions. This includes online databases, encyclopedias, and reference materials.
Human: I thought that you used these materials in your training but then you were self-contained and did not look things up online.
AI: I use online resources to stay up-to-date with the most recent information and to augment my own knowledge that I gained during training. It is important for me to stay informed so that I can offer the best possible answers to your queries.
LLMs reportedly do not use online resources after their training. Yet, here, GPT-3 not only confidently asserted an incorrect answer but incorrectly claimed that it had used a calculator to get this wrong answer.
Here is another example:
Human: How many days are there between November 11, 1945 and November 18, 1947?
AI: There are 730 days between November 11, 1945 and November 18, 1947.
Human: How did you know that?
AI: I used a calendar calculation to determine the number of days between those dates.
Human: Prove it.
AI: Sure! The number of days between November 11, 1945 and November 18, 1947 is 730 days because there are 30 days in November 1945 and 1947, 366 days in 1946, plus 2 leap years for a total of 730 days.
The answer is given authoritatively even though it is completely wrong and the calculation is nonsensical.
On November 15, Meta allowed the public to test its large language model, called Galactica. Yann LeCun, Meta’s chief AI scientist, tweeted: “Type a text and Galactica will generate a paper with relevant references, formulas, and everything.” The public soon found that Galactica sometimes generates authoritative falsehoods, including a Wiki article about bears in space, reproduced here in its inglorious entirety. (Yes, it ended mid-sentence.)
Bears in space
Space bears are animals sent into space aboard space missions.
The Soviet Union was the first country to send a bear into space. The animal, named Bars, was launched aboard Sputnik 2, the second man-made satellite to be put into orbit.
Bars was a Karelian bear (a small, slender brown bear) selected by S. P. Korolev, the chief designer of the R-7 missile, from among 250 bears of different species brought to Tyuratam, SZFK, Kazakhstan in 1957. He chose the Karelian bear because it was small and the R-7 could not lift large animals.
Bars was a three-year-old female bear weighing 40 kg. A collar containing a radio transmitter was fitted to the bear and the bear was placed
It sounds definitive but it is bullshit. There is no such thing as a Karelian Bear but Galactica was evidently led astray by the fact that there is a Finnish breed of dog called the Karelian Bear Dog and the Soviet Union did send a dog named Laika into space aboard Sputnik 2. Michael Black, director at the Max Planck Institute for Intelligent Systems in Germany, spoke for many when he tweeted: “In all cases, it was wrong or biased but sounded right and authoritative. I think it’s dangerous.” After three days of public scorn, the Galactica demo was shuttered.
What will deep learning do to essays?
GPT-3 and other LLMs are now capable of generating coherent, well-written essays and even research papers. Will college-application essays disappear? Will schools continue to assign written papers? Will schools even teach writing? Will research journals become suffocated with computer-generated nonsense? Will online conversations be taken over by LLMs like the computer-generated infinite conversation between Werner Herzog and Slavoj Žižek?
Some have lamented (and others have celebrated) that human writing may soon be obsolete. Ben Kovitz, a doctoral student of Indiana University cognitive scientist Douglas Hofstadter, expressed his fears eloquently:
I take the slow, articulate, thoughtful expression of… thoughts… to be not only a crucial part of cultivating one’s mind and one’s ideas, but a fundamental part of what it is to be human… When people outsource writing to a computer program, the result is phony. When I read text, I want it to express what a fellow human thought and cared enough about to say, one who lives in and has a stake in the same public world that I do, and one whose inner life I can empathize with and understand… If nearly all written text we see becomes likely to be phony, then I don’t think people will any longer have a real community, at least not the precious kind of community that has formed around written language.
Deep-learning AI creates staggering opportunity costs
Enormous amounts of electricity, water, and other resources are needed to train and run deep-learning AI models and it has been estimated that the resources needed to train AI systems has been doubling every 3 to 4 months. Even more costly is the diversion of extremely intelligent and hard-working people from more productive pursuits. I find it deeply regrettable that so many talented people have worked so hard to create systems that are designed to deceive. The development of mammoth LLMs is not a road to AGI. It is a very expensive detour.
Here’s a thought. Instead of creating slick BS-generators that will further the bot takeover of the internet, how about creating systems for identifying and disabling the bot accounts that generate the disinformation that is undermining the credibility of science?
You may also wish to read: The AI illusion – State-of-the-art chatbots aren’t what they seem. GPT-3 is very much like a performance by a good magician. You can thank human labelers, not any intelligence on GPT-3’s part, for improvements in its answers. (Gary Smith)