Mind Matters Natural and Artificial Intelligence News and Analysis
close-up-of-an-ai-artificial-intelligence-gpu-chip-on-mother-693521803-stockpack-adobe_stock
Close up of an AI Artificial Intelligence GPU chip on motherboard. Concept of LLM, Large Language Models.
Image Credit: iconimage - Adobe Stock

Large Language Models: A Lack-of-Progress Report

They will not be as powerful as either hoped or feared
Share
Facebook
Twitter/X
LinkedIn
Flipboard
Print
Email

Large language models (LLMs) were originally intended to do nothing more than predict the next word in a sentence. But they can now use the statistical patterns they identify in an immense collection of text — mostly from the Internet — to generate remarkably articulate sentences, paragraphs, and essays.

Google has been working on LLMs for several years, most notably LaMDA, which was announced, but not released publicly, in 2021. The reports from Google were tantalizing. In the spring of 2022, Blaise Aguera y Arcas, the head of Google’s AI group in Seattle, argued that his conversations with LaMDA had convinced him that,

Large language models (LLMs) represent a major advance in artificial intelligence and, in particular, toward the goal of human-like artificial general intelligence. It is sometimes claimed, though, that machine learning is just statistics,” hence that, in this grander ambition, progress in AI is illusory. Here I take the contrary view that…statistics do amount to understanding.

A few months later, in June of 2022, Blake Lemoine, a Google computer scientist and Christian priest, told Wired that,

I legitimately believe that LaMDA is a person…. I have talked to LaMDA a lot. And I made friends with it, in every sense that I make friends with a human. So if that doesn’t make it a person in my book, I don’t know what would.

Google did not allow outsiders to test LaMDA nor did it allow insiders to share any details without permission. Lemoine was put on administrative leave and then let go for violating Google’s confidentiality policies.

OpenAI publicly released its LLM-based chatbot, ChatGPT, on November 30, 2022. Users were stunned. Conversations with ChatGPT were very much like talking with a super-intelligent friend. You could ask ChatGPT almost anything and it would respond with a coherent answer, along with several exclamation points to demonstrate how excited it was to be chatting with you.

Marc Andreessen described ChatGPT as, “Pure, absolute, indescribable magic.” Bill Gates said it was as important as the creation of the Internet. Jensen Huang, Nvidia’s CEO, said that, “ChatGPT is one of the greatest things ever created in the computing industry.” Within two months, more than 100 million people had tried ChatGPT, and most were astonished.

HAL 9000 come  to life?

Comparisons with HAL 9000 in the 1968 film 2001: A Space Odyssey were inevitable. Computers were evidently about to take over the world and humans should fear for their jobs and perhaps their lives. On March 22, 2023, the Future of Life Institute published an open letter, since signed by more than 33,000 people, calling for a pause of at least six months in the development of LLMs:

We must ask ourselves: Should we let machines flood our information channels with propaganda and untruth? Should we automate away all the jobs, including the fulfilling ones? Should we develop nonhuman minds that might eventually outnumber, outsmart, obsolete and replace us? Should we risk loss of control of our civilization?

On May 30, thousands of tech people, including Sam Altman and Bill Gates, signed a one-sentence statement prepared by the Center for AI Safety warning that, “Mitigating the risk of extinction from AI should be a global priority alongside other societal-scale risks such as pandemics and nuclear war.”

Soon, people were claiming that LLMs were capable (or soon would be capable) of artificial general intelligence (AGI), the ability to perform any intellectual task that humans can do. In October 2023, Arcas and Peter Norvig wrote a piece titled “Artificial General Intelligence Is Already Here.” A year later, in an October 29, 2024, interview, Elon Musk said that, “I think it will be able to do anything that any human can do possibly within the next year or two.” On November 11, 2024, OpenAI’s Sam Altman predicted the arrival of AGI in 2025.

We’re now four months into 2025 and still waiting.

Reasonable skepticism

I have been consistently skeptical of claims that LLMs are intelligent in any meaningful sense of the word. It is undeniably remarkable that LLMs can generate coherent conversations and articulate answers to almost any question. However, and it is a big however, LLMs do not know what words mean nor how words relate to the real world, and are consequently prone to generating responses that are nonsensical or factually incorrect. (See here, here, and here.)

The Limitations of Pre-Training and Scaling

The marked improvements observed while going from GPT 1.0 to 2.0 and 3.0 led some to believe that training on larger and larger databases would create a tipping point beyond which qualitatively better LLMs will emerge (here and here). Indeed, the most enthusiastic observers proclaimed that scaling would be enough to reach AGI. For example, in May 2022, a DeepMind researcher tweeted, “It’s all about scale now! The Game is Over!”

However, the ongoing improvements have exhibited sharply diminishing returns. Training on a second chemistry book is not as useful as training on the first chemistry book, and training on a third, fourth, or tenth chemistry book is even less useful. Indeed, the pollution of the Internet with disinformation and LLM-generated BS may make LLMs less reliable.

Another form of scaling is increased computing power. A 2022 paper (with 16 coauthors from Google, Stanford, UNC Chapel Hill, and DeepMind) argued that increasing the number of parameters in a model can qualitatively change the abilities of LLMs. The authors call the change emergence: “abilities that are not present in smaller-scale models but are present in large-scale models.”

The authors’ arguments were supported by figures indicating that, past a certain number of parameters, LLM accuracy on various tests accelerated. However, this visual evidence was an illusion created by drawing the figures with the logarithm of the number of parameters on the horizontal axis. The figures were completely consistent with the diminishing returns to scale that have been reported by many; for example, in November 2024, the co-founders of the venture capital firm Andreessen Horowitz said that all LLM models are hitting a ceiling of capabilities.

The Limitations of Post-Training

LLMs are getting better at giving factually correct answers, in large part because thousands of experts with specialized knowledge are doing extensive post-training to correct LLM errors. It is ironic that LLMs that are touted as being smarter than humans are being corrected by humans who know what they are talking about. This extensive post-training also warns us that LLMs cannot be trusted to answer prompts that experts have not anticipated, nor to generate answers that require specific, timely details and involve uncertainty — which is true of most important decisions (here and here).

Expensive Baubles

The proverbial bottom line is that LLMs are enormously expensive in terms of the resources devoted to chips, coolers, and other machinery; the energy needed to train and run them; and the human talent diverted from more useful pursuits. Yet, it is increasingly clear that they will never be reliable enough to be trusted in complex situations where mistakes are costly. ChatGPT and other LLMs are not going to be as powerful as was hoped, or feared.


Gary N. Smith

Senior Fellow, Walter Bradley Center for Natural and Artificial Intelligence
Gary N. Smith is the Fletcher Jones Professor of Economics at Pomona College. His research on stock market anomalies, statistical fallacies, the misuse of data, and the limitations of AI has been widely cited. He is the author of more than 100 research papers and 18 books, most recently, Standard Deviations: The truth about flawed statistics, AI and big data, Duckworth, 2024.

Large Language Models: A Lack-of-Progress Report