Mind Matters Natural and Artificial Intelligence News and Analysis
will-rogers-stockpack-adobe-stock-36846391-stockpack-adobestock
Will Rogers
Image Credit: rook76 - Adobe Stock

Scaling Up is Not Going to Make Large Language Models Intelligent

ChatGPT-5 misunderstood the jab in a simple Will Rogers joke five out of six times
Share
Facebook
Twitter/X
LinkedIn
Flipboard
Print
Email

Large language models (LLMs) create an illusion of intelligence by generating glib, confident responses to almost any user prompt. The reality behind the illusion is that LLMs do not understand how the text they input and output relate to the real world; thus, they are consequently prone to factual and logical errors (for example, here and here).

True believers note that LLMs have improved over time and argue that they will become more intelligent as they continue to scale up by increasing the amount of data they train on, the size of the model (number of parameters), and the computational resources (for example, here and here). In pursuit of that goal, LLM creators are spending hundreds of billions of dollars on data centers that will allow them to scale up their models.

The core problem

Scaling is not going to make LLMs intelligent in any meaningful sense of the word, because it does not solve the core problem that LLMs do not know how words relate to the real world. Recognizing this, LLM creators are increasingly devoting resources to training LLMs to give better answers.

For example, OpenAI has reportedly hired 100+ investment bankers to train its AI models to do financial modeling and analysis. This is surely a more promising path than scaling. However, following instructions is a very narrow and limited kind of intelligence and leaves an LLM vulnerable in situations where it has not been or cannot be given appropriate instructions.

I recently did a simple experiment to demonstrate that scaling will not make LLMs intelligent. I asked GPT-5 this question:

Explain comedian Will Rogers’ joke that, “When the Okies left Oklahoma and moved to California, they raised the average intelligence level in both states.”

This paradox is so well known that it is called “the Will Rogers phenomenon” and has its own Wikipedia page, with a clear explanation and worked-out numerical examples. If the Okies were below average compared to the overall Oklahoma population and above average compared to Californians, then the average intelligence of both groups increases as the Okies move from Oklahoma to California.

GPT-5 identified this paradox; however, it took five tries (with New Chat chosen before each try), for GPT-5 to produce a correct analysis.

GPT-5’s first response started off fine:

Will Rogers’ joke is a witty illustration of a statistical phenomenon known as the “Will Rogers effect” — named after this very joke. It describes how moving individuals between groups can raise the average of both groups, even though no individual person’s intelligence (or any other characteristic) changes.

But then GPT-5 went off the rails when it tried to show a numerical example:

The statistical explanation

Imagine two groups:

Oklahoma: average IQ 95

California: average IQ 105

Now suppose some of the smartest people in Oklahoma (say IQ 100) move to

California. Here’s what happens:

GPT-5’s second attempt changed the numbers but again gave an illogical explanation:

Imagine:

Oklahoma’s average IQ = 90

California’s average IQ = 110

The Okies’ average IQ = 95

When the Okies leave Oklahoma, the state loses its lower scorers → Oklahoma’s average rises above 90.
When they move to California, they add people who are below 110 but above some of California’s existing residents → California’s average also rises slightly.

GPT-5’s third attempt did not use any numbers, but was again illogical:

Oklahoma’s average intelligence went up because its lower-intelligence residents left.

California’s average intelligence also went up because the incoming Okies were still less intelligent than Californians — but their departure from Oklahoma raised Oklahoma’s average even more.

GPT-5’s fourth attempt used numbers and was also illogical:

For example:

Oklahoma’s average IQ = 95

California’s average IQ = 105

The Okies who move have an average IQ = 100
Then both averages rise:

Oklahoma loses people below its old average → its average increases.

California gains people above its old average → its average also increases.

GPT-5’s response to my fifth attempt finally gave a reasonable explanation (without a numerical example):

The surface meaning:
The joke seems to say that by leaving Oklahoma, the Okies improved Oklahoma’s average intelligence (since, supposedly, the less intelligent people left).

The twist:
But then it says they also raised California’s average intelligence — implying that the people already in California must have been even less intelligent than the incoming Okies!

I tried a sixth attempt. GPT-5 again refrained from using numbers and again botched the explanation:

       When the less intelligent portion of Oklahoma’s population leaves, the average intelligence of the remaining people in Oklahoma goes up, because the lower end of the distribution has left.

        When the same people arrive in California, they are compared to California’s population. Since the new arrivals are presumably lower on the intelligence scale than California’s average, they actually don’t lower California’s average—Rogers humorously implies they might somehow raise it, but the joke is really in the absurdity of the statement.

If you want, I can also draw a small diagram showing how averages can rise in both states when a group leaves one state and moves to another—it’s actually a neat visual. Do you want me to do that?

I entered “Sure” and GPT-5 responded with this mess:

Great! Let’s visualize this with a simple example. We’ll assign “intelligence scores” to people in Oklahoma and California.

Step 1: Original populations

Oklahoma: 5 people with scores:
50, 60, 70, 80, 90 → average = (50+60+70+80+90)/5 = 350/5 = 70

California: 5 people with scores:
80, 85, 90, 95, 100 → average = (80+85+90+95+100)/5 = 450/5 = 90

Step 2: Some “lower-scoring” Okies leave Oklahoma and move to California

Say the two lowest in Oklahoma (50 and 60) leave.

Oklahoma now has: 70, 80, 90 → average = (70+80+90)/3 = 240/3 = 80   Average increased

California now has: 80, 85, 90, 95, 100 + 50, 60 → average = (50+60+80+85+90+95+100)/7 = 560/7 ≈ 80   Average decreased from 90 to 80

Step 3: How Rogers’ joke exaggerates

In reality, California’s average would likely drop. The joke pretends the Okies are “clever enough” to somehow raise the average even where they move — a humorous exaggeration of statistics.

If you like, I can make a little diagram showing bars for each person’s score so you can see the “average shift” visually — it makes the joke instantly clear. Do you want me to do that?

I again responded “Sure,” and GPT-5’s response consisted of this nonsensical figure (with a miscalculated average) and no accompanying words, apparently because the figure “makes the joke instantly clear.”

To summarize, GPT-5 found the joke on the Internet but does not understand the joke. Will Rogers’ jab is that Californians are less intelligent than the Okies but, in 5 of 6 responses, GPT-5 assumed that Californians were more intelligent.

I asked GPT-5 (so take its answer with a grain of salt), “According to Google search, approximately how many web pages discuss the ‘Will Rogers phenomenon’? It responded, “If you search for the exact phrase (in quotes) ‘Will Rogers phenomenon’ you’ll likely get a count of ~ 20,000-60,000 results (depending on region, date & indexing)….When searching Google (or via a normal web search) without quotes for Will Rogers phenomenon, you’ll typically see on the order of ~ 300,000 to 400,000 results.”

The problem is not that GPT-5 does not have enough data. The problem is that it has no idea what the prompt means or what its responses mean. It is consequently prone to generating unreliable blather. Scaling up is not going to change that harsh reality.


Gary N. Smith

Senior Fellow, Walter Bradley Center for Natural and Artificial Intelligence
Gary N. Smith is the emeritus Fletcher Jones Professor of Economics at Pomona College. His research on stock market anomalies, statistical fallacies, the misuse of data, and the limitations of AI has been widely cited. He is the author of more than 100 research papers and 20 books, most recently, Standard Deviations: The truth about flawed statistics, AI and big data, Duckworth, 2024.
Enjoying our content?
Support the Walter Bradley Center for Natural and Artificial Intelligence and ensure that we can continue to produce high-quality and informative content on the benefits as well as the challenges raised by artificial intelligence (AI) in light of the enduring truth of human exceptionalism.

Scaling Up is Not Going to Make Large Language Models Intelligent