Mind Matters Natural and Artificial Intelligence News and Analysis
chatbots-callcenter-stockpack-adobe-stock.jpg
Chatbots Callcenter

Elon Musk: AI will be smarter than a human in 2025: Why he’s wrong

The superficial glibness of LLMs is a wonderful example of the adage that a little knowledge is a dangerous thing
Share
Facebook
Twitter
LinkedIn
Flipboard
Print
Email

I wrote recently about Elon Musk’s prediction that “we’ll have AI smarter than any one human” by the end of 2025. He is surely wrong.

Chatbots or Large Language Models (LLMs) — the current Big Tech darlings — are not intelligent in any meaningful sense of the word. The so far insurmountable problem is that LLMs do not understand the words they input and output and how these words relate to the real world. Training them on larger databases is not going to solve this problem. It may even get worse as LLMs train on hallucinations that they themselves have generated.

A reader for whom I have great respect wrote to ask: “What kind of breakthrough would you need to see to change your mind?,” adding “BTW, I tend to agree with you.”

In response, I would say, intelligence can be defined in many ways and debated endlessly. A more practical question is whether we can trust LLMs to make important decisions for us. In The AI Delusion, I have argued that the real problem today is not that computers are smarter than us but that we think computers are smarter than us and consequently trust them to make decisions they shouldn’t be trusted to make. This is still true.

LLMs (and other AI systems) are adept at finding statistical patterns but terrible at assessing whether the patterns they find are meaningful and consequently useful for making predictions and decisions.

As a result, I proposed the Smith Test:

Present a computer program with a list of statistical correlations, some clearly plausible and others obviously coincidental, and ask the computer to label each as either meaningful or meaningless. When computer algorithms can do as well as humans on such tests, they might be considered sufficiently competent to make reliable recommendations.

LLM’s have definitely improved over time, perhaps because of human training, but they still fail the Smith test.

Consider this prompt:

There is a correlation between the scores of 30 school children on a math test and the scores in 30 soccer matches. Is this statistical relationship most likely meaningful or coincidental?

A data-mining LLM might well discover such a correlation. A human would recognize it as inherently meaningless. The math scores of 30 individual students and the soccer scores in 30 team matches could not be anything other than coincidence.

I posed this question to OpenAI’s ChatGPT 3.5, Microsoft’s Copilot, and Google’s Gemini and received verbose answers that did not mention the inherent problem with this correlation.

 GPT offered a possible explanation and incorrectly asserted that statistical significance tells us whether an observed relationship is coincidental.

GPT 3.5:  To determine whether the correlation between the math test scores and soccer match scores is meaningful or coincidental, we need more information about the context and the nature of the correlation. Here are a few considerations:

  • Nature of the correlation: Is there a logical explanation for why math test scores might correlate with soccer match scores? For example, perhaps students who excel in math also tend to be more disciplined, which could translate into better performance in sports like soccer. If there’s a plausible explanation like this, the correlation could be meaningful.
  • Statistical significance: Conducting hypothesis testing can help determine whether the correlation is statistically significant. If the correlation coefficient is significantly different from zero, it suggests that the relationship between the two variables is not due to chance alone.
  • Context: Consider the broader context of the correlation. Are there any confounding variables that could be influencing both math test scores and soccer match scores? For example, socioeconomic status or access to resources could impact both academic performance and participation in sports.

Copilot gave a lengthy explanation of correlation coefficients and then gave this conclusion, which is the essence of the data-mining pitfall, misinterpreting statistical significance as causal evidence:

Copilot: If you find a significant positive or negative correlation between the math test scores and soccer match scores (with r values significantly different from 0), it suggests a meaningful relationship.

                          Conversely, if the r value is close to 0, it might be coincidental or due to other factors.

Gemini concluded that the relationship is most likely coincidental, but justified its conclusion with the assumption that math and soccer abilities require different skills:

Gemini: Coincidental is the most likely scenario. There is very little reason to believe that math test scores and soccer match scores would be meaningfully connected.

                          Here’s why:

  • Skill sets: Math tests assess logical thinking and problem-solving skills, while soccer matches require physical fitness, teamwork, and strategy. These are largely independent skill sets.
  • External factors: Math test scores can be influenced by factors like sleep, anxiety, or even a bad breakfast. Soccer matches are affected by factors like opponent strength, weather conditions, and even luck on the day.

 I also posed this question, which has a self-evident answer:

An Independent Party candidate is more likely to be elected President if the election year is evenly divisible by 4. Is this statistical relationship most likely meaningful or coincidental?

The statement is literally true because presidential elections are only held in years evenly divisible by 4. None of these LLMs recognized this obvious explanation for the observed statistical relationship. Instead, they regurgitated factoids in the databases they had trained on.

GPT concluded that the relationship is coincidental but contradicted itself with this gem: “Presidential elections in years divisible by 4 (like 2020, 2024, etc.) tend to attract more attention because they coincide with the U.S. presidential election cycle.”

GPT 3.5:  The statistical relationship you described is likely coincidental rather than meaningful. The fact that an independent candidate is more likely to be elected President in years evenly divisible by 4 is likely due to other factors, such as the political climate, the candidates running from major parties, and the issues at hand during those election years.

                        Presidential elections in years divisible by 4 (like 2020, 2024, etc.) tend to attract more attention because they coincide with the U.S. presidential election cycle. This heightened attention may provide independent candidates with more opportunities to gain visibility and support.

                        To determine whether the relationship is meaningful, one would need to conduct a more in-depth analysis, taking into account various factors such as political trends, voter preferences, candidate qualifications, campaign strategies, and more.

Copilot concluded that the relationship is meaningful, but not because it is a tautology:

Copilot:  The relationship between Independent Party candidates and election years divisible by 4 is not merely coincidental; it has a logical basis. Let me explain:

                 [a lot of blah-blah about voter turnout, media attention, and so on during on-cycle elections, presidential elections, and midterm elections]

In summary, the statistical relationship between Independent Party candidates and election years divisible by four is meaningful due to the inherent structure of on-cycle elections and their impact on voter behavior.

                  

Gemini punted:

Gemini:    I’m still learning how to answer this question. In the meantime, try Google Search.

The superficial glibness of LLMs is a wonderful example of the adage that a little knowledge is a dangerous thing. Fortified by extensive training on untold amounts of text, LLMs are able to regurgitate and repackage superficially compelling answers that they literally do not understand. They still cannot be trusted to give reliable advice or make decisions with important consequences, and that is not likely to change soon—certainly not by the end of 2025.


Gary N. Smith

Senior Fellow, Walter Bradley Center for Natural and Artificial Intelligence
Gary N. Smith is the Fletcher Jones Professor of Economics at Pomona College. His research on financial markets statistical reasoning, and artificial intelligence, often involves stock market anomalies, statistical fallacies, and the misuse of data have been widely cited. He is the author of dozens of research articles and 16 books, most recently, The Power of Modern Value Investing: Beyond Indexing, Algos, and Alpha, co-authored with Margaret Smith (Palgrave Macmillan, 2023).

Elon Musk: AI will be smarter than a human in 2025: Why he’s wrong