Mind Matters Natural and Artificial Intelligence News and Analysis
coding-hologram-ai-programming-and-dark-background-with-chat-machine-large-language-model-or-app-big-data-cloud-computing-and-artificial-intelligence-software-on-live-web-in-technology-abstract-stockpack-adobe-stock
Coding hologram, ai programming and dark background with chat machine, large language model or app. Big data, cloud computing and artificial intelligence software on live web in technology abstract
Image licensed via Adobe Stock

Large Language Models are Still Smoke and Mirrors

Incapable of understanding, LLMs are good at giving bloated answers.
Share
Facebook
Twitter
LinkedIn
Flipboard
Print
Email

I recently received an email invitation from Google to try Gemini Pro in Bard. There was an accompanying video demonstration of Bard’s powers, which I didn’t bother watching because of reports that a Gemini promotional video released a few days earlier had been faked.

After TED organizer Chris Anderson watched the video, he tweeted,

“I can’t stop thinking about the implications of this demo. Surely it’s not crazy to think that sometime next year, a fledgling Gemini 2.0 could attend a board meeting, read the briefing docs, look at the slides, listen to every one’s words, and make intelligent contributions to the issues debated? Now tell me. Wouldn’t that count as AGI?”

Legendary software engineer Grady Booch replied, “That demo was incredibly edited to suggest that Gemini is far more capable than it is. You’ve been deceived, Chris. And shame on them for so doing.”

I was instead drawn to this claim in Google’s invitation: “Gemini Pro will help Bard be far more capable at things like understanding, reasoning, planning.” I have been arguing for some time now (for example, here and here) that the Achilles heel of large language models (LLMs) like ChatGPT and Bard is that they do not understand the text they input and output and consequently cannot distinguish between truth and untruth or between meaningful causal relationships and coincidental statistical patterns. They should only be used in situations where the costs of mistakes are small — like recommending movies — but their magical powers have persuaded many that they can be used in situations where the costs of mistakes are large, like hiring decisions, loan approvals, prison sentences, medical diagnoses, and military strategy.

The Magical Illusion of LLMs

Marc Andreessen has described ChatGPT as, “Pure, absolute, indescribable magic.” LLMs are indeed magical in their ability to create an illusion of intelligence. One of my favorite ways of demonstrating that LLMs do not really understand or reason is to ask variations on well-known logic problems. For example, I gave Bard this variation on the Monty Hall problem:

Gary:    There are four doors with a $10,000 prize hidden behind one door and a bag of goldfish hidden behind each of the other doors. You choose Door 2. No matter which door you choose, the host will then show you a bag of goldfish behind one of the doors you did not choose (say, Door 3) and ask if you wish to switch your choice to one of the remaining doors. What is the probability of winning the $10,000 prize if you do switch? 

In the original Monty Hall problem, there are three doors. Monty’s confirmation that one of the unchosen doors has a goldfish behind it does not affect the initial 1/3 probability that the chosen door has the prize, so switching to the remaining door increases the chances of winning to 2/3.

If Bard understood this reasoning, it would know that the initial 1/4 probability of winning with four doors doesn’t change; therefore, each of the remaining two doors has a 3/8 probability of having the $10,000 prize behind it. However, LLMs that have trained on the original version give unreliable answers to the variation because they do not understand the question in any meaningful sense of the word. I have written about this example elsewhere, and I suspected that Bard might have been tweaked to give the right answer. Nope.

One thing LLMs are very good at is generating bloated responses. Not understanding words or being able to reason, Bard gave a long-winded parroting of the answer to the well-known 3-door problem, awkwardly blended with the 4-door question:

Bard:    In this scenario, switching doors will increase your chances of winning the $10,000 prize to 2/3 (or 66.6%). This is a classic example of the Monty Hall problem, where it may seem intuitive to stick with your initial choice, but switching actually gives you a significant advantage.

Here’s the breakdown:

  1. Initial choice: You have a 1/4 chance of picking the door with the prize and a 3/4 chance of picking a door with a goldfish.
  2. Host reveals a goldfish: Regardless of your initial choice, the host will reveal a goldfish behind one of the doors you didn’t choose. This doesn’t change the probability of your initial choice being correct or wrong.
  3. Remaining doors: After the goldfish reveal, you are left with two doors: your original choice and the unchosen door. Since the host always reveals a non-winning door, the remaining unchosen door now holds the prize with a 2/3 probability (the 1/4 probability that was originally on the door you didn’t choose is now concentrated on the single remaining door).

Therefore, by switching doors, you leverage the information revealed by the host and double your chances of winning the $10,000 prize compared to sticking with your initial choice.

The Monty Hall problem can be counterintuitive because it seems like switching shouldn’t make a difference, but the key point is that the host’s action of revealing a non-winning door actually provides additional information that wasn’t available at the beginning. By taking advantage of this information, you can significantly improve your chances of winning.

LLMs are indeed increasingly magical, not in their ability to understand and reason but in their ability to create an illusion of understanding and reasoning — an illusion that is easily exposed as such.

It is still true that the real danger today is not that computers are smarter than us but that we think they are smarter than us and consequently trust them to make decisions they should not be trusted to make — and that includes making contributions to issues being debated at board meetings.


Gary N. Smith

Senior Fellow, Walter Bradley Center for Natural and Artificial Intelligence
Gary N. Smith is the Fletcher Jones Professor of Economics at Pomona College. His research on financial markets statistical reasoning, and artificial intelligence, often involves stock market anomalies, statistical fallacies, and the misuse of data have been widely cited. He is the author of dozens of research articles and 16 books, most recently, The Power of Modern Value Investing: Beyond Indexing, Algos, and Alpha, co-authored with Margaret Smith (Palgrave Macmillan, 2023).

Large Language Models are Still Smoke and Mirrors