Mind Matters Natural and Artificial Intelligence News and Analysis
group-of-jurors-sitting-together-in-jury-box-during-trial-st-740094298-stockpack-adobe_stock
Group of jurors sitting together in jury box during trial

The Large Language Model (LLM) “Superpower” Illusion Dies Hard

Historic confirmation bias around ESP and spirit cabinets makes for an interesting comparison with the current need to believe in the abilities of LLMs
Share
Facebook
Twitter
LinkedIn
Flipboard
Print
Email

Beginning in the 1930s, J. B. Rhine conducted millions of extra-sensory perception (ESP) tests using a 25-card Zener deck which has five cards for each of five symbols: circle, cross, wavy lines, square, and star.

Rhine’s belief in parapsychology was so strong that he was blinded to the flaws in his experiments and to his misinterpretation of the results. For example, he counted an unusual number of misses as evidence of ESP because the subjects were evidently giving deliberately wrong answers in order to embarrass him! He also looked for position effects, where there were above-average or below-average successes in different parts of an experiment’s record sheet. Other times, he would study the results and discover that, while the guesses did not match the contemporaneous Zener cards, they might match the next card or two cards ahead.

In 1926, before he began his ESP experiments, Rhine and his wife attended a séance conducted by a famous medium, Mina “Margery” Crandon, recommended by Sir Arthur Conan Doyle, the author of the Sherlock Holmes novels. Margery claimed that she could channel deceased people using a “spirit cabinet,” that is, a large wooden box or curtained-off area that separates the medium from the audience.

Margery was put into a trance and tied securely to a chair inside the cabinet with a few props scattered on the floor. When the door was closed, a spirit supposedly visited Margery. The audience heard loud noises like a bell being rung and metal pans clanging together. Some props were tossed out of the cabinet. Then the cabinet was opened and Margery was still in a trance and tied to the chair.

Most of the audience was convinced that a spirit had entered the cabinet and was responsible for what happened. It was obvious to the Rhines, however, that Margery was not tightly restrained; thus, she could easily cause everything that happened without the assistance of spirits.

The Rhines wrote an exposé, arguing that “the whole game was a base and brazen trickery, carried out cleverly enough under the guise of spirit manifestations.”

Doyle was incensed. He sent a letter to the Boston Herald condemning the Rhine’s “colossal impertinence” and paid for display ads in Boston newspapers that said simply, “J.B. RHINE IS A MONUMENTAL ASS.”

Confirmation bias

These ESP experiments and spirit-cabinet performances are examples of confirmation bias — the inclination to embrace information that might support our prior beliefs and dismiss evidence to the contrary. Doyle and many others wanted to believe that the deceased can communicate with the living and thus they ignored the obvious fakery. Rhine didn’t believe in spiritualism and declared spirit cabinets, séances, and the like to be fraudulent. However, he believed in ESP and grasped for evidence so fervently that he even interpreted failures as successes.

The large language model (LLM) illusion

ChatGPT and other large language models (LLMs) are very much like a magic act in that their confident responses to almost any query conceal the fact that they are not intelligent in any meaningful sense of the word. True believers do not see the fakery. In April 2024, for example, Elon Musk said, “My guess is that we’ll have AI smarter than any one human around the end of next year.”

I recently argued that LLMs will not lead to artificial general intelligence (AGI) anytime soon. Pre-training on a vast amount of text data will not yield AGI because LLMs are not intended to know — and, in practice, do not know — how any of the text they train on relates to the real world. They are consequently prone to generating confident responses that are factually incorrect and, in some cases, defy common sense.

Post-training that tries to keep LLMs from going off the rails and guide them to logical and accurate responses won’t solve this problem. The post-training cannot reliably anticipate the information needed to give trustworthy responses and cannot assess the uncertainty that affects the outcomes of many consequential decisions; for example, should I settle this legal case or go to trial?

Many readers agreed with my assessment but some pushed back. One very intelligent person, a portfolio manager at a top-50 investment management company, emailed:

Doesn’t training vehicles using simulation and human data somewhat capture the decision making under uncertainty aspect of driving?  How did Alphago learn to play a complex game like Go at championship levels?  

These two examples are unrelated to the question of whether LLMs will lead to AGI. LLMs are not used to drive cars or play Go, and driving a car and playing Go are not at all like deciding whether to settle a legal case.

Another reader, a sensible and productive economics professor, emailed:

I agree with you that the creators of these things are likely over-selling the AGI singularity and, like you, that the market is probably more likely to be repeating the dot-com era than the era of the steam engine or the personal computer….but, but.. I also think that “good enough” could be a pretty dramatic turn of events in this case. I have never seen anything like ChatGPT or the robots on Tesla’s or BMW’s or Amazon’s factories and warehouses.

ChatGPT is astonishing but I will never trust it to make important decisions for me. And, again, LLMs do not control robots in factories and warehouses and the use of robots is not at all like deciding whether to settle a legal case.

Large language models and uncertainty

Imagine that you are the lawyer for a client who is accused of murder. You have followed the old adage, “Don’t settle until you see the whites of the jurors’ eyes.” The jury has now been selected and the trial is about to begin. The prosecutor offers a last-minute deal — a lighter sentence in exchange for pleading guilty to a lesser charge. What do you advise the client to do?

workplace of  lawyer business in office. wooden gavel , contact papers ,justice and law ,attorney, court judge,burden of proof.

As an experienced trial lawyer, you consider the evidence, the composition of the jury, the competency of the prosecutor, whether your client will testify and how that might go, and other relevant information. You tell your client the possible outcomes and your assessment of the likelihood of these various possibilities. How could an LLM — no matter the amount pre-training and post-training — offer equally well-informed and trustworthy advice? An LLM has no way of understanding the relevant information and no means of coming up with subjective probabilities.

If your client rejects the plea bargain, then you need to decide the most useful evidence, the most promising way of presenting the case, and the lines of cross-examination that are most likely to be successful. Again, this is based on your knowledge of the details of this particular case and your past experience handling other relevant cases.

Someone blinded by confirmation bias might believe that robots and Go victories are evidence that ChatGPT would be a reliable lawyer. Don’t be that someone. If that someone were to use ChatGPT instead of a competent human lawyer, then ChatGPT would have a fool for a client.


Gary N. Smith

Senior Fellow, Walter Bradley Center for Natural and Artificial Intelligence
Gary N. Smith is the Fletcher Jones Professor of Economics at Pomona College. His research on stock market anomalies, statistical fallacies, the misuse of data, and the limitations of AI has been widely cited. He is the author of more than 100 research papers and 18 books, most recently, Standard Deviations: The truth about flawed statistics, AI and big data, Duckworth, 2024.

The Large Language Model (LLM) “Superpower” Illusion Dies Hard