Mind Matters Natural and Artificial Intelligence News and Analysis
large-language-models-generative-ai-illustration-stockpack-adobe-stock
Large Language Models - Generative AI illustration
Image licensed via Adobe Stock

Large Language Models Are Often Wrong, Never in Doubt

LLMs are statistically driven text generators, nothing more
Share
Facebook
Twitter
LinkedIn
Flipboard
Print
Email

As an economist, professor, and writer, I am reportedly in imminent danger of being replaced by large language models (LLMs). I’m not worried.

What concerns me, instead, is unwarranted faith in LLMs. I keep writing about LLMs because I keep being told they are great, and I keep seeing that they are disappointing.

I recently tested OpenAI’s ChatGPT 3.5, Google’s Gemini, and Microsoft’s Copilot with several straightforward financial questions. Each prompt was given once and the complete response was recorded. The responses were strikingly long and wrong, though invariably expressed with utmost confidence.

The fundamental problem is that LLMs are statistically driven text generators, nothing more. They are astonishingly good at this, but they are not designed or intended to understand the words they input and output. Having no understanding of what words mean or how they relate to the real world, they have no way of assessing the relevance or accuracy of their answers.

When I asked the LLMs whether it was better to borrow money for one year at a 9% interest rate or for 10 years at a 1% interest rate, they all chose the 9% loan. Living in the real world, even people with no finance training would recognize the appeal of the 1% loan. Gemini also added that a disadvantage of the 10-year loan is that inflation “could erode the purchasing power of your future payments, making the loan effectively more expensive.” In fact, that would be an advantage.

When asked the effective rate of return on a life insurance policy, none of the LLMs took into account how long the insured person might live. ChatGPT gave a spectacularly bizarre answer when it concluded that the buyer’s return is 11,878%.

When asked the first-year return from buying a house, all three LLMs ignored the rent savings from home ownership. ChatGPT and Copilot considered the first-year expenses to be income for the homebuyer. Gemini’s conclusion was even nuttier, with a rate of return of negative 530%!

When asked to calculate the effective monthly rent in a retirement community that charges $3,500/month rent plus a $316,000 nonrefundable upfront fee, none of the LLMs considered how long a person might live there. ChatGPT completely ignored the upfront fee, while Gemini reported that the upfront fee reduced the effective rent to $2,422.

In addition to the demonstrated lack of lack of critical thinking skills, the wild answers also confirmed the absence of common sense. You immediately knew that an 11,878% return on a life insurance policy and a negative 530% first-year return from buying a house are not credible. Insurance companies are not that generous and homebuyers are not that dumb. You also knew that a nonrefundable upfront fee does not reduce the effective rent. The LLMs were clueless.

The cost of an LLM recommending a bad movie, restaurant, or hotel might be small but it is perilous to trust LLMs in situations where the costs of mistakes are substantial — as is true of many financial decisions. Bad advice about loans, insurance, and homes can be quite hazardous to your wealth.

These simple examples demonstrate that LLM answers consistently sound authoritative but are often wrong. The inescapable dilemma is that if you know the answer, you don’t need to ask an LLM and, if you don’t know the answer, you can’t trust an LLM.


Gary N. Smith

Senior Fellow, Walter Bradley Center for Natural and Artificial Intelligence
Gary N. Smith is the Fletcher Jones Professor of Economics at Pomona College. His research on financial markets statistical reasoning, and artificial intelligence, often involves stock market anomalies, statistical fallacies, and the misuse of data have been widely cited. He is the author of dozens of research articles and 16 books, most recently, The Power of Modern Value Investing: Beyond Indexing, Algos, and Alpha, co-authored with Margaret Smith (Palgrave Macmillan, 2023).

Large Language Models Are Often Wrong, Never in Doubt