^{Gary Smith

April 29, 2024

3

Artificial Intelligence}

Large Language Models Are Often Wrong, Never in Doubt

_{LLMs are statistically driven text generators, nothing more} _{Gary Smith

April 29, 2024

3

Artificial Intelligence}

Share: Facebook; Twitter; LinkedIn; Flipboard; Print; Email

As an economist, professor, and writer, I am reportedly in imminent danger of being replaced by large language models (LLMs). I’m not worried.

What concerns me, instead, is unwarranted faith in LLMs. I keep writing about LLMs because I keep being told they are great, and I keep seeing that they are disappointing.

I recently tested OpenAI’s ChatGPT 3.5, Google’s Gemini, and Microsoft’s Copilot with several straightforward financial questions. Each prompt was given once and the complete response was recorded. The responses were strikingly long and wrong, though invariably expressed with utmost confidence.

The fundamental problem is that LLMs are statistically driven text generators, nothing more. They are astonishingly good at this, but they are not designed or intended to understand the words they input and output. Having no understanding of what words mean or how they relate to the real world, they have no way of assessing the relevance or accuracy of their answers.

When I asked the LLMs whether it was better to borrow money for one year at a 9% interest rate or for 10 years at a 1% interest rate, they all chose the 9% loan. Living in the real world, even people with no finance training would recognize the appeal of the 1% loan. Gemini also added that a disadvantage of the 10-year loan is that inflation “could erode the purchasing power of your future payments, making the loan effectively more expensive.” In fact, that would be an advantage.

When asked the effective rate of return on a life insurance policy, none of the LLMs took into account how long the insured person might live. ChatGPT gave a spectacularly bizarre answer when it concluded that the buyer’s return is 11,878%.

When asked the first-year return from buying a house, all three LLMs ignored the rent savings from home ownership. ChatGPT and Copilot considered the first-year expenses to be income for the homebuyer. Gemini’s conclusion was even nuttier, with a rate of return of negative 530%!

When asked to calculate the effective monthly rent in a retirement community that charges $3,500/month rent plus a $316,000 nonrefundable upfront fee, none of the LLMs considered how long a person might live there. ChatGPT completely ignored the upfront fee, while Gemini reported that the upfront fee reduced the effective rent to $2,422.

In addition to the demonstrated lack of lack of critical thinking skills, the wild answers also confirmed the absence of common sense. You immediately knew that an 11,878% return on a life insurance policy and a negative 530% first-year return from buying a house are not credible. Insurance companies are not that generous and homebuyers are not that dumb. You also knew that a nonrefundable upfront fee does not reduce the effective rent. The LLMs were clueless.

The cost of an LLM recommending a bad movie, restaurant, or hotel might be small but it is perilous to trust LLMs in situations where the costs of mistakes are substantial — as is true of many financial decisions. Bad advice about loans, insurance, and homes can be quite hazardous to your wealth.

These simple examples demonstrate that LLM answers consistently sound authoritative but are often wrong. The inescapable dilemma is that if you know the answer, you don’t need to ask an LLM and, if you don’t know the answer, you can’t trust an LLM.