^{News
September 16, 2025

2

Large Language Models (LLMs)}

The fundamental mistake programmers made in designing AIs

_{News
September 16, 2025

2

Large Language Models (LLMs)}

Share: Facebook; Twitter/X; LinkedIn; Flipboard; Print; Email

At Futurism, Victor Tangermann reports that Sam Altman’s company, OpenAI, thinks it has figured out the origin of the hallucinations that plague large language models (LLMs) like ChatGPT.

Chatbot digital tablet artificial intelligence communication concept. Chatbot is new trend in B2C communication with conversational AI application

The bad news is that it’s unclear what can be done about it.

In a paper published last week, a team of OpenAI researchers attempted to come up with an explanation. They suggest that large language models hallucinate because when they’re being created, they’re incentivized to guess rather than admit they simply don’t know the answer.

Hallucinations “persist due to the way most evaluations are graded — language models are optimized to be good test-takers, and guessing when uncertain improves test performance,” the paper reads.

Conventionally, the output of an AI is graded in a binary way, rewarding it when it gives a correct response and penalizing it when it gives an incorrect one.

In simple terms, in other words, guessing is rewarded — because it might be right — over an AI admitting it doesn’t know the answer, which will be graded as incorrect no matter what.

“OpenAI Realizes It Made a Terrible Mistake,” September 14, 2025

From the open access paper: “This “epidemic” of penalizing uncertain responses can only be addressed through a socio-technical mitigation: modifying the scoring of existing benchmarks that are misaligned but dominate leaderboards, rather than introducing additional hallucination evaluations. This change may steer the field toward more trustworthy AI systems.”

Is the error fundamental to that type of system? The company argues that the problem is fixable but just how easy will that be? As Tangerman says, “For now, the AI industry will have to continue reckoning with the problem as it justifies tens of billions of dollars in capital expenditures and soaring emissions.”

And meanwhile, “GPT-5 Is Making Huge Factual Errors, Users Say.”

Indeed. You may also wish to look at economist Gary Smith’s dialogues with error-prone GPT-5: here, here, and here.