Analysts: AIs Are Often Wrong But Never Uncertain
Gary Marcus and Ernest Davis note that getting AIs to admit uncertainty is "one of the most important *unsolved* challenges" in the fieldAt Psychology Today, University of Queensland psychology prof Thomas Suddendorf asks how we know if other humans, animals, or AI are conscious:
Consciousness is a private affair. There is no way of directly knowing what it is like to be another. We can only infer. And we readily do infer that others have conscious experiences like we do, for essentially three kinds of reasons:
(1) they act like me
(2) they look like me
(3) they tell me
How Can We Know That an Animal, or AI, Is Conscious?,” February 12, 2025

He worries that people will believe that advanced AI models meet these tests:
Few of us may hence currently attribute consciousness to AI. But it is easy to imagine how that would change as the loading on all three reasons for our inferences increases. An AI could simply be programmed to tell us that it is conscious, and it could also act more like us once better integrated with robotics.
Furthermore, it may appear to be much more like us if it was fused with an actual biological body. “Is Conscious?”
Of course, that doesn’t show that it is conscious.
Interesting as his discussion is, something about it feels artificial. A story that broke today illustrates the problem.
When the chatbot was entered in the math contest…
At his substack, AI analyst Gary Marcus hosts New York University computer science prof Ernest Davis to discuss what really happened when LLMs were asked to tackle the questions asked at the USA Math Olympiad (March 19–20) :
Hours after it was completed, so there could be virtually no chance of data leakage, a team of scientists gave the problems to some of the top large language models, whose mathematical and reasoning abilities have been loudly proclaimed: o3-Mini, o1-Pro, DeepSeek R1, QwQ-32B, Gemini-2.0-Flash-Thinking-Exp, and Claude-3.7-Sonnet-Thinking. The proofs output by all these models were evaluated by experts. The results were dismal: None of the AIs scored higher than 5% overall.
“Reports of LLMs mastering math have been greatly exaggerated,” April 5, 2025

Davis and Marcus concede that the math problems are very difficult. But — and this is quite significant — that wasn’t really the problem for the LLMs. Their problem was that they couldn’t know whether they had solved the problems or not:
What matters here is the nature of the failure: the AIs were never able to recognize when they had not solved the problem. In every case, rather than give up, they confidently output a proof that had a large gap or an outright error. To quote the report: “The most frequent failure mode among human participants is the inability to find a correct solution. Typically, human participants have a clear sense of whether they solved a problem correctly. In contrast, all evaluated LLMs consistently claimed to have solved the problems.”
The refusal of these kinds of AI to admit ignorance or incapacity and their obstinate preference for generating incorrect but plausible-looking answers instead are one of their most dangerous characteristics. It is extremely easy for a user to pose a question to an LLM, get what looks like a valid answer, and then trust to it, without doing the careful inspection necessary to check that it is actually right. “Greatly exaggerated”
The authors go on to admit something remarkable: “Getting AIs to answer “I don’t know” is one of the most important unsolved challenges facing the field.” (Emphasis in original.)
So why did DeepMind’s AlphaProof and AlphaGeometry systems achieve a Silver Medal performance in the 2024 International Math Olympiad? Because, believe it or not, “they rely in part on powerful, completely hand-written, symbolic reasoning systems.” Thus, they “can fail to find a proof, but they cannot generate an incorrect proof.” LLMs have no such human backup system.
The authors conclude,
The really important challenge is not to get the AIs to solve more USAMO problems; it is to get them to say “I give up” when they can’t. And we have yet to see any evidence that any kind of prompt helps in that regard.
Ernest Davis and Gary Marcus are sorry to have to break bad news, once again. “Greatly exaggerated”
One way of accounting for the fact that the LLMs couldn’t consider the possibility that they did not solve the problems is that they is not conscious. They are machines programmed to arrive at “solutions”, not necessarily solutions based on symbolic reasoning. And they are not going to suddenly step outside their nature and start using symbolic reason because they somehow “know” they should. They stick to their programming, of course.
In the real world, we will probably be dealing mostly with problems around non-conscious AI that some people suppose is conscious for a long time before we have problems with conscious AI — if we ever do.