Mind Matters Natural and Artificial Intelligence News and Analysis
a-digital-tablet-casting-a-hologram-of-a-chatbot-icon-symbolizing-advanced-customer-service-technology-stockpack-adobe-stock
A digital tablet casting a hologram of a chatbot icon, symbolizing advanced customer service technology.
Image Credit: FantasyLand86 - Adobe Stock

Common sense is still out of reach for chatbots

Share
Facebook
Twitter/X
LinkedIn
Flipboard
Print
Email

At AI analyst Gary Marcus’s substack, he and New York University computer science prof Ernest Davis discuss AI chatbots’ continuing lack of common sense after seven decades of research and invention.

One proposal they address is that common sense might somehow “emerge” from vast quantities of data, in the way that some scientists argue that life somehow “emerged” from a lifeless void:

A second hypothesis has been that commonsense might automatically “emerge” in foundation models’ such as LLMs trained on vast quantities of data, without any need for specific systems devoted to physical reasoning.

We first discussed this idea in 2020, when the hot new thing was GPT-3. At the time, we ran some experiments and found that its misunderstandings of basic reality were common and ludicrous, concluding “All GPT-3 really has is a tunnel-vision understanding of how words relate to one another; it does not, from all those words, ever infer anything about the blooming, buzzing world.”

Two and a half years later, more recent models no longer makes the same specific mistakes that GPT-3 did, but still lack a robust understanding of the real external world. For example, OpenAI’s o1 is able to create and execute Python code that can do geometric calculation, but because it does not understand how objects are embedded in space and interact in space or, indeed, what spatial relationships actually mean, it often makes very basic mistakes. In some experiments Ernie carried out on GPT-4o-preview with spatial and physical reasoning, the AI seemed to think that an astronaut standing on the far side of the moon would be able to see the earth; it confused the upswing of a pendulum with its downswing; it confused two things moving apart with two particles moving together; and it made other similar mistakes. Gur Kimchi’s example of (at the top of this essay) elephants in swimming pools is another. TED talks by Fei Fei Li and Yejin Choi also pointed to the unreliability of the pure LLM approach with respect to commonsense including physical and spatial reasoning. Any one example can be fixed, but examples keep rolling in because no robust solution has been found.

“AI still lacks “common” sense, 70 years later,” January 5, 2025

Apparently, common sense does not just emerge from a vast dump of data.

Marcus and Davis do not think that that means chatbots are useless:

This is not to say that large databases of text and video are without value. LLMs could even be argued to have knowledge of some form. But one important component of physical and conceptual understanding is reasoning about entities and their properties, and current approaches have consistently fallen short on this.

In our view, it is only once AI researchers grapple directly with the challenging problems inherent in commonsense reasoning about entities that interact in space and persist in time that the field of general-purpose AI will finally begin to mature. “ 70 years later,

Meantime, don’t get them involved with important decisions.


Common sense is still out of reach for chatbots