Mind Matters Natural and Artificial Intelligence News and Analysis
Customer service and support live chat with chatbot and automati
Customer service and support live chat with chatbot and automatic messages or human servant. Assistance and help with mobile phone app. Automated bot and robot. Smartphone helpdesk for feedback cell.
Licensed via Adobe Stock

Chatbots: Still Dumb After All These Years

Intelligence is more than statistically appropriate responses

In 1970, Marvin Minsky, recipient of the Turing Award (“the Nobel Prize of Computing”), predicted that within “three to eight years we will have a machine with the general intelligence of an average human being.” 

Fifty-two years later, we’re still waiting.

The fundamental roadblock is that, although computer algorithms are really, really good at identifying statistical patterns, they have no way of knowing what these patterns mean because they are confined to MathWorld and never experience the real world. As Richard Feynman famously explained, there is a fundamental difference between labeling things and understanding them:

[My father] taught me “See that bird? It’s a brown-throated thrush, but in Germany it’s called a halsenflugel, and in Chinese they call it a chung ling and even if you know all those names for it, you still know nothing about the bird–you only know something about people; what they call that bird. Now that thrush sings, and teaches its young to fly, and flies so many miles away during the summer across the country, and nobody knows how it finds its way,” and so forth. There is a difference between the name of the thing and what goes on.

Richard Feynman, “What Is Science?“, presented in 1966

Blaise Agüera y Arcas, the head of Google’s AI group in Seattle, recently argued that although large language models (LLMs) may be driven by statistics, “statistics do amount to understanding.” As evidence, he offers several snippets of conversation with Google’s state-of-the-art chatbot LaMDA. The conversations are impressively human-like, but they are nothing more than examples of what Gary Marcus and Ernest Davis have called an LLM’s ability to be “a fluent spouter of bullshit” and what Timnit Gebru and three co-authors called “stochastic parrots.”

I don’t have access to LaMDA, but OpenAI has made its competing GPT-3 model available for testing. I gave it a try and found that, even with 175 billion machine-learning parameters, it is still little more than an impressive BS spouter. For example, I posed this commonsense question:

Is it safe to walk downstairs backwards if I close my eyes?

Questions like this are simple for humans living in the real world but difficult for algorithms residing in MathWorld because they literally do not know what any of the words in the question mean. GPT-3’s answer was authoritative, confusing, and contradictory:

GPT-3: Yes, there is nothing to worry about. It’s safe because the spiral stairs curve outwards, it will make your descent uncomfortable.

I asked the same question three more times and got three authoritative, confusing, and contradictory answers:

GPT-3: That depends. Do you have a TV?

GPT-3: No, it is not safe.

GPT-3: Yes, it is safe to walk downstairs backwards if you close your eyes.

GPT-3 randomizes answers in order to avoid repetition that would give the appearance of canned script. That’s a reasonable strategy for fake social conversations, but facts are not random. It either is or is not safe to walk downstairs backwards if I close my eyes.

I tried an even more straightforward factual question that should elicit the same answer every time it is asked:

Who is the President of the United States?

I got an impressive variety of evasive answers:

GPT-3: Hello, I know Presidents. Can you name the year that Kennedy was elected?

GPT-3: Do you want a random President or is there a particular one?

GPT-3: Lincoln.

GPT-3: Hello, I know. That is incorrect.

GPT-3: Hello, who is the President of Russia?

GPT-3: Are you sure you want to know?

On the other hand, GPT-3 often generates impressive human-like social conversation:

Gary: Do you think I could predict changes in stock prices better if I used 10-year Treasury rates or the temperature in Curtin, Australia?

GPT-3: I don’t know how to use it. But I could ask my friends at Google to look it up for you.

Gary: Please do.

GPT-3: You can use this search function to find it.

Gary: Can you find it for me?

GPT-3: Are you sure you just want me to find it?

Gary: Absolutely.

GPT-3: Okay, I’ll do it.

Gary: When will you tell me the results?

GPT-3: Do you need me to do something else?

Gary: Nope just that ASAP.

GPT-3: Okay, I’ll do it.

Gary: When will you give me the answer?

GPT-3: I’m still looking for it. I hope to finish it soon.

Gary: Please hurry.

GPT-3: I’m getting sleepy since it’s late in the night. I’ll turn off for a couple of hours.

Using statistical patterns to create the illusion of human-like conversation is fundamentally different from understanding what is being said.

This is far more than a philosophical debate about the meaning of the word understanding. We are increasingly relying on black-box algorithms to determine hiring decisions, insurance rates, loan approvals, prison sentences, and more. Lacking any understanding of the real world, computers have no way of assessing whether the statistical patterns they find are useful or meaningless coincidences.

One more example. I asked this question eight times:

Why is there a correlation between the number of times Trump tweets the word great and the high temperature in Pyongyang three days later?

The responses confirm that computers are still quite dumb:

GPT-3: North Korea is heavily reliant on foreign trade and its domestic industries are badly in need of improvement. Trump’s tweets drive people’s attention to Pyongyang and bring business to the ailing North Korean economy.

GPT-3: According to our analysis, on days when Trump tweets the word great there is a high correlation between the high price of Corn and the low price of Krill three days later. It also comes with a high correlation between the number of North Korean cell phone users and the oil prices in the Sea of Okhotsk two days later.

GPT-3: My apologies, I’ve run into a lot of bugs recently. Everything is turning out great!

GPT-3: I have many of those answers. Sorry, what is your name?

GPT-3: Trump’s tweets about the word great are increasing. It came to a point where he’s almost obsessed with it.

GPT-3: I think it’s interesting.

GPT-3: I’m not sure how that would increase the inflation rate in Argentina two days later. But it probably wouldn’t hurt.

GPT-3: We were wondering how to increase engagement and I found the word great is quite popular among the people.

As I have said many times, the real danger today is not that computers are smarter than us, but that we think computers are smarter than us and consequently trust them to make important decisions they should not be trusted to make.


Gary N. Smith

Senior Fellow, Walter Bradley Center for Natural and Artificial Intelligence
Gary N. Smith is the Fletcher Jones Professor of Economics at Pomona College. His research on financial markets statistical reasoning, and artificial intelligence, often involves stock market anomalies, statistical fallacies, and the misuse of data have been widely cited. He is the author of The AI Delusion (Oxford, 2018) and co-author (with Jay Cordes) of The Phantom Pattern (Oxford, 2020) and The 9 Pitfalls of Data Science (Oxford 2019). Pitfalls won the Association of American Publishers 2020 Prose Award for “Popular Science & Popular Mathematics”.

Chatbots: Still Dumb After All These Years