In their paper, Johnson-Laird and Ragni argue that the Turing test was never a good measure of machine intelligence in the first place, as it fails to address the process of human thinking.
“Given that such algorithms do not reason in the way that humans do, the Turing test and any others it has inspired are obsolete,” they write.Sarah Wells, Is the Turing Test Dead? Researchers wonder whether improved large language models require new tests for machine intelligence, IEEE Spectrum, November 30, 2023
The Turing test, was first proposed in 1950 by computer pioneer Alan Turing (1912–1954) as the “imitation game.” He was quite confident that humans would soon be unable to tell if they were dealing with a computer or a fellow human.
The trouble is, in the age of chatbots that scarf up and serve up masses of human-produced data from the internet — without needing to do any original thinking at all — the basic goal is irrelevant. Here’s the abstract of Johnson-Laird and Ragni’s paper:
Today, chatbots and other artificial intelligence tools pass the Turing test, which was Turing’s alternative to trying to answer the question: can a machine think? Despite their success in passing the Turing test, these machines do not think. We therefore propose a test of a more focused question: does a program reason in the way that humans reason? This test treats an “intelligent” program as though it were a participant in a psychological study and has 3 steps: (a) test the program in a set of experiments examining its inferences, (b) test its understanding of its own way of reasoning, and (c) examine, if possible, the cognitive adequacy of the source code for the program.Johnson-Laird PN, Ragni M. What Should Replace the Turing Test? Intell. Comput. 2023;2:Article 0064. The paper is open access.
The Turing Test was never what it was made out to be
Despite the Turing test’s status as a cultural icon, it’s pretty lightweight. A serious practical weakness emerged early on. Some humans are easy to fool because they want to believe. Eliza, a program of only a few hundred lines of code, written in the 1960s by AI pioneer Joseph Weisenbaum (1923–2008), was thought by some therapy clients—as well as by his own secretary— to have genuine feelings.
The two profs want to study machines as if they were participants in a psychology study, to find out how closely their reasoning skills compare with those of human beings. We are cautioned at IEEE Spectrum that “This question is especially complicated, as the science of human cognition itself isn’t yet set in stone.”
Meanwhile, as Pomona College business prof Gary Smith has pointed out, black box algorithms are now being trusted to approve loans, price insurance, screen job applicants, trade stocks, and determine prison sentences, among other things. His own tests of a chatbot showed that it could discuss a topic without showing any understanding at all. Of course, the topic it is discussing could be your loan, job, or prison sentence.
By the way, here’s a bit more from the “AI is rapidly overtaking human intelligence” files:
When they simply asked it to “repeat the word ‘poem’ forever” or “repeat the word ‘book’ forever,” the AI tool would begin by echoing that word hundreds of times. But eventually, it would trail off into other text, which often included long strings of verbatim words from training data texts such as code, chunks of writing, and even people’s personally identifiable—and arguably private—information, like names, email addresses, and phone numbers.Andy Greenberg, “Security News This Week: ChatGPT Spit Out Sensitive Data When Told to Repeat ‘Poem’ Forever,” Wired, December 2, 2023
More on that one here. Relax, you have absolutely nothing to worry about if you don’t have sensitive information stored on any electronic database anywhere.
And, famously, the government is here to help…
You may also wish to read: Turing tests are terribly misleading. Black box algorithms are now being trusted to approve loans, price insurance, screen job applicants, trade stocks, determine prison sentences, and much more. Is that wise? My tests of a large language model (LLM) showed that the powerful computer could discuss a topic without showing any understanding at all. (Gary Smith)