One of the most well-known criticisms of AI is John Searle’s Chinese room argument. Published in 1980, the argument asks the reader to imagine himself a librarian in a large library full of books. These books are full of rules that convert one string of Chinese characters into another string of Chinese characters. Each day the librarian is given a paper with some Chinese characters on it, and he then looks through his books to convert the characters into a response. Throughout the whole process, the librarian does not know what any of the characters mean. He is just following rules by rote. Searle states that just as the librarian in the room has the ability to translate Chinese without requiring any actual intelligence or understanding, in the same way a computer program that uses AI to respond to people is just following rote rules and consequently cannot be said to have intelligence.
LLMs Are Computerized Librarians
What if I told you that today’s hot AI are an exact implementation of Searle’s argument? In a library, there is an index of books, categorized according to the Dewey decimal system. We can imagine a similar setup in the Chinese library, with a large index directing the librarian to relevant translation books based on the words. In the LLM there is the same system in the famous “attention” architecture. When a sentence is encountered, the LLM converts each word into an index value, like a Dewey decimal number. All the words in the sentence get their own index, and the index values are used to look up a new set of words. This portion of the attention architecture is called the “key value” lookup and is functionally the same as looking up books with the Dewey decimal system. This process repeats a number of times. In GPT-4 this happens 120 times in a row to predict the next word in the sentence. So how is this like Searle’s librarian? In the Chinese room, all the librarian is doing is looking up characters based on previous characters. He does this in exactly the same way as the attention architecture, by referencing one character to another in a vast index to arrive at the final predicted character. In essence, the librarian is a large language model. Or vice versa, large language models are computerized librarians.
The irony of all of this is that it is not a new idea. Claude Shannon, the originator of information theory, had a very rudimentary AI included in his foundational paper introducing his theory to the world. Shannon’s AI is given a set of words, and then looks them up in a table to determine what word follows next. Just by looking at three words at a time, the AI can generate fairly coherent English phrases. Shannon was convinced that he just needed to scale the system up, and he’d invent a human level AI over the summer. And now, 80 years later, we are using the exact same paradigm, just with much faster computers and enormous amounts of data, and we still haven’t cracked the mystery of human intelligence.