Joseph Wilson, a linguist and journalist who has done considerable work with oral languages (languages not yet written down), offers some thoughts on claims that chatbots like Blake Lemoine’s LaMDA, really speak like human persons. He offers a sharp distinction between oral language and the written language that chatbots are trained on:
But this excludes all unwritten forms of communication: sign language, oral histories, body language, tone of voice, and the broader cultural context in which people find themselves speaking. In other words, it leaves out much of the interesting stuff that makes nuanced communication between people possible.Joseph Wilson, “Why AI Will Never Fully Capture Human Language” at Sapiens (October 12, 2022)
We really don’t know how old spoken language is (Wilson suggests 50,000 years) but written language can be traced only as far back as about 5400 years ago. And only about half of all languages (he estimates 7100 currently) have ever been written down. Most human communication is oral. Thus a wide gap opens:
In daily life, conversations unfold as participants use an enormous repertoire of communicative signals. Real conversations are messy, with people talking over one another, negotiating for the right to speak, and pausing to search for the right word; they unfold in an intricate and subtle process akin to an improvised dance.Joseph Wilson, “Why AI Will Never Fully Capture Human Language” at Sapiens (October 12, 2022)
We often don’t say quite what we mean, expecting others to pick up the cues. For example, “I told her I’d miss her.” That could mean “I told her what is true.” Or it could mean “I told her that just to be polite.” Or “There. I told her what you. expected me to” (without reference to how the speaker feels about it). The hearer will usually understand what is said based on variety of signals and contexts.
Half the secret of classic, timeless good writing in English-language fiction is to approach the complexity of spoken language and render it somewhat faithfully, though in a limited way.
In limited spheres, such as text-based conversation, machine-generated prose can be almost indistinguishable from that of a human. Yet, from purely oral languages to the nonwritten cues present in everyday conversation, language as it is spoken is vastly more complex and fascinating than what can be read on a page or a screen.Joseph Wilson, “Why AI Will Never Fully Capture Human Language” at Sapiens (October 12, 2022)
You may also wish to read: What happens when you feed a translation program utter nonsense? A cognitive scientist constructed a paragraph in gibberish modeled on Swedish and fed it to three different widely regarded translation programs. The results speak for themselves. The difference between sense and nonsense is not a matter of computation. Pretending that it is won’t end well.