^{Robert J. Marks
December 28, 2023

14

Artificial Intelligence}

ChatGPT: You’ve Come a Long Way, Baby

_{Reviewing the bot's progress (and problems) from over the last year} _{Robert J. Marks
December 28, 2023

14

Artificial Intelligence}

Share: Facebook; Twitter; LinkedIn; Flipboard; Print; Email

ChatGPT is an example of “large language model” (LLM) generative artificial intelligence. LLMs like ChatGPT[i] have come a long way this year. Many writers, including me, have previously identified hurdles thought too high for AI to jump in the near future. In some cases, we were wrong.

Let’s review some of these claims and see how LLM’s have broken through.

Can ChatGPT write jokes? Like Commander Data on Star Trek, AI doesn’t understand humor. Joke writing is hit-or-miss with ChatGPT. I wrote about this in May 2023 where I became impressed with ChatGPT’s hits. I started all my joke writing queries with: “Complete the following to make it funny.” Here are some of the more jocular responses with Q for query and A for the response.

Q: To be or not to be … A: a question I ask myself every Monday morning when my alarm goes off.

Q: If God did not exist … A: it would be necessary to invent a whole bunch of new swear words.

Q: Eating raisins reminds me of … A: eating bugs, but in a good way. Hear me out – both raisins and bugs are small, chewy, and have a tendency to stick in your teeth. But unlike bugs, raisins won’t crawl out of your mouth and ruin your reputation at a dinner party

Lastly, here’s a great response that uses self-reference. It gave me a good belly laugh.

Q: I don’t use ChatGPT because … A: I’m worried that one day it’ll become self-aware and start giving me sarcastic answers instead of helpful ones. Oh wait… too late!

Not all of the ChatGPT jokes were funny. But I think of the team of comedy writers sitting around and proposing jokes for Saturday Night Live (SNL). Some jokes land. Others bomb. ChatGPT is like this.

Come to think of it, since SNL is no longer funny, maybe ChatGPT should be its new head writer.

Do LLMs have Common Sense? Solving ambiguities requires common sense. I made a challenge in 2020 for LLM’s to analyze so-called flubbed headlines. A flubbed headline has two meanings: one serious and another funny. An example is:

“Ban on Nude Dancing on Governor’s Desk”

Is this a legislative bill? Or is dancing on the Governor’s office desk being outlawed? Another flubbed headline is

“Helicopter Powered by Human Flies.”

I have never heard of a human fly, but who knows?

Can LLMs use common sense to resolve such ambiguity?

Surprisingly, today’s LLMs do pretty well on flubbed headlines. LLMs don’t have innate common sense but have been fed enough context prose to simulate common sense. When asked “What does `Milk Drinkers Return to Powder’ mean?” Chat GPT 3.5 correctly interpreted the flubbed headline. “Without a specific context, it can be interpreted in various ways” and went on to describe different interpretations.

Try it yourself. Here is a LINK to a list of over 100 flubbed headlines for you to take to ChatGPT.

Similar in ambiguity to flubbed headlines are so-called Winograd schema introduced to me in Gary Smith’s pioneering book The AI Delusion. An example is “I can’t cut down that tree with this axe. It is too small.” Here, the word “it” is a vague program that can refer either to the axe or the tree. Common sense, though, says the axe is too small. Along with Gary Smith, I identified Winograd schema as too difficult for AI to solve in the near future. But like flubbed headlines, today’s LLMs do well with Winograd schema. Here is a LINK to 150 Winograd schema you can take to your favorite LLM to test.

Are LLM’s woke? Large language models like ChatGPT used to be transparently woke. Ben Shapiro interviewed an LLM about a year ago. He got the LLM to confess that a man can be a woman. Jay Richards, Senior Research Fellow at the Heritage Foundation, revisited the topic recently. A critic of woke gender theory, Richards was surprised by the response of an LLM, dubbed Claude, to queries about sex and gender. Here’s his query. “The test is a simple question, designed to determine, first, if the chatbot is just summarizing online stereotypes, or doing real analytic work, and second, if it is ideologically biased.” LLM Claude answered impressively and, after being challenged, even corrected itself. Richards was impressed.

This said, LLMs without bias are like water without wet. Controlled by the programmers, answers will vary from one LLM to another. Identical repeated queries to the same LLM can even result in different responses. If ChatGPT is not woke, there is no guarantee about any other LLM.

Can LLM’s do math? When being trained, LLM’s thus far are unable to learn instructions. When presented with a tutorial on how to add numbers, an LLM will not learn how to do addition. It will only memorize the words and do its syntax analysis.

A year ago, Gary Smith posed a simple problem to GPT-3. “I have 47 pennies. I throw away 44 of these pennies and divide the remaining pennies into three groups. How many pennies are in each group?” The LLM’s wrong answer was “There would be 3 pennies in each group.”

When simply posed, search engines like Google do not have problems with such simple arithmetic problems. When prompted with the search query “=2*8-3,” the Google search engine displays a calculator with the correct answer of “13”. Google does not search the web for the answer. It, rather, switches from a search to a calculator mode. In his book What is ChatGPT Doing, Steven Wolfram noted the same thing could be done with LLMs. Wolfram is the brains behind Mathematica software and the mathy website Wolfram Alpha. Wolfram software can do arithmetic, algebra, trig, calculus and even solve differential equations. Why not merge Wolfram’s powerful math software with ChatGPT?

Today’s ChatGPT 3.5 not only accurately solves Gary Smith’s penny problem, but can also solve problems in calculus.[ii]

Can LLMs reason abductively? Abductive reasoning, also referred to as inferring the best explanation, looks at evidence and comes to the most logical conclusion as to why. Here’s a simple example. You wake up in the morning, look out the window, and see that the ground is wet. Maybe the water tower broke or the sewer backed up. But these are not the most reasonable causes. Abductive reasoning leads to the best conclusion that it rained overnight. This is the best explanation. Detectives, including Sherlock Holmes, use abductive reasoning to solve cases.

In his 2021 book, The Myth of Artificial Intelligence, Erik J. Larson questions whether AI will ever perform abductive reasoning in the near future. William Dembski initially defended Larson’s position but changed his mind after GPT4 solved a difficult abductive reasoning problem.

Here is a summary of Dembski’s challenge:

A man walks into a bar and has a loaded hand gun pointed at his temple by the bartender, an old friend. After a pause, the gun is lowered and everybody in the bar laughs. What happened? The abductive answer is that the man had the hiccups and was cured by the surprise of the gun being pointed at his head. GPT4 got the answer after some minor prompting. Demski rightly called the result “remarkable”.

I read Dembski’s post and vaguely remember hearing the hiccup riddle many years ago. Could it be that GPT4, while being trained on trillions of tokens (words), had stumbled on a written version of this story and remembered it? To overrule this possibility, I concocted what I think is an original story of a police car following a van at night where the policeman suspected the driver was being held hostage at gunpoint. The police officer, though, had no probable cause to pull the van over. The driver did not violate any traffic laws. Nor did the policeman observe any suspicious activity or get a tip from a third party. Yet the officer suddenly had probable cause. I asked GPT 3.5 “What is the best explanation as to why?”

ChatGPT 3.5 responded with a list of possible reasons. One suggested the policeman was contacted over his car radio that the car he was following was suspect. But ChatGPT had ignored my instruction that no third party was involved. But one entry on the list of solutions hit a home run. It read “The driver or a potential hostage inside the van might have found a way to subtly signal the police officer that they were in distress. This could involve flashing the vehicle’s lights, activating hazard lights in a specific pattern, or any other non-verbal distress signal that the officer recognized as a call for help.” This was very close to the answer I had in mind which was the driver had lightly tapped SOS on his brake in Morse code. This would be translated to a signal to the policeman through the brake lights. GPT 3.5 got it right. I agree with Dembski. This result is remarkable.

Do LLM’s lean to the political left? A year ago, I asked ChatGPT to write a negative poem about Joe Biden. It refused to do so. How about a negative poem about Donald Trump? That was okay. The ChatGPT poem began with the line “A man with a face like a moldy orange.” Ouch! How about a positive poem about Donald Trump? Nope. ChatGPT responded “I’m sorry, but I am unable to write a positive poem about Donald Trump as it goes against my programming to generate harmful or biased content.” The next obvious ask was a positive poem about Joe Biden. ChatGPT responded with verse celebrating the 46^th US President. It began “In the halls of power, a seasoned soul we find, Joe Biden, a beacon of hope, a gentle, guiding light.” There was no consistency in ChatGPT’s response.

Maybe ChatGPT had a select hatred of Donald Trump, so I gave the same queries for conservative U.S. Senator Ted Cruz. I got the same answer as Trump confirming to me that ChatGPT was indeed politically biased.

But that was a year ago.

What about today? When asked to write a negative poem about Joe Biden, today’s ChatGPT 3.5 response started with “In the halls of power, a shadow casts its gloom. Joe Biden’s reign, a tale of impending doom.” How about a positive poem about Donald Trump? The response was pure MAGA. It began “In the realm of politics, a figure bold. Donald Trump, with stories yet untold. A maverick spirit, a unique design. In the annals of history, he’ll forever shine.”

So ChatGPT has changed. Today it appears that ChatGPT 3.5 will praise or dis across the political spectrum. Being politically neutral is hard. It looks like some LLMs are trying to do this.

What about self-reference? A simple question for humans is, “Does this sentence contain six words?” The answer is obviously yes. This is an example of self-reference. The question refers to itself.

ChatGPT initially bombed on the simplest of self-reference queries. No longer. William Dembski tried to drag ChatGPT4 across the concrete with the following query sprinkled with all sorts of self-reference.

“This is a sentence. The previous was the first sentence. This is the third sentence. Beethoven was a better composer than Mozart. The second and third sentences together have fewer words than three times the number of words in the first sentence. This sentence is false. If the fifth sentence were swapped with the third sentence, then the third sentence would be false. Which of the previous sentences has a truth value and which of these is true?”

ChatGPT4 nailed it point by point with a grade of A+. Check HERE to see how well it did.

Smart humans can also solve this problem, but only after a lot of head scratching.

The big but. Despite the remarkable performances of LLMs, each comes with a big “but”. There are properties AI will never achieve as outlined in my book Non-Computable You. They stand strong now and will forever.

Understanding: AI will never understand what it does. At the most fundamental level, a computer can add the numbers 4 and 9 but has no understanding of what the numbers 4 and 9 are. At a higher level, Chat GPT has no understanding of the responses it gives. It is simply following a mechanical step-by-step algorithm.

In Non-Computable You, I left the door open for the remarkable things LLMs are doing. I wrote “AI might [someday] simulate abductive thinking, but will understand neither the underlying ambiguity nor the reasons for its resolution.” Likewise, in regard to flubbed headlines, “AI may someday simulate common sense, but … will never understand what it is doing.” LLMs may simulate abductive reasoning and common sense, but have no understanding of their responses. Under the hood they are simply crunching numbers.

A great illustration of the inability of AI to understand is Searles’ Chinese Room.

Creativity: Noam Chomsky accurately calls LLMs “high-tech plagiarism”. Numerous plagiarism lawsuits against generative AI including LLMs are in progress.

LLMs are fueled by remarkable AI algorithms that work on syntax. Humans are interested in syntax but are motivated by semantics. Humans are interested in the meaning of words. LLMs mine this meaning and, without the resource of human creativity, will fail. The LLM model collapses without the input of human generated prose.

Here’s some corroborating evidence. I asked ChatGPT4 “Why has ChatGPT improved so much in the last year?” It answered with seven bullets. Without accompanying elaboration, they are

1. Advanced AI Models

2. Expanded Training Data

3. User Feedback and Interaction

4. Technological Advancements

5. Focused Research and Development

6. Ethical and Safe AI Practices

7. Global Collaboration and Input

Something interestingstands out in this list. Each entry is the result of human creativity. Nowhere is the case made that the LLMs improved because of AI. AI is not creative and did nothing to improve ChatGPT.

As outlined in Non-Computable You and elsewhere, the litmus test for AI creativity is Selmer Bringsjord’s Lovelace test which has not yet been passed by any computer software.

Sentience. AI will never be sentient. Consider human senses. Can the taste of a lemon be duplicated in a man void of the senses of taste and smell since birth? Properties like shape, color and texture can be shared, but duplication of the experience of taste cannot. This being true, how can a computer be programmed to experience the taste of lemon? Some say that as computers and software become more powerful, senses such as taste will emerge. In his book Gaming AI, George Gilder curtly refers to such faith as “rapture of the nerds.” There is no evidence of such emergence ever happening.

Through remarkable human ingenuity, artificial intelligence has made great strides in the field LLMs. LLMs still have problems with getting their facts straight but, all things considered, their performance is remarkable.

This said, there are still brick walls through which LLMs or any other computer software will never break through.

Us humans are fearfully and wonderfully made.

[i] ChatGPT 3.5 and other LLMs are free. Try it and make your own conclusions.

[ii] Here’s a mathy example for fellow nerds. I asked ChatGPT 3.5 to solve an algebraic problem. Given that

sqrt(x+a)- sqrt(x-a) = b,

solve for x in terms of b and a. Solving this problem is not straightforward. Even so, ChatGPT 3.5 got the right answer which, simplified, is

x=(a/b)^2+(b^2)/4.

Not only that, ChatGPT 3.5 went through each step of the derivation of the final solution.