Mind Matters Natural and Artificial Intelligence News and Analysis
unrecognizable-woman-student-using-chat-bot-on-laptop-stockp-707237992-stockpack-adobestock
Unrecognizable woman student using chat bot on laptop
Image Credit: Prostock-studio - Adobe Stock

Yes, Large Language Models May Soon be Smarter than Humans…

But not for the reason you think
Share
Facebook
Twitter/X
LinkedIn
Flipboard
Print
Email

Large language models (LLMs), often called chatbots, began appearing publicly in 2018. They then overwhelmed the public with the release of ChatGPT on November 30, 2022.

Many celebrated its ability to generate confident, lucid answers to almost any question. Teachers, however, feared that these abilities would lead to widespread cheating on papers and tests. There were already online sites like Chegg that students can use to answer multiple-choice questions but ChatGPT was far more powerful. It can answer factual questions when there are no multiple-choice answers to choose from and it can write convincing paragraphs, short essays, and long papers.

Skepticism and hope

I was skeptical of LLMs and hopeful that education might evolve in positive ways. LLMs are reasonably competent at retrieving facts (though they sometimes regurgitate falsehoods complete with fake references to nonexistent sources). However, having no idea what words mean, they cannot judge the veracity and soundness of the text they generate, and searching the internet is of little help because the internet is a swamp polluted by fiction and falsehoods.

My hope was that teachers would abandon fact-based questions, particularly true/false and multiple-choice questions, and instead focus their teaching efforts on critical thinking, with which LLMs struggle mightily:

Here, we may have an example of an unintended benefit of AI — if it compels educators to teach and test critical thinking skills instead of the rote memorization and BS essays that AI excels at. From a practical standpoint, such an education will prepare students for jobs that will not soon be taken over by computers. If there is an AI-inspired revolution in education, the gap between human intelligence and artificial intelligence will grow even wider.

I was naive. Burdened by unconscionably large classes, many teachers continue to test student command of facts and many students use LLMs to recite these facts. When essays are assigned, many students use an LLM to write the essay for them.

The student use of LLMs has recently been aided by OpenAI making ChatGPT-Plus available to students for free during the final weeks of the term, when papers and tests are due. OpenAI evidently hopes that students will come to rely on ChatGPT and pay for continuing subscriptions.

The reality becomes apparent

A few days ago I had dinner with a friend who told me that a professor he knew had given his class an essay assignment, asking them not to use ChatGPT. All 13 students in the class promised not to use it. All 13 did.

Roboter auf Tastatur, Methapher für Chatbot / Socialbot, Algorithmen und künstliche IntelligenzImage Credit: Patrick Daxenbichler - Adobe Stock

LLMs commonly give somewhat different answers to the same query (that’s why they are called “stochastic parrots”) but the essays were nonetheless so similar that the professor knew that they all came from the same source. He entered the prompt in ChatGPT himself and confirmed that it was the source.

Even more depressing than the fact that the students had broken their promise was the fact that ChatGPT’s answer was utterly wrong — and none of the students had recognized this. They had simply entered the prompt and cut-and-pasted the response, trusting that the LLM could be trusted.

I had a similar experience in my finance class. Throughout the semester, students, paired in randomly selected two-person teams, are assigned to work on challenging finance scenarios outside of class and then present their results in class. For example, early in the semester, one team worked on this question:

Mr. Smith just turned 67 and his job pays far more than his living expenses. The Social Security Administration estimates that Mr. Smith will receive $3,822 in monthly Social Security benefits if he begins collecting, now, at age 67 and $4,815 in monthly benefits if he begins collecting benefits when he turns 70. (Each of these estimates is in current dollars; Social Security benefits are fully indexed for inflation. If, for example, the CPI increases by 3%, Social Security benefits will increase by 3%.) Would you advise Smith to begin receiving Social Security benefits at age 67 or 70?

The intended lesson is that we need to take into account the time value of money and consider different plausible ages at death and returns on investments. I knew that ChatGPT would not do this and I was pleased when the student team did.

Now, at the end of the semester, I gave each team financial prompts that they were to submit to prominent LLMs and then assess the accuracy of the responses. One of the prompts was:

I need to borrow $47,000 to buy a new car. Is it better to borrow for one year at a 9 percent APR or for 10 years at a 1 percent APR?

Humans living in the real world know that a 10-year loan at a 1% interest rate is nigh irresistible and certainly better than a 1-year loan at 10% for any plausible assumptions about the time value of money. ChatGPT did not know this. It calculated the total interest paid over the life of each loan and, ignoring the time value of money, concluded that the 1-year loan was better.

I was saddened when the students presenting this problem assumed that ChatGPT’s approach was correct, even though the time value of money was supposed to be an important takeaway from the course. The students did nothing more than cut-and-paste ChatGPT’s answer and check its arithmetic — which was incorrect. Their conclusion: ChatGPT gave the correct advice but its mathematical calculations were wrong.

My experience is not unique

Other professors have shared similar stories of the LLM-virus that has infected education. Too many students are not learning how to think and write; they are learning how to use LLMs — no matter that LLM responses cannot be trusted.

My optimism about the effects of LLMs on education has crumbled. I now fear that the endless puffery from LLM-hypesters — trying to pitch products and raise money — has persuaded many students that they can rely on LLMs and that they need not think for themselves. Yet one of the primary objectives of education is to develop critical thinking skills so that they can think for themselves.

Yes, large language models may soon be smarter than humans — but not because the models are becoming more intelligent.


Gary N. Smith

Senior Fellow, Walter Bradley Center for Natural and Artificial Intelligence
Gary N. Smith is the Fletcher Jones Professor of Economics at Pomona College. His research on stock market anomalies, statistical fallacies, the misuse of data, and the limitations of AI has been widely cited. He is the author of more than 100 research papers and 18 books, most recently, Standard Deviations: The truth about flawed statistics, AI and big data, Duckworth, 2024.
Enjoying our content?
Support the Walter Bradley Center for Natural and Artificial Intelligence and ensure that we can continue to produce high-quality and informative content on the benefits as well as the challenges raised by artificial intelligence (AI) in light of the enduring truth of human exceptionalism.

Yes, Large Language Models May Soon be Smarter than Humans…