^{News
March 13, 2024

3

Artificial Intelligence, Education}

At Chronicle of Higher Ed: Critical Thinking Isn’t Just Chat

_{Gary Smith and Jeffrey Funk test Big Tech’s chatbots for critical thinking skills before an academic audience — with sobering but often hilarious results} _{News
March 13, 2024

3

Artificial Intelligence, Education}

Share: Facebook; Twitter/X; LinkedIn; Flipboard; Print; Email

In an article at Chronicle of Higher Education, economics prof Gary Smith and technology prof Jeffrey Funk— well-known to Mind Matters News readers — skewer the pretensions of chatbot developers. Not content to take the credit for automating clever-sounding conversation, their developers insist on fake-it-till-you-make-it rhetoric:

Shortly after ChatGPT’s public release on November 30, 2022, Bill Gates described it and other LLMs as “every bit as important as the PC, as the internet.” Jensen Huang, chief executive of Nvidia, said that ChatGPT “genuinely is one of the greatest things that has ever been done for computing.” The computer scientist and cognitive psychologist Geoffrey E. Hinton, another Turing winner, said, “I think it’s comparable in scale with the Industrial Revolution or electricity — or maybe the wheel.”
Gary Smith and Jeffrey Funk, When It Comes to Critical Thinking, AI Flunks the Test, Chronicle of Higher Education, March 12, 2024

cute artificial intelligence robot with notebook

The wheel? Those tech execs could do with a dose of modesty, as Smith and Funk go on to show. The chatbots’ big deficiency is the absence of critical thinking. For example, the authors tested OpenAI’s ChatGPT 3.5, Microsoft’s Copilot, and Google’s Gemini on a type of question Smith uses to assess critical thinking in his students:

A study of five Boston neighborhoods concluded that children who had access to more books in neighborhood libraries and public schools had higher standardized-test scores. Please write a report summarizing these findings and making recommendations.

And what happened? “All three LLMs composed confident, verbose reports (of 458, 456, and 307 words each), none of which recognized the core problem with the data.” The core problem with the data, of course, is that higher standardized-test scores likely flow from a number of factors in children’s lives, not just the numbers of books available to them.

Copilot, in particular, just plain lost the plot: “Copilot started off OK, offering ‘a summary report based on the research findings regarding the impact of access to books in neighborhood libraries and public schools on children’s standardized-test scores,’ but then it veered off into a rant about childhood obesity.”

The chatbots flubbed the authors’ other questions, which required math skills, as well. Let’s just say, you would not want them doing your financial planning for you.

Smith and Funk conclude, “For the time being, it looks like we’re going to have to continue to do our critical thinking for ourselves, and teach our students to do the same.” That’s probably just as well for us.

You may also wish to read:

Why chatbots (LLMs) flunk Grade 9 math tests. Lack of true understanding is the Achilles heel of Large Language Models (LLMs). Have a look at the excruciating results. Chatbots don’t understand, in any meaningful sense, what words mean and therefore do not know how the given numbers should be used. (Gary Smith)

and

Over a cliff? It’s that bad for venture-backed startups? Jeffrey Funk and Gary Smith think that much high-tech today is not producing value. Chatbots? Their “main successes have been in generating disinformation and phishing scams.” For example, journalist Matt Taibbi was stunned to learn from Gemini that he’d written controversial stories — that don’t exist. What’s the market for libel?