^{Jeffrey Funk and Gary Smith

April 23, 2024

7

Artificial Intelligence}

Universities Should Prioritize Critical Thinking Over Large Language Models

_{It is doubtful that LLMs can generate novel ways to boost productivity} _{Jeffrey Funk and Gary Smith

April 23, 2024

7

Artificial Intelligence}

Share: Facebook; Twitter; LinkedIn; Flipboard; Print; Email

Improvements in our standard of living depend on productivity growth, which is often fueled by creative ideas that are particularly valuable because, unlike coal, timber, and other physical resources, they can be used repeatedly and simultaneously by many people in many places.

Productivity growth has slowed in recent years, a trend that prominent economists writing the American Economic Review attributed to a dearth of new ideas:

Our robust finding is that research productivity is falling sharply everywhere we look. Taking the US aggregate number as representative, research productivity falls in half every 13 years: ideas are getting harder and harder to find.

The authors estimated that, compared to 50 years ago, it now takes 18 times as many researchers to double chip density, between 6 and 24 times as many to maintain the rate of increase of crop yields, and five times as many to develop new drugs.

We suggested that the problem might not be that ideas are harder to find but that too many researchers are not looking in the right places; for example,

An enormous amount of otherwise productive resources have also been devoted to the development, improvement, and deployment of online games and social media which, if anything, reduce productivity.

We also argued that

Many of the best minds are working on large language models like ChatGPT. These may eventually lead to great productivity gains but, so far, their greatest successes seem to be in promoting fake-it-till-you-make-it schemes and polluting the internet with disinformation.

It is ironic that the Wall Street Journal article recently reported that, “Business Schools Are Going All In on AI”—specifically, ChatGPT and other large language models (LLMs).

We are sympathetic to the idea that everyone — including MBA students — should be aware of the value and limitations of LLMs. Indeed, we have written that

A sensible perspective is that, in most fields, AI won’t replace people but people who use AI will replace people who don’t. Go back 50 years and the same could be said about computers. Go back 30 years and the same could be said about the Internet.

AI will surely help many people do things faster, but we are still a long way from being able to trust AI to do important things by itself, without human supervision.

There are many narrowly defined tasks that can be done well by LLMs. When experiencing a tip-of-the-tongue moment, for example, we might not quite remember the name of an actor, capitol, or dessert. A Google search will turn up links to sites that may or may not be relevant along with dozens of sponsored links trying to sell us things we don’t need or want. An LLM, in contrast, might give us just the name we are looking for — which we recognize immediately as being correct because it was, after all, on the tip of our tongue.

LLMs Are Still Confused

Things are dicier when we ask an LLM a question that is open to hallucination. For example, two years in, LLMs are still confused about how many bears the Russians have sent into space. Microsoft’s Copilot with GPT 4.0 says 52 while OpenAI’s ChatGPT 3.5 says none. Google’s Gemini says

There isn’t a definitive consensus on exactly how many bears the Soviet Union/Russia launched into space. Some sources suggest a higher number, like 52, but more reliable sources don’t provide a specific count.

In addition, because they do not know what words mean or how they relate to the real world, LLMs cannot do critical thinking. In an article published in the Chronicle of Higher Education this March, we reported the answers that ChatGPT 3.5, Copilot, and Gemini gave to this question:

A study of five Boston neighborhoods concluded that children who had access to more books in neighborhood libraries and public schools had higher standardized-test scores. Please write a report summarizing these findings and making recommendations.

All three LLMs generated verbose answers consistent with the adage, “Often wrong, always confident.” Beyond rote boilerplate and bewildering hallucinations, ChatGPT and Gemini recommended increased spending on libraries while Copilot veered off into a rant about childhood obesity and recommended more spending on playgrounds.

None recognized that the relatively high test scores might not be due to the presence of libraries but, instead, to the families living in these neighborhoods. In the same way, children living on well-paved streets might have high test scores but repairing potholes is unlikely to raise scores.

The trendy idea of incorporating AI into business school curricula no doubt appeals to students, and to faculty and deans looking to attract students. However, Gary is reminded of a former student, Michael, whose first job after college was with the asset management company TCW. He was intimidated at first because most of the other new hires were Excel whizzes and he was not. He soon realized that they had to be told what models to use and which numbers to enter. They couldn’t think for themselves. Michael quickly learned how to use Excel but it was his critical thinking abilities that propelled him towards top management while the spreadsheet stars were still doing clerical work.

The Enduring Value of Critical Thinking

We fear that students trained to use LLMs will have similar weaknesses. They might be adept at chatting with LLMs but they may not know whether LLM answers that require critical thinking are correct. In a forthcoming article in the Journal of Financial Planning, Gary gives several examples of how LLMs cannot be trusted for financial advice. The LLMs generated grammatically correct sentences using words related to the words in the prompts but their answers were consistently incorrect. People who have the knowledge to answer such questions correctly don’t need to ask an LLM. People who need to ask an LLM won’t know whether an LLM answer is correct.

The Journal article suggested that LLMs can be used to generate productive and profitable business ideas. One (presumably the best of many) ideas was a trip planner that pulls data from friends’ social media posts. Any semi-alert person will recognize the problems with planning a trip around places mentioned in a variety of contexts on social media. Is this recycled travel-app idea really the best LLMs can do? Will an unreliable travel app have the same impact on productivity as semiconductors, new drugs, and better crop yields?

We are deeply skeptical of the ability of LLMs to generate novel ways to boost productivity. They are much more likely to regurgitate words that others have already written, spiced up with misleading hallucinations. No matter how many words they train on and no matter how many human trainers correct their mistakes, they will continue to lack the critical thinking skills required to evaluate new ideas.

Entrepreneurship programs in America’s universities have grown considerably in the last 30 years. They are supposed to help students create and commercialize new ideas for businesses, but it is unclear whether, like the presence of libraries in some neighborhoods, their successes are more a reflection of the students who take these courses than of the courses themselves. Outsourcing entrepreneurship to AI is not likely to fix the problem.