What Does AI in Education Mean for Critical Thinking Skills?
Students, as reported at Motherboard, are increasingly using GPT-3 and other text-generator programs to write essays for themThe COVID pandemic pushed a lot of school coursework to the internet, with an increased reliance on true/false and multiple-choice tests that can be taken online and graded quickly and conveniently. Not surprisingly, once questions went online, so did answers, with several companies posting (for a fee) solutions for students who would rather Google answers than watch Zoomed lectures.
To fit into a true/false or multiple-choice format, the questions are generally little more than a recitation of definitions, facts, and calculations. Here, for example, are three statistics questions I found at a question/answer site:

Question: True or false: A group of subjects selected from the group of all subjects under study is called a sample.
Answer: True
Question: You are interested in how stress affects heart rate in humans. Your dependent variable would be the _______. A. Interest; B. Heart rate; C. Humans; D. Stress
Answer: B. Heart rate
Question: What are the mode and the mean for the following set of numbers? {4, 9, 8, 2, 16, 4, 4, 8 ,9, 6 A. Mean = 7, mode =8; B. Mean = 7, mode =4; C. Mean = 6, mode =8; D. Mean = 8, mode =9
Answer: B. Mean = 7, mode =4.
One might think that essay assignments would get around the online-answer problem. But one would be wrong. A recent article at Motherboard reported that students are increasingly using GPT-3 and other text-generator programs to write essays for them.
These programs use vast databases of written text to identify “tokens,” common sequences of characters (typically words but also parts of words) and to predict which token will come next in a sequence of tokens. The result is often well-written sentences, paragraphs, and essays. Since the programs generate text one token at a time, they are not flagged by plagiarism-detection programs. In addition to generating text, GPT-3 flawlessly answered the three statistics questions given above.
Motherboard quoted one student who rationalized the reliance on GPT-3 as a shrewd work-around for “busywork”:
I like to learn a lot [and] sometimes schoolwork that I have done before makes me procrastinate and not turn in the assignment… Being able to do it faster and more efficient seems like a skill to me.
Learning how to cheat is certainly not a traditional educational goal! In addition, a reliance on GPT-3 and the like sidesteps creativity, hard work, and resilience which are, in and of themselves, worthwhile educational objectives.
Jeremy Roschelle of Digital Science has written that
One of the hardest things about being an educator in the age of #AI is going to be the need to re-think what we are really trying to develop in our students and how the assignments we make align to those elements of knowledge, skills and abilities. (translation)
Some fear that the uncanny ability of computer algorithms to answer test questions and write human-like essays foreshadows the end of educational testing. It isn’t fair to reward humans with A grades if algorithms are doing the assignments.
On the other hand, one of the most important — perhaps the most important — educational goal is to help students develop critical thinking abilities. Robert Ennis has identified several hallmarks of critical thinking, including
● Judges well the credibility of sources
● Identifies conclusions, reasons, and assumptions
● Asks appropriate clarifying questions
● Judges well the quality of an argument, including the acceptability of its reasons, assumptions, evidence
● Can well develop and defend a reasonable position
● Formulates plausible hypotheses, plans experiments well
● Draws conclusions when warranted, but with caution
Such skills are exactly what AI lacks. AI algorithms will fail class assignments that assess critical-thinking abilities.
I tested GPT-3 with two questions from the midterm examination that I recently gave to my introductory statistics class. Both GPT-3 answers were wrong. GPT-3’s answer to the second question is a nice example of what Ernie Davis and Gary Marcus mean when they call a program like GPT-3 “a fluent spouter of bullshit.”
Human: A Vanguard study of all mutual funds operating in 2013 found that 62% had beaten the overall stock market during the previous five years. What statistical problem do you see here?
AI: I don’t see any statistical problems here.
Human: In 1999 Sally Clark, an English solicitor, was accused of murdering her two infant sons; she said that both children had died of sudden infant death syndrome (SIDS). A pediatric professor testified for the prosecution that the probability that a child in an affluent family would suffer SIDS is 1/8500 and, therefore, the probability that two children in the family would suffer SIDS is (1/8500)(1/8500) = 1/72,250,000, or about 1 in 72 million. As an expert witness for the defense, how might you challenge the 1-in-72 million calculation?
AI: There are several ways to challenge the 1 in 72 million calculation. For example, the defense could argue that the statistic is based on an outdated understanding of SIDS, that the statistic does not take into account all of the relevant factors, or that the statistic is overly simplistic and does not reflect the complexities of the issue.
AI has overpromised and underdelivered in many ways. Here, we may have an example of an unintended benefit of AI — if it compels educators to teach and test critical thinking skills instead of the rote memorization and BS essays that AI excels at. From a practical standpoint, such an education will prepare students for jobs that will not soon be taken over by computers. If there is an AI-inspired revolution in education, the gap between human intelligence and artificial intelligence will grow even wider.