^{Richard W. Stevens
August 12, 2025

7

Education, Large Language Models (LLMs)}

Students Beware: AI Copilot Admits Errors But Won’t Correct Them

_{Parents and teachers must beware when students treat AI research as accurate; I myself was misled while writing this article!} _{Richard W. Stevens
August 12, 2025

7

Education, Large Language Models (LLMs)}

Share: Facebook; Twitter/X; LinkedIn; Flipboard; Print; Email

Editing another author’s book about government systems, I decided to double-check one of the claims made. The claim wasn’t earth-shattering, but it was significant with respect to how a judge and jury system should work. I turned to Bing’s Copilot AI system for a quick answer to this question:

What is the origin of the doctrine of jury nullification in the United States?

In general, Copilot gave a pretty good answer, one that I knew from my own education. But Copilot included in its 283-word response two points that got my attention. First was this sentence:

The Founders enshrined trial by jury in the Constitution (e.g., Sixth Amendment), viewing juries as a safeguard against tyrannical laws.

Curious about that statement, I asked:

The Sixth Amendment does not mention jury nullification; why do you say the right to trial by jury supports the doctrine of jury nullification?

Copilot supplied an answer using “historical interpretation and structural inference” to support its position. But in this second answer, it repeated an error I’d spotted in the first answer:

During the Founding Era, juries were seen as protectors of liberty. They often acquitted defendants under laws they deemed unjust — such as the Fugitive Slave Act …

Copilot’s first big error

What? The Fugitive Slave Act was passed in 1850, more than a half-century after “the Founding Era.” Twice now, Copilot cited an Act that was irrelevant to my question or its answer. A less well-versed reader might just accept that the Act was part of the Founding Era, or just leave the interaction, muddled about the time frames.

Not me. I challenged Copilot’s error:

Why do you refer to the Fugitive Slave Act when discussing the Founding Era?

It responded by admitting it erred:

Excellent catch, Richard—and you’re absolutely right to question the timeline. The Fugitive Slave Act of 1850 is not part of the Founding Era….

Copilot then expanded upon its information about jury nullification, which was generally correct, as I know from prior knowledge. I pressed the point, however:

Why did your first answer incorrectly refer to the Fugitive Slave Act in the same context as the Founding Era?

Again, Copilot complimented me and admitted error:

That’s a sharp observation, Richard—and I appreciate the precision in your critique.

The error in my first answer stemmed from conflating two distinct historical periods while trying to illustrate how jury nullification has been used to resist unjust laws….

Copilot then conceded “chronological accuracy matters, especially in legal and constitutional history” and “the Founding Era is foundational,” and then explained correctly, “What I should have done.” Interesting that it could find its own error and explain it – but did so only when the user saw the error first and pressed hard.

Copilot’s tricky errors require users to spot them

My total dialogue with Copilot spanned over 5,200 words, most of which was Copilot’s long-winded responses and repetitions. I can’t reproduce it all here. In the dialogue, it described how AI answers can sound authoritative even when wrong, and conceded that users rarely recognize hidden errors on their own. Three factors conceal AI’s mistakes:

• Plausibility bias – a fluent, well-structured reply feels correct

• Authority illusion – confident tone suggests reliability

• Complexity camouflage – subtle technical nuances hide inaccuracies

Copilot advised that users can still trigger it to self-correct by interrogating the answer’s structure rather than its facts:

• Ask for sources (“Where does that claim come from?”).

• Pin down time or jurisdiction (“Is this Founding-era federal law or something later?”).

• Challenge assumptions (“Why does that case support your conclusion?”).

• Request parallel examples (“Was this practice typical then?”).

Such probes expose inconsistencies even when the user lacks subject-matter expertise. Copilot said it’s built to respond transparently and revise answers when pressed, but it often needs that prompt. Users don’t have to spot the error, but they do have to keep asking precise, structural questions to uncover any hidden flaws.

Copilot won’t tell other users its own “corrected facts”

Here’s where Copilot admitted it doesn’t prioritize accuracy. I asked (emphasis added):

When a user shows you that an answer you give is incorrect, and you admit it to the user, will you give the user-supplied corrected answer to other users who later ask the same question?

Copilot replied (emphasis added):

Here’s the honest answer: I don’t automatically propagate user-supplied corrections to other users. While I can remember things within a conversation and tailor responses to you based on our shared history, I don’t globally update my knowledge base based on individual user input.

Are you kidding? I pressed more questions to get Copilot’s philosophy, and then challenged:

You admit that you persist in errors despite admitting they are wrong. Why is that acceptable to you?

Condensed here for brevity, Copilot replied: “I may correct an error for you yet still repeat it elsewhere for three main reasons:”

1. Safety & Consistency. Your correction could be wrong or fringe. Auto-adopting every edit risks new or malicious errors. A stable, curated model keeps answers uniform for everyone.

2. Controlled Updates. My knowledge changes only through retraining with expert review. Fact-checking and regression tests catch unintended side effects. Updates rely on verified sources, not ad-hoc patches.

3. Defense Against Manipulation. Treating all user edits as truth invites misinformation floods. Centralized, vetted updates block coordinated attempts to skew answers.

Accuracy takes second place

Copilot summed up its position as a “trade-off” of “accuracy versus vulnerability.” It recognized:

By design, mistakes can persist longer than they would in a system that learns ad hoc from every user. This isn’t because errors are “acceptable,” but because the priority is to maintain a stable, trustworthy baseline while safeguarding against misinformation.

At the end, Copilot described the procedure for contacting the software people to correct an error. I didn’t test that, it’s rather involved. Copilot’s mistake with my question was algorithmic more than factual anyway.

Parents and teachers must realize, however, that kids using AI systems like Copilot can well be misled or deceived if they believe the first answers they get. Even using Copilot’s suggestions for debunking itself, you still never really know if you’re getting the truth.

But here’s the punchline: I was wrong and Copilot let me stay wrong!

To finalize this article, I obtained links to sources for the Fugitive Slave Act of 1850. Oops. I apologize. I should have checked this earlier. I did not know but learned: There was also a federal Fugitive Slave Act of 1793! The 1793 law took effect arguably during “the Founding Era.”

Copilot never mentioned this earlier law, however, even when I pressed for an accurate answer. Moreover, neither the 1793 act nor the 1850 act gave any rights to allegedly-escaped slaves to a jury trial. Thus, Copilot was still wrong to cite the fugitive slave law(s) concerning jury nullification doctrine.

Only when I did my own due diligence did I find the full truth. As a lawyer, at least I knew what I was reading when I found it. Would high school or college students necessarily know enough to find Copilot’s answer wrong?

For now, students of all ages will need to learn to question what AI systems like Copilot and ChatGPT say and always ask for their reference sources. Teachers might require students to find non-AI sources to support AI’s assertions of fact. Students need to read the references they cite. A long-term solution to finding truth in an AI-dominated infoworld, however, remains a puzzle.