Mind Matters Natural and Artificial Intelligence News and Analysis
the scales of justice
Scales of Justice in the dark Court Hall. Law concept of Judiciary, Jurisprudence and Justice. Copy space. Based on Generative AI
Image licensed via Adobe Stock

AI in the Courtroom: How to Program a Hot Mess

Could AI make competent judicial choices in the court?
arroba Email

Imagine we’re assigned to design the artificial intelligence (AI) software to carry out legal analysis of cases like a human judge. Our project is “CourtGPT,” a system that receives a factual and legal problem in a case where there are two opposing parties, analyzes how certain statutes and other legal principles apply to the facts, and delivers a decision in favor of one of the parties. CourtGPT will make “legal decisions,” not decide “jury questions of fact,” and thus will function like a judge (not juror).

To write a computer program of any complexity, we start by describing the entire program’s operations in English (my native tongue). Pro tip: If you cannot describe how your program operates in human language, then you cannot write a program in computer language that will work.

A Seven Module AI System Design

We would have to write out in human language exactly what the computer program would do.  Perhaps we would specify seven high-level modules first, such as:

(1) receive and diagram the fact pattern;

(2) identify applicable statutes;

(3) identify other relevant legal principles;

(4) locate published precedent case decisions and extract the fact-law analogies relevant to the presented case;

(5) determine how the case facts satisfy the elements of the statute(s) and/or the principles of law;

(6) evaluate whether any located precedent case analogizes closely enough to fit and influence or direct the decision;  and

(7) in a reasonable time frame, compute and deliver a decision about how the statutes and principles apply to the case, holding in favor of one or the other party.

Next, we take each top-level module separately and specify in English: (a) what the input information looks like and how to parse and store it as binary data; (b) what functions the computer must carry out upon the data; and (c) what output the module generates to pass to the next module. We thus must specify sub-routines that carry out carefully defined tasks. Within sub-routines can be lower-level sub-routines, following the same input-process-output design. 

The software engineers who developed MS Windows, Apple IOS, and all such supervisory-level operating systems had to carry out this design procedure. The compatibility of new applications (apps) depends upon the operating systems’ being designed with common definitions for inputs and outputs from other programs. Every decent chatbot and substantial computer program arose by similar design methods.

Define the Judge Module’s Functions

CourtGPT on its face presents a mammoth programming design project before a single line of code is written. Let’s fast forward to designing the “Judge Module” that actually renders the decision based upon the facts, statutes, principles, and precedents. How might we specify the Judge Module in English?

Turning to the legal literature to help explain a human judge’s mental processes, I found Prof. Dan Simon’s comprehensive 1998 article, “A Psychological Model of Judicial Decision Making.” Simon’s article looks at how judges render decisions of law, aiming to describe the processes involved. Working to sketch a high-level psychological model of the judges’ task, Simon quotes famous and influential judges and professors who described the legal decision-making process. For instance, Benjamin Cardozo, appellate court and Supreme Court Justice, declared:

Any judge, one might suppose, would find it easy to describe the process which he had followed a thousand times and more. Nothing could be farther from the truth.

Simon reviewed the statements of well-respected judicial decision-making authorities whom most other judges and lawyers recognize, including Judge Learned Hand, Justice Oliver Wendell Holmes, Judge Richard A. Posner, Justice Roger Traynor, and more. About the thought processes of judges, Simon summarized:

With no systematic account at their disposal, judges have tended to relate to their activity by means of loose, metaphorical terms. They typically portray the judicial decision as constituting a “strange compound,” an “incalculable mixture,” a “brew,” and a formula requiring “the wisest and most just mixture.” Judging is occasionally described as artistic creation, and as various forms of craftsmanship, including cooking, weaving and carpentering. Other judges summarize their account of the decision-making process by emphasizing the role of the hunch, or intuition.

Indeed, Simon’s article examines more the “role of the hunch” in judicial decisions, stating:

[The] reliance on the hunch as an aid in decision making is probably more germane than most commentators believe, however, in its unexplored form it is too nebulous to illuminate the process in a meaningful way.

The renowned experts in how judges think reveal that even they do not know how decisions are reached in real life. The experts do not confirm the seven-module method we imagined (above). The experts do not offer anything like an English-language description of the thinking process expressed in step-by-step terms. Based upon the experts’ comments, judging is an intellectual hot mess. But, of course, AI requires us to understand the step-by-step processes, the algorithms, before we write one line of computer code.

Our CourtGPT software team, even if populated by seasoned judges, could not describe the program to be written. Writing code to deliver a “hunch” is the stuff of funny satire. If the judges cannot describe the process, then the software people can’t write the program to mimic the process.

Neural Networks Cannot Overcome the Biased Data Sources

Indeed, Simon’s article also implicitly predicts why AI like ChatGPT could not validly provide legal reasoning leading to judgment by analyzing millions of published precedent decisions. Recall that ChatGPT draws its knowledge and methods ostensibly from analyzing millions of online documents. To do legal analysis, if CourtGPT were designed like ChatGPT, it would presumably draw from the millions of published legal decisions.

Here’s the fatal flaw with ChatGPT style methods. Written judicial decisions (e.g., appellate and supreme court decisions) are typically crafted to persuade the readers (lawyers, judges, legislators, the public) that the decision is a near certainty, derived from clear and applicable principles, bolstered by judicial, legislative and public policy considerations. Such decisions do not explain the reasoning process that looks at one side, then another, and still another. Indeed, judicial decisions typically focus on a selected set of principles and certain key facts that produce a result the judges can defend as logical. Simon’s article thus shares:

[Justice] Cardozo found it “comic” that while jurists fail to agree on defining the premises of their activity, they confidently manufacture decisions “out of what, they cannot tell you, and by a formula they cannot state.” In a similar vein, [Judge] Jerome Frank described adjudication as involving features that “words cannot ensnare;” a process guided by a “wordless knowledge.”  

Therefore, the ChatGPT neural network methods cannot identify and extract the understanding of how to make legal decisions by computer analyzing published decisions produced from “wordless knowledge.” That fact leaves the CourtGPT team lacking any description of the judicial analysis algorithms, and unable to trust a massive document search to discover such algorithms.

If the humans cannot describe even theoretically how to solve a problem, then the computer cannot be programmed to solve the problem. We cannot program AI to do competent judicial analysis that leads from fact pattern through principles to conclusion in the way that human judges actually do it (as suggested by Prof. Simon). Even an AI system that passes the Turing Test by seeming to speak and think like a human will not duplicate a human lawyer or judge in reasoning to result. Human judging could certainly be improved, but the solution won’t come from CourtGPT.  

Richard Stevens

Fellow, Walter Bradley Center on Natural and Artificial Intelligence
Richard W. Stevens is a lawyer, author, and a Fellow of Discovery Institute's Walter Bradley Center on Natural and Artificial Intelligence. He has written extensively on how code and software systems evidence intelligent design in biological systems. He holds a J.D. with high honors from the University of San Diego Law School and a computer science degree from UC San Diego. Richard has practiced civil and administrative law litigation in California and Washington D.C., taught legal research and writing at George Washington University and George Mason University law schools, and now specializes in writing dispositive motion and appellate briefs. He has authored or co-authored four books, and has written numerous articles and spoken on subjects including legal writing, economics, the Bill of Rights and Christian apologetics. His fifth book, Investigation Defense, is forthcoming.

AI in the Courtroom: How to Program a Hot Mess