Mind Matters Natural and Artificial Intelligence News and Analysis
Close-up Of A Robot's Hand Holding Stethoscope
Close-up Of A Robot's Hand Holding Stethoscope On Colorful Background`

Doctors Won’t Be Obsolete Anytime Soon

Despite fanfare and positive portrayals in pop culture, artificial intelligence “doctors” are failing to live up to the hype.

A careful analysis of British hospital records found that an annual average of 1,600 adults over the age of 30 had used outpatient child and adolescent psychiatry services and that a comparable number of youths aged 0-19 years old had used outpatient geriatric services. Tongue-firmly-in-cheek, the authors speculated that, “We are not clear why so many adults seem to be availing themselves of pediatric services, but it might be part of an innovative exchange program with pediatric patients attending geriatric services.”

They also found that thousands of men used outpatient obstetrics, gynecology, and midwifery services each year, though there were fewer women availing themselves of vasectomies.

These were clearly clerical errors made by fallible humans recording patient data. Could computers do better?

In the popular media, robots are generally smarter — much smarter — than humans, including doctors and nurses. In the movie Big Hero 6, for example, Baymax is a lovable super hero and all-knowing health care provider that has been programmed to have the feelings and emotions of its creator’s father. In real life, instead of Baymax, we have IBM’s Watson, which has no feelings or emotions and is little more than a sometimes helpful doctor’s assistant.

After Watson defeated the best human Jeopardy players, IBM boasted that Watson would soon revolutionize health care. After all, health care is all about gathering data and analyzing it with cutting-edge medical knowledge. Watson can surely store more data and medical knowledge than any doctor, retrieve the relevant information faster, and apply that knowledge to specific patients more reliably. Computer algorithms might even use data mining techniques to discover diagnoses and treatments that medical researchers are not aware of. And, unlike doctors, computer algorithms never get tired or make mistakes.

Alas, as the very old proverb goes, “There’s many a slip twixt the cup and the lip.” The possible spills begin with patient data, which doctors and nurses know are notoriously unreliable. Electronic Medical Records (EMRs) have imposed some standardization on patient data collection, though the well-meaning effort to record for all relevant information has created what has been called iPatient medicine, with voluminous data logged into computers for every patient visit and procedure.

A 2019 report from the National Academy of Sciences estimated that, on average, doctors and nurses spend 50 percent of their workday interacting with their computer screen instead of their patients. Doctors complain of burnout from EMR overload, but it is now what is expected, indeed required, of them. A survey of emergency room doctors found that it typically took 6 clicks to order an aspirin, 8 clicks for a chest X-ray, and 15 clicks for a prescription. On average, more than 4 hours of a 10-hour emergency room shift was spent entering data in computers, even though the label emergency suggests there is some urgency to treating patients.

Even worse, EMRs are intended more for improving the billing process than for improving patient care. Check boxes often cannot adequately convey the important details of a doctor’s observations. One doctor complained that, “There might be 8, 9 pages or more on something you could categorize in 2 or 3 pages. It’s numbing, mind numbing. It’s ream after ream of popup screens and running data.” Another said that the written patient notes that preceded EMRs could document a patient’s condition and treatment and the doctor’s observations in ways that the doctor and other doctors could easily understand: “But now I get a 14-page note, and if you don’t know what I did, you’d have to really dive through 14 pages to find what should be an easy evaluation. We’re creating mountains of data that make more noise than pertinent fact. I think the computer is a great tool, but I went into medicine to work with people and not to be a data entry clerk.”

EMRs may be easily processed by computer software, but that doesn’t make them more useful. When entering patient information on a clipboard or in a computer, an age of 80 will occasionally be recorded as “8” and vice versa; and the male box will be checked when female was intended. It doesn’t happen often, but it is important that a diagnosis and treatment not be led astray when it does happen. Human doctors will recognize such mistakes and correct them. Computer algorithms won’t because they have no idea what age and sex mean, which may lead to an incorrect and possibly dangerous diagnosis and treatment.

EMRs may also mislead computer algorithms because doctors would rather use work-arounds for software errors than report software bugs. In addition, doctors may recognize, while computers won’t, that some EMR entries were intended to ensure reimbursement for insurance claims.

Using data mining to discover novel diagnoses and treatments is dodgy enough without adding data EMR errors to the mix. An expert told me that,

Physicians are well aware of the limitations of EMR and routinely work around them. Dealing with uncertainty and understanding context for treating the individual patient is part of their job. I am not saying that EMRs are unsafe. However, the problems in EMR data are major challenges when using these data for AI algorithms. 

There are similar problems with a computer’s library of healthcare research. In 2011 an IBM Senior Vice President for Cognitive Solutions and Research boasted that, “Watson can read all of the healthcare texts in the world in seconds, and that’s our first priority, creating a ‘Dr. Watson,’ if you will.” Sounds impressive, but inputting is not understanding. Computer algorithms do not understand words and have no effective means of separating the best research papers from thousands of so-so and garbage papers. Nor do they have any reliable way of recognizing when previously reported results have been reversed by subsequent studies.

Computer algorithms also struggle with complex medical conditions involving multiple health problems. Multiple diseases are the norm in the real world, particularly with elderly patients, and people who are ill often take multiple medications. This reality is a problem both for medical studies of patients who have one disease and take one medication and for data-mining algorithms that make the same assumption.

So, how has it worked out? Have computer algorithms replaced doctors? Houston’s MD Anderson Cancer Center began employing Watson in 2013, accompanied by great hope and fanfare. A story headlined, “IBM supercomputer goes to work at MD Anderson,” began,

First he won on Jeopardy!, now he’s going to try to beat leukemia. The University of Texas MD Anderson Cancer Center announced Friday that it will deploy Watson, IBM’s famed cognitive computing system, to help eradicate cancer.

The idea was that Watson would analyze enormous amounts of patient data, looking for clues to help in diagnosis and recommend treatments for cancer patients based on the research papers in its database and the patterns it discovered through data mining.

Five years and $60 million later, MD Anderson fired Watson after “multiple examples of unsafe and incorrect treatment recommendations.” Internal IBM documents recounted the blunt comments of a doctor at Jupiter Hospital in Florida: “This product is a piece of s—.… We can’t use it for most cases.”

IBM spent more than $15 billion on Dr. Watson with no peer-reviewed evidence that it improved patient health outcomes. Watson Health has disappointed so soundly that IBM is now looking for someone to take it off their hands.

Five years and $60 million later, MD Anderson fired Watson after “multiple examples of unsafe and incorrect treatment recommendations.”

Dr. Watson was the most hyped computerized health care system, but it is hardly the only disappointing one. Most recently, a 2021 study looked at 2,212 research papers published during the period from January 1, 2020, to October 3, 2020, that described new machine learning models for diagnosing or prognosing COVID-19 from chest radiographs and chest computed tomography images. Their conclusion: “None of the models identified are of potential clinical use.”

In his book The Digital Doctor, Robert Wachter wrote that,

One of the great challenges in healthcare technology is that medicine is at once an enormous business and an exquisitely human endeavor; it requires the ruthless efficiency of the modern manufacturing plant and the gentle hand-holding of the parish priest; it is about science, but also about art; it is eminently quantifiable and yet stubbornly not.

Dr. Watson is no Baymax.


Gary N. Smith

Senior Fellow, Walter Bradley Center for Natural and Artificial Intelligence
Gary N. Smith is the Fletcher Jones Professor of Economics at Pomona College. His research on financial markets statistical reasoning, and artificial intelligence, often involves stock market anomalies, statistical fallacies, and the misuse of data have been widely cited. He is the author of The AI Delusion (Oxford, 2018) and co-author (with Jay Cordes) of The Phantom Pattern (Oxford, 2020) and The 9 Pitfalls of Data Science (Oxford 2019). Pitfalls won the Association of American Publishers 2020 Prose Award for “Popular Science & Popular Mathematics”.

Doctors Won’t Be Obsolete Anytime Soon