Smith and Cordes’ Phantom Pattern Problem A Top 2020 BookPublished by Oxford in 2020, it deals with the “patterns” Big Data throws up that aren’t really there
David Auerbach has picked The Phantom Pattern Problem (2020) by Gary Smith and Jay Cordes as one of the top books of 2020 in the science and tech category.
Auerbach, who describes himself as “a writer and software engineer, trying to bridge the two realms,” is the author of BITWISE: A Life in Code (2018). He has an interesting way of choosing books to recommend: Those that resist the “increasingly desperate and defensive oversimplification” of popular culture:
I hesitate to mention too many other books for fear of neglecting the others, but I will say that of the science and technology books, several deal with subjects that are currently inundated with popularizations. In my eye, those below are notably superior to the rest of their crowd, though the marketplace of ideas has apparently and frustratingly failed to raise these books above their brethren. To a lesser extent, the same applies to history and politics.
Jacob Burckhardt said that the 20th century would be the age of oversimplification. The 21st has so far been the age of increasingly desperate and defensive oversimplification, across all domains of knowledge. Here’s to the fight against it.David Auerbach, “David Auerbach’s Books of the Year 2020” at Waggish
Big Data, crunched by powerful computers, has created the “phantom pattern” —a comparatively new problem in false and sometimes nonsensical research findings: In a big enough data set (think very powerful computers here), we are sure to find many apparent patterns that are actually random noise. That’s because randomness is not even; it’s bumpy.
So in a vast survey of health care data, we may find that more people die on the 15th of the month than on the 16th. As is normal and reasonable, we look for a cause. But there may be no actual “cause.” Then, with masses more data, the pattern will tend to disappear.
But researchers are tempted to research and justify these phantom patterns, sometimes publishing papers in respected journals in which they fudge and cherrypick the data to get a genuine result.
Just yesterday, Gary Smith noted here, at Mind Matters News, a paper purporting to prove that in England, people were more likely to get admitted to hospital on unlucky Friday the 13th than on average Friday the 6th. And another paper claiming to show that Americans of East Asian descent were more likely to die on the reputedly unlucky 4th day of the month. He shows why it’s all nonsense—yet both papers were published in a venerable and respected journal, the British Medical Journal.
Some consequences of this search for patterns could also be needless suffering. “Cancer maps,” for example, can be, as Smith notes, “an expensive source of phantom patterns”:
The National Institutes of Health web site has cancer rates for twenty-two different types of cancer, two sexes, four age groups, six races and ethnicities, and more than three thousand counties. With millions of possible cancer clusters, some are bound to appear, just by chance alone. Some places will have above-average cancer rates and other places will have below-average rates, just as in 100,000 coin flips there are bound to be places where there happen to be 9 or 10 heads in a row and other places where there happen to be 9 or 10 tails in a row.Gary Smith, “Cancer Maps—An Expensive Source of Phantom Patterns” at Mind Matters News (November 9, 2020)
People who follow cancer maps may experience needless anxiety if they discover that they live in a “cancer cluster” town—based on the equivalent of coin flips. Perhaps the public money would be better spent on cancer research and treatment than on raising fears based on statistical blips.
In an interview here at MMN, Smith noted, our tendency to seek for patterns must be balanced by other human qualities like wisdom:
Mind Matters News: So you are saying that a human tendency to seek patterns coincides with an AI tendency to produce a variety of meaningless patterns like “An unborn baby’s sex can be predicted by the amount of breakfast cereal the mother eats.” And the combination results in bad data finding its way into science journals. Is that a fair characterization?
Gary N. Smith: Gary Smith: I would put it this way. AI is currently based on finding patterns in numbers, pixels, sound waves, and other kinds of data. We humans are hard-wired to presume that the patterns we observe are meaningful—so, we are overly impressed by AI’s pattern-discovery prowess.
We do not fully appreciate the fact that even random data contain patterns. Thus the patterns that AI algorithms discover may well be meaningless. Our seduction by patterns underlies the publication of nonsense in good peer-reviewed journals.
The study you mentioned about moms and breakfast cereal was published in Proceedings of the Royal Society.News, “Interview: New book outlines the perils of Big (Meaningless) data” at Mind Matters News (September 30, 2020)
The book by Smith and Cordes is an argument for the enduring value of “human wisdom and experience” and you can read a brief summary here.
You may also enjoy this recent piece by Gary Smith: Torturing data can destroy a career: The case of Brian Wansink: Wansink wasn’t alone. A surprising number of studies published in highly respected peer-reviewed journals are complete nonsense and could not be replicated with fresh data. When we hear a provocative claim, we should flip on our BS detector and consider the possibility that data have been bullied, tormented, tortured to say that.