- Distrust: Big Data, Data-Torturing, and the Assault on Science by Gary Smith, Oxford University Press, 2023.
The pandemic proved a lot of things, one of them being that science is under assault. In this enlightening and entertaining new book, Professor Gary Smith shows us how much of the assault has its roots in what scientists do.
The easiest impact to understand is the Internet, which was originally created by scientists in the 1970s to exchange scientific information. Now it has become a great way to spread disinformation on almost every subject. A former chief historian of NASA noted that: “The reality is, the internet has made it possible for people to say whatever the hell they like to a broader number of people than ever before.”
Smith recounts endless examples of this disinformation, much of which we already knew, but taken together, it is mind-blowing, particularly when people act on this false information. For instance, a private citizen raided a pizzeria that was purportedly being used by Hillary Clinton to manage a pedophile ring. Pizzagate soon evolved into QAnon, which extended the conspiracy far beyond the Clintons. A 2021 survey found that 16% agree much of America is controlled by a group of Satan-worshipping pedophiles who run a global child sex-trafficking operation.
While raids by private citizens are somewhat rare, other surveys have also found that many people have strange beliefs. Recent polls have found that 20% to 30% of British, American, and Russian adults believe America’s moon landing was staged, partly because such beliefs are easily shared on the Internet.
Growing Distrust in the Media
Sometimes this misinformation begins on the internet and then appears in the mainstream news. For instance, Fox News ran a three-part series on “the puppet master, George Soros,” in which he was portrayed as the archetypical Jewish financier who controls politicians around the globe. More recently, there was the recently settled defamation suit against Fox News for its claims that Dominion voting systems switched votes from Trump to Biden.
The result is growing distrust in the mainstream media. Surveys of Americans have found that 30% of respondents now claim “none at all” when asked how much they trust mass media, up from 5% in 1975. Facebook data shows that users engage more with fake news than with mainstream news, a very disturbing trend.
Defenders of the status quo may say this has little to do with science, even if scientists created the Internet. After all, there is a long history of tabloid newspapers and fake stories, even going back to Ben Franklin, also recounted in this book. Smith argues, however, that many governments have used science to justify rules and regulations, including high taxes, onerous building codes, and mandatory vaccines. The result is that people end up disliking both science and government. As Smith says: science created tools that businesses and government use to spy on us.
P-Hacking and “Torturing” Data
Part 2 of the book, entitled Data Torture, provides us with a different set of tools that scientists both developed and use, and that sometimes result in faulty information. One of those tools is called p-hacking. The term p-hacking comes from researchers trying to find results that have a sufficiently low p-value, usually 0.05, to get published in scientific journals. A p-value measures the probability of obtaining the observed results even if the hypothesis is wrong.
Scientists are supposed to develop hypotheses through examination of previous research papers, and then test them using random trials and statistics. But many scientists work backward; they look for patterns with a p-value smaller than 0.05, then concoct a theory to explain the results, and in the process reach worthless conclusions.
Smith describes how a p-hacker might begin by considering the data as a whole. But then he or she would quickly begin comparing males vs. females, adults vs. children, and then different age cutoffs, all in an attempt to find something publishable. They may also continue gathering data, at least until they get a p-value smaller than 0.05. As Ronald Coase says, if you torture data long enough, they will confess.”
Another problem with the obsession by journals and researchers on p-values is that the impact of the proposed actions is lost. The economist Deidre McCloskey calls this “oomph”, or what is the impact in real terms. Smith describes papers that only used statistical significance to test hypotheses in which the results had less than a 1% impact, not enough to warrant publication in Smith’s mind.
Unfortunately, most social science papers only report statistical significance, and we never know whether there was an impact. For instance, one study of empirical articles published in the 1980s and 1990s in the American Economic Review, considered to be one of the most prestigious economics journals, reported that 70% of the published in the 1980s didn’t distinguish between statistical significance and impact and that 82% of the 1990s papers made the same error. Should the paper be published even if the results won’t have a big impact?
Data Mining, the title of Part 3, takes the data torture to a further extreme, looking for patterns without considering explanations, perhaps because many researchers don’t think explanations are important. As one Harvard Business Review article says: “Traditional database inquiry requires some level of hypothesis, but mining big data reveals relationships and patterns that we didn’t know what to look for.” Smith says that while data mining was once considered a sin, it is seemingly everywhere now, “cropping up in medicine, economics, and management, and even history.” He describes many papers, too many to be summarized here, and they are increasing quickly due to artificial intelligence hype.
Part 4 deals with AI and Part 5 deals with “The Crisis.” I will bypass AI because Gary and I have written many papers together on AI, some of them summarized in his book. In Part 5, he distinguishes between reproducibility (whether others using the original data obtain the reported results) and replication (whether others using fresh data obtain results consistent with those originally reported).
Fabricated Data, More Retractions
By now most of us have heard of these crises, probably hearing of the reproducibility crisis more than the replication crisis. Both are important.
Smith gives many examples, but I will only mention a few from the pandemic. With more than 100,000 papers published on Covid, there is a lot to analyze, and Smith recounts stories of Covid-19 cures that initially sounded good, but upon further analysis, it became clear that the analysis was shoddy. For instance, hard work by scientific “sleuths” found that the research studies claiming hydroxychloroquine and ivermectin cured patients used data that was fabricated. Some of the papers were retracted, an outcome that is becoming more common.
As a percentage of published papers, retractions have grown from almost zero to 0.07% of papers. While the percentage is small, and thus might be the result of more low-quality journals, the retraction rate is high even for top-ranked journals. One analysis found that the percentage retracted was higher for widely cited than not so widely cited journals.
What to do about these problems? Clearly there is a lot to think about, but one solution is to require published papers to post their data. However, few do. Analyses of Science and other journals find that most authors still don’t post their data, and little can be done after a paper is accepted.
Smith believes that the obsession with publishing papers is behind the rising incidence of retractions. He says there were three million papers published in 42,000 journals in 2018. Some of the journals are free and some charge for publication. Some of those that charge authors use names very similar to famous journals that do not charge authors.
Science Needs a Better Success Metric
There are no good solutions to these problems, although Smith presents some ideas. “A direct way to fight p-hacking is to eliminate the incentive by removing statistical significance as a hurdle for publication. P-values can help us assess the extent to which chance might explain empirical results, but they should not be the primary measure of a model’s success.” More journal articles should recognize this unfortunate incentive and publish papers that are well-done with important results that do not happen to be statistically significant.” Furthermore, “artificial thresholds like p<0.05 encourage unsound practices.”
Solving the publish or perish problem is tougher, and Smith does not address it. I personally believe there are too many papers published and too many statistics in the published papers, including too many p-values. The number of papers published is not a good measure of quality even if only papers in top-ranked journals are counted, nor is the number of citations per paper.
Science needs to find a better measure of research success. While counting papers or using h-indexes might tell us what is popular with other scientists, it does not tell us what is worthwhile. Finding what is worthwhile takes many years, and it requires evaluators to consider the links between papers and new products, processes, work methods, policies, and teaching methods, linkages that scientists often ignore because they are too busy writing papers. I believe that even if a person only writes one great paper that actually leads to some important new products, processes, work methods, policies, or teaching methods, then that is a great researcher. Few people write those kinds of papers anymore.