Mind Matters Natural and Artificial Intelligence News and Analysis
Junk Science concept
Junk Science concept

The British Medical Journal’s Top Picks in Offbeat Medical Science

In its legendary Christmas edition, the Journal highlights interesting findings that are often junk science

The British Medical Journal (BMJ) is one of the world’s oldest and most prestigious medical journals. Each Christmas, they take time off from the usual dry academic papers and publish studies that are noteworthy for their originality: “We don’t want to publish anything that resembles anything we’ve published before.”

Although the papers are unusual, BMJ’s editors state that:

While the subject matter may be more light-hearted, research papers in the Christmas issue adhere to the same high standards of novelty, methodological rigour, reporting transparency, and readability as apply in the regular issue. Christmas papers are subject to the same competitive selection and peer review process as regular papers.

The articles are often goofy, and four have won the dreaded satiric Ig Nobel Prizes for research that is trivial and laughable:

Side effects of sword swallowing. There can be problems if the swallower is distracted or the swords are unusual.

● People with acute appendicitis driving over speed bumps. Speed bumps are more painful for people with acute appendicitis.

● MRI imaging of male and female genitals during sexual intercourse. Move along; nothing to see here.

● The effect of ale, garlic, and soured cream on the appetite of leeches. The experiment was abandoned for ethical reasons after two leeches died from exposure to garlic.

The BMJ Christmas issue is widely anticipated, read, and reported in the mainstream media, which leads some researchers chasing fame to torture data in ways I’ve described before and provides an interesting example of how determined researchers can slip tortured research past even the best journal editors.

Tortured research often involves provocative, semi-plausible conclusions, with the prospective publicity luring researchers to do what it takes to achieve statistical significance. What it takes often involves discarding inconvenient data that stand in the way of statistical significance. If a study reports that something unusual was found for Korean women between the ages of 64 and 72 who were living in Texas between February 1, 2002, and December 31, 2014, red flags should be waving.

Here are three examples:

Friday the 13th

One Christmas BMJ paper compared the number of hospital admissions in the South West Thames region of England on six of the nine Friday the 13ths that occurred during the years 1989–1992 with the number of admissions on the preceding Friday the 6ths. The restriction to the South West Thames region of England and the omission of three Friday the 13ths are suspect but there is more. The researchers first compared emergency room admissions for accidents and poisoning on the 6th and 13th, and did not find anything statistically persuasive. So, they looked at all hospital admissions for accidents and poisoning and again found nothing. Then they separated hospital admissions into the five sub-categories shown in the table below: accidental falls; injuries caused by animals and plants; not determined whether accidental or intentional; accidental poisoning; and transportation accidents.

Grunge calendar showing Friday the thirteenth on wood background

Overall, there were more hospital admissions on the 6th than on the 13th but there was one category, transportation, where hospital admissions were higher on the 13th. So the researchers concluded their study with this dire warning: “Friday 13th is unlucky for some. The risk of hospital admission as a result of a transport accident may be increased by as much as 52%. Staying at home is recommended.”

Numbers of Admissions for South West Thames Residents by Type of Accident

CauseFriday the 6thFriday the 13th
Falling 370343
Animals           13
Transportation 4565
Total454     440

This is a clear example of, “Seek a pattern, and you will find one.” Even though there were more hospital admissions on the 6th than on the 13th, the researchers persisted in searching for some category, any category, until they found something that could be published in the BMJ Christmas issue.

Scared to Death

In Japanese, Mandarin, and Cantonese, the pronunciation of “four” and “death” are very similar. Not surprisingly, many Japanese and Chinese consider 4 to be an unlucky number. Surprisingly, it has been argued that this aversion to 4 is so strong that Japanese and Chinese Americans are susceptible to heart attacks on the fourth day of every month. The idea is preposterous but a study making this silly claim was published in a BMJ Christmas issue with the title, “The Hound of the Baskervilles Effect,” referring to Sir Arthur Conan Doyle’s story in which Charles Baskerville is pursued by a vicious dog and dies of a heart attack:

The dog, incited by its master, sprang over the wicket-gate and pursued the unfortunate baronet, who fled screaming down the yew alley. In that gloomy tunnel it must indeed have been a dreadful sight to see that huge black creature, with its flaming jaws and blazing eyes, bounding after its victim. He fell dead at the end of the alley from heart disease and terror.

Are Asian Americans really so superstitious and fearful that the fourth day of the month—which, after all, happens every month—is as terrifying as being chased down a dark alley by a ferocious dog?

The Baskervilles study (isn’t the BS acronym tempting?) examined data for Japanese and Chinese Americans who died of coronary disease. A natural test would be a comparison of the number of coronary deaths on the third, fourth, and fifth days of the month. For the time period they studied, 33.9 percent of the coronary deaths on these three days occurred on the fourth day of the month, which does not differ substantially or statistically from the expected 33.3 percent. If days 3, 4, and 5 are equally likely days for coronary deaths, we can expect a difference this large more often than not.

So, how did the Baskervilles study come to the opposite conclusion? The authors didn’t report the overall 33.9 percent figure. Instead, they reported deaths from some kinds of heart disease, but not others. In the International Classification of Diseases, coronary deaths are divided into several categories. In some categories, more than one-third of the deaths occurred on day 4. In other categories, fewer deaths occurred. The Baskervilles study reported results only for the former. They discarded data that did not support their theory.

The lead author of the Baskervilles study coauthored two different studies that used all of the heart disease categories and a third study that used completely different categories. The only reason for using different categories in different studies is to manufacture support for otherwise unsupported theories.

When we suspect that a researcher made choices after looking at the data, this suspicion can be tested by trying to replicate the results with fresh data. The Baskervilles study used data for the years 1989–1998. When 1969–1988 and 1999–2001 data were used to retest the heart disease categories reported in the Baskervilles study, the results were neither substantial nor statistically significant. In the 1969–1988 data, there were more deaths on day 5 than on day 4; in the 1999-2001 data, there were more deaths on day 3. It is also revealing that the authors could have used the 1969-1988 data (and did so in other studies) but chose not to do so in the Baskervilles study. We can guess why.

Unhappy Birthdays

The most recent BMJ Christmas issue includes a study reporting that surgeries are more likely to be fatal if they are done on the surgeon’s birthday. It is a damning indictment if patients are indeed dying because surgeons are distracted by birthday plans and good wishes from their assistants. However, several red flags are waving.

The study involved Medicare beneficiaries aged 65 to 99 years who underwent one of 17 common surgeries between 2011 and 2014: four common cardiovascular surgeries and the 13 most common non-cardiovascular surgeries in the Medicare population.

The authors justify their heart surgery selections by references to four similar studies that investigated the relationship between surgical mortality and surgeon age, surgeon experience, hospital volume, and surgeon age and sex. Two of the co-authors of the fourth study were also co-authors of the birthday study.

A comparison of the birthday study with these four papers is helpful for identifying choices that may have been made to bolster the case the authors wanted to make. For example, one paper considers 6 cardiovascular and 8 cancer operations; two papers examined 4 cardiovascular and 4 cancer operations; and the fourth paper considered 4 cardiovascular surgeries and the 16 most common non-cardiovascular surgeries in the Medicare population. The birthday paper’s choice of 17 surgeries is suspiciously peculiar.

None of the other four studies exclude patients with cancer but the birthday study does, with this unconvincing explanation: “To avoid patients’ care preferences (including end-of-life care) affecting postoperative mortality.”

The birthday study defines operative mortality as death within 30 days after surgery. All four of the other papers (including the paper with overlapping co-authors) define operative mortality as death before hospital discharge or within 30 days after the operation. One paper explains that, “Because, for some procedures, a large proportion of operative deaths before discharge occurred more than 30 days after surgery, 30-day mortality alone would not adequately reflect the true operative mortality.”


Our lives have been enriched immensely by the scientific method’s insistence that beliefs and theories should not be accepted uncritically, but tested empirically. Unfortunately, clever researchers have found ways to game the system; for example, by torturing data in order to prove whatever they want to prove.

Real science still moves forward with fair tests of plausible theories that can be replicated with additional tests. Our challenge is to distinguish real science from junk science.

You may also enjoy these articles by Gary Smith:

Torturing data can destroy a career: The case of Brian Wansink
Wansink wasn’t alone. A surprising number of studies published in highly respected peer-reviewed journals are complete nonsense and could not be replicated with fresh data. When we hear a provocative claim, we should flip on our BS detector and consider the possibility that data have been bullied, tormented, tortured to say that.


Cancer maps: An expensive source of phantom patterns? Is the money the U.S. government spends on tracking cancer patterns a good investment? There’s a way we can tell.

Gary N. Smith

Senior Fellow, Walter Bradley Center for Natural and Artificial Intelligence
Gary N. Smith is the Fletcher Jones Professor of Economics at Pomona College. His research on financial markets statistical reasoning, and artificial intelligence, often involves stock market anomalies, statistical fallacies, and the misuse of data have been widely cited. He is the author of The AI Delusion (Oxford, 2018) and co-author (with Jay Cordes) of The Phantom Pattern (Oxford, 2020) and The 9 Pitfalls of Data Science (Oxford 2019). Pitfalls won the Association of American Publishers 2020 Prose Award for “Popular Science & Popular Mathematics”.

The British Medical Journal’s Top Picks in Offbeat Medical Science