How Data Can Appear in Science Papers — Out of Thin Air!

At Retraction Watch, Gary Smith explains how one author team apparently copy pasted missing data about green innovation in various countries

Recently, Retraction Watch, a site that helps keeps science honest, noted some statistical peculiarities about a paper last September in the Journal of Clean Energy, “Green innovations and patents in OECD countries.” The site was tipped off by a PhD student in economics that “For several countries, observations for some of the variables the study tracked were completely absent.”

But that wasn’t the big surprise. The big surprise was when the student wrote to one of the authors:

In email correspondence seen by Retraction Watch and a follow-up Zoom call, [Almas] Heshmati told the student he had used Excel’s autofill function to mend the data. He had marked anywhere from two to four observations before or after the missing values and dragged the selected cells down or up, depending on the case. The program then filled in the blanks. If the new numbers turned negative, Heshmati replaced them with the last positive value Excel had spit out. “No data? No problem! …

But it got worse. Heshmati’s data, which the student convinced him to share, showed that in several instances where there were no observations to use for the autofill operation, the professor had taken the values from an adjacent country in the spreadsheet. New Zealand’s data had been copied from the Netherlands, for example, and the United States’ data from the United Kingdom.

Undisclosed tinkering in Excel behind economics paper,” Retraction Watch, February 5, 2024
While many researchers decried the results, University of Copenhagen econometrician Søren Johansen said something worth pondering: “The reason it’s cheating isn’t that he’s done it, but that he hasn’t written it down,” adding, “It’s pretty egregious.” (RetractionWatch).

Pomona College business prof Gary Smith, who often writes here at Mind Matters News, weighed in at Retraction Watch yesterday, explaining how blanks can come to seem like information in statistical papers.

Imputation (the technique the authors were using), he says, is not always unfair: “If we are measuring the population of an area and are missing data for 2011, it is reasonable to fit a trend line and, unless there has been substantial immigration or emigration, use the predicted value for 2011. Using stock returns for 2010 and 2012 to impute a stock return for 2011 is not reasonable.” In other words, whether imputation is unfair depends on whether anything was likely to have happened in the period for which data is missing that would change the results.

But, he says, the way the authors of the controversial paper were using the technique was another story:

The most extreme cases are where a country has no data for a given variable. The authors’ solution was to copy and paste data for another country. Iceland has no MKTcap data, so all 29 years of data for Japan were pasted into the Iceland cells. Similarly, the ENVpol (environmental policy stringency) data for Greece (with six years imputed) were pasted into Iceland’s cells and the ENVpol data for Netherlands (with 2013-2018 imputed) were pasted into New Zealand’s cells. The WASTE (municipal waste per capita) data for Belgium (with 1991-1994 and 2018 imputed) were pasted into Canada. The United Kingdom’s R&Dpers (R&D personnel) data were pasted into the United States (though the 10.417 entry for the United Kingdom in 1990 was inexplicably changed to 9.900 for the United States).

The copy-and-pasted countries were usually adjacent in the alphabetical list (Belgium and Canada, Greece and Iceland, Netherlands and New Zealand, United Kingdom and United States), but there is no reason an alphabetical sorting gives the most reasonable candidates for copying and pasting. Even more troubling is the pasting of Japan’s MKTcap data into Iceland and the simultaneous pasting of Greece’s ENVpol data into Iceland. Iceland and Japan are not adjacent alphabetically, suggesting this match was chosen to bolster the desired results.

Gary Smith, “How (not) to deal with missing data: An economist’s take on a controversial study, Retraction Watch, February 21, 2024

He concludes, “There is no justification for a paper not stating that some data were imputed and describing how the imputation was done.”

Perhaps Elsevier, the journal publishers, agree with his view. Retraction Watch announced today that Elsevier, the journal’s publisher, would retract the paper:

As we reported earlier this month, Almas Heshmati of Jönköping University mended a dataset full of gaps by liberally applying Excel’s autofill function and copying data between countries – operations other experts described as “horrendous” and “beyond concern.” …

Elsevier, in whose Journal of Cleaner Production the study appeared, moved quickly on the new information. A spokesperson for the publisher told us yesterday: “We have investigated the paper and can confirm that it will be retracted.”

“Exclusive: Elsevier to retract paper by economist who failed to disclose data tinkering,” Retraction Watch, February 22, 2024

If Elsevier doesn’t end up retracting the paper, that will certainly say something about what counts as science today.

Note: As noted above, the first author of the paper, Almas Heshmati, was the one originally interviewed by the student. The second author, Mike Tsionas, died recently.

