^{News
May 27, 2024

5

Peer Review, Psychology}

Is Psychology Heading for Another Big Replication Crisis?

_{The use of Amazon’s MTurk in survey research risks a second scandal in which findings are low quality and can’t be replicated, critics warn} _{News
May 27, 2024

5

Peer Review, Psychology}

Share: Facebook; Twitter; LinkedIn; Flipboard; Print; Email

filling form online, questionnaire survey

At Britain’s Daily Sceptic, we learn about a curious change in the way psychology studies are done. The older practice of roping in undergrad university students for psych testing has largely been replaced by the use of Amazon’s MTurk.

Briefly, MTurk recruits study participants who answer surveys, as students do, for pay. The system caught on, probably because one advantage is that MTurk participants come from a broader range of the population than students. Over 40% of some journals’ articles stem now from MTurk data, according to Noah Carl at Sceptic. But it hasn’t gone well:

A growing body of evidence indicates that MTurk data is of very low quality, due to the high percentage of MTurk workers who are careless responders. By ‘high percentage’ I mean upwards of 75%. Careless responders may click options randomly, or they may engage in what’s called ‘straightlining’ where they click the first option that appears for each successive question. Both types of responding yield data that is worthless.
Noah Carl, “Are a Large Percentage of Recent Psychology Studies Flawed?,” Daily Sceptic, May 21, 2024

Carl points to a preprint of a study by Union College psychologist Cameron Kay. From the Abstract:

Do items that assess clearly contradictory content show positive correlations on the platform? We administered 27 semantic antonyms—pairs of items that assess incompatible beliefs or behaviours (e.g., “I am an extrovert” and “I am an introvert”)—to a sample of MTurk participants (N = 400). Over 96% of the semantic antonyms were positively correlated in the sample. For example, “I talk a lot” was positively correlated with “I rarely talk”; “I am narcissistic” was positively correlated with “I am a selfless person”; and “I like order” was positively correlated with “I crave chaos.” Moreover, 67% of the correlations remained positive even after we excluded nearly half of the sample for failing common attention check measures. These findings provide clear evidence that data collected on MTurk cannot be trusted, at least without a considerable amount of screening.”
Cameron Kay, “Extraverted introverts, cautious risk-takers, and selfless narcissists: A demonstration of why you can’t trust data collected on MTurk,” PsyArchiv Preprints, April 28, 2024

The undergrads recruited for studies in the past may not have been representative of the general public but at least they were paying attention… A different online survey platform, CloudResearch Connect, gave more reliable results, Kay reports, so the problem may be specific to MTurk. But that hardly explains why an unreliable platform is so widely used. Carl comments,

There has been much talk of a ‘replication crisis’ in psychology and other disciplines, which has been attributed to a combination of publication bias and questionable research practices. We may soon hear about a ‘second replication crisis’, stemming from the overreliance on MTurk data.
Carl, “Recent Psychology Studies Flawed?”

If this might be the “second replication crisis,” what was the first one?

The great replication crisis of the 2010s

According to Psychology Today,

Some scientists have warned for years that certain ways of collecting, analyzing, and reporting data, often referred to as questionable research practices, make it more likely that results will appear to be statistically meaningful even though they are not. Flawed study designs and a “publication bias” that favors confirmatory results are other longtime sources of concern.
“Replication Crisis,” Psychology Today

But it took some serious scandals to get people’s attention and that didn’t happen until 2011, courtesy former Tilburg University social psychologist Diederik Stapel and Cornell University psychologist Daryl Bem:

According to the Association for Psychological Science, Diederik Stapel “fabricated data for over 50 peer-reviewed articles, many of which were published in leading journals, including Science… Given that Stapel’s deception went undetected for many years, one may expect a cunning scheme of data-fabrication. However, the book reveals that Stapel’s trickery was remarkably unsophisticated, even clumsy.”

That certainly raises some questions, doesn’t it? To its credit, the Association admits, “Stapel insists on the almost complete absence of scientific control structures. This made it just too hard for him to resist temptation… In his descriptions of methodological practice in psychology, Stapel appears to underscore the conclusions from the Levelt committee that investigated the fraud case: It was not just Stapel who failed, but the scientific community as a whole.”

Now about Daryl Bem: He published a paper in the prestigious Journal of Personality and Social Psychology in 2011, claiming that the ability to see the future (precognition) is real. Many academic psychologists were outraged. But that’s not the story. This is the story: In an effort to debunk Bem’s findings,

… a group called the Open Science Collaboration organized a massive replication study. And 270 scientists from 17 countries signed up. They picked 100 studies published in the year 2008 as their test sample — all from reputable, peer-reviewed psychology journals.

The plan was to repeat all 100 experiments exactly as described, and then see what happens. The findings came out in 2015. The results were stunning: only 36 percent of replications were successful.
Alexander B. Kim, “Psychologists confront impossible finding, triggering a revolution in the field,” CBC, November 1, 2019.

In other words, a single paper validating ESP — sound or unsound — was the least of the psychologists’ problems! Huge tracts of establishment-approved findings were now suspect.

We are told that the wakeup call has resulted in journals offering more scrutiny before results are published now. Let’s hope so. We need science. “Trust the science” shouldn’t be allowed to degenerate into a bad joke.

You may also wish to read: Science writer: Maybe we need fewer scientists, science journals. Cameron English sees a rise in partisan advocacy as part of the problem of increasing retractions in science journals. English argues that science research, which is already mostly paid for by taxpayers, should be open access so more of us can see what’s happening.