Fake It ’til You Make It – The Power Pose Parable Part II
Where does p-hacking and the replication crisis leave the state of scientific studies?Last time, we explored the findings of a 2010 psychological study, which concluded that assuming a “power pose” for two minutes increases testosterone (confidence) and decreases cortisol (stress). But it turned out that p-hacking affected the results of the initial study, and that subsequent studies debunked the “power pose” findings. Dana Carney, the lead author of the original paper, acknowledged the faults of the original study and updated her views “to reflect the evidence.”
Today, we explore the implications:
Carney’s willingness to acknowledge the p-hacks and to support efforts to redo the power-pose tests is convincing evidence that the p-hacks were well-intentioned. This was how a lot of research was done at the time. Joseph Simmons, Leif Nelson, and Uri Simonsohn, who have persuasively documented the dangers of p-hacking, have written that,
We knew many researchers—including ourselves—who readily admitted to dropping dependent variables, conditions, or participants so as to achieve significance. Everyone knew it was wrong, but they thought it was wrong the way it’s wrong to jaywalk. We [now know that it] was wrong the way it’s wrong to rob a bank.
A 2011 survey of 2,000 research psychologists found that 72 percent admitted to having used statistical significance to decide either to collect additional data or to stop collecting data; 38 percent to “deciding whether to exclude data after looking at the impact of doing so on the results;” and 46 percent to selectively reporting studies that “worked.”
Brian Nosek’s Reproducibility Project attempted to replicate 100 studies that had been published in what are arguably the top three psychology journals. Only 36 continued to have p-values below 0.05 and to have effects in the same direction as in the original studies.
An interesting side study was done while Nosek’s Reproducibility Project was under way. Approximately two months before 44 of the replication studies were scheduled to be completed, researchers set up 44 auction markets for researchers in the field of psychology to bet on whether the replication would be successful. The markets were open for two weeks and the people doing the studies were not allowed to participate.
The most important takeaway is how skeptical psychology researchers are of research in their field. The final market prices indicated that people believed that there was, on average, only a 55 percent chance of a successful replication. Of the 41 studies that were completed on time, 19 (46 percent) were given less than a 50 percent chance of replicating. Even that dismal expectation turned out to be too optimistic as 25 (61 percent) did not replicate.
I happened to attend a conference at Google’s corporate headquarters in 2015 where the replication crisis in all fields was a hot topic. A prominent social psychologist told the audience that, “My default assumption is that anything published in my field is wrong.”
Criticism of Cuddy is unfair to the extent that she was doing what others did at the time — indeed what her professors taught her to do. On the other hand, she was quick to exaggerate fragile results based on a tiny sample and slow to acknowledge that larger studies did not support her conclusions. Audiences want simple, powerful messages, and that is what she gave them. She seemed to be following her own advice when she confidently promoted her power-pose story: fake-it-til-you-make-it, fake-it-til-you-become-it.
Some experienced researchers are understandably defensive and have reacted to failed replications by criticizing the replicators. Susan Fiske, past president of the Association for Psychological Science (APS) and Amy Cuddy’s mentor and co-author, has lamented what she calls “methodological terrorism.”
Others recognize the consequences of p-hacking and are working hard to restore the credibility of scientific research. Andrew Gelman has described the old way as the “find-statistical-significance-any-way-you-can-and-declare-victory paradigm,” and written that, “I can see that to people such as Fiske who’d adapted to the earlier lay of the land, these changes can feel catastrophic.” Michael Inzlicht, professor of psychology at the University of Toronto, spoke for many when he wrote that, “I want a better tomorrow, I want social psychology to change. But, the only way we can really change is if we reckon with our past, coming clean that we erred; and erred badly…. Our problems are not small and they will not be remedied by small fixes. Our problems are systemic and they are at the core of how we conduct our science.”
What’s left of power posing? Artificial poses don’t seem to affect people’s hormones or their behavior. Even the possibility of a modest effect on feelings is suspect because the subjects may realize what researchers expect to find and change their behavior to meet those expectations. Here, people who are instructed to assume very unusual and possibly awkward poses and then asked how powerful they feel may well know the desired answer. This explanation is supported by one study that found that people who had viewed Cuddy’s TED talk were more likely to report feeling powerful after assuming a high-power pose.
The real value of the power-posing parable is that it is a compelling example of how p-hacking has fueled the replication crisis that has shaken science. The original study was unreliable because the goal was to deliver a simple media-friendly message and the consequences of p-hacking were not yet widely appreciated. An enormous amount of valuable time and resources was then spent showing that the study was unreliable. What we have learned from this parable is that it is far better to do studies correctly the first time, and that is what serious researchers are striving to do now.
In case you missed it:
Fake It ‘til You Make It — The Power Pose Parable. Why a study “proving” a unique way to boost confidence and reduce stress turned out to be wrong. A popular study on power poses in 2010 has since been debunked. But why did the study go wrong in the first place?