UC Davis researchers, faculty speak to the reasons behind, the impact of the replication crisis
A 2015 exposé revealed that political scientist Michael LaCour had fabricated results in his ground-breaking paper about the effects a conversation with gay canvassers can have on voters’ preferences. Having recently read the article, my interest in the validity of research piqued. After just a few clicks, I discovered that within the scientific community — beyond just fabricating results — lies a more nuanced issue: a replication crisis.
The replication crisis refers to the inability of researchers to recreate studies from the past to yield similarly robust results. Repeated replication is a vital element in upholding the integrity of the scientific process, and assessing the validity and applicability of experiments. Some major, well-known studies that haven’t been able to be replicated in recent years are the Stanford Marshmallow test and power poses. In 2016, the weekly science journal Nature conducted a survey of 1,500 scientists asking their thoughts on replication — 90% said the issue was a crisis (52% said it is a significant crisis and 38% said it is a slight crisis).
Noah Yardeny, a third-year chemistry major, discussed his own experience with irreplicability while working in David Olson’s lab where he helps conduct research on psychoactive drugs, exploring their potential to increase neuroplasticity to treat anxiety and depression.
When attempting to yield a specific chemical by recreating previous experiments, Yardeny said he finds that, oftentimes, his results don’t align with the research. He described one instance where the research from an older, peer-reviewed journal didn’t even come close to producing the stated results.
“I followed it to a tee,” Yardeny said. “Did what they did. I had the re-agents, my timing, my solvent, but it went to absolute sh-t. And I was like, I wonder what this is? I went to one of my graduate mentors, and they’re like, ‘Where did you find this?’ I said the source and they said there’s no way that’s reliable.”
It’s natural that some studies and their results are irreplicable, especially ones from older, less reputable journals like the one Yardeny referenced. The inability to replicate on larger scales, however, is a problem that even the most revered journals, like Nature and Science, have.
In a study published by Nature, researchers attempted to replicate 21 social science studies from Nature and Science. Of the 21 experiments, only 13 were replicable and many of the 13 that could be replicated yielded results that were significantly less robust than the original papers denoted. The failure to replicate by no means is a cut-and-dry condemnation of a study, as irreplicability can arise for a variety of reasons: For example, the replicating team may have botched their own attempt or obtained their own false positive.
The impact of nonreplication is not evenly distributed across all scientific disciplines. Experiments that deal with human subjects — especially social psychology — have a particularly hard time holding up to scrutiny. That said, replication issues are not isolated solely to psychology — they have reared their head in other areas, such as economics and pharmaceutical studies.
Kate Laskowski, an assistant professor in the Department of Evolution and Ecology, remarked that a couple of breakthrough studies on the effect of water acidification on fish behavior within her own field drew attention for replication issues.
“A handful of papers came out a few years ago that ocean acidification is having really horrible effects on fish, changing their behavior in really detrimental ways, and it got a lot of attention,” Laskowski said. “Recently, a group of researchers tried to replicate these results with an actual, true replication. They redid the experiments, they reran analyses and the results didn’t replicate. They found no effects of ocean acidification on fish behavior. So the moral of the story was that one or two fancy results do not necessarily mean that this is the be-all to end-all truth.”
Within the academic community, there is a formidable bias toward groundbreaking work, skewing the rate of positive results in publications — the more groundbreaking research, the more funding and prestige. This leads to the publication of a higher number of false positives. Emily Merchant, an assistant professor and historian of science and technology at UC Davis, discussed her understanding of the problem.
“I think it is more of a structural issue in science than an issue of people doing sketchy things with their data,” Merchant said. “There are instances of outright fraud, but I don’t think that’s really what’s behind this replication crisis. I think it’s been that there hasn’t been any incentive to try to replicate. And there has been a bias toward positive results without any attempt to figure out whether it’s a true positive.”
Merchant explained how nonreplication can be a byproduct of the current structure of academic publishing.
“In the social sciences, in particular psychology, it’s really hard to publish negative results,” Merchant said. “You can only publish a new finding where you’re statistically able to reject the null hypothesis with an alpha value of .5. That’s a confidence of 95%. With that standard, you get false positives 5% of the time, but if you can only publish the positives, then you have a higher concentration of false positives in what gets published.”
For people without a background in statistics, Merchant rephrased the issue in layman’s terms: “A really useful thing that I heard someone say is that if all you find is groundbreaking research, you just end up with a bunch of holes in the ground.”
This effect is further augmented by the “publish or perish” environment where researchers fight to establish credibility and advance their careers by breaking into major publications.
“Within academic research, we just don’t have enough jobs for all the people that are trained,” Laskowski said. “It’s highly, highly competitive.”
Laskowski stressed that the arrangement with researchers fluctuates between universities — some being more high-pressured and others more accommodating.
“I think it’s important to note that academia is extremely heterogeneous,” Laskowski said. “There are differences from university to university and how strong the level of competition is and certainly, there are many universities where it’s cutthroat and you are expected to have really high power publications coming out every year or every few years. Other universities are, I think, a bit more humane. They value quality science and people who are making good, solid progress, even if maybe that doesn’t result in a Nature paper every other year.”
The pressure to publish positive results can leak into experiment methodology when researchers subconsciously p-hack — the practice of changing their original hypothesis during the experiment and retesting the data to render a desired, but often flawed, positive.
“If you run the data in 20 slightly different ways, you’re going to get a positive result at least one time, even if it’s not a true positive,” Merchant said.
Merchant takes issue with the phrasing “crisis.” She noted that the issues of replication are not occurring due to an increase in faulty methodological practices, but because only recently has there been a significant effort to replicate results.
“The way that some people see this is kind of the self-correcting nature of science,” Merchant said. “The issue is that we don’t know if these findings are robust until we try to replicate them.”
So how does the scientific community start-up the charge for tackling replication? Although some solutions require long-term institutional transformations, there are a couple helpful steps to be taken in the meantime. Merchant suggests establishing procedures for hypothesis registration to curtail the effects of p-hacking.
“The idea of registering a hypothesis is that you can only register one hypothesis and then you can only test that hypothesis,” Merchant said. “So you’re not doing 20 different experiments, you’re just doing one.”
Laskowski mentioned the need for more rigorous training in statistics for researchers.
“If you have a firm understanding of statistics, you’re at least more aware of these problems so you can’t make them ignorantly,” Laskowski said. “I think maybe a more firm grounding in statistical training certainly helps. It lets people understand the power of what they can do with the data that they collect. It also helps them collect better data and actually test the hypotheses they want.”
Laskowski also advocated for more transparency within the publication process by increasing accessibility to the procedures and data used to carry out studies.
“The other thing I think is really important is moving toward more open and transparent data practices, in the sense that any data that you collect and [publish] should be uploaded with that paper,” Laskowski said. “My data and the code that I use to analyze my data should be available for anyone to look at to look at assuming there’s no problems with personal privacy.”
Replication is not a new issue in science; however, the widespread discovery of nonreplicating experiments is. The magnitude of this issue cannot be understated, especially when politicians and special interests use irreplicability as a crux to undermine science that they don’t agree with — with most setting their sights on fields such as climate science.
Addressing the issue demands careful reflection about the pressure academic institutions and academic publications place on positive results.
“You’re much less likely to get published if you show, ‘I did this, this and this and this and it doesn’t work,’” Yardeny said. “You did the science, you had a hypothesis, you ran a good experiment, you just didn’t get the result. There is still this power in saying something didn’t work.”
Written by: Andrew Williams — email@example.com