The Reproducibility Crisis: Back to Basics

February 2, 2020

The ability to reproduce experimental results reported in academic and scientific publications sits at the heart of the scientific method. Put simply, if experimental results cannot be reproduced by third-parties, for any reason, then the scientific merit of these results is fundamentally questionable. In the past two decades, there has been a growing awareness of the fact that many reported academic and scientific studies are difficult or even impossible to reproduce. This puts into question the veracity and validity of these studies and inevitably leads to an erosion of public trust in academia and science. The coining of the phrase "Reproducibility Crisis" or "Replicability Crisis" in the past decade was meant to raise awareness of the gravity of the situation so that proper community-wide interventions can take shape to counter this crisis.

As an educator and researcher in computer science and electronics engineering, I have seen first-hand how the reproducibility crisis has grown over the past two decades. In this blog, I will draw upon my own experiences and observations to attempt and explain the reproducibility crisis phenomenon and its causes. I will then present examples of this crisis in different disciplines and its consequences. After that, I will make suggestions to mitigate the impact of this crisis before drawing some conclusions.

The Reproducibility Crisis: What? and Why?

In a revealing survey of more than 1,500 scientists conducted by Nature in 2016 [1], 70% of researchers surveyed said they have tried and failed to reproduce other groups' experimental results. Asked about reproducing their own experimental results, more than half said they could not! It is this failure to reproduce so many research results published in the literature that has given rise to what is commonly known these days as the "reproducibility crisis". This is not just an academic concern however, for much of our system of inventions and enterprise relies fundamentally on reproducible research results. Failure to address this problem would question whole systems of academic and scientific research, innovation and enterprise, and ultimately the wider economy.

The reasons behind the failure to reproduce so much of publication results are diverse, and include:

Failure to disclose the full details of experimental design in publications. This is often because experimental design was not fully and formally captured beforehand. The lack of a universal requirement to report on experimental design in publication submissions is a compounding factor
Lack of proper training on research methods which accompanied a boom in the number of trainee researchers in the last few decades. In particular, a lack of proper training on statistics leads to basic mistakes in hypothesis testing and statistical inference
Selective reporting of data and results that suit a particular hypothesis while ignoring other data that do not support the hypothesis
Hyper-competition in research [2], driven in part by the obsession with metrics e.g. Journal Impact Factor, has led to a "Publish or Perish" culture in academia, with perverse incentives to see data and experimental design as a competitive advantage not to be shared with other researchers

The Reproducibility Crisis Across Disciplines

The reproducibility crisis touches a wide range of disciplines in engineering & physical sciences, biological & medical sciences, and social sciences.

Biological & medical sciences are perhaps the academic and research discipline whereby the impact of the reproducibility crisis is mostly felt as it can have a direct incidence on the health of ordinary citizens. For instance, take up of the MMR vaccine in the UK was severely affected after the publication of a controversial study in 1998, which suggested a link between the MMR vaccine and autism, even though the paper reported a small case series with no controls [3]. Subsequent studies refuted this finding, but the public scare the original publication created led to noticeable Measles outbreaks.

In social sciences, Economics is often cited as the prime example. In it, the debate continues to rage around the vast range of economic analyses and forecasts despite an often-common pool of data. For instance, a 2016 study in the journal Science found that one third of 18 experimental studies from two top-tier economics journals failed to be reproduced [4]. When we factor in the importance of economic forecasts in political life and decision making, one could easily see the damage that the reproducibility crisis is doing.

Closer to my own area of research interest, engineering & physical sciences are not immune to the reproducibility crisis. Indeed, selective reporting is still commonplace especially in engineering where there is no strong culture of experimental design reporting as in biological sciences. For instance, it is still not uncommon for benchmarking studies in computer engineering to report on performance e.g. speed, with no reference to trade-offs such as circuit area/code size or power/energy consumption. It is also not uncommon for publications to report on synthetic benchmarks which "artificially" show a particular hardware or software solution in a good light at the expense of competitors. Worse, practical concerns such as cost, sourcing, robustness, extendibility and maintainability are still routinely omitted.

Back To Basics

The reproducibility crisis could and should be addressed by going back to the basics of the scientific method. Reproducibility must be a cardinal precondition for scientific publishing. Below are practical ways to achieve that:

Making the sharing of experimental design, including raw data when appropriate, part of the publication submission process. In some disciplines, the submission of such information might be required before the experiment is conducted, and a decision to accept or reject the paper would be made solely on the basis of the experimental design submission i.e. prior to the experiment being conducted. This would reduce the likelihood of bias towards publishing positive results
Formal and rigorous training of researchers (e.g. Doctoral students) on research methodologies and statistics. Making this a mandatory requirement will go a long way into reducing some of the basic mistakes seen in many publications e.g. conflating correlation with causality, sensitivity vs. specificity, and misinterpretation of p-values
Encouraging replication and triangulation attempts in undergraduate and postgraduate teaching e.g. as part of a dedicated course on research methodologies, study projects or final year Bachelor theses. This would serve a dual purpose: a training purpose for students on research and research methodologies, on the one hand, and a useful critical review of research results published in the literature, which should ultimately lead to better outcomes
Funding for replication studies especially in areas where the impact of the published science has a direct incidence on people's lives and the potential for reproducibility problems is high. Note that the propensity to seek groundbreaking results introduces a bias against this type of research from both researchers themselves and funding bodies

None of the above is particularly challenging to implement, the biggest hurdle is ultimately cultural. The academic and scientific community has to develop a collective consciousness around the scientific method and the assessment of research. In particular, we need to align the interests of individual researchers with the collective interest of society. For instance, as long as we continue to reward flawed metrics such as number of publications or even journal impact factors, we will perpetuate the perverse incentives of the "Publish or Perish" culture.

Conclusions

The reproducibility crisis is a major threat to public trust in science and research. A combination of a fall in standards of practice exacerbated by a "Publish or Perish" culture, and a lack of adequate frameworks for the assessment of research quality means that this problem is set to persist if no decisive actions are taken. Essentially, this is a classical collective action problem and the future will ultimately be shaped by any of the following developments (or combinations thereof):

Regulation: from governments and funding agencies e.g. to require publishers and authors of scientific publications to share experimental design and/or data required for reproducibility studies
Privatization: whereby research is privately owned and hence access to research output is regulated by private owners in order to maximize utility. It might seem far-fetched but universities and publicly funded research institutions were not always the dominant players in research. Indeed, research labs within private technology companies, conducting much of their research in-house, used to be a dominant model. The inefficiencies brought about by the reproducibility crisis and their underlying causes might well push companies towards that model again
Self-organization: which is a community-driven approach to address the root causes of the reproducibility crisis. The Declaration for Research Assessment [5] and subsequent initiatives [6][7] are seminal actions in this direction

Clearly, self-organization is the most desirable outcome. However, that will only become a reality if the underlying principles and actions are collectively adopted by a large number of researchers and scientists. I posit that a larger weight of responsibility lies on the shoulders of established academics and researchers. If that does not materialize, however, then regulation and/or privatization might well be the only ways forward.

You can keep up with the latest from Arm Education and contribute to the wider debate in the comments sections below.

Arm Education

References

[1] "Is There a Reproducibility Crisis in Science?", Nature Video, May 28, 2016, https://www.scientificamerican.com/video/is-there-a-reproducibility-crisis-in-science/

[2] Open Research and Publishing: Reflections, Arm Education Media. Jan. 2020.

[3] https://www.bmj.com/content/342/bmj.c7452

[4] "Evaluating replicability of laboratory experiments in economics", C. F. Camerer et al., Science, 25 Mar 2016, Vol. 351, Issue 6280, pp. 1433-1436

[5] https://sfdora.org/read/

[6] http://www.leidenmanifesto.org/

[7] https://www.coalition-s.org

0 comments
0 members are here

Arm Education

A modern way to learn the Arm Assembly Language

Kieran Hejmadi

Elevate your Arm Assembly learning experience with the cutting-edge Arm Language Server. Learn more in this blog post.
- January 31, 2025
Anglia Ruskin University: Ultra-flexible learning powers efforts to plug industry skills gaps

Robert Iannello

ARU and Arm Education launched an online PG Cert in Embedded Computing, addressing skills gaps in AI and IoT through flexible, industry-focused learning.
- December 12, 2024
Educational partnerships for a dynamic semiconductor industry

Robert Iannello

Arm Education partners with academia and governments to tackle the semiconductor talent shortage through free resources and initiatives like the Semiconductor Education Alliance.
- October 7, 2024

Arm Education

The Reproducibility Crisis: Back to Basics

The Reproducibility Crisis: What? and Why?

The Reproducibility Crisis Across Disciplines

Back To Basics

Conclusions

References

A modern way to learn the Arm Assembly Language

Anglia Ruskin University: Ultra-flexible learning powers efforts to plug industry skills gaps

Educational partnerships for a dynamic semiconductor industry