Arm Community
Arm Community
  • Site
  • User
  • Site
  • Search
  • User
  • Groups
    • Research Collaboration and Enablement
    • DesignStart
    • Education Hub
    • Innovation
    • Open Source Software and Platforms
  • Forums
    • AI and ML forum
    • Architectures and Processors forum
    • Arm Development Platforms forum
    • Arm Development Studio forum
    • Arm Virtual Hardware forum
    • Automotive forum
    • Compilers and Libraries forum
    • Graphics, Gaming, and VR forum
    • High Performance Computing (HPC) forum
    • Infrastructure Solutions forum
    • Internet of Things (IoT) forum
    • Keil forum
    • Morello Forum
    • Operating Systems forum
    • SoC Design and Simulation forum
    • 中文社区论区
  • Blogs
    • AI and ML blog
    • Announcements
    • Architectures and Processors blog
    • Automotive blog
    • Graphics, Gaming, and VR blog
    • High Performance Computing (HPC) blog
    • Infrastructure Solutions blog
    • Innovation blog
    • Internet of Things (IoT) blog
    • Operating Systems blog
    • Research Articles
    • SoC Design and Simulation blog
    • Smart Homes
    • Tools, Software and IDEs blog
    • Works on Arm blog
    • 中文社区博客
  • Support
    • Arm Support Services
    • Documentation
    • Downloads
    • Training
    • Arm Approved program
    • Arm Design Reviews
  • Community Help
  • More
  • Cancel
Education Hub
Education Hub
Khaled Benkrid The Reproducibility Crisis: Back to Basics
  • Blogs
  • Mentions
  • Sub-Groups
  • Tags
  • Jump...
  • Cancel
  • New
Education Hub requires membership for participation - click to join
More blogs in Education Hub
  • Anna Malan

  • Apurva Varma

  • Becky Ellis

  • Hao Xue

  • Khaled Benkrid

  • Michael S

  • Nicholas Sample

  • Rob Leeman

  • Robert Iannello

  • Rosalie Tribe

  • Sadanand Gulwadi

  • Shuojin Hang

  • Suriya Gunasekaran

Tags
  • Arm Research
  • Arm Education
Actions
  • RSS
  • More
  • Cancel
Related blog posts
Related forum threads

The Reproducibility Crisis: Back to Basics

Khaled Benkrid
Khaled Benkrid
February 2, 2020

The ability to reproduce experimental results reported in academic and scientific publications sits at the heart of the scientific method. Put simply, if experimental results cannot be reproduced by third-parties, for any reason, then the scientific merit of these results is fundamentally questionable. In the past two decades, there has been a growing awareness of the fact that many reported academic and scientific studies are difficult or even impossible to reproduce. This puts into question the veracity and validity of these studies and inevitably leads to an erosion of public trust in academia and science. The coining of the phrase  "Reproducibility Crisis" or "Replicability Crisis" in the past decade was meant to raise awareness of the gravity of the situation so that proper community-wide interventions can take shape to counter this crisis.

As an educator and researcher in computer science and electronics engineering, I have seen first-hand how the reproducibility crisis has grown over the past two decades. In this blog, I will draw upon my own experiences and observations to attempt and explain the reproducibility crisis phenomenon and its causes. I will then present examples of this crisis in different disciplines and its consequences. After that, I will make suggestions to mitigate the impact of this crisis before drawing some conclusions.

The Reproducibility Crisis: What? and Why?

In a revealing survey of more than 1,500 scientists conducted by Nature in 2016 [1], 70% of researchers surveyed said they have tried and failed to reproduce other groups' experimental results. Asked about reproducing their own experimental results, more than half said they could not! It is this failure to reproduce so many research results published in the literature that has given rise to what is commonly known these days as the "reproducibility crisis". This is not just an academic concern however, for much of our system of inventions and enterprise relies fundamentally on reproducible research results. Failure to address this problem would question whole systems of academic and scientific research, innovation and enterprise, and ultimately the wider economy. 

The reasons behind the failure to reproduce so much of publication results are diverse, and include:

  • Failure to disclose the full details of experimental design in publications. This is often because experimental design was not fully and formally captured beforehand. The lack of a universal requirement to report on experimental design in publication submissions is a compounding factor
  • Lack of proper training on research methods which accompanied a boom in the number of trainee researchers in the last few decades. In particular, a lack of proper training on statistics leads to basic mistakes in hypothesis testing and statistical inference
  • Selective reporting of data and results that suit a particular hypothesis while ignoring other data that do not support the hypothesis
  • Hyper-competition in research [2], driven in part by the obsession with metrics e.g. Journal Impact Factor, has led to a "Publish or Perish" culture in academia, with perverse incentives to see data and experimental design as a competitive advantage not to be shared with other researchers

The Reproducibility Crisis Across Disciplines

The reproducibility crisis touches a wide range of disciplines in engineering & physical sciences, biological & medical sciences, and social sciences. 

Biological & medical sciences are perhaps the academic and research discipline whereby the impact of the reproducibility crisis is mostly felt as it can have a direct incidence on the health of ordinary citizens. For instance, take up of the MMR vaccine in the UK was severely affected after the publication of a controversial study in 1998, which suggested a link between the MMR vaccine and autism, even though the paper reported a small case series with no controls [3]. Subsequent studies refuted this finding, but the public scare the original publication created led to noticeable Measles outbreaks.

In social sciences, Economics is often cited as the prime example. In it, the debate continues to rage around the vast range of economic analyses and forecasts despite an often-common pool of data. For instance, a 2016 study in the journal Science found that one third of 18 experimental studies from two top-tier economics journals failed to be reproduced [4]. When we factor in the importance of economic forecasts in political life and decision making, one could easily see the damage that the reproducibility crisis is doing. 

Closer to my own area of research interest, engineering & physical sciences are not immune to the reproducibility crisis. Indeed, selective reporting is still commonplace especially in engineering where there is no strong culture of experimental design reporting as in biological sciences. For instance, it is still not uncommon for benchmarking studies in computer engineering to report on performance e.g. speed, with no reference to trade-offs such as circuit area/code size or power/energy consumption. It is also not uncommon for publications to report on synthetic benchmarks which "artificially" show a particular hardware or software solution in a good light at the expense of competitors. Worse, practical concerns such as cost, sourcing, robustness, extendibility and maintainability are still routinely omitted. 

Back To Basics 

The reproducibility crisis could and should be addressed by going back to the basics of the scientific method. Reproducibility must be a cardinal precondition for scientific publishing. Below are practical ways to achieve that:

  • Making the sharing of experimental design, including raw data when appropriate, part of the publication submission process. In some disciplines, the submission of such information might be required before the experiment is conducted, and a decision to accept or reject the paper would be made solely on the basis of the experimental design submission i.e. prior to the experiment being conducted. This would reduce the likelihood of bias towards publishing positive results
  • Formal and rigorous training of researchers (e.g. Doctoral students) on research methodologies and statistics. Making this a mandatory requirement will go a long way into reducing some of the basic mistakes seen in many publications e.g. conflating correlation with causality, sensitivity vs. specificity, and misinterpretation of p-values
  • Encouraging replication and triangulation attempts in undergraduate and postgraduate teaching e.g. as part of a dedicated course on research methodologies, study projects or final year Bachelor theses. This would serve a dual purpose: a training purpose for students on research and research methodologies, on the one hand, and a useful critical review of research results published in the literature, which should ultimately lead to better outcomes
  • Funding for replication studies especially in areas where the impact of the published science has a direct incidence on people's lives and the potential for reproducibility problems is high. Note that the propensity to seek groundbreaking results introduces a bias against this type of research from both researchers themselves and funding bodies

None of the above is particularly challenging to implement, the biggest hurdle is ultimately cultural. The academic and scientific community has to develop a collective consciousness around the scientific method and the assessment of research. In particular, we need to align the interests of individual researchers with the collective interest of society. For instance, as long as we continue to reward flawed metrics such as number of publications or even journal impact factors, we will perpetuate the perverse incentives of the "Publish or Perish" culture. 

Conclusions

The reproducibility crisis is a major threat to public trust in science and research. A combination of a fall in standards of practice exacerbated by a "Publish or Perish" culture, and a lack of adequate frameworks for the assessment of research quality means that this problem is set to persist if no decisive actions are taken. Essentially, this is a classical collective action problem and the future will ultimately be shaped by any of the following developments (or combinations thereof):

  • Regulation: from governments and funding agencies e.g. to require publishers and authors of scientific publications to share experimental design and/or data required for reproducibility studies
  • Privatization: whereby research is privately owned and hence access to research output is regulated by private owners in order to maximize utility. It might seem far-fetched but universities and publicly funded research institutions were not always the dominant players in research. Indeed, research labs within private technology companies, conducting much of their research in-house, used to be a dominant model. The inefficiencies brought about by the reproducibility crisis and their underlying causes might well push companies towards that model again
  • Self-organization: which is a community-driven approach to address the root causes of the reproducibility crisis. The Declaration for Research Assessment [5] and subsequent initiatives [6][7] are seminal actions in this direction

Clearly, self-organization is the most desirable outcome. However, that will only become a reality if the underlying principles and actions are collectively adopted by a large number of researchers and scientists. I posit that a larger weight of responsibility lies on the shoulders of established academics and researchers. If that does not materialize, however, then regulation and/or privatization might well be the only ways forward.

You can keep up with the latest from Arm Education and contribute to the wider debate in the comments sections below.

Arm Education

References

[1] "Is There a Reproducibility Crisis in Science?", Nature Video, May 28, 2016, https://www.scientificamerican.com/video/is-there-a-reproducibility-crisis-in-science/

[2] Open Research and Publishing: Reflections, Arm Education Media. Jan. 2020.

[3] https://www.bmj.com/content/342/bmj.c7452

[4] "Evaluating replicability of laboratory experiments in economics", C. F. Camerer et al., Science, 25 Mar 2016, Vol. 351, Issue 6280, pp. 1433-1436

[5] https://sfdora.org/read/

[6] http://www.leidenmanifesto.org/

[7] https://www.coalition-s.org

Anonymous
Khaled Benkrid
  • The Evolving Nature of Work

    Khaled Benkrid
    Khaled Benkrid
     The rise of Artificial Intelligence (AI) technology and the Internet-of-Things (IoT) is giving rise to a fourth industrial revolution which is already having a fundamental impact on the nature of work…
    • April 12, 2021
  • Tips For Home-Schooling

    Khaled Benkrid
    Khaled Benkrid
    The COVID-19 pandemic is having a profound impact on the education of our children. Many of them have been unable to attend regular school for months, and many parents have suddenly found themselves in…
    • June 26, 2020
  • Talent Multiplication: The Holy Grail Of Modern Organizations

    Khaled Benkrid
    Khaled Benkrid
    In today's incredibly connected, knowledge and skills-driven world, organizations are increasingly competing on talent. This is more so in the high-tech industry where technology is moving at an unprecedented…
    • May 27, 2020