January 24 – 29 , 2021, Dagstuhl Seminar 21041

CANCELLED Toward Scientific Evidence Standards in Empirical Computer Science

Due to the Covid-19 pandemic, this seminar was cancelled. A related Dagstuhl Seminar was scheduled to October 30 – November 4 , 2022 – Seminar 22442.


Brett A. Becker (University College Dublin, IE)
Christopher D. Hundhausen (Washington State University – Pullman, US)
Ciera Jaspan (Google Inc. – Mountain View, US)
Andreas Stefik (University of Nevada – Las Vegas, US)
Thomas Zimmermann (Microsoft Corporation – Redmond, US)

For support, please contact

Dagstuhl Service Team


Scientists in a variety of fields are increasingly concerned about the quality of gathered evidence in the sciences. This concern stems from many things, including a lack of procedures to detect fraud, the challenges in replication, our lack of use of pre-registration, and statistical problems like p-hacking. For example, pre-registration requires researchers to register the methodologies of their studies before running an experiment and can be closely tied to the publication process. Besides the aforementioned p-hacking, this also helps prevent the file drawer problem: publishing only the results that confirm the authors' biases. No journal in computer science of which we are aware has checks and balances such as these in place.

Issues of evidence quality have also motivated political change and are beginning to limit the ability of computer scientists to win government grants. For example, the Every Student Succeeds Act in the United States places empirical studies into "Tiers" of evidence, which automatically discount many papers in our field because of our lack of evidence standards. Further, while not all scholars hold the same view, a recent Naturesurvey found that more than 70% of researchers have tried and failed to reproduce another scientist's experiments, more than 50% have failed to reproduce their own experiments, and 90% believe there is a replication crisis. Researchers in many fields are thus concerned about standards of evidence, replication, and other issues that are meaningful to the credibility of the science.

The discipline of computer science is not immune to any of these challenges and faces its own unique difficulties in addressing them. Multiple years-long investigations into software engineering and programming languages have furnished compelling evidence that authors in these fields fail to present rigorous evidence in their publications and lack basic checks and balances like "gathering data,” "having a control group,” or testing people "other than the authors of the publication.” Researchers have raised concerns about the quality of research in computer science education and specifically the lack of replication. Further, no journal or conference in the field has formalized a standard of evidence for its publications, which makes comparisons across studies difficult.

We invite you to a Dagstuhl Seminar that has three primary objectives: 1) to establish a process for creating a computer science-specific evidence standard for empirical research, 2) to build a community of scholars in software engineering, human factors, and computer science education to discuss what a general standard should include, and 3) to collaborate between these sub-fields in analyzing and drafting evidence standards that can be adopted by journals in our respective sub-fields. Our overall goal is to define a standard that is general and flexible, focusing on what should be reported in empirical studies, but not prescribing to scholars the content of what they decide to investigate. The bulk of the activities will be focused on discussion time to determine what could work across and within our respective sub-fields of computer science.

Motivation text license
  Creative Commons BY 3.0 DE
  Brett A. Becker, Christopher D. Hundhausen, Ciera Jaspan, Andreas Stefik, and Thomas Zimmermann

Related Dagstuhl Seminar


  • Human-Computer Interaction
  • Other Computer Science
  • Software Engineering


  • Community Evidence Standards
  • Human Factors


In the series Dagstuhl Reports each Dagstuhl Seminar and Dagstuhl Perspectives Workshop is documented. The seminar organizers, in cooperation with the collector, prepare a report that includes contributions from the participants' talks together with a summary of the seminar.


Download overview leaflet (PDF).

Dagstuhl's Impact

Please inform us when a publication was published as a result from your seminar. These publications are listed in the category Dagstuhl's Impact and are presented on a special shelf on the ground floor of the library.


Furthermore, a comprehensive peer-reviewed collection of research papers can be published in the series Dagstuhl Follow-Ups.