Toward Scientific Evidence Standards in Empirical Computer Science Cancelled
( 24. Jan – 29. Jan, 2021 )
- Brett A. Becker (University College Dublin, IE)
- Christopher D. Hundhausen (Washington State University - Pullman, US)
- Ciera Jaspan (Google Inc. - Mountain View, US)
- Andreas Stefik (University of Nevada - Las Vegas, US)
- Thomas Zimmermann (Microsoft Corporation - Redmond, US)
- Shida Kunz (für wissenschaftliche Fragen)
- Annette Beyer (für administrative Fragen)
Scientists in a variety of fields are increasingly concerned about the quality of gathered evidence in the sciences. This concern stems from many things, including a lack of procedures to detect fraud, the challenges in replication, our lack of use of pre-registration, and statistical problems like p-hacking. For example, pre-registration requires researchers to register the methodologies of their studies before running an experiment and can be closely tied to the publication process. Besides the aforementioned p-hacking, this also helps prevent the file drawer problem: publishing only the results that confirm the authors' biases. No journal in computer science of which we are aware has checks and balances such as these in place.
Issues of evidence quality have also motivated political change and are beginning to limit the ability of computer scientists to win government grants. For example, the Every Student Succeeds Act in the United States places empirical studies into "Tiers" of evidence, which automatically discount many papers in our field because of our lack of evidence standards. Further, while not all scholars hold the same view, a recent Naturesurvey found that more than 70% of researchers have tried and failed to reproduce another scientist's experiments, more than 50% have failed to reproduce their own experiments, and 90% believe there is a replication crisis. Researchers in many fields are thus concerned about standards of evidence, replication, and other issues that are meaningful to the credibility of the science.
The discipline of computer science is not immune to any of these challenges and faces its own unique difficulties in addressing them. Multiple years-long investigations into software engineering and programming languages have furnished compelling evidence that authors in these fields fail to present rigorous evidence in their publications and lack basic checks and balances like "gathering data,” "having a control group,” or testing people "other than the authors of the publication.” Researchers have raised concerns about the quality of research in computer science education and specifically the lack of replication. Further, no journal or conference in the field has formalized a standard of evidence for its publications, which makes comparisons across studies difficult.
We invite you to a Dagstuhl Seminar that has three primary objectives: 1) to establish a process for creating a computer science-specific evidence standard for empirical research, 2) to build a community of scholars in software engineering, human factors, and computer science education to discuss what a general standard should include, and 3) to collaborate between these sub-fields in analyzing and drafting evidence standards that can be adopted by journals in our respective sub-fields. Our overall goal is to define a standard that is general and flexible, focusing on what should be reported in empirical studies, but not prescribing to scholars the content of what they decide to investigate. The bulk of the activities will be focused on discussion time to determine what could work across and within our respective sub-fields of computer science.
- Brett A. Becker (University College Dublin, IE) [dblp]
- Christopher D. Hundhausen (Washington State University - Pullman, US) [dblp]
- Ciera Jaspan (Google Inc. - Mountain View, US) [dblp]
- Andreas Stefik (University of Nevada - Las Vegas, US) [dblp]
- Thomas Zimmermann (Microsoft Corporation - Redmond, US) [dblp]
- Human-Computer Interaction
- Other Computer Science
- Software Engineering
- Community Evidence Standards
- Human Factors