TOP
Search the Dagstuhl Website
Looking for information on the websites of the individual seminars? - Then please:
Not found what you are looking for? - Some of our services have separate websites, each with its own search option. Please check the following list:
Schloss Dagstuhl - LZI - Logo
Schloss Dagstuhl Services
Seminars
Within this website:
External resources:
  • DOOR (for registering your stay at Dagstuhl)
  • DOSA (for proposing future Dagstuhl Seminars or Dagstuhl Perspectives Workshops)
Publishing
Within this website:
External resources:
dblp
Within this website:
External resources:
  • the dblp Computer Science Bibliography


Dagstuhl Seminar 16041

Reproducibility of Data-Oriented Experiments in e-Science

( Jan 24 – Jan 29, 2016 )

(Click in the middle of the image to enlarge)

Permalink
Please use the following short url to reference this page: https://www.dagstuhl.de/16041

Organizers

Contact


Impacts

Motivation

In many subfields of computer science (CS), experiments play an important role. Besides theoretic properties of algorithms or methods, their effectiveness and performance often can only be validated via experimentation. In most of these cases, the experimental results depend on the input data, settings for input parameters, and potentially on characteristics of the computational environment where the experiments were designed and run. Unfortunately, most computational experiments are specified only informally in papers, where experimental results are briefly described in figure captions; the code that produced the results is seldom available.

This has serious implications. Scientific discoveries do not happen in isolation. Important advances are often the result of sequences of smaller, less significant steps. In the absence of results that are fully documented, reproducible, and generalizable, it becomes hard to re-use and extend these results. Besides hindering the ability of others to leverage our work, and consequently limiting the impact of our field, the absence of reproducibility experiments also puts our reputation at stake, since reliability and validity of empiric results are basic scientific principles.

Reproducible results are not just beneficial to others – in fact, they bring many direct benefits to the researchers themselves. Making an experiment reproducible forces the researcher to document execution pathways. This in turn enables the pathways to be analyzed (and audited) and it also helps newcomers (e.g., new students and post-docs) to get acquainted with the problem and tools used. Reproducibility also forces portability which simplifies the dissemination of the results. Last, but not least, preliminary evidence exists that reproducibility increases impact, visibility and research quality.

However, attaining reproducibility for computational experiments is challenging. It is hard both for authors to derive a compendium that encapsulates all the components (e.g., data, code, parameter settings, environment) needed to reproduce a result, and for reviewers to verify the results. There are also other barriers, from practical issues – including the use of proprietary data, software and specialized hardware, to social – for example, the lack of incentive for authors to spend the extra time making their experiments reproducible.

In this seminar, we will bring together experts from various sub-fields of Computer Science (CS) to create a joint understanding of the problems of reproducibility of experiments, discuss existing solutions and impediments, and propose ways to overcome current limitations. Some topics we intend to cover include, but are not limited to: reproducibility requirements, infrastructure, new CS research directions required to support reproducibility, incentives, education. Each participant is expected to present the state of the art, requirements, and issues related to reproducibility in their sub-field. Workgroups will be formed to discuss cross-cutting issues.

The expected outcome of the seminar will be a manifesto proposing guidelines, procedures, and further activities for improving reproducibility and broadening its adoption in CS.

More specifically, we will address the following key issues:

  • Understanding and requirements
  • Technical aspects of repeatability
  • Benchmarks and worksets
  • IPR, public availability of research, non-consumptive research
  • Infrastructures
  • New challenges and research directions
  • Specialization and integration across disciplines
  • Awareness, education and communication
  • Incentives for repeatability

Summary

In many subfields of computer science, experiments play an important role. Besides theoretical properties of algorithms or methods, their effectiveness and performance often can only be validated via experimentation. In most of these cases, the experimental results depend on the input data, settings for input parameters, and potentially on characteristics of the computational environment where the experiments were designed and run. Unfortunately, most computational experiments are specified only informally in papers, where experimental results are briefly described in figure captions; the code that produced the results is seldom available.

This has serious implications. Scientific discoveries do not happen in isolation. Important advances are often the result of sequences of smaller, less significant steps. In the absence of results that are fully documented, reproducible, and generalizable, it becomes hard to re-use and extend these results. Besides hindering the ability of others to leverage our work, and consequently limiting the impact of our field, the absence of reproducibility experiments also puts our reputation at stake, since reliability and validity of empiric results are basic scientific principles.

Reproducible results are not just beneficial to others -- in fact, they bring many direct benefits to the researchers themselves. Making an experiment reproducible forces the researcher to document execution pathways. This in turn enables the pathways to be analyzed (and audited). It also helps newcomers (e.g., new students and post-docs) to get acquainted with the problem and tools used. Furthermore, reproducibility facilitates portability, which simplifies the dissemination of the results. Last, but not least, preliminary evidence exists that reproducibility increases impact, visibility and research quality.

However, attaining reproducibility for computational experiments is challenging. It is hard both for authors to derive a compendium that encapsulates all the components (e.g., data, code, parameter settings, environment) needed to reproduce a result, and for reviewers to verify the results. There are also other barriers, from practical issues -- including the use of proprietary data, software and specialized hardware, to social -- for example, the lack of incentives for authors to spend the extra time making their experiments reproducible.

This seminar brought together experts from various sub-fields of Computer Science as well as experts from several scientific domains to create a joint understanding of the problems of reproducibility of experiments, discuss existing solutions and impediments, and propose ways to overcome current limitations.

Beyond a series of short presentations of tools, state of the art of reproducibility in various domains and "war stories" of things not working, participants specifically explored ways forward to overcome barriers to the adoption of reproducibility. A series of break-out sessions gradually built on top of each other, (1) identifying different types of repeatability and their merits; (2) the actors involved and the incentives and barriers they face; (3) guidelines for actors (specifically editors, authors and reviewers) on how to determine the level of reproducibility of papers and the merits of reproduction papers; and (4) the specific challenges faced by user-oriented experimentation in Information Retrieval.

This led to the definition of according typologies and guidelines as well as identification of specific open research problems. We defined a set of actions to reach out to stakeholders, notably publishers and funding agencies as well as identifying follow-up liaison with various reproducibility task forces in different communities including the ACM, FORCE11, STM, Science Europe.

The key message resulting from this workshop, copied from and elaborated in more detail in Section 6.5 is:

Transparency, openness, and reproducibility are vital features of science. Scientists embrace these features as disciplinary norms and values, and it follows that they should be integrated into daily research activities. These practices give confidence in the work; help research as a whole to be conducted at a higher standard and be undertaken more efficiently; provide verifiability and falsifiability; and encourage a community of mutual cooperation. They also lead to a valuable form of paper, namely, reports on evaluation and reproduction of prior work. Outcomes that others can build upon and use for their own research, whether a theoretical construct or a reproducible experimental result, form a foundation on which science can progress. Papers that are structured and presented in a manner that facilitates and encourages such post-publication evaluations benefit from increased impact, recognition, and citation rates. Experience in computing research has demonstrated that a range of straightforward mechanisms can be employed to encourage authors to produce reproducible work. These include: requiring an explicit commitment to an intended level of provision of reproducible materials as a routine part of each paper’s structure; requiring a detailed methods section; separating the refereeing of the paper’s scientific contribution and its technical process; and explicitly encouraging the creation and reuse of open resources (data, or code, or both).

Copyright Juliana Freire, Norbert Fuhr, and Andreas Rauber

Participants
  • Vanessa Braganholo (Fluminense Federal University, BR) [dblp]
  • Fernando Chirigati (NYU Tandon School of Engineering, US) [dblp]
  • Christian Collberg (University of Arizona - Tucson, US) [dblp]
  • Shane Culpepper (RMIT University - Melbourne, AU) [dblp]
  • David De Roure (University of Oxford, GB) [dblp]
  • Arjen P. de Vries (Radboud University Nijmegen, NL) [dblp]
  • Jens Dittrich (Universität des Saarlandes, DE) [dblp]
  • Nicola Ferro (University of Padova, IT) [dblp]
  • Juliana Freire (New York University, US) [dblp]
  • Norbert Fuhr (Universität Duisburg-Essen, DE) [dblp]
  • Daniel Garijo (Technical University of Madrid, ES) [dblp]
  • Carole Goble (University of Manchester, GB) [dblp]
  • Kalervo Järvelin (University of Tampere, FI) [dblp]
  • Noriko Kando (National Institute of Informatics - Tokyo, JP) [dblp]
  • Randall J. LeVeque (University of Washington - Seattle, US) [dblp]
  • Matthias Lippold (Universität Duisburg-Essen, DE) [dblp]
  • Bertram Ludäscher (University of Illinois at Urbana-Champaign, US) [dblp]
  • Mihai Lupu (TU Wien, AT) [dblp]
  • Tanu Malik (University of Chicago, US) [dblp]
  • Rudolf Mayer (SBA Research - Wien, AT) [dblp]
  • Alistair Moffat (The University of Melbourne, AU) [dblp]
  • Kevin Page (University of Oxford, GB) [dblp]
  • Raul Antonio Palma de Leon (Poznan Supercomputing and Networking Center, PL)
  • Martin Potthast (Bauhaus-Universität Weimar, DE) [dblp]
  • Andreas Rauber (TU Wien, AT) [dblp]
  • Paul Rosenthal (TU Chemnitz, DE) [dblp]
  • Claudio T. Silva (New York University, US) [dblp]
  • Stian Soiland-Reyes (University of Manchester, GB) [dblp]
  • Benno Stein (Bauhaus-Universität Weimar, DE) [dblp]
  • Rainer Stotzka (KIT - Karlsruher Institut für Technologie, DE) [dblp]
  • Evelyne Viegas (Microsoft Research - Redmond, US) [dblp]
  • Stefan Winkler-Nees (DFG - Bonn, DE)
  • Torsten Zesch (Universität Duisburg-Essen, DE) [dblp]
  • Justin Zobel (The University of Melbourne, AU) [dblp]

Classification
  • bioinformatics
  • data bases / information retrieval
  • society / human-computer interaction

Keywords
  • experimentation
  • reliability
  • validity