Dagstuhl Seminar 16041
Reproducibility of Data-Oriented Experiments in e-Science
( Jan 24 – Jan 29, 2016 )
Permalink
Organizers
- Juliana Freire (New York University, US)
- Norbert Fuhr (Universität Duisburg-Essen, DE)
- Andreas Rauber (TU Wien, AT)
Contact
- Dagmar Glaser (for administrative matters)
Impacts
- Dagstuhl Manifesto : Reproducibility of Data-Oriented Experiments in e-Science : pp. 330-333 : article - Freire, Juliana; Fuhr, Norbert; Rauber, Andreas - Berlin : Springer, 2016 - (Informatik Spektrum : 39. 2016, 4).
- Increasing Reproducibility in IR : Findings from the Dagstuhl Seminar on “Reproducibility of Data-Oriented Experiments in e-Science” : article : pp. 68-82 - Agosti, Maristella; Fuhr, Norbert; Järvelin, Kalervo; Kando, Noriko; Lippold, Matthias; Zobel, Justin - New York : ACM, 2016 - (Sigir forum : 50. 2016, 1).
- Reproducibility Challenges in Information Retrieval Evaluation : article - Ferro, Nicola - New York : ACM, 2017 - (Journal of Data and Information Quality ; 8. 2017, 2 : Article 8).
- Reproducibility in Information Retrieval : Tools and Infrastructures - Ferro, Nicola; Fuhr, Norbert; Rauber, Andreas - New York : ACM, 2018 - (Journal of Data and Information Quality ; 10. 2018, 4).
- The Road Towards Reproducibility in Science : The Case of Data Citation : paper presented at Italian Research Conference on Digital Libraries 2017 (IRCDL 2017) - Ferro, Nicola; Silvello, Gianmaria - Padua : University, 2017 - (Italian Research Conference on Digital Libraries (IRCDL) 2017 ; paper).
- Towards Open-Source Shared Implementations of Keyword-Based Access Systems to Relational Data : article in KARS 2017 - Badan, Alex; Benvegnu, Luca; Biasetton, Matteo; Ferro, Nicola; Simionato, Riccardo; Soleti, Nicolo; Tessarotto, Matteo; Tonon, Andrea; Vendramin, Federico; Marchesin, Stefano; Minetto, Alberto; Pellegrina, Leonardo; Purpura, Alberto; Bonato, Giovanni; Brighente, Alessandro; Cenzato, Alberto; Ceron, Piergiorgi o; Cogato, Giovanni - Aachen : CEUR, 2017 - (CEUR workshop series ; 1810).
- Yin & Yang : Demonstrating Complementary Provenance from noWorkflow & YesWorkflow : article in LNCS 9672, IPAW 2016 : pp. 161-165 - Berlin : Springer, 2016. - Pimentel, Joao Felipe; Dey, Saumen; MacPhillips, Timothy; Belhajjame, Khalid; Murta, Leonardo - Berlin : Springer, 2016 - (Lecture notes in computer science ; 9672 : article).
In many subfields of computer science (CS), experiments play an important role. Besides theoretic properties of algorithms or methods, their effectiveness and performance often can only be validated via experimentation. In most of these cases, the experimental results depend on the input data, settings for input parameters, and potentially on characteristics of the computational environment where the experiments were designed and run. Unfortunately, most computational experiments are specified only informally in papers, where experimental results are briefly described in figure captions; the code that produced the results is seldom available.
This has serious implications. Scientific discoveries do not happen in isolation. Important advances are often the result of sequences of smaller, less significant steps. In the absence of results that are fully documented, reproducible, and generalizable, it becomes hard to re-use and extend these results. Besides hindering the ability of others to leverage our work, and consequently limiting the impact of our field, the absence of reproducibility experiments also puts our reputation at stake, since reliability and validity of empiric results are basic scientific principles.
Reproducible results are not just beneficial to others – in fact, they bring many direct benefits to the researchers themselves. Making an experiment reproducible forces the researcher to document execution pathways. This in turn enables the pathways to be analyzed (and audited) and it also helps newcomers (e.g., new students and post-docs) to get acquainted with the problem and tools used. Reproducibility also forces portability which simplifies the dissemination of the results. Last, but not least, preliminary evidence exists that reproducibility increases impact, visibility and research quality.
However, attaining reproducibility for computational experiments is challenging. It is hard both for authors to derive a compendium that encapsulates all the components (e.g., data, code, parameter settings, environment) needed to reproduce a result, and for reviewers to verify the results. There are also other barriers, from practical issues – including the use of proprietary data, software and specialized hardware, to social – for example, the lack of incentive for authors to spend the extra time making their experiments reproducible.
In this seminar, we will bring together experts from various sub-fields of Computer Science (CS) to create a joint understanding of the problems of reproducibility of experiments, discuss existing solutions and impediments, and propose ways to overcome current limitations. Some topics we intend to cover include, but are not limited to: reproducibility requirements, infrastructure, new CS research directions required to support reproducibility, incentives, education. Each participant is expected to present the state of the art, requirements, and issues related to reproducibility in their sub-field. Workgroups will be formed to discuss cross-cutting issues.
The expected outcome of the seminar will be a manifesto proposing guidelines, procedures, and further activities for improving reproducibility and broadening its adoption in CS.
More specifically, we will address the following key issues:
- Understanding and requirements
- Technical aspects of repeatability
- Benchmarks and worksets
- IPR, public availability of research, non-consumptive research
- Infrastructures
- New challenges and research directions
- Specialization and integration across disciplines
- Awareness, education and communication
- Incentives for repeatability
In many subfields of computer science, experiments play an important role. Besides theoretical properties of algorithms or methods, their effectiveness and performance often can only be validated via experimentation. In most of these cases, the experimental results depend on the input data, settings for input parameters, and potentially on characteristics of the computational environment where the experiments were designed and run. Unfortunately, most computational experiments are specified only informally in papers, where experimental results are briefly described in figure captions; the code that produced the results is seldom available.
This has serious implications. Scientific discoveries do not happen in isolation. Important advances are often the result of sequences of smaller, less significant steps. In the absence of results that are fully documented, reproducible, and generalizable, it becomes hard to re-use and extend these results. Besides hindering the ability of others to leverage our work, and consequently limiting the impact of our field, the absence of reproducibility experiments also puts our reputation at stake, since reliability and validity of empiric results are basic scientific principles.
Reproducible results are not just beneficial to others -- in fact, they bring many direct benefits to the researchers themselves. Making an experiment reproducible forces the researcher to document execution pathways. This in turn enables the pathways to be analyzed (and audited). It also helps newcomers (e.g., new students and post-docs) to get acquainted with the problem and tools used. Furthermore, reproducibility facilitates portability, which simplifies the dissemination of the results. Last, but not least, preliminary evidence exists that reproducibility increases impact, visibility and research quality.
However, attaining reproducibility for computational experiments is challenging. It is hard both for authors to derive a compendium that encapsulates all the components (e.g., data, code, parameter settings, environment) needed to reproduce a result, and for reviewers to verify the results. There are also other barriers, from practical issues -- including the use of proprietary data, software and specialized hardware, to social -- for example, the lack of incentives for authors to spend the extra time making their experiments reproducible.
This seminar brought together experts from various sub-fields of Computer Science as well as experts from several scientific domains to create a joint understanding of the problems of reproducibility of experiments, discuss existing solutions and impediments, and propose ways to overcome current limitations.
Beyond a series of short presentations of tools, state of the art of reproducibility in various domains and "war stories" of things not working, participants specifically explored ways forward to overcome barriers to the adoption of reproducibility. A series of break-out sessions gradually built on top of each other, (1) identifying different types of repeatability and their merits; (2) the actors involved and the incentives and barriers they face; (3) guidelines for actors (specifically editors, authors and reviewers) on how to determine the level of reproducibility of papers and the merits of reproduction papers; and (4) the specific challenges faced by user-oriented experimentation in Information Retrieval.
This led to the definition of according typologies and guidelines as well as identification of specific open research problems. We defined a set of actions to reach out to stakeholders, notably publishers and funding agencies as well as identifying follow-up liaison with various reproducibility task forces in different communities including the ACM, FORCE11, STM, Science Europe.
The key message resulting from this workshop, copied from and elaborated in more detail in Section 6.5 is:
Transparency, openness, and reproducibility are vital features of science. Scientists embrace these features as disciplinary norms and values, and it follows that they should be integrated into daily research activities. These practices give confidence in the work; help research as a whole to be conducted at a higher standard and be undertaken more efficiently; provide verifiability and falsifiability; and encourage a community of mutual cooperation. They also lead to a valuable form of paper, namely, reports on evaluation and reproduction of prior work. Outcomes that others can build upon and use for their own research, whether a theoretical construct or a reproducible experimental result, form a foundation on which science can progress. Papers that are structured and presented in a manner that facilitates and encourages such post-publication evaluations benefit from increased impact, recognition, and citation rates. Experience in computing research has demonstrated that a range of straightforward mechanisms can be employed to encourage authors to produce reproducible work. These include: requiring an explicit commitment to an intended level of provision of reproducible materials as a routine part of each paper’s structure; requiring a detailed methods section; separating the refereeing of the paper’s scientific contribution and its technical process; and explicitly encouraging the creation and reuse of open resources (data, or code, or both).
 Juliana Freire, Norbert Fuhr, and Andreas Rauber
                    Juliana Freire, Norbert Fuhr, and Andreas Rauber
                - Vanessa Braganholo (Fluminense Federal University, BR) [dblp]
- Fernando Chirigati (NYU Tandon School of Engineering, US) [dblp]
- Christian Collberg (University of Arizona - Tucson, US) [dblp]
- Shane Culpepper (RMIT University - Melbourne, AU) [dblp]
- David De Roure (University of Oxford, GB) [dblp]
- Arjen P. de Vries (Radboud University Nijmegen, NL) [dblp]
- Jens Dittrich (Universität des Saarlandes, DE) [dblp]
- Nicola Ferro (University of Padova, IT) [dblp]
- Juliana Freire (New York University, US) [dblp]
- Norbert Fuhr (Universität Duisburg-Essen, DE) [dblp]
- Daniel Garijo (Technical University of Madrid, ES) [dblp]
- Carole Goble (University of Manchester, GB) [dblp]
- Kalervo Järvelin (University of Tampere, FI) [dblp]
- Noriko Kando (National Institute of Informatics - Tokyo, JP) [dblp]
- Randall J. LeVeque (University of Washington - Seattle, US) [dblp]
- Matthias Lippold (Universität Duisburg-Essen, DE) [dblp]
- Bertram Ludäscher (University of Illinois at Urbana-Champaign, US) [dblp]
- Mihai Lupu (TU Wien, AT) [dblp]
- Tanu Malik (University of Chicago, US) [dblp]
- Rudolf Mayer (SBA Research - Wien, AT) [dblp]
- Alistair Moffat (The University of Melbourne, AU) [dblp]
- Kevin Page (University of Oxford, GB) [dblp]
- Raul Antonio Palma de Leon (Poznan Supercomputing and Networking Center, PL)
- Martin Potthast (Bauhaus-Universität Weimar, DE) [dblp]
- Andreas Rauber (TU Wien, AT) [dblp]
- Paul Rosenthal (TU Chemnitz, DE) [dblp]
- Claudio T. Silva (New York University, US) [dblp]
- Stian Soiland-Reyes (University of Manchester, GB) [dblp]
- Benno Stein (Bauhaus-Universität Weimar, DE) [dblp]
- Rainer Stotzka (KIT - Karlsruher Institut für Technologie, DE) [dblp]
- Evelyne Viegas (Microsoft Research - Redmond, US) [dblp]
- Stefan Winkler-Nees (DFG - Bonn, DE)
- Torsten Zesch (Universität Duisburg-Essen, DE) [dblp]
- Justin Zobel (The University of Melbourne, AU) [dblp]
Classification
- bioinformatics
- data bases / information retrieval
- society / human-computer interaction
Keywords
- experimentation
- reliability
- validity

 
                 
                 
                 Creative Commons BY 3.0 Unported license
                        Creative Commons BY 3.0 Unported license
                    