Dagstuhl Seminar 18412: Encouraging Reproducibility in Scientific Research of the Internet

Dagstuhl Seminar 18412

Encouraging Reproducibility in Scientific Research of the Internet

( Oct 07 – Oct 10, 2018 )

(Click in the middle of the image to enlarge)

Permalink

Please use the following short url to reference this page: https://www.dagstuhl.de/18412

Organizers

Vaibhav Bajpai (TU München, DE)
Olivier Bonaventure (UC Louvain, BE)
Kimberly Claffy (San Diego Supercomputer Center, US)
Daniel Karrenberg (RIPE NCC - Amsterdam, NL)

Contact

Shida Kunz (for scientific matters)
Annette Beyer (for administrative matters)

Publications

Encouraging Reproducibility in Scientific Research of the Internet (Dagstuhl Seminar 18412). Vaibhav Bajpai, Olivier Bonaventure, Kimberly Claffy, and Daniel Karrenberg. In Dagstuhl Reports, Volume 8, Issue 10, pp. 41-62, Schloss Dagstuhl - Leibniz-Zentrum für Informatik (2019)

Impacts

Motivation

Show Motivation

Reproducibility of research in computer science and in the field of networking in particular is a well-recognized problem. For several reasons, including the sensitive and/or proprietary nature of some Internet measurements, the networking research community discounts the importance of reproducibility of results, instead tending to accept papers that appear plausible. Studies have shown that a fraction of published papers release artefacts (such as code and datasets) that are needed to reproduce results. To provide incentives to authors, conferences attempt to bestow best dataset awards and actively solicit submissions that reproduce results. Community archives (such as DatCat and CRAWDAD) exist that provide an index of existing measurement data and invite the community to reproduce existing research. SIGCOMM Computer Communication Review allows authors to upload artefacts during the paper submission page to allow reviewers to check for reproducibility, and relaxes page limits for reproducible papers. Association for Computing Machinery (ACM) has lately also taken an initiative and introduced a new policy on result and artefact review and badging. The policy sets a terminology to use to assess results and artefacts. ACM has also initiated a new task force on data, software and reproducibility in publication to understand how ACM can effectively promote reproducibility within the computing research community. Despite these continued efforts, reproducibility of research in computer science and in the field of networking in particular appears to exist as an ongoing problem since papers that reproduce existing research rarely get published in practise.

In this Seminar, we aim to discuss challenges to improving reproducibility of scientific Internet research, and hope to develop a set of recommendations that we as a community can undertake to initiate a cultural change toward reproducibility of our work. Questions we anticipate discussing during the seminar include:

What are the challenges with reproducibility?
How can researchers (and data providers) navigate concerns with openly sharing datasets? How should we cope with datasets that lack stable ground truth?
What incentives are needed to encourage reproducibility?
What can publishers do? What can conference organisation committees do? How can we ensure that reviewers consider reproducibility when reviewing papers? How can we man- age and scale the evaluation of artefacts during peer review? Do we need new venues that specifically require reproducibility of the submitted research?
What tools and systems are available to facilitate reproducibility?
How effective are emerging interactive lab notebook tools (e.g., Jupyter) at enabling or facilitating reproducibility? Should computer science course curricula integrate use of these tools for student projects to help develop skills and habits that enable reproducibility?
What guidelines or (best practises) are needed to help reproducibility?
How can we ensure authors think about reproducibility? What guidelines would assist reviewers in evaluating artefacts?

In order to encourage reproducibility of research, practitioners continue to do community service to educate the community on the need for this change.

Creative Commons BY 3.0 DE

Vaibhav Bajpai, Olivier Bonaventure, Kimberly Claffy, and Daniel Karrenberg

Summary

Show Summary

Reproducibility in scientific research is a means to not only achieve trustworthiness of results, but it also lowers barriers to technology transition [1] and accelerates science by promoting incentives to data sharing. The networking research community however pays limited attention to the importance of reproducibility of results, instead tending to accept papers that appear plausible. Previous studies [2,3,4] have shown that a fraction of published papers release artifacts (such as code and datasets) that are needed to reproduce results. In order to encourage reproducibility of research, practitioners continue [5,6,7,8,9] to do service to educate the community on the need for this change. To provide incentives to authors, vehicles for publication of software and datasets are also emerging. For instance, Elsevier SoftwareX [10] is a new journal designed to specifically publish software contributions. DataCite [11,12] provides mechanisms for supporting methods to locate and cite datasets. Community Resource for Archiving Wireless Data (CRAWDAD) [13] and Information Marketplace for Policy and Analysis of Cyber-risk & Trust (IMPACT)Cyber Trust [14] provide an index of existing measurement data to not only enable new research but also advance network science by promoting reproducible research. Traditional conferences bestow best dataset awards and actively solicit submissions that reproduce results. SIGCOMM Computer Communication Review (CCR) allows authors to upload artifacts during paper submission to allow reviewers to check for reproducibility, and relaxes page limits for reproducible papers. Association for Computing Machinery (ACM) has recently introduced a new policy [15] on result and artifact review and badging. The policy identifies a terminology to use to assess results and artifacts. ACM has also initiated a new task force on data, software and reproducibility in publication [16] to understand how ACM can effectively promote reproducibility within the computing research community. National Academies of Sciences, Engineering, and Medicine with a goal to move towards the open science ecosystem has recently (2018) released a report [17] with guidance and concrete recommendations on how to build strategies for achieving open science. The target is to ensure the free availability (and usability) of publications and associated artifacts. The The National Science Foundation (NSF) is taking substantial steps [18] in this area whereby submitted proposals are required to provide a results dissemination plan to describe how produced research results are made available to the extent necessary to independently validate the findings. Towards this end, the proposal budget [19] may request funds for the costs of documenting, preparing, publishing or otherwise making a available to others the findings and products of the work conducted under the NSF grant. Despite these continued efforts, reproducibility of research exist as an ongoing problem and few papers that reproduce existing research get published [20,21,22] in practise.

Goals

In this seminar, we discussed challenges to improving reproducibility of scientific Internet research, developed a set of recommendations that we as a community can undertake to initiate a cultural change toward increased reproducibility of our work. The goal of the seminar was to discuss the questions below and to propose recommendations that would improve the state of reproducibility in computer networking research.

What are the challenges with reproducibility?
How can researchers (and data providers) navigate concerns with openly sharing datasets? How should we cope with datasets that lack stable ground truth?

The first category of questions tried to identify the challenges with reproducibility [23]. For instance, concerns with openly sharing datasets led to discussions around legal restrictions and the advantages of researchers keeping data private for their own exclusive future use. Another consideration is double-blind review practices, which require that authors expend effort to obfuscate the source of their data. Would this time be better spent documenting the datasets for sharing to enable reproducibility? A "gap analysis" discussion to understand whether the problem is a lack of appropriate venues or lack of stable ground truth, or more broadly a lack of incentive to reproduce research since publishing (and funding) agents tend to prefer novelty was held. There is also the inherent risk of confirmation bias of existing results; discussion of ideas on how to train young researchers to recognize and counter this tendency was sought.

What incentives are needed to encourage reproducibility?
What can publishers do? What can conference organisation committees do? How can we ensure that reviewers consider reproducibility when reviewing papers? How can we manage and scale the evaluation of artifacts during peer review? Do we need new venues that specifically require reproducibility of the submitted research?

The second category of questions is about incentives. Questions about how publishers can promote reproducibility framed discussions on whether publishers can provide storage for authors to upload data artifacts with the associated paper in digital libraries, or whether mechanisms can be developed to highlight reproducible (and reproduced) papers. Questions on how conference organisation committees can inspire ideas for additional incentives (such as best dataset awards or relaxing page limits) for authors to make research reproducible. We identified questions to add to review forms to ensure reviewers pay attention to reproducibility aspects. This further lead to discussions on whether committees (in parallel to the regular technical program committee) should evaluate artifacts during the conference review process. Should such a committee be composed of purely young researchers or a blend of young and senior researchers? Questions on the need for specific venues triggered discussions on whether high-impact journals need to establish feature topics on reproducibility or devote a dedicated column for papers that reproduce existing research.

What tools and systems are available to facilitate reproducibility?
How effective are emerging interactive lab notebook tools (e.g., Jupyter) at facilitating reproducibility? Should ac{CS} course curricula integrate use of these tools for student projects to help develop skills and habits that enable reproducibility?

The third category of questions attempt to identify and review tools and systems that are available to facilitate reproducibility. Enormous interest has developed recently in tools for recording experimental observations and computational analytics on large data sets. Some researchers now document the entire process for a paper in a Jupyter lab notebook, greatly facilitating reproducibility and extension of the research. The learning curve for these tools may be daunting; we discussed how faculty can evolve ac{CS} course curricula to integrate use of these tools for student projects to help develop skills and habits that enable reproducibility.

What guidelines or (best practises) are needed to help reproducibility?
How can we ensure authors think about reproducibility? What guidelines would assist reviewers in evaluating artifacts?

The fourth category of questions attempts to develop guidelines (or best practises) to promote reproducibility of research. For instance, we discussed what language could be added to Call for Papers (CFP) to encourage authors to describe reproducibility aspects (of both measurements and results) in their paper submissions.

Structure

The seminar lasted 2.5 days. The seminar began with an introductory round where each participant presented one slide to give an overview of their experience that is relevant for the seminar and a set of open questions that the participant wished to discuss during the event. These slides were collected from each participant before the seminar. We had one invited talk (§ 3.1) that we used as a basis for triggering discussions and identifying areas for group work, while a major portion of the seminar time was dedicated to breakout sessions, whereby participants were split into small groups to discuss specific themes and develop ideas with consensus to propose to larger groups. The morning sessions the following day were dedicated to continuing parallel group work with presentations that reported the outcomes of each breakout session from the previous day. In the afternoons, we dedicated some time for seven minute lightning talks to invite ideas for subsequent breakout sessions. One evening, we had a social dinner activity. The afternoon of the third day was spent reviewing and collecting feedback from the participants and to initiating follow up actions identified during the seminar.

References

Henning Schulzrinne. Networking Research - A Reflection in the Middle Years. In Computer Communications, 2018. doi:10.1016/j.comcom.2018.07.001.
Stuart Kurkowski, Tracy Camp, and Michael Colagrosso. MANET Simulation Studies: The Incredibles. In Mobile Computing and Communications Review, pages 50–61, 2005. doi:10.1145/1096166.1096174.
Patrick Vandewalle, Jelena Kovacevic, and Martin Vetterli. Reproducible Research in Signal Processing. IEEE Signal Processing Magazine, 2009. doi:10.1109/MSP.2009.932122.
Christian S. Collberg and Todd A. Proebsting. Repeatability in Computer Systems Research. In http://doi.acm.org/10.1145/2812803, Communications of the ACM, 2016. doi:10.1145/2812803.
Vern Paxson. Strategies for Sound Internet Measurement. In Internet Measurement Conference, (IMC), pages 263–271, 2004. doi:10.1145/1028788.1028824.
Balachander Krishnamurthy, Walter Willinger, Phillipa Gill, and Martin F. Arlitt. A Socratic Method for Validation of Measurement-based Networking Research. In Computer Communications, pages 43–53, 2011. doi:10.1016/j.comcom.2010.09.014.
Geir Kjetil Sandve, Anton Nekrutenko, James Taylor, and Eivind Hovig. Ten Simple Rules for Reproducible Computational Research. PLOS Computational Biology Journal, 2013. doi:10.1371/journal.pcbi.1003285.
Vaibhav Bajpai, Arthur W. Berger, Philip Eardley, Jörg Ott, and Jürgen Schönwälder. Global Measurements: Practice and Experience (Report on Dagstuhl Seminar #16012). Computer Communication Review, 46(2):32–39, 2016. doi:10.1145/2935634.2935641.
Philip Eardley, Marco Mellia, Jörg Ott, Jürgen Schönwälder, and Henning Schulzrinne. Global Measurement Framework (Dagstuhl Seminar 13472). Dagstuhl Reports, 3(11):144–153, 2013. doi:10.4230/DagRep.3.11.144.
Elsevier SoftwareX. https://www.journals.elsevier.com/softwarex [Online; last accessed 29-December-2018].
Laura Rueda, Martin Fenner, and Patricia Cruse. Datacite: Lessons learned on persistent identifiers for research data. IJDC, 11(2):39–47, 2016. doi:10.2218/ijdc.v11i2.421.
Catherine Jones, Brian Matthews, and Ian Gent. Software reuse, repurposing and reproducibility. In Proceedings of the 12th International Conference on Digital Preservation, iPRES 2015, Chapel Hill, North Carolina, USA, November 2-6, 2015, 2015. URL: http://hdl.handle.net/11353/10.429590.
Jihwang Yeo, David Kotz, and Tristan Henderson. CRAWDAD: A Community Resource for Archiving Wireless Data at Dartmouth. SIGCOMM Computer Communication Review, pages 21–22, 2006. doi:10.1145/1129582.1129588.
IMPACT Cyber Trust. https://www.impactcybertrust.org [Online; last accessed 31-December-2018].
ACM Artifact Review and Badging. https://www.acm.org/publications/policies/ artifact-review-badging [Online; last accessed 29-December-2018].
ACM Task Force on Data, Software and Reproducibility in Publication, 2015. https: //www.acm.org/publications/task-force-on-data-software-and-reproducibility [Online; last accessed 29-December-2018].
National Academies of Sciences, Engineering, and Medicine. Open Science by Design: Realizing a Vision for 21st Century Research. The National Academies Press, 2018. doi: 10.17226/25116.
Computer and Network Systems (CNS): Core Programs. https://www.nsf.gov/pubs/2018/nsf18569/nsf18569.htm [Online; last accessed 29-December-2018].
NSF Proposal & Award Policies & Procedures Guide (PAPPG). https://www.nsf.gov/pubs/policydocs/pappg18 [Online; last accessed 29-December-2018].
Bryan Clark, Todd Deshane, Eli M. Dow, Stephen Evanchik, Matthew Finlayson, Jason Herne, and Jeanna Neefe Matthews. Xen and the Art of Repeated Research. In USENIX Annual Technical Conference, pages 135–144, 2004. URL: http://www.usenix.org/publications/library/proceedings/usenix04/tech/freenix/clark.html.
Heidi Howard, Malte Schwarzkopf, Anil Madhavapeddy, and Jon Crowcroft. Raft Refloated: Do We Have Consensus? In Operating Systems Review, pages 12–21, 2015. doi:10.1145/2723872.2723876.
Diana Andreea Popescu and Andrew W. Moore. Reproducing Network Experiments in a Time-controlled Emulation Environment. In Traffic Monitoring and Analysis (TMA), 2016. URL: http://tma.ifip.org/2016/papers/tma2016-final10.pdf.
Vaibhav Bajpai, Mirja Kühlewind, Jörg Ott, Jürgen Schönwälder, Anna Sperotto, and Brian Trammell. Challenges with Reproducibility. In SIGCOMM 2017 Reproducibility Workshop, pages 1–4, 2017. doi:10.1145/3097766.3097767.

Creative Commons BY 3.0 Unported license

Vaibhav Bajpai, Olivier Bonaventure, Kimberly Claffy, and Daniel Karrenberg

Participants

Show Participants

Vaibhav Bajpai (TU München, DE) [dblp]
Steve Bauer (MIT - Cambridge, US) [dblp]
Olivier Bonaventure (UC Louvain, BE) [dblp]
Anna Brunström (Karlstad University, SE) [dblp]
Kenneth L. Calvert (University of Kentucky - Lexington, US) [dblp]
Georg Carle (TU München, DE) [dblp]
Kimberly Claffy (San Diego Supercomputer Center, US) [dblp]
Alberto Dainotti (San Diego Supercomputer Center, US) [dblp]
Anja Feldmann (MPI für Informatik - Saarbrücken, DE) [dblp]
Ralph Holz (The University of Sydney, AU) [dblp]
Luigi Iannone (Télécom Paris Tech, FR) [dblp]
Daniel Karrenberg (RIPE NCC - Amsterdam, NL) [dblp]
Wolfgang Kellerer (TU München, DE) [dblp]
Robert Kisteleki (RIPE NCC - Amsterdam, NL) [dblp]
Mirja Kühlewind (ETH Zürich, CH) [dblp]
Andra Lutu (Telefónica Research - Barcelona, ES) [dblp]
Matt Mathis (Google Inc. - Mountain View, US) [dblp]
Jörg Ott (TU München, DE) [dblp]
Aiko Pras (University of Twente, NL) [dblp]
Damien Saucez (INRIA Sophia Antipolis, FR) [dblp]
Quirin Scheitle (TU München, DE) [dblp]
Jürgen Schönwälder (Jacobs University Bremen, DE) [dblp]
Henning Schulzrinne (Columbia University - New York, US) [dblp]
Georgios Smaragdakis (TU Berlin, DE) [dblp]
Karen Sollins (MIT - Cambridge, US) [dblp]
Joel Sommers (Colgate University - Hamilton, US) [dblp]
Brian Trammell (ETH Zürich, CH) [dblp]
Matthias Wählisch (FU Berlin, DE) [dblp]
Klaus Wehrle (RWTH Aachen University, DE) [dblp]
John Wroclawski (USC - Marina del Rey, US) [dblp]
Thomas Zinner (TU Berlin, DE) [dblp]

Classification

networks
world wide web / internet

Keywords

computer networks
reproducibility

Seminar 18412

Search the Dagstuhl Website

Schloss Dagstuhl Services

Seminars

Within this website:

External resources:

Publishing

Within this website:

External resources:

dblp

Within this website:

External resources:

Dagstuhl Seminar 18412

Encouraging Reproducibility in Scientific Research of the Internet

( Oct 07 – Oct 10, 2018 )

Permalink

Organizers

Contact

Publications

Impacts

Motivation

Summary

Goals

Structure

References

Participants

Classification

Keywords