October 7 – 10 , 2018, Dagstuhl Seminar 18412

Encouraging Reproducibility in Scientific Research of the Internet


Vaibhav Bajpai (TU München, DE)
Olivier Bonaventure (UC Louvain, BE)
Kimberly Claffy (San Diego Supercomputer Center, US)
Daniel Karrenberg (RIPE NCC – Amsterdam, NL)

For support, please contact

Dagstuhl Service Team


Dagstuhl Report, Volume 8, Issue 10 Dagstuhl Report
Aims & Scope
List of Participants
Dagstuhl's Impact: Documents available


Reproducibility in scientific research is a means to not only achieve trustworthiness of results, but it also lowers barriers to technology transition [1] and accelerates science by promoting incentives to data sharing. The networking research community however pays limited attention to the importance of reproducibility of results, instead tending to accept papers that appear plausible. Previous studies [2,3,4] have shown that a fraction of published papers release artifacts (such as code and datasets) that are needed to reproduce results. In order to encourage reproducibility of research, practitioners continue [5,6,7,8,9] to do service to educate the community on the need for this change. To provide incentives to authors, vehicles for publication of software and datasets are also emerging. For instance, Elsevier SoftwareX [10] is a new journal designed to specifically publish software contributions. DataCite [11,12] provides mechanisms for supporting methods to locate and cite datasets. Community Resource for Archiving Wireless Data (CRAWDAD) [13] and Information Marketplace for Policy and Analysis of Cyber-risk & Trust (IMPACT)Cyber Trust [14] provide an index of existing measurement data to not only enable new research but also advance network science by promoting reproducible research. Traditional conferences bestow best dataset awards and actively solicit submissions that reproduce results. SIGCOMM Computer Communication Review (CCR) allows authors to upload artifacts during paper submission to allow reviewers to check for reproducibility, and relaxes page limits for reproducible papers. Association for Computing Machinery (ACM) has recently introduced a new policy [15] on result and artifact review and badging. The policy identifies a terminology to use to assess results and artifacts. ACM has also initiated a new task force on data, software and reproducibility in publication [16] to understand how ACM can effectively promote reproducibility within the computing research community. National Academies of Sciences, Engineering, and Medicine with a goal to move towards the open science ecosystem has recently (2018) released a report [17] with guidance and concrete recommendations on how to build strategies for achieving open science. The target is to ensure the free availability (and usability) of publications and associated artifacts. The The National Science Foundation (NSF) is taking substantial steps [18] in this area whereby submitted proposals are required to provide a results dissemination plan to describe how produced research results are made available to the extent necessary to independently validate the findings. Towards this end, the proposal budget [19] may request funds for the costs of documenting, preparing, publishing or otherwise making a available to others the findings and products of the work conducted under the NSF grant. Despite these continued efforts, reproducibility of research exist as an ongoing problem and few papers that reproduce existing research get published [20,21,22] in practise.


In this seminar, we discussed challenges to improving reproducibility of scientific Internet research, developed a set of recommendations that we as a community can undertake to initiate a cultural change toward increased reproducibility of our work. The goal of the seminar was to discuss the questions below and to propose recommendations that would improve the state of reproducibility in computer networking research.

  • What are the challenges with reproducibility?
    How can researchers (and data providers) navigate concerns with openly sharing datasets? How should we cope with datasets that lack stable ground truth?

The first category of questions tried to identify the challenges with reproducibility [23]. For instance, concerns with openly sharing datasets led to discussions around legal restrictions and the advantages of researchers keeping data private for their own exclusive future use. Another consideration is double-blind review practices, which require that authors expend effort to obfuscate the source of their data. Would this time be better spent documenting the datasets for sharing to enable reproducibility? A "gap analysis" discussion to understand whether the problem is a lack of appropriate venues or lack of stable ground truth, or more broadly a lack of incentive to reproduce research since publishing (and funding) agents tend to prefer novelty was held. There is also the inherent risk of confirmation bias of existing results; discussion of ideas on how to train young researchers to recognize and counter this tendency was sought.

  • What incentives are needed to encourage reproducibility?
    What can publishers do? What can conference organisation committees do? How can we ensure that reviewers consider reproducibility when reviewing papers? How can we manage and scale the evaluation of artifacts during peer review? Do we need new venues that specifically require reproducibility of the submitted research?

The second category of questions is about incentives. Questions about how publishers can promote reproducibility framed discussions on whether publishers can provide storage for authors to upload data artifacts with the associated paper in digital libraries, or whether mechanisms can be developed to highlight reproducible (and reproduced) papers. Questions on how conference organisation committees can inspire ideas for additional incentives (such as best dataset awards or relaxing page limits) for authors to make research reproducible. We identified questions to add to review forms to ensure reviewers pay attention to reproducibility aspects. This further lead to discussions on whether committees (in parallel to the regular technical program committee) should evaluate artifacts during the conference review process. Should such a committee be composed of purely young researchers or a blend of young and senior researchers? Questions on the need for specific venues triggered discussions on whether high-impact journals need to establish feature topics on reproducibility or devote a dedicated column for papers that reproduce existing research.

  • What tools and systems are available to facilitate reproducibility?
    How effective are emerging interactive lab notebook tools (e.g., Jupyter) at facilitating reproducibility? Should ac{CS} course curricula integrate use of these tools for student projects to help develop skills and habits that enable reproducibility?

The third category of questions attempt to identify and review tools and systems that are available to facilitate reproducibility. Enormous interest has developed recently in tools for recording experimental observations and computational analytics on large data sets. Some researchers now document the entire process for a paper in a Jupyter lab notebook, greatly facilitating reproducibility and extension of the research. The learning curve for these tools may be daunting; we discussed how faculty can evolve ac{CS} course curricula to integrate use of these tools for student projects to help develop skills and habits that enable reproducibility.

  • What guidelines or (best practises) are needed to help reproducibility?
    How can we ensure authors think about reproducibility? What guidelines would assist reviewers in evaluating artifacts?

The fourth category of questions attempts to develop guidelines (or best practises) to promote reproducibility of research. For instance, we discussed what language could be added to Call for Papers (CFP) to encourage authors to describe reproducibility aspects (of both measurements and results) in their paper submissions.


The seminar lasted 2.5 days. The seminar began with an introductory round where each participant presented one slide to give an overview of their experience that is relevant for the seminar and a set of open questions that the participant wished to discuss during the event. These slides were collected from each participant before the seminar. We had one invited talk (§ 3.1) that we used as a basis for triggering discussions and identifying areas for group work, while a major portion of the seminar time was dedicated to breakout sessions, whereby participants were split into small groups to discuss specific themes and develop ideas with consensus to propose to larger groups. The morning sessions the following day were dedicated to continuing parallel group work with presentations that reported the outcomes of each breakout session from the previous day. In the afternoons, we dedicated some time for seven minute lightning talks to invite ideas for subsequent breakout sessions. One evening, we had a social dinner activity. The afternoon of the third day was spent reviewing and collecting feedback from the participants and to initiating follow up actions identified during the seminar.


  1. Henning Schulzrinne. Networking Research - A Reflection in the Middle Years. In Computer Communications, 2018. doi:10.1016/j.comcom.2018.07.001.
  2. Stuart Kurkowski, Tracy Camp, and Michael Colagrosso. MANET Simulation Studies: The Incredibles. In Mobile Computing and Communications Review, pages 50–61, 2005. doi:10.1145/1096166.1096174.
  3. Patrick Vandewalle, Jelena Kovacevic, and Martin Vetterli. Reproducible Research in Signal Processing. IEEE Signal Processing Magazine, 2009. doi:10.1109/MSP.2009.932122.
  4. Christian S. Collberg and Todd A. Proebsting. Repeatability in Computer Systems Research. In, Communications of the ACM, 2016. doi:10.1145/2812803.
  5. Vern Paxson. Strategies for Sound Internet Measurement. In Internet Measurement Conference, (IMC), pages 263–271, 2004. doi:10.1145/1028788.1028824.
  6. Balachander Krishnamurthy, Walter Willinger, Phillipa Gill, and Martin F. Arlitt. A Socratic Method for Validation of Measurement-based Networking Research. In Computer Communications, pages 43–53, 2011. doi:10.1016/j.comcom.2010.09.014.
  7. Geir Kjetil Sandve, Anton Nekrutenko, James Taylor, and Eivind Hovig. Ten Simple Rules for Reproducible Computational Research. PLOS Computational Biology Journal, 2013. doi:10.1371/journal.pcbi.1003285.
  8. Vaibhav Bajpai, Arthur W. Berger, Philip Eardley, Jörg Ott, and Jürgen Schönwälder. Global Measurements: Practice and Experience (Report on Dagstuhl Seminar #16012). Computer Communication Review, 46(2):32–39, 2016. doi:10.1145/2935634.2935641.
  9. Philip Eardley, Marco Mellia, Jörg Ott, Jürgen Schönwälder, and Henning Schulzrinne. Global Measurement Framework (Dagstuhl Seminar 13472). Dagstuhl Reports, 3(11):144–153, 2013. doi:10.4230/DagRep.3.11.144.
  10. Elsevier SoftwareX. [Online; last accessed 29-December-2018].
  11. Laura Rueda, Martin Fenner, and Patricia Cruse. Datacite: Lessons learned on persistent identifiers for research data. IJDC, 11(2):39–47, 2016. doi:10.2218/ijdc.v11i2.421.
  12. Catherine Jones, Brian Matthews, and Ian Gent. Software reuse, repurposing and reproducibility. In Proceedings of the 12th International Conference on Digital Preservation, iPRES 2015, Chapel Hill, North Carolina, USA, November 2-6, 2015, 2015. URL:
  13. Jihwang Yeo, David Kotz, and Tristan Henderson. CRAWDAD: A Community Resource for Archiving Wireless Data at Dartmouth. SIGCOMM Computer Communication Review, pages 21–22, 2006. doi:10.1145/1129582.1129588.
  14. IMPACT Cyber Trust. [Online; last accessed 31-December-2018].
  15. ACM Artifact Review and Badging. artifact-review-badging [Online; last accessed 29-December-2018].
  16. ACM Task Force on Data, Software and Reproducibility in Publication, 2015. https: // [Online; last accessed 29-December-2018].
  17. National Academies of Sciences, Engineering, and Medicine. Open Science by Design: Realizing a Vision for 21st Century Research. The National Academies Press, 2018. doi: 10.17226/25116.
  18. Computer and Network Systems (CNS): Core Programs. [Online; last accessed 29-December-2018].
  19. NSF Proposal & Award Policies & Procedures Guide (PAPPG). [Online; last accessed 29-December-2018].
  20. Bryan Clark, Todd Deshane, Eli M. Dow, Stephen Evanchik, Matthew Finlayson, Jason Herne, and Jeanna Neefe Matthews. Xen and the Art of Repeated Research. In USENIX Annual Technical Conference, pages 135–144, 2004. URL:
  21. Heidi Howard, Malte Schwarzkopf, Anil Madhavapeddy, and Jon Crowcroft. Raft Refloated: Do We Have Consensus? In Operating Systems Review, pages 12–21, 2015. doi:10.1145/2723872.2723876.
  22. Diana Andreea Popescu and Andrew W. Moore. Reproducing Network Experiments in a Time-controlled Emulation Environment. In Traffic Monitoring and Analysis (TMA), 2016. URL:
  23. Vaibhav Bajpai, Mirja Kühlewind, Jörg Ott, Jürgen Schönwälder, Anna Sperotto, and Brian Trammell. Challenges with Reproducibility. In SIGCOMM 2017 Reproducibility Workshop, pages 1–4, 2017. doi:10.1145/3097766.3097767.
Summary text license
  Creative Commons BY 3.0 Unported license
  Vaibhav Bajpai, Olivier Bonaventure, Kimberly Claffy, and Daniel Karrenberg


  • Networks
  • World Wide Web / Internet


  • Computer networks
  • Reproducibility


In the series Dagstuhl Reports each Dagstuhl Seminar and Dagstuhl Perspectives Workshop is documented. The seminar organizers, in cooperation with the collector, prepare a report that includes contributions from the participants' talks together with a summary of the seminar.


Download overview leaflet (PDF).

Dagstuhl's Impact

Please inform us when a publication was published as a result from your seminar. These publications are listed in the category Dagstuhl's Impact and are presented on a special shelf on the ground floor of the library.


Furthermore, a comprehensive peer-reviewed collection of research papers can be published in the series Dagstuhl Follow-Ups.