- Thoughts from the Dagstuhl Principles of Provenance Workshop
Blog entry by Paul Groth, published March 7, 2012.
The term "provenance" refers to information about the origin, context, derivation, ownership or history of some artifact. In both art and science, provenance information is crucial for establishing the value of a real-world artifact, guaranteeing for example that the artifact is an original work produced by an important artist, or that a stated scientific conclusion is reproducible. Even in everyday situations, we unconsciously use provenance to judge the quality of an artifact or process. For example, we often decide what food to buy based on freshness, origin and "organic" labels; and we decide whether or not to believe an online news article based on its source, author, and timeliness.
Maintaining good and convincing records of provenance is difficult. It seems to require both pervasive monitoring of actions as they are performed, and a clear understanding of system boundaries and trustworthiness of actors. For example, every step in the chain of ownership of an important work of art needs to be recorded in a secure way in order to defend against forgery and deter attempts to sell stolen artwork.
Since it is much easier to copy or alter digital information than to alter real-world artifacts, there are even more opportunities for misinformation, forgery and error in the digital world than there are in the traditional physical world. For this reason, the need for provenance is now widely appreciated. Simple and unreliable forms of automatic provenance tracking, such as version numbering, ownership, creation and modification timestamps in file systems, have long been supported as a basic services on which more sophisticated tools can rely. In today's increasingly networked and decentralized world, however, we anticipate the need for richer provenance recording and management capabilities to be built into a wide variety of systems.
For example, "grid" or "cloud" computing infrastructures are frequently used for scientific computing, as part of a widespread trend towards "eScience", "cyberinfrastructure" or more recently the data-intensive "fourth paradigm" of science popularized by Jim Gray and others. These systems are complex and opaque. The correctness and repeatability of scientific conclusions (about, for example, climate change) is increasingly being questioned because of the lack of transparency of the complex computer systems used to derive the results. Provenance technology can help to restore transparency and increase the robustness of eScience, countering increasing skepticism of scientific results as evidenced by the so-called "Climategate" controversy in 2009.
This problem is already widely appreciated in scientific settings but is increasingly recognized as a problem in business, industrial and Web settings. Until recently, work on provenance has mostly taken place in relatively isolated parts of existing research communities, such as databases, scientific workflow-based distributed computing, or file systems, or the Semantic Web. However, we believe that to make real progress it will be necessary to form a broader research community focusing on provenance.
In this respect, the aims of Dagstuhl Seminar 12091 "Principles of Provenance" were to:
- bring together researchers from databases, security, scientific workflows, software engineering, programming languages, and other areas to identify the commonalities and differences of provenance in these areas;
- improve the mutual understanding of these communities;
- identify main areas for further foundational provenance research.
The seminar hosted 41 participants in total from the above communities, and included representatives from the W3C Provenance Working group that is in the process of standardizing a common data model for representing and exchanging provenance information.
To improve the mutual understanding of the various communities, the first day of the seminar was devoted to tutorial talks from well-respected members of each community.
The rest of the seminar consisted of presentations of recent ongoing provenance research in the various communities, as well as break-out sessions aimed at deepening discussions and identifying open problems.
- Umut A. Acar (MPI-SWS - Kaiserslautern, DE) [dblp]
- Shawn Bowers (Gonzaga University - Spokane, US)
- Peter Buneman (University of Edinburgh, GB) [dblp]
- Adriane Chapman (MITRE - McLean, US)
- James Cheney (University of Edinburgh, GB) [dblp]
- Stephen Chong (Harvard University - Cambridge, US) [dblp]
- Sarah Cohen-Boulakia (University of Paris South XI, FR)
- Victor Cuevas-Vicenttin (St. Martin-d'Heres, FR)
- Lois Delcambre (Portland State University, US)
- Kai Eckert (Universität Mannheim, DE)
- Nate Foster (Cornell University, US) [dblp]
- Juliana Freire (Polytechnic Institute of NYU - Brooklyn, US) [dblp]
- James Frew (University of California - Santa Barbara, US)
- Irini Fundulaki (FORTH - Heraklion, GR) [dblp]
- Daniel Garijo (Technical University of Madrid, ES) [dblp]
- Floris Geerts (University of Antwerp, BE) [dblp]
- Ashish Gehani (SRI - Menlo Park, US) [dblp]
- Carole Goble (University of Manchester, GB) [dblp]
- Todd J. Green (University of California - Davis, US & LogicBlox - Atlanta, US) [dblp]
- Paul Groth (VU University Amsterdam, NL) [dblp]
- Torsten Grust (Universität Tübingen, DE) [dblp]
- Olaf Hartig (HU Berlin, DE) [dblp]
- Melanie Herschel (University of Paris South XI, FR) [dblp]
- Bertram Ludäscher (University of California - Davis, US) [dblp]
- Andrew Martin (University of Oxford, GB)
- Simon Miles (King's College London, GB) [dblp]
- Paolo Missier (University of Newcastle, GB) [dblp]
- Luc Moreau (University of Southampton, GB) [dblp]
- Leon J. Osterweil (University of Massachusetts - Amherst, US) [dblp]
- Christopher Ré (University of Wisconsin - Madison, US) [dblp]
- Vladimiro Sassone (University of Southampton, GB)
- Martin Schäler (Universität Magdeburg, DE) [dblp]
- Margo Seltzer (Harvard University - Cambridge, US) [dblp]
- Christian Skalka (University of Vermont, US) [dblp]
- Perdita Stevens (University of Edinburgh, GB) [dblp]
- Wang-Chiew Tan (University of California - Santa Cruz, US) [dblp]
- Jan Van den Bussche (Hasselt University - Diepenbeek, BE) [dblp]
- Stijn Vansummeren (University of Brussels, BE) [dblp]
- Marianne Winslett (University of Illinois - Urbana-Champaign, US)
- Steve Zdancewic (University of Pennsylvania - Philadelphia, US) [dblp]
- Jun Zhao (University of Oxford, GB) [dblp]
- software engineering