February 26 – March 2 , 2012, Dagstuhl Seminar 12091
Principles of Provenance
1 / 2 >
For support, please contact
The term "provenance" refers to information about the origin, context, derivation, ownership or history of some artifact. In both art and science, provenance information is crucial for establishing the value of a real-world artifact, guaranteeing for example that the artifact is an original work produced by an important artist, or that a stated scientific conclusion is reproducible. Even in everyday situations, we unconsciously use provenance to judge the quality of an artifact or process. For example, we often decide what food to buy based on freshness, origin and "organic" labels; and we decide whether or not to believe an online news article based on its source, author, and timeliness.
Maintaining good and convincing records of provenance is difficult. It seems to require both pervasive monitoring of actions as they are performed, and a clear understanding of system boundaries and trustworthiness of actors. For example, every step in the chain of ownership of an important work of art needs to be recorded in a secure way in order to defend against forgery and deter attempts to sell stolen artwork.
Since it is much easier to copy or alter digital information than to alter real-world artifacts, there are even more opportunities for misinformation, forgery and error in the digital world than there are in the traditional physical world. For this reason, the need for provenance is now widely appreciated. Simple and unreliable forms of automatic provenance tracking, such as version numbering, ownership, creation and modification timestamps in file systems, have long been supported as a basic services on which more sophisticated tools can rely. In today's increasingly networked and decentralized world, however, we anticipate the need for richer provenance recording and management capabilities to be built into a wide variety of systems.
For example, "grid" or "cloud" computing infrastructures are frequently used for scientific computing, as part of a widespread trend towards "eScience", "cyberinfrastructure" or more recently the data-intensive "fourth paradigm" of science popularized by Jim Gray and others. These systems are complex and opaque. The correctness and repeatability of scientific conclusions (about, for example, climate change) is increasingly being questioned because of the lack of transparency of the complex computer systems used to derive the results. Provenance technology can help to restore transparency and increase the robustness of eScience, countering increasing skepticism of scientific results as evidenced by the so-called "Climategate" controversy in 2009.
This problem is already widely appreciated in scientific settings but is increasingly recognized as a problem in business, industrial and Web settings. Until recently, work on provenance has mostly taken place in relatively isolated parts of existing research communities, such as databases, scientific workflow-based distributed computing, or file systems, or the Semantic Web. However, we believe that to make real progress it will be necessary to form a broader research community focusing on provenance.
In this respect, the aims of Dagstuhl Seminar 12091 "Principles of Provenance" were to:
- bring together researchers from databases, security, scientific workflows, software engineering, programming languages, and other areas to identify the commonalities and differences of provenance in these areas;
- improve the mutual understanding of these communities;
- identify main areas for further foundational provenance research.
The seminar hosted 41 participants in total from the above communities, and included representatives from the W3C Provenance Working group that is in the process of standardizing a common data model for representing and exchanging provenance information.
To improve the mutual understanding of the various communities, the first day of the seminar was devoted to tutorial talks from well-respected members of each community.
The rest of the seminar consisted of presentations of recent ongoing provenance research in the various communities, as well as break-out sessions aimed at deepening discussions and identifying open problems.
- Software Engineering