- Susanne Bach-Bernhard (for administrative matters)
Two significant trends in data management are emerging: data is moving to cloud infrastructures and an increasing fraction of data produced is born digital. We risk losing all record of born digital data if we do not take explicit steps to ensure its longevity. While each of these trends raises its own set of questions, our seminar began with two fundamental questions at the intersection of these trends: What role should the cloud play in preservation? What steps should we be taking now to preserve the future of today's digital artifacts?
We addressed these two questions by bringing together a diverse cohort of approximately thirty participants. Our participants consisted of researchers from both academia and industry, representatives from cloud providers, and archivists and librarians from memory institutions. Every participant was responsible for some aspect of the program, and the workshop was characterized by lively debate. There were four primary outcomes of the workshop:
- We identified key functional requirements that are critical if cloud infrastructures are to be used for long-term digital preservation.
- We identified topics where we were unable to reach agreement; since we are trying to look into the future, while not satisfactory, it seems likely we will need to wait until the future to resolve these debates.
- We identified several specific problems requiring further work and brought together groups of people interested in pursuing those areas.
- We identified several areas that we were not able to address, either because we lacked the expertise in the room or we ran out of time; these areas represent opportunities for subsequent workshops.
Perhaps the most pressing issue with respect to existing cloud infrastructures is the lack of standardized APIs. If data are to outlive any particular organization, then it is crucial that archives span organizational boundaries; standardized APIs make this dramatically easier and more robust. There was also agreement that some form of automated appraisal was important, but there were no concrete ideas about how to do it.
We had lively debate around the long term cost of cloud storage, in particular public clouds; since this debate depended upon assumptions of future costs, the future will ultimately resolve the debate. We also had much discussion around the importance of logical preservation and whether the modern world, with readily available open source viewers has made the need for logical preservation obsolete.
Several small working groups coalesced around the areas of: archival exit (how do you get data out of an archive), the technical design of preservation-as-a-service (PaaS), technologies for ensuring that data is "forgotten", and searching distributed archives. We are hoping to see these small groups evolve into productive collaborations that continue the work begun at the seminar.
Finally, there were a number of areas related to using the cloud as a preservation service that we were unable to address. For example, what legal issues arise if companies undertake digital archival initiatives? Is there a legal definition of "deletion" of data, and is it practical? Where does "record management" end and "archival" begin? Who is the customer for long term preservation? Is it the data provider? Or perhaps it's the data consumer? What happens to archived data if payment cannot be made? What is the economic model behind long term archival? These and other questions provide ample opportunity for further workshops on this topic.
The workshop was organized around a series of 90-minute sessions, each of which began with one or more short presentations followed by a moderated discussion. We had one person scribe each session and the session moderators produced the session summaries that appear in this report documenting each session. We also devoted one session to smaller breakout groups, who reported back in our closing session.
- Ian F. Adams (University of California - Santa Cruz, US)
- Jean Bacon (University of Cambridge, GB) [dblp]
- Mary Baker (HP Labs - Palo Alto, US)
- Christoph Becker (TU Wien, AT) [dblp]
- André Brinkmann (Universität Mainz, DE) [dblp]
- Nikos Chondros (University of Athens, GR)
- Milena Dobreva (University of Malta, MT) [dblp]
- Erik Elmroth (University of Umeå, SE) [dblp]
- Michael Factor (IBM - Haifa, IL)
- Sam Fineberg (HP Storage CT Office - Fremont, US)
- David Giaretta (APA, Dorset, GB)
- Matthias Grawinkel (Universität Mainz, DE)
- Alexandru Iosup (TU Delft, NL) [dblp]
- Ross King (AIT Austrian Institute of Technology - Wien, AT)
- Hillel Kolodner (IBM - Haifa, IL)
- Ewnetu Bayuh Lakew (University of Umeå, SE)
- Natasa Milic-Frayling (Microsoft Research UK - Cambridge, GB) [dblp]
- Ethan Miller (University of California - Santa Cruz, US)
- Dirk Nitschke (Oracle - Herndon, US)
- Gillian Oliver (Victoria University - Wellington, NZ) [dblp]
- Peter R. Pietzuch (Imperial College London, GB)
- David S. H. Rosenthal (Stanford University Libraries, US)
- Raivo Ruusalepp (National Library of Estonia - Tallinn, EE) [dblp]
- Gerhard Schneider (Universität Freiburg, DE)
- Margo Seltzer (Harvard University - Cambridge, US) [dblp]
- Liuba Shrira (Brandeis University - Waltham, US) [dblp]
- Joanne Syben (Google Inc. - Mountain View, US)
- Lawrence You (Google Inc. - Mountain View, US)
- data bases / information retrieval
- society / human-computer interaction
- Long-term preservation
- cloud storage
- data access
- storage systems