TOP
Search the Dagstuhl Website
Looking for information on the websites of the individual seminars? - Then please:
Not found what you are looking for? - Some of our services have separate websites, each with its own search option. Please check the following list:
Schloss Dagstuhl - LZI - Logo
Schloss Dagstuhl Services
Seminars
Within this website:
External resources:
  • DOOR (for registering your stay at Dagstuhl)
  • DOSA (for proposing future Dagstuhl Seminars or Dagstuhl Perspectives Workshops)
Publishing
Within this website:
External resources:
dblp
Within this website:
External resources:
  • the dblp Computer Science Bibliography


Dagstuhl Seminar 19111

Theoretical Foundations of Storage Systems

( Mar 10 – Mar 15, 2019 )

(Click in the middle of the image to enlarge)

Permalink
Please use the following short url to reference this page: https://www.dagstuhl.de/19111

Organizers

Contact


Motivation

Storage systems, including databases and file systems, are at the heart of all large data applications. Recently, some storage systems have incorporated theoretical advances in data organization techniques, with substantial improvements in performance. However, there is little contact between the systems designers who build storage systems and theoreticians who design new ways to organize data. This Seminar aims to bridge that gap, to the benefit of both communities and to improve the design of all storage systems.

External-memory algorithms are those where the data is too large to fit in memory, and hence needs to be stored on disk and accessed using I/O. Algorithmic analysis of such algorithms therefore focuses on the number of I/Os needed to complete a computation, rather than the number of instructions. This is because an I/O can be many orders of magnitude slower than a machine instruction and therefore I/Os can be the bottleneck in such computations.

The theoretical analysis of such external-memory algorithms has produced many exciting results in the last two decades. Many of these are directly relevant to practical applications, but only a few have made the leap to deployment. This Seminar aims to bring together theoreticians, who have an extensive understanding of the state of the art in external memory data structures, and storage systems researchers and practitioners, who understand the details of the problems that need to be solved.

Specific questions that would be addressed:

  • How can we use the huge improvements in string data structures to improve storage systems that manipulate strings? Many data structures, such as LSMs and Bε-trees, rely heavily on the assumption that keys are indivisible and small.
  • How can we use new multi-dimensional data indexes in working systems?
  • Many indexes suffer from fragmentation. Are there data structure improvements that would allow efficient storage on disks that are nearly full? Currently, disks are kept only a fraction full because the performance of existing data structures decays dramatically as the disk fills. This suggests another problem:
  • How can theoretical models be improved to capture such issues as:
    • full disk: The external memory model, the disk is of unbounded size.
    • sequential access: Both hard disks and SSDs require very large blocks of sequential I/O to capture a large fraction of bandwidth. The external-memory model assumes that disks are random access.
  • Concurrency: Can data structures be designed to exploit memory locality on disk while maintaining concurrency in RAM?

The storage system world is in a ferment as new hardware becomes available. Now is the time to establish deep partnerships across disciplines in computer science to solve some of the most pressing big data infrastructure problems.

Copyright Martin Farach-Colton, Inge Li Gørtz, Rob Johnson, and Donald E. Porter

Summary

Storage systems, including databases and file systems, are at the heart of all large data applications. Recently, some storage systems have incorporated theoretical advances in data organization techniques, with substantial improvements in performance. However, there is little contact between the systems designers who build storage systems and theoreticians who design new ways to organize data. This Seminar worked to bridge this gap, to the benefit of both communities and to improve the design of all storage systems.

External-memory algorithms are those where the data is too large to fit in memory, and hence needs to be stored on disk and accessed using I/O. Algorithmic analysis of such algorithms therefore focuses on the number of I/Os needed to complete a computation, rather than the number of instructions. This is because an I/O can be many orders of magnitude slower than a machine instruction and therefore I/Os can be the bottleneck in such computations.

The theoretical analysis of such external-memory algorithms has produced many exciting results in the last two decades. Many of these are directly relevant to practical applications, but only a few have made the leap to deployment. This Seminar brought together theoreticians, who have an extensive understanding of the state of the art in external-memory data structures, and storage systems researchers and practitioners, who understand the details of the problems that need to be solved.

Specific questions that were addressed in the workshop include the following:

  • How can we use the huge improvements in string data structures to improve storage systems that manipulate strings? Many data structures, such as LSMs and B^varepsilon-trees, rely heavily on the assumption that keys are indivisible and small.
  • How can we use new multi-dimensional data indexes in working systems?
  • Many indexes suffer from fragmentation. Are there data structure improvements that would allow efficient storage on disks that are nearly full? Currently, disks are kept only a fraction full because the performance of existing data structures decays dramatically as the disk fills. This suggests another problem:
  • How can theoretical models be improved to capture such issues as:
    • full disk: The external memory model, the disk is of unbounded size.
    • sequential access: Both hard disks and SSDs require very large blocks of sequential I/O to capture a large fraction of bandwidth. The external-memory model assumes that disks are random access.
  • Concurrency: Can data structures be designed to exploit memory locality on disk while maintaining concurrency in RAM?
Copyright Michael Bender, Martin Farach-Colton, Inge Li Gørtz, Rob Johnson, and Donald E. Porter

Participants
  • Michael A. Bender (Stony Brook University, US) [dblp]
  • Ioana Oriana Bercea (Tel Aviv University, IL) [dblp]
  • Jonathan Berry (Sandia National Labs - Albuquerque, US) [dblp]
  • Philip Bille (Technical University of Denmark - Lyngby, DK) [dblp]
  • Timo Bingmann (KIT - Karlsruher Institut für Technologie, DE) [dblp]
  • Alexander Conway (Rutgers University - Piscataway, US) [dblp]
  • Guy Even (Tel Aviv University, IL) [dblp]
  • Martin Farach-Colton (Rutgers University - Piscataway, US) [dblp]
  • Jeremy Fineman (Georgetown University - Washington, US) [dblp]
  • Johannes Fischer (TU Dortmund, DE) [dblp]
  • Pawel Gawrychowski (University of Wroclaw, PL) [dblp]
  • Seth Gilbert (National University of Singapore, SG) [dblp]
  • Inge Li Gørtz (Technical University of Denmark - Lyngby, DK) [dblp]
  • Magnús M. Halldórsson (Reykjavik University, IS) [dblp]
  • William Jannen (Williams College - Williamstown, US) [dblp]
  • Rob Johnson (VMware - Palo Alto, US) [dblp]
  • Tomasz Kociumaka (Univ. of Warsaw, PL & Bar-IIan Univ. Ramat-Gan, IL) [dblp]
  • Geoff Kuenning (Harvey Mudd College - Claremont, US) [dblp]
  • Bradley C. Kuszmaul (Oracle Labs - Burlington, US) [dblp]
  • William Kuszmaul (MIT - Cambridge, US) [dblp]
  • Simon Mauras (University Paris-Diderot, FR) [dblp]
  • Samuel McCauley (Bar-Ilan University - Ramat Gan, IL) [dblp]
  • Ulrich Carsten Meyer (Goethe-Universität - Frankfurt a. M., DE) [dblp]
  • Miguel A. Mosteiro (Pace University - New York, US) [dblp]
  • Ian Munro (University of Waterloo, CA) [dblp]
  • Sam H. Noh (Ulsan National Institute of Science and Technology, KR) [dblp]
  • Prashant Pandey (Carnegie Mellon University - Pittsburgh, US) [dblp]
  • Nikos Parotsidis (University of Copenhagen, DK) [dblp]
  • Cynthia A. Phillips (Sandia National Labs - Albuquerque, US) [dblp]
  • Solon Pissis (CWI - Amsterdam, NL) [dblp]
  • Donald E. Porter (University of North Carolina at Chapel Hill, US) [dblp]
  • Simon J. Puglisi (University of Helsinki, FI) [dblp]
  • Tom Ridge (University of Leicester, GB) [dblp]
  • Eva Rotenberg (Technical University of Denmark - Lyngby, DK) [dblp]
  • Siddhartha Sen (Microsoft - New York, US) [dblp]
  • Francesco Silvestri (University of Padova, IT) [dblp]
  • Shikha Singh (Wellesley College, US) [dblp]
  • Meng-Tsung Tsai (National Chiao-Tung University - Hsinchu, TW) [dblp]
  • Przemyslaw Uznanski (University of Wroclaw, PL) [dblp]
  • Janet Vorobyeva (Stony Brook University, US)
  • Gala Yadgar (Technion - Haifa, IL) [dblp]

Classification
  • data structures / algorithms / complexity
  • hardware
  • operating systems

Keywords
  • Storage Systems
  • External Memory Algorithms