http://www.dagstuhl.de/19051

January 27 – February 1 , 2019, Dagstuhl Seminar 19051

Data Structures for the Cloud and External Memory Data

Organizers

Gerth Stølting Brodal (Aarhus University, DK)
Ulrich Carsten Meyer (Goethe-Universität – Frankfurt a. M., DE)
Markus E. Nebel (Universität Bielefeld, DE)
Robert Sedgewick (Princeton University, US)

For support, please contact

Susanne Bach-Bernhard for administrative matters

Andreas Dolzmann for scientific matters

Motivation

Computing is about the processing, exchanging, handling and storage of information. The way information is stored profoundly influences the efficiency of operations over the data. Research in data structures attempts to provide ways of storing and manipulating data and information that are appropriate for the computational model at hand: A setting where data are to be stored in internal random access memory requires data structuring methods different from settings where data are to be stored on sequentially accessible tapes, or on a multiple cache-external disk memory hierarchy with highly varying access latency. Still other settings have data distributed over a network of nodes, involve large scale parallel processing, or just utilize the very small scale parallelism provided by the simultaneous processing of all the bits of one computer word.

In spite of the maturity of the field many data structuring problems are still open, while new ones arise due to technological advances. Moreover, almost every new computing or storage technology poses new data structuring problems. This Dagstuhl Seminar will address besides the “classical” data structuring topics the following issues:

  • Distributed Data Structures for Cloud Services. There is a set of established programming models (like, e.g., Google’s MapReduce and Apache’s hadoop) and distributed data structures (like, e.g., Google’s BigTable) that serve as building blocks of many large-scale cloud services. They are designed to scale across thousands of machines in a way that allows to add additional servers without the need of reconfiguration. Due to hardware failures and latency issues components have varying and hardly predictable responsiveness and it is one of the challenges to construct a large scale responsive cloud services out of such parts. From a more abstract point of view, the following two subjects are of importance when it comes to the design of new distributed data structures for large scale applications: Problem decomposition and fault tolerance.
  • Data Structures for Modern Memory Hierarchies. The amount of data processed by typical cloud services usually exceeds the size of DRAM available within all servers even on large clusters. As a matter of fact, lots of MapReduce operations in practical applications use B-tree files as input. The subject addressed under this headline is best explained by looking at the B-tree itself, which was introduced by Bayer and McCreight in 1972. It is still in use to index data that is too big to fit in memory and thus is still at the core of database and file system design. However, B-trees somehow are designed for hardware from the 70s. On disks of that era, there was a relatively small gap between sequential and random I/Os. But bandwidth increases as the square root of the capacity, and so it has enjoyed the same type of exponential growth as capacity has, though of course with twice the doubling time. In order to address the resulting problematic aspect of the B-tree performance profile, specifically the insertion bottleneck, write optimized data structures like, e.g., buffered repository trees and Bε-trees have been developed. The best of these can match a B-tree on its best operations (searches and sequential insertions) while outperforming them by orders of magnitude on their worst operations (random insertions). These raise questions to be addressed during our seminar: What is the right way to achieve concurrency in these high-insertion dictionaries? What is the right way to implement ACID (Atomicity, Consistency, Isolation, Durability – a set of properties of database transactions)? What is the right analytical framework for these data structures?

License
  Creative Commons BY 3.0 DE
  Gerth Stølting Brodal, Ulrich Carsten Meyer, Markus E. Nebel, and Robert Sedgewick

Dagstuhl Seminar Series

Classification

  • Data Structures / Algorithms / Complexity

Keywords

  • Data structures
  • Algorithms
  • Cloud computing
  • Large data sets
  • External memory methods
  • Big data
  • Web-scale

Book exhibition

Books from the participants of the current Seminar 

Book exhibition in the library, ground floor, during the seminar week.

Documentation

In the series Dagstuhl Reports each Dagstuhl Seminar and Dagstuhl Perspectives Workshop is documented. The seminar organizers, in cooperation with the collector, prepare a report that includes contributions from the participants' talks together with a summary of the seminar.

 

Download overview leaflet (PDF).

Publications

Furthermore, a comprehensive peer-reviewed collection of research papers can be published in the series Dagstuhl Follow-Ups.

Dagstuhl's Impact

Please inform us when a publication was published as a result from your seminar. These publications are listed in the category Dagstuhl's Impact and are presented on a special shelf on the ground floor of the library.

NSF young researcher support