http://www.dagstuhl.de/19051

27. Januar – 01. Februar 2019, Dagstuhl Seminar 19051

Data Structures for the Cloud and External Memory Data

Organisatoren

Gerth Stølting Brodal (Aarhus University, DK)
Ulrich Carsten Meyer (Goethe-Universität – Frankfurt a. M., DE)
Markus E. Nebel (Universität Bielefeld, DE)
Robert Sedgewick (Princeton University, US)

Auskunft zu diesem Dagstuhl Seminar erteilen

Susanne Bach-Bernhard zu administrativen Fragen

Andreas Dolzmann zu wissenschaftlichen Fragen

Motivation

Computing is about the processing, exchanging, handling and storage of information. The way information is stored profoundly influences the efficiency of operations over the data. Research in data structures attempts to provide ways of storing and manipulating data and information that are appropriate for the computational model at hand: A setting where data are to be stored in internal random access memory requires data structuring methods different from settings where data are to be stored on sequentially accessible tapes, or on a multiple cache-external disk memory hierarchy with highly varying access latency. Still other settings have data distributed over a network of nodes, involve large scale parallel processing, or just utilize the very small scale parallelism provided by the simultaneous processing of all the bits of one computer word.

In spite of the maturity of the field many data structuring problems are still open, while new ones arise due to technological advances. Moreover, almost every new computing or storage technology poses new data structuring problems. This Dagstuhl Seminar will address besides the “classical” data structuring topics the following issues:

  • Distributed Data Structures for Cloud Services. There is a set of established programming models (like, e.g., Google’s MapReduce and Apache’s hadoop) and distributed data structures (like, e.g., Google’s BigTable) that serve as building blocks of many large-scale cloud services. They are designed to scale across thousands of machines in a way that allows to add additional servers without the need of reconfiguration. Due to hardware failures and latency issues components have varying and hardly predictable responsiveness and it is one of the challenges to construct a large scale responsive cloud services out of such parts. From a more abstract point of view, the following two subjects are of importance when it comes to the design of new distributed data structures for large scale applications: Problem decomposition and fault tolerance.
  • Data Structures for Modern Memory Hierarchies. The amount of data processed by typical cloud services usually exceeds the size of DRAM available within all servers even on large clusters. As a matter of fact, lots of MapReduce operations in practical applications use B-tree files as input. The subject addressed under this headline is best explained by looking at the B-tree itself, which was introduced by Bayer and McCreight in 1972. It is still in use to index data that is too big to fit in memory and thus is still at the core of database and file system design. However, B-trees somehow are designed for hardware from the 70s. On disks of that era, there was a relatively small gap between sequential and random I/Os. But bandwidth increases as the square root of the capacity, and so it has enjoyed the same type of exponential growth as capacity has, though of course with twice the doubling time. In order to address the resulting problematic aspect of the B-tree performance profile, specifically the insertion bottleneck, write optimized data structures like, e.g., buffered repository trees and Bε-trees have been developed. The best of these can match a B-tree on its best operations (searches and sequential insertions) while outperforming them by orders of magnitude on their worst operations (random insertions). These raise questions to be addressed during our seminar: What is the right way to achieve concurrency in these high-insertion dictionaries? What is the right way to implement ACID (Atomicity, Consistency, Isolation, Durability – a set of properties of database transactions)? What is the right analytical framework for these data structures?

License
  Creative Commons BY 3.0 DE
  Gerth Stølting Brodal, Ulrich Carsten Meyer, Markus E. Nebel, and Robert Sedgewick

Dagstuhl Seminar Series

Classification

  • Data Structures / Algorithms / Complexity

Keywords

  • Data structures
  • Algorithms
  • Cloud computing
  • Large data sets
  • External memory methods
  • Big data
  • Web-scale

Buchausstellung

Bücher der Teilnehmer 

Buchausstellung im Erdgeschoss der Bibliothek

(nur in der Veranstaltungswoche).

Dokumentation

In der Reihe Dagstuhl Reports werden alle Dagstuhl-Seminare und Dagstuhl-Perspektiven-Workshops dokumentiert. Die Organisatoren stellen zusammen mit dem Collector des Seminars einen Bericht zusammen, der die Beiträge der Autoren zusammenfasst und um eine Zusammenfassung ergänzt.

 

Download Übersichtsflyer (PDF).

Publikationen

Es besteht weiterhin die Möglichkeit, eine umfassende Kollektion begutachteter Arbeiten in der Reihe Dagstuhl Follow-Ups zu publizieren.

Dagstuhl's Impact

Bitte informieren Sie uns, wenn eine Veröffentlichung ausgehend von
Ihrem Seminar entsteht. Derartige Veröffentlichungen werden von uns in der Rubrik Dagstuhl's Impact separat aufgelistet  und im Erdgeschoss der Bibliothek präsentiert.