https://www.dagstuhl.de/19051

January 27 – February 1 , 2019, Dagstuhl Seminar 19051

Data Structures for the Cloud and External Memory Data

Organizers

Gerth Stølting Brodal (Aarhus University, DK)
Ulrich Carsten Meyer (Goethe-Universität – Frankfurt a. M., DE)
Markus E. Nebel (Universität Bielefeld, DE)
Robert Sedgewick (Princeton University, US)

For support, please contact

Dagstuhl Service Team

Documents

Dagstuhl Report, Volume 9, Issue 1 Dagstuhl Report
Aims & Scope
List of Participants
Shared Documents

Summary

About the seminar

Data structures provide ways of storing and manipulating data and information that are appropriate for the computational model at hand. Every such model relies on assumptions that we have to keep questioning. The aim of this seminar was to exchange ideas for new algorithms and data structures, and to discuss our models of computations in light of recent technological advances. This Dagstuhl seminar was the 13th in a series of loosely related Dagstuhl seminars on data structures.

Topics

The presentations covered both advances in classic fields, as well as new problems and insights for recent trends in computing. In particular, Johnson (Section 3.12) and Muth (Section 3.17) reported on models and research opportunities in the cloud and external memory motivated by practical demands.

A number of talks highlighted technical challenges in storing and processing large datasets: Bast (Section 3.2) demonstrated the knowledge database QLever and discussed algorithmic aspects. Distributed frameworks were presented by Bingmann (Section 3.4) reporting on the progress of Thrill while focusing on parallel external sorting and by Yan (Section 3.32) who introduced G-thinker. Farach-Colton (Section 3.7) analyzed the slow-down of various filesystems caused by updates over time. Owens (Section 3.19) discuses intricacies of GPUs and presented efficient and practical data structures for this hardware.

In order to mitigate the impact of huge datasets, streaming and online algorithms were considered. Martinez (Section 3.15) discussed Affirmative Sampling which takes uniform samples of a stream and adapts the sample size to the stream’s diversity. Sedgewick (Section 3.26) revisited the cardinality estimation problem and proposed the HyperBitBit algorithm. A matching of requests to resources in an online setting was covered by Raghvendra (Section 3.22). Similarly, Mehlhorn (Section 3.16) presented a solution to assigning indivisible resources approximately optimizing the social welfare.

Nebel (Section 3.18) and Wild (Section 3.31) proposed and analyzed tree-based data structures. Additionally, various aspects on more general graph processing were covered ranging from their enumeration (Lumbroso, Section 3.14) and random sampling (Penschuck, Section 3.20), over representations for k-connectivity (Pettie, Section 3.21) to the detection of substructures (Silvestri, Section 3.28 and Tarjan, Section 3.29).

Regarding the complexity of graph algorithms, Fagerberg (Section 3.23) presented new lower bounds on the reorganisation cost of B-trees, while Thankachan (Section 3.30) gave hardness results on the recognizability of Wheeler graphs. Kopelowitz (Section 3.13) considered the complexity of data structures for the set-disjointness problem. Emphasizing cloud-related security concerns, Jacob (Section 3.11) showed that a range of simple data structures have to incur an (log n) overhead if one wants to prevent information leakage via their access patterns.

Problems involving large text corpora were considered by Fischer (Section 3.8) presenting an external memory bi-directional compression scheme, by Golin (Section 3.9) discussing AIFV codes, and by Salinger (Section 3.24) analyzing persistent full-text indices for versioned documents.

Data structures using hashing were examined by Conway (Section 3.5), Dietzfelbinger (Section 3.6), Even and Sanders (Section 3.25). Bender (Section 3.3) discussed variants of Bloom filters which adapt based on past queries.

Afshani (Section 3.1) presented Fragile Complexity, a novel model of computation with an element-centric cost function, and gave bounds for various classical problems. Iacono (Section 3.10) proposed to model locality-of-reference more explicitly and compared his proposal to the external memory and cache-oblivious model. Sen (Section 3.27) proposed the novel paradigm HAIbrid augmenting classic data structures with aritifical intelligence.

Final Thoughts

The organizers would like to thank the Dagstuhl team for their continuous support; the welcoming atmosphere made the seminar both highly productive and enjoyable. They also thank all participants for their contributions to this seminar.

License
  Creative Commons BY 3.0 Unported license
  Gerth Stølting Brodal, Ulrich Carsten Meyer, Markus E. Nebel, and Robert Sedgewick

Dagstuhl Seminar Series

Classification

  • Data Structures / Algorithms / Complexity

Keywords

  • Data structures
  • Algorithms
  • Cloud computing
  • Large data sets
  • External memory methods
  • Big data
  • Web-scale

Documentation

In the series Dagstuhl Reports each Dagstuhl Seminar and Dagstuhl Perspectives Workshop is documented. The seminar organizers, in cooperation with the collector, prepare a report that includes contributions from the participants' talks together with a summary of the seminar.

 

Download overview leaflet (PDF).

Publications

Furthermore, a comprehensive peer-reviewed collection of research papers can be published in the series Dagstuhl Follow-Ups.

Dagstuhl's Impact

Please inform us when a publication was published as a result from your seminar. These publications are listed in the category Dagstuhl's Impact and are presented on a special shelf on the ground floor of the library.

NSF young researcher support