https://www.dagstuhl.de/21071

### February 14 – 19 , 2021, Dagstuhl Seminar 21071

# Scalable Data Structures

## Organizers

Gerth Stølting Brodal (Aarhus University, DK)

John Iacono (UL – Brussels, BE)

Markus E. Nebel (Universität Bielefeld, DE)

Vijaya Ramachandran (University of Texas – Austin, US)

## For support, please contact

## Documents

Dagstuhl Report, Volume 11, Issue 1

Aims & Scope

List of Participants

Dagstuhl Seminar Schedule [pdf]

## Summary

### About the seminar

Scalable data structures form the backbone for computing: Computing is about processing, exchanging, and storing data. The organization of data profoundly influences the performance of accessing and manipulating data. By optimizing the way data is stored, performance can be improved by several orders of magnitude when data scales. This Dagstuhl seminar brought together researchers from several research directions to illuminate solutions to the scalability challenge of data structures. The seminar was the 14th in a series of loosely related Dagstuhl seminars on data structures. Due to the ongoing Covid-19 pandemic the seminar was purely virtual.

### Topics

The presentations covered both advances in classic data structure fields, as well as insights addressing the scalability of computing for different models of computation.

Classic data structure questions on dictionaries, hashing, filters, and heaps were the topic of several talks. Wild (Section 4.8) considered new pointer based search trees, Kozma (Section 4.24) discussed algorithms related to self-adjusting trees and heaps, Bercea (Section 4.4) presented results for dictionaries and filters, Even (Section 4.21) discussed dynamic stable perfect hashing, and Farach-Colton (Section 4.2) presented results for succinct stable hash tables. Johnson (Section 4.25) presented the vector quotient filter based on Robin Hood hashing and designed to exploit SIMD instructions. An external memory dictionary was presented by Conway (Section 4.20) who presented the SplinterDB key-value store for NVMe solid state drives.

For data structure problems on strings, Gørtz (Section 4.9) discussed the support for random access in compact representations of strings, and Starikovskaya (Section 4.22) dictionary look-ups with mismatches.

Data structures for storing and querying static and dynamic graphs were the topic of a sequence of talks. Pettie (Section 4.1) considered trade-offs between space usage and query time for supporting exact distance queries in planar graphs. Rotenberg (Section 4.5) considered planarity testing of dynamic graphs under the insertion and deletion of edges with polylogarithmic update time. Kopelowitz (Section 4.6) considered maintaining the orientation of edges in dynamic forests under the insertion of edges, guaranteeing low out degree of all nodes. Henzinger (Section 4.16) presented an algorithm for maintaining a (Δ+1)-vertex coloring of a graph with maximal degree Δ with constant time edge insertions and deletions. Bast (Section 4.27) gave a demonstration of an implementation of algorithms for real-time searching knowledge graphs with billions of edges.

Parallel algorithms for problems on graphs were addressed in multiple talks. Liu (Section 4.10) considered a parallel algorithm for counting triangles (cliques of size three) in graphs under batched updates of edge insertions and deletions, and Blelloch (Section 4.15) considered parallel batched dynamic algorithms for the minimum spanning tree and minimum cut problems. Sun (Section 4.13) considered a parallel algorithm for the single source shortest path problem using lazy batched priority queues. Shun (Section 4.3) considered a parallel index-based algorithm for graph clustering and an approximation algorithm using locality-sensitive hashing.

Computational models supporting massive parallelism, like GPUs and TCUs, were addressed in talks by Owens (Section 4.11) who considered open-addressing hashing on GPUs, by Geil (Section 4.18) who consider how to solve the maximum clique problem on GPUs, and by Silvestri (Section 4.14) who addressed similarity search with tensor core units. Sitchinava & Jacob (Section 4.19) in their joint talk, considered the power of the atomic and non-atomic versions of the parallel fork-join model. Sanders (Section 4.12) considered how to execute MapReduce computations robustly and efficiently on realistic distributed-memory parallel machines.

Ellen (Section 4.17) considered labelling schemes for networks supporting distributed deterministic radio broadcast using labels of constant-length at the nodes of the network. Lincoln (Section 4.23) presented new techniques for proving fine-grained average-case hardness results, and Fagerberg (Section 4.26) considered the fragile complexity of adaptive algorithms. Finally, Driemel (Section 4.7) considered approximate-near-neighbor data structures for time series under the continuous Fréchet distance.

### Final Thoughts

The organizers would like to thank the Dagstuhl team for their continuous support and allowing this seminar to happen as a purely virtual Dagstuhl seminar. They also thank all participants for their contributions to this seminar.

Previous seminars in the series had few female participants. A focus for this seminar was to significantly increase the female attendance. 50% of the invited participants were female, resulting in a 38% female attendance.

Even though the seminar was challenged by the different time zones of the participants, on average 37 of the 48 participants participants attended the talks, and all talks were attended by at least 30 partcipants. In the post-seminar survey it was appreciated that the seminar took place as a virtual seminar instead of being cancelled, but it was also stated that the virtual format can never be as productive as an in-person seminar and showed how much we should appreciate the possibilities Dagstuhl offers under regular circumstances.

**Summary text license**

Creative Commons BY 4.0

Gerth Stølting Brodal, John Iacono, Markus E. Nebel, and Vijaya Ramachandran

## Dagstuhl Seminar Series

- 23211: "Scalable Data Structures" (2023)
- 19051: "Data Structures for the Cloud and External Memory Data" (2019)
- 16101: "Data Structures and Advanced Models of Computation on Big Data" (2016)
- 14091: "Data Structures and Advanced Models of Computation on Big Data" (2014)
- 10091: "Data Structures" (2010)
- 08081: "Data Structures" (2008)
- 06091: "Data Structures " (2006)
- 04091: "Data Structures" (2004)
- 02091: "Data Structures" (2002)
- 00091: "Data Structures" (2000)
- 98091: "Data Structures" (1998)
- 9609: "Data Structures" (1996)
- 9409: "Data Structures" (1994)
- 9145: "Data Structures" (1991)

## Classification

- Data Bases / Information Retrieval
- Data Structures / Algorithms / Complexity

## Keywords

- Data structures
- Algorithms
- Computational models
- Big data
- Parallel computation