TOP
Search the Dagstuhl Website
Looking for information on the websites of the individual seminars? - Then please:
Not found what you are looking for? - Some of our services have separate websites, each with its own search option. Please check the following list:
Schloss Dagstuhl - LZI - Logo
Schloss Dagstuhl Services
Seminars
Within this website:
External resources:
  • DOOR (for registering your stay at Dagstuhl)
  • DOSA (for proposing future Dagstuhl Seminars or Dagstuhl Perspectives Workshops)
Publishing
Within this website:
External resources:
dblp
Within this website:
External resources:
  • the dblp Computer Science Bibliography


Dagstuhl Seminar 23211

Scalable Data Structures

( May 21 – May 26, 2023 )

(Click in the middle of the image to enlarge)

Permalink
Please use the following short url to reference this page: https://www.dagstuhl.de/23211

Organizers

Contact

Shared Documents


Schedule

Summary

About the seminar

The scale at which data is generated and processed is increasing unabated and novel applications continue to arise, posing new challenges for data structure design and analysis. The performance of data structures can dramatically affect the overall efficiency of computing systems, motivating research on scalable data structures along the entire spectrum from purely theoretical to purely empirical.

The focus of data structure research has continuously shifted to better align with the changing realities of the underlying hardware (e.g. by refining computational models to capture memory hierarchies and parallelism), and the requirements of applications (e.g. by finding the right input models, novel modes of operation, and special requirements such as data privacy or adapting to side-information). Suitable data structuring abstractions have often been crucial components of algorithmic breakthroughs, for instance in static or dynamic graph algorithms, e.g. for maximum flow or minimum spanning trees. Research on classical problems and long-standing open questions continues, with surprising recent improvements, e.g. for list labeling.

Data structure research has been a core part of computer science from the beginnings, and the field appears as vibrant as ever, with research continuing on deep old questions, as well as on new directions reflecting the changing realities of the computational landscape. This seminar was the 15th in a series of loosely related Dagstuhl Seminars on data structures, bringing together researchers from several research directions to illuminate various aspects and solutions to the problem of data structure scalability. Following the previous, purely virtual meeting, the seminar was organized as a fully on-site event.

Topics

The presentations covered both advances in classic data structure fields, as well as insights that addressed the scalability of computing in different models of computation and across a diverse range of applications. In the following, we make reference to the relevant sections of the full report.

Parallelism was an important theme of the seminar. Blelloch (Section 4.1) discussed possible ways of making parallelism a core part of a computer science curriculum, Agrawal (Section 4.12) talked about incorporating data structures in parallel algorithms, and Sun (Section 4.5) presented algorithms for Longest Increasing Subsequence, building on parallel data structures.

Classic questions of data structure design and analysis in the comparison- or pointer-based models were the topic of multiple talks. Tarjan (Section 4.17) discussed various self-adjusting heaps and new results on their amortized efficiency. Munro (Section 4.24) talked about the ordered majority problem. Sorting was the subject of a series of talks: Wild (Section 4.3) talked about new, practically efficient sorting algorithms used in Python. Nebel (Section 4.6) talked about the practical efficiency of Lomuto’s QuickSort variant. Pettie (Section 4.21) studied efficiently sorting inputs with pattern-avoiding properties, and Jacob (Section 4.27) studied variants of the sorting problem with priced comparisons.

Several talks reflected the central importance of hashing in scalable data structures, giving a broad picture of modern developments in this area. Conway (Section 4.2) presented Iceberg Hashing and Mosaic Pages. Sanders (Section 4.7) presented Sliding Block Hashing. Farach-Colton (Section 4.9) considered a simplified hash table design with strong guarantees. Bercea (Section 4.18) presented a data structure for incremental stable perfect hashing, Johnson (Section 4.22) presented Maplets, and Even (Section 4.29) talked about dynamic filters with one memory access.

The very recent O(log3/2n)n)-time online list labeling result was presented by Wein (Section 4.28) and results for the online list labeling problem with machine learning advice were presented by Singh (Section 4.8).

Static and dynamic graph algorithms were the topic of several presentations. Kyng (Section 4.12) talked about dynamic spanners and data structuring problems arising in the context of minimum cost flow. King (Section 4.13) presented algorithms for dynamic connectivity and Rotenberg (Section 4.15) presented dynamic graph algorithms that are adaptive to sparsity of the input. For data structures in string problems, Gørtz (Section 4.20) discussed regular expression matching and Kopelowitz (Section 4.16) presented speedups of the dynamic program for the Dyck edit distance problem.

Multiple talks reflected the interplay between data structures and computational geometry: Dallant (Section 4.30) spoke about conditional lower bounds for dynamic geometric problems, Oh (Section 4.25) presented an algorithm for the planar disjoint paths problem, and Arseneva (Section 4.23) talked about morphing graph drawings, including results obtained jointly with Oh during the seminar.

Possible computational models for designing algorithms for GPUs were addressed by Sitchinava (Section 4.19) and models of in-memory processing were presented by Silvestri (Section 4.26).

Phillips (Section 4.10) presented the design and analysis of a large-scale stream monitoring system. Liu (Section 4.4) discussed the role of scalable data structures in the context of differential privacy for graph data. Xu (Section 4.11) presented a search-optimized layout for packed memory arrays.

Final thoughts

The organizers would like to thank the Dagstuhl team for their continuous support and also thank all participants for their contributions to this seminar. Following the earlier virtual seminar, the current (15th) seminar was fully on-site. The opportunity to personally meet and interact was highly appreciated by the community, as reflected by a very strong response to the first round of invitations and subsequent positive feedback. The seminar fills a unique need in bringing together data-structures-researchers from around the world and facilitating collaboration and exchange of ideas between them.

Earlier seminars in the series had few female participants. An important focus of the previous and the current seminar was to significantly increase female attendance. In the current seminar, 48% of the invited participants were female, resulting in a 37% female attendance. Another important focus of the seminar is to encourage the interaction between senior and early career researchers, the latter comprising 27% of the invited participants and 32% of the eventual attendees.

In the post-seminar survey the diversity of junior/senior and female/male participants were both appreciated, respondents also drawing attention to the importance of tactful and clear communication on these matters. The survey respondents also praised the coverage of a diverse range of research topics, as well as the mix between theoreticians and more applied researchers.

Copyright Gerth Stølting Brodal, John Iacono, László Kozma, and Vijaya Ramachandran

Motivation

Data structures enable the efficient storage and retrieval of data in a variety of settings. Data structures are central to almost all of computing and have been an object of study since the beginning of the discipline of computer science. For a given abstract data type and a certain model of computation, speed and small space are the usual aims of data structures research. Due to the unprecedented scale at which data structures are deployed in applications, even small performance gains can have a significant effect. Thus, the study of data structures remains a very active area of research with unresolved old problems, new objects of interest, new techniques for analysis, and an interplay between theoretical data structure design, developments in computer hardware, and the needs of modern applications.

Data structures are not designed in isolation from the computing devices that they run on. The ultimate goal is to design data structures that are fast in practice. As hardware advances, models of computation must also evolve. The original and most simplistic models included the random access machine (RAM) and comparison models. Later models attempted to more accurately capture various hardware effects; these include the external memory/DAM model, the cache-oblivious model of multi-level memory systems and the streaming model. A compelling goal of data structure research is to develop algorithms that are optimal for several computational models at the same time, and as a counterpoint, to study inherent trade-offs between the cost measures of the different models, e.g., query time vs. space usage, read time vs. write time, computational time vs. block transfers to/from external memory, or optimality in the external memory model vs. optimality in the cache-oblivious model.

Parallel computation is crucial to the scalability of data structures and has attained particular importance as single processor hardware is no longer prevalent, even on mobile devices. Modern parallelism takes many forms, from several cores on a modern CPU to several machines on a network, to specialized hardware. Each of these settings requires us to revisit classical models of parallelism such as the PRAM, fork-join, and bulk synchronous parallel models, in order to evaluate how well they apply and then to develop effective methods. GPUs present their own challenges, as they are often the most powerful computing unit in a modern machine, yet have a particular architecture that is a vestige of their origins in graphics processing and defies simple modeling.

Dagstuhl Seminars on data structures have been held roughly biennially since 1991. These seminars have played a very important role in moving a community that was focused on theoretical analysis to implementation, experimentation, and deployment. This Dagstuhl Seminar will, in addition to the “classical” data structuring topics, place special emphasis on spawning new research within data structures for multithreaded, bulk-synchronous, GPU, and emerging computing paradigms such as tensor cores.

Copyright Gerth Stølting Brodal, John Iacono, László Kozma, and Vijaya Ramachandran

Participants

Related Seminars
  • Dagstuhl Seminar 9145: Data Structures (1991-11-04 - 1991-11-08) (Details)
  • Dagstuhl Seminar 9409: Data Structures (1994-02-28 - 1994-03-04) (Details)
  • Dagstuhl Seminar 9609: Data Structures (1996-02-26 - 1996-03-01) (Details)
  • Dagstuhl Seminar 98091: Data Structures (1998-03-02 - 1998-03-06) (Details)
  • Dagstuhl Seminar 00091: Data Structures (2000-02-27 - 2000-03-03) (Details)
  • Dagstuhl Seminar 02091: Data Structures (2002-02-24 - 2002-03-01) (Details)
  • Dagstuhl Seminar 04091: Data Structures (2004-02-22 - 2004-02-27) (Details)
  • Dagstuhl Seminar 06091: Data Structures (2006-02-26 - 2006-03-03) (Details)
  • Dagstuhl Seminar 08081: Data Structures (2008-02-17 - 2008-02-22) (Details)
  • Dagstuhl Seminar 10091: Data Structures (2010-02-28 - 2010-03-05) (Details)
  • Dagstuhl Seminar 14091: Data Structures and Advanced Models of Computation on Big Data (2014-02-23 - 2014-02-28) (Details)
  • Dagstuhl Seminar 16101: Data Structures and Advanced Models of Computation on Big Data (2016-03-06 - 2016-03-11) (Details)
  • Dagstuhl Seminar 19051: Data Structures for the Cloud and External Memory Data (2019-01-27 - 2019-02-01) (Details)
  • Dagstuhl Seminar 21071: Scalable Data Structures (2021-02-14 - 2021-02-19) (Details)
  • Dagstuhl Seminar 25191: Adaptive and Scalable Data Structures (2025-05-04 - 2025-05-09) (Details)

Classification
  • Data Structures and Algorithms
  • Information Retrieval

Keywords
  • Data structures
  • algorithms
  • computational models
  • big data
  • parallel computation