Dagstuhl Seminar 16101: Data Structures and Advanced Models of Computation on Big Data

Dagstuhl Seminar 16101

Data Structures and Advanced Models of Computation on Big Data

( Mar 06 – Mar 11, 2016 )

(Click in the middle of the image to enlarge)

Permalink

Please use the following short url to reference this page: https://www.dagstuhl.de/16101

Organizers

Alejandro Lopez-Ortiz (University of Waterloo, CA)
Ulrich Carsten Meyer (Goethe-Universität - Frankfurt a. M., DE)
Markus E. Nebel (TU Kaiserslautern, DE)
Robert Sedgewick (Princeton University, US)

Contact

Andreas Dolzmann (for scientific matters)
Annette Beyer (for administrative matters)

Publications

Data Structures and Advanced Models of Computation on Big Data (Dagstuhl Seminar 16101). Alejandro Lopez-Ortiz, Ulrich Carsten Meyer, Markus E. Nebel, and Robert Sedgewick. In Dagstuhl Reports, Volume 6, Issue 3, pp. 1-23, Schloss Dagstuhl - Leibniz-Zentrum für Informatik (2016)

Impacts

Schedule

Schedule

Motivation

Show Motivation

Computing is about the processing, exchanging, handling and storage of information. The way information is stored profoundly inuences the efficiency of operations over the data. Research in data structures attempts to provide ways of storing and manipulating data and information that are appropriate for the computational model at hand: A setting where data are to be stored in internal random access memory requires data structuring methods different from settings where data are to be stored on sequentially accessible tapes, or on a multiple cache-external disk memory hierarchy with highly varying access latency. In spite of the maturity of the field many data structuring problems are still open, while new ones arise due to technological advances. Moreover, almost every new computing or storage technology poses new data structuring problems. This seminar will address besides the "classical" data structuring topics the following issues:

Distributed Data Structures for Cloud Services. Many large scale cloud services are based on distributed data structures such as Google's BigTable that are designed to scale across hundreds or thousands of machines. Design goals for such data structures to be addressed at this seminar are the prevention of large latency with only a small computational overhead and the ability for the simple use of additional servers without the need of reconfiguration. Furthermore, the ever growing amount of data to be processed asks for compression; here a modernized view of information assimilating elements of space, time, structure, semantics, and context might be helpful to improve the data structures performance. Accordingly, one focus of this seminar will be on Information Theory of Data Structures; a starting point for the discussion at our workshop will be the search for measures and e_cient algorithms to appraise the amount of organization and structure embodied in artifacts and natural objects.
Memory Hierarchy Methods for CMPs. Processor speeds and external memory sizes have increased exponentially over the last decades. However the transfer speed between external and internal memory has not kept pace. The „von Neumann bottleneck“ is becoming a „von Neumann capillary“. Data structuring research has been addressing this problem and has worked on solutions that attempt to minimize the data transfer between external and internal memory and the various caches in between (see e.g. GI-Dagstuhl Seminar 02112 and Dagstuhl Seminar 14091). However, many data structuring problems still have unsatisfactory solutions and new challenges arise from multicore computers in which certain caches are shared. Here, new models are needed to guide the design of I/O efficient data structures and algorithms. Our seminar will start a discussion on how such models should look like.
Energy-Aware Data Structure Design. Reducing the energy consumption of computations has become an important issue both for large scale computing and for small mobile devices. Clearly, the power consumption of a program depends on its execution time and memory usage, but the details are quite sophisticated. Unstructured memory access patterns, for example, have been shown to severely inuence the energy consumption of smart phones - more than could be expected by their respective time delay. In addition, it has been observed that a „better“ data structure (runtime of related algorithm on a given machine) not necessarily gives rise to a smaller energy consumption (on that machine). Thus, suitable theoretical models and experimental studies will be needed to develop meaningful data structures and algorithms for energy-efficient computing. Our seminar will host a discussion on such models.

Summary

Show Summary

About the Seminar

Data structures provide ways of storing and manipulating data and information that are appropriate for the computational model at hand. Every such model relies on assumptions that we have to keep questioning. The aim of this seminar was to exchange ideas for new algorithms and data structures, and to discuss our models of computations in light of recent technological advances. This Dagstuhl seminar was the 12th in a series of loosely related Dagstuhl seminars on data structures.

Topics

The presentations covered both advances in classic fields, as well as new models for recent trends in computing, in particular the appearance of big-data applications.

The talks by Brodal (§3.5), Penschuck (§3.19), Silvestri (§3.29), and Vahrenhold (§3.31) covered methods in the external-memory model that models the situation that data does no longer fit into internal memory. This limit can be pushed a bit further by using succinct data structures, which use only as much memory as absolutely necessary. Such methods were covered in the talks of Hagerup (§3.14), Raman (§3.25), and Gog (§3.12). If the task is to generate large random instances, Even (§3.9) showed that one can delay generation of large parts until they really become requested.

Big-data applications rely on parallel computation to speed up processing. Bingmann (§3.4) announced the creation of a new framework to simplify developing such applications. Brodnik (§3.6) presented a parallel string-searching algorithm. Since such methods are often used in a distributed setting, the cost of communication can become dominating. Sanders (§3.24) discussed several algorithms from this point of view.

Iacono (§3.15) and Mehlhorn (§3.18) reported on recent advances in the long-standing open problem of dynamic optimality of binary search trees (BSTs). The classic problem of finding optimal static BSTs was taken up by Munro (§3.21): it becomes significantly harder if the objective is to minimize the number of binary comparisons instead of the classic ternary comparisons.

Wild (§3.32) used the connection between BSTs and recursion trees of Quicksort to analyze Quicksort on inputs with equal keys, including multi-way partitioning Quicksort. The latter was discussed in detail by Aumüller (§3.3) who presented a novel analysis for comparison-optimal partitioning.

Neumann (§3.22) introduced a new randomized dictionary implementation based on jumplists. Kopelowitz (§3.17) showed a much simplified solution to the file-maintenance problem.

In the context of large sparse graphs, Andoni (§3.2), Fagerberg (§3.10), and Sun (§3.30) showed how to exploit special structure in the input for algorithmic applications. Pettie (§3.28) showed how to efficiently answer connectivity queries in graphs when vertices can be deleted.

The seminar also enjoyed contributions on new algorithms: two innovative applications of hashing were presented by Silvestri (§3.29) and Jacob (§3.16); Meyer auf der Heide (§3.20) applied the primal-dual approach for online algorithms to online leasing problems. Driemel (§3.7) reported on clustering methods for time series.

The theory-focused talks were complemented by broader perspectives from practice: Ajwani (§3.1) presented his vision for future communication tools that are supported by context-sensitive agents, and Sedgewick (§3.27) sketched his views on the future of higher education. Finally, Salinger (§3.26) summarized the approaches taken by SAP to include data-specific algorithms directly in their HANA database system.

New models of computation were also discussed. Owens (§3.23) explained how the architecture of graphic cards calls for different approaches to design data structures; Dütsch (§3.8) discussed the cost of virtual address translation in several algorithms. Finally, Farach-Colton (§3.11) and Graefe (§3.13) challenged the claim that data structures are independent of the application they are used in: they showed intriguing examples where the context a data structure was applied in entailed unforeseen additional requirements.

Final Thoughts

The organizers would like to thank the Dagstuhl team for their continuous support; the welcoming atmosphere made the seminar both highly productive and enjoyable. They also thank all participants for their contributions to this seminar.

Creative Commons BY 3.0 Unported license

Alejandro Lopez-Ortiz, Ulrich Carsten Meyer, Markus E. Nebel, and Robert Sedgewick

Participants

Show Participants

Deepak Ajwani (Bell Labs - Dublin, IE) [dblp]
Helmut Alt (FU Berlin, DE) [dblp]
Alexandr Andoni (Columbia University - New York, US) [dblp]
Martin Aumüller (IT University of Copenhagen, DK) [dblp]
Timo Bingmann (KIT - Karlsruher Institut für Technologie, DE) [dblp]
Gerth Stølting Brodal (Aarhus University, DK) [dblp]
Andrej Brodnik (University of Primorska, SI) [dblp]
Martin Dietzfelbinger (TU Ilmenau, DE) [dblp]
Anne Driemel (TU Eindhoven, NL) [dblp]
Fabian Dütsch (Universität Münster, DE)
Guy Even (Tel Aviv University, IL) [dblp]
Rolf Fagerberg (University of Southern Denmark - Odense, DK) [dblp]
Martin Farach-Colton (Rutgers University - Piscataway, US) [dblp]
Simon Gog (KIT - Karlsruher Institut für Technologie, DE) [dblp]
Mordecai Golin (HKUST - Kowloon, HK) [dblp]
Goetz Graefe (HP Labs - Madison, US) [dblp]
Torben Hagerup (Universität Augsburg, DE) [dblp]
Herman J. Haverkort (TU Eindhoven, NL) [dblp]
John Iacono (New York University, US) [dblp]
Riko Jacob (IT University of Copenhagen, DK) [dblp]
Tsvi Kopelowitz (University of Michigan - Ann Arbor, US) [dblp]
Moshe Lewenstein (Bar-Ilan University - Ramat Gan, IL) [dblp]
Alejandro Lopez-Ortiz (University of Waterloo, CA) [dblp]
Jérémie Lumbroso (Princeton University, US) [dblp]
Conrado Martinez (UPC - Barcelona, ES) [dblp]
Kurt Mehlhorn (MPI für Informatik - Saarbrücken, DE) [dblp]
Ulrich Carsten Meyer (Goethe-Universität - Frankfurt a. M., DE) [dblp]
Friedhelm Meyer auf der Heide (Universität Paderborn, DE) [dblp]
Ian Munro (University of Waterloo, CA) [dblp]
Markus E. Nebel (TU Kaiserslautern, DE) [dblp]
Elisabeth Neumann (TU Kaiserslautern, DE)
John Owens (University of California, Davis, US) [dblp]
Manuel Penschuck (Goethe-Universität - Frankfurt a. M., DE) [dblp]
Seth Pettie (University of Michigan - Ann Arbor, US) [dblp]
Rajeev Raman (University of Leicester, GB) [dblp]
Alejandro Salinger (SAP SE - Walldorf, DE) [dblp]
Peter Sanders (KIT - Karlsruher Institut für Technologie, DE) [dblp]
Robert Sedgewick (Princeton University, US) [dblp]
Francesco Silvestri (IT University of Copenhagen, DK) [dblp]
He Sun (University of Bristol, GB) [dblp]
Jan Vahrenhold (Universität Münster, DE) [dblp]
Sebastian Wild (TU Kaiserslautern, DE) [dblp]

Related Seminars

Dagstuhl Seminar 9145: Data Structures (1991-11-04 - 1991-11-08) (Details)
Dagstuhl Seminar 9409: Data Structures (1994-02-28 - 1994-03-04) (Details)
Dagstuhl Seminar 9609: Data Structures (1996-02-26 - 1996-03-01) (Details)
Dagstuhl Seminar 98091: Data Structures (1998-03-02 - 1998-03-06) (Details)
Dagstuhl Seminar 00091: Data Structures (2000-02-27 - 2000-03-03) (Details)
Dagstuhl Seminar 02091: Data Structures (2002-02-24 - 2002-03-01) (Details)
Dagstuhl Seminar 04091: Data Structures (2004-02-22 - 2004-02-27) (Details)
Dagstuhl Seminar 06091: Data Structures (2006-02-26 - 2006-03-03) (Details)
Dagstuhl Seminar 08081: Data Structures (2008-02-17 - 2008-02-22) (Details)
Dagstuhl Seminar 10091: Data Structures (2010-02-28 - 2010-03-05) (Details)
Dagstuhl Seminar 14091: Data Structures and Advanced Models of Computation on Big Data (2014-02-23 - 2014-02-28) (Details)
Dagstuhl Seminar 19051: Data Structures for the Cloud and External Memory Data (2019-01-27 - 2019-02-01) (Details)
Dagstuhl Seminar 21071: Scalable Data Structures (2021-02-14 - 2021-02-19) (Details)
Dagstuhl Seminar 23211: Scalable Data Structures (2023-05-21 - 2023-05-26) (Details)
Dagstuhl Seminar 25191: Adaptive and Scalable Data Structures (2025-05-04 - 2025-05-09) (Details)
Dagstuhl Seminar 28221: Simplicity and Efficiency of Data Structures (2028-05-28 - 2028-06-02) (Details)

Classification

data bases / information retrieval
data structures / algorithms / complexity
world wide web / internet

Keywords

data structures
algorithms
cloud services
information theory
large data sets
external memory methods
big data
streaming
web-scale

Seminar 16101

Search the Dagstuhl Website

Schloss Dagstuhl Services

Seminars

Within this website:

External resources:

Publishing

Within this website:

External resources:

dblp

Within this website:

External resources:

Dagstuhl Seminar 16101

Data Structures and Advanced Models of Computation on Big Data

( Mar 06 – Mar 11, 2016 )

Permalink

Organizers

Contact

Publications

Impacts

Schedule

Motivation

Summary

About the Seminar

Topics

Final Thoughts

Participants

Related Seminars

Classification

Keywords