23.02.14 - 28.02.14, Seminar 14091

Data Structures and Advanced Models of Computation on Big Data

Diese Seminarbeschreibung wurde vor dem Seminar auf unseren Webseiten veröffentlicht und bei der Einladung zum Seminar verwendet.


Computing is about the processing, exchanging, handling and storage of information. The way information is stored profoundly influences the efficiency of operations over the data. Research in data structures attempts to provide ways of storing and manipulating data and information that are appropriate for the computational model at hand: internal random access memory, sequentially accessible data streams, multiple cache-external disk memory hierarchies, parallel and distributed computing, etc. In spite of the maturity of the field many data structuring problems are still open, while new ones arise due to technological advances. Almost every new computing or storage technology poses new data structuring problems. This Dagstuhl Seminar will address besides the "classical" data structuring topics the following issues:

Big data

One consequence of the exponential growth of computing usage has been a large increase in the scale of the problems addressed by computer scientists. For example, twenty-five years ago a few tens of megabytes constituted a substantial text corpus. Nowadays search engine indexes handle document collections whose size is in the order of a hundred terabytes of data or so. We have seen similar growth in the size of large data collections such as those derived from internet traffic management, genomic data, social networks, large scale scientific experiments such as the Large Hadron Collider (LHC) and sensor networks in general. In this context the introduction of novel data structures that can operate over massively distributed computing platforms with minimal interprocess communication is a must.

Memory Hierarchy Methods for CMPs and SSDs

Processor speeds and external memory sizes have increased exponentially over the last decades. However the transfer speed between external and internal memory has not kept pace. Data structuring research has been addressing this problem and has worked on solutions that attempt to minimize the data transfer between external and internal memory and the various caches in between. However, many data structuring problems still have unsatisfactory solutions in this model. New challenges arise from multicore computers in which certain caches are shared, giving rise to issues such as cache contention, coherence and optimal order of execution. Similarly, flash memory storage, suffer from certain peculiarities like read/write imbalances and wear, which are not captured in the standard I/O model.

Energy-Aware Data Structure Design

Reducing the energy consumption of computations has become an important issue both for large scale computing and for small mobile devices. Clearly, the power consumption of a program depends on its execution time and memory usage, but the details are quite sophisticated. Unstructured memory access patterns, for example, have be shown to severely influence the energy consumption of smart phones -- more than could be expected by their respective time delay. Thus, suitable theoretical models and experimental studies will be needed to develop meaningful data structures and algorithms for energy-efficient computing.

Web-scale Data Structures

They are playing an increasingly important role in modern computing. The focus on implementing them has properly been on systems and commercial issues, but as the scale continues to increase, the scientific basis for studying performance and developing new algorithms that has proven so successful for classical data structures is certain to play a critical role in the evolution of the web.