http://www.dagstuhl.de/14091

February 23 – 28, 2014, Dagstuhl Seminar 14091

Data Structures and Advanced Models of Computation on Big Data

Organizers

Alejandro Lopez-Ortiz (University of Waterloo, CA)
Ulrich Carsten Meyer (Goethe-Universitšt Frankfurt am Main, DE)
Robert Sedgewick (Princeton University, US)


1 / 2 >

For support, please contact

Annette Beyer for administrative matters

Marc Herbstritt for scientific matters

Dagstuhl Reports

As part of the mandatory documentation, participants are asked to submit their talk abstracts, working group results, etc. for publication in our series Dagstuhl Reports via the Dagstuhl Reports Submission System.

Documents

List of Participants
Shared Documents
Dagstuhl Seminar Wiki
Dagstuhl Seminar Schedule [pdf]

(Use seminar number and access code to log in)

Motivation

Computing is about the processing, exchanging, handling and storage of information. The way information is stored profoundly influences the efficiency of operations over the data. Research in data structures attempts to provide ways of storing and manipulating data and information that are appropriate for the computational model at hand: internal random access memory, sequentially accessible data streams, multiple cache-external disk memory hierarchies, parallel and distributed computing, etc. In spite of the maturity of the field many data structuring problems are still open, while new ones arise due to technological advances. Almost every new computing or storage technology poses new data structuring problems. This Dagstuhl Seminar will address besides the "classical" data structuring topics the following issues:

Big data

One consequence of the exponential growth of computing usage has been a large increase in the scale of the problems addressed by computer scientists. For example, twenty-five years ago a few tens of megabytes constituted a substantial text corpus. Nowadays search engine indexes handle document collections whose size is in the order of a hundred terabytes of data or so. We have seen similar growth in the size of large data collections such as those derived from internet traffic management, genomic data, social networks, large scale scientific experiments such as the Large Hadron Collider (LHC) and sensor networks in general. In this context the introduction of novel data structures that can operate over massively distributed computing platforms with minimal interprocess communication is a must.

Memory Hierarchy Methods for CMPs and SSDs

Processor speeds and external memory sizes have increased exponentially over the last decades. However the transfer speed between external and internal memory has not kept pace. Data structuring research has been addressing this problem and has worked on solutions that attempt to minimize the data transfer between external and internal memory and the various caches in between. However, many data structuring problems still have unsatisfactory solutions in this model. New challenges arise from multicore computers in which certain caches are shared, giving rise to issues such as cache contention, coherence and optimal order of execution. Similarly, flash memory storage, suffer from certain peculiarities like read/write imbalances and wear, which are not captured in the standard I/O model.

Energy-Aware Data Structure Design

Reducing the energy consumption of computations has become an important issue both for large scale computing and for small mobile devices. Clearly, the power consumption of a program depends on its execution time and memory usage, but the details are quite sophisticated. Unstructured memory access patterns, for example, have be shown to severely influence the energy consumption of smart phones -- more than could be expected by their respective time delay. Thus, suitable theoretical models and experimental studies will be needed to develop meaningful data structures and algorithms for energy-efficient computing.

Web-scale Data Structures

They are playing an increasingly important role in modern computing. The focus on implementing them has properly been on systems and commercial issues, but as the scale continues to increase, the scientific basis for studying performance and developing new algorithms that has proven so successful for classical data structures is certain to play a critical role in the evolution of the web.

Dagstuhl Seminar Series

Classification

  • Data Bases / Information Retrieval
  • Data Structures / Algorithms / Complexity

Keywords

  • Data structures
  • Algorithms
  • Large data sets
  • External memory methods
  • Big data
  • Streaming
  • Web-scale

Book exhibition

Books from the participants of the current Seminar 

Book exhibition in the library, 1st floor, during the seminar week.

Documentation

In the series Dagstuhl Reports each Dagstuhl Seminar and Dagstuhl Perspectives Workshop is documented. The seminar organizers, in cooperation with the collector, prepare a report that includes contributions from the participants' talks together with a summary of the seminar.

 

Download overview leaflet (PDF).

Publications

Furthermore, a comprehensive peer-reviewed collection of research papers can be published in the series Dagstuhl Follow-Ups.

Dagstuhl's Impact

Please inform us when a publication was published as a result from your seminar. These publications are listed in the category Dagstuhl's Impact and are presented on a special shelf on the ground floor of the library.