http://www.dagstuhl.de/14091

### February 23 – 28 , 2014, Dagstuhl Seminar 14091

# Data Structures and Advanced Models of Computation on Big Data

## Organizers

Alejandro Lopez-Ortiz (University of Waterloo, CA)

Ulrich Carsten Meyer (Goethe-Universität – Frankfurt a. M., DE)

Robert Sedgewick (Princeton University, US)

## For support, please contact

## Documents

Dagstuhl Report, Volume 4, Issue 2

Aims & Scope

List of Participants

Shared Documents

Dagstuhl Seminar Schedule [pdf]

## Summary

A persistent theme in the presentations in this Dagstuhl seminar is the need to refine our models of computation to adapt to modern architectures, if we are to develop a scientific basis for inventing efficient algorithms to solve real-world problems. For example, Mehlhorn's presentation on the cost of memory translation, Iacono's reexamination of the cache-oblivious model, and Sanders' description of communication efficiency all left many participants questioning basic assumptions they have carried for many years and are certain to stimulate new research in the future.

Better understanding of the properties of modern processors certainly can be fruitful. For example, several presentations, such as the papers by Aumüller, López-Ortiz, and Wild on Quicksort and the paper by Bingmann on string sorting, described faster versions of classic algorithms that are based on careful examination of modern processor design.

Overall, many presentations described experience with data from actual applications. For example, the presentations by Driemel and Varenhold on trajectory data described a relatively new big-data application that underscores the importance and breadth of application of classic techniques in computational geometry and data structure design.

Other presentations which discussed large data sets on modern architectures were the lower bound on parallel external list ranking by Jacob, which also applies on the MapReduce and BSP models commonly used in large distributed platforms; and by Hagerup who considered the standard problem of performing a depth first search (DFS) on a graph, a task that is trivial in small graphs but extremely complex on ``big data'' sets such as the Facebook graph. He proposed a space efficient algorithm that reduces the space required by DFS by a log n factor or an order of magnitude on practical data sets.

Schweikardt gave a model for MapReduce computations, a very common computing platform for very large server farms. Salinger considered the opposite end of the spectrum namely how to simplify the programming task as to take optimal advantage of a single server which also has its own degree of parallelism from multiple cores, GPUs and other parallel facilities.

In terms of geometric data structures for large data sets Afshani presented sublinear algorithms for the I/O model which generalize earlier work on sublinear algorithms. Sublinear algorithms are of key importance on very large data sets, which are thus presumably unable to fit in main memory. Yet most of the previously proposed algorithms assumed that such large data sets were hosted in main memory. Toma gave an external memory representation of the popular quad tree data structure commonly used in computer graphics as well as other spatial data applications.

**License**

Creative Commons BY 3.0 Unported license

Alejandro Lopez-Ortiz, Ulrich Carsten Meyer, and Robert Sedgewick

## Dagstuhl Seminar Series

- 19051: "Data Structures for the Cloud and External Memory Data" (2019)
- 16101: "Data Structures and Advanced Models of Computation on Big Data" (2016)
- 10091: "Data Structures" (2010)
- 08081: "Data Structures" (2008)
- 06091: "Data Structures " (2006)
- 04091: "Data Structures" (2004)
- 02091: "Data Structures" (2002)
- 00091: "Data Structures" (2000)
- 98091: "Data Structures" (1998)
- 9609: "Data Structures" (1996)
- 9409: "Data Structures" (1994)
- 9145: "Data Structures" (1991)

## Classification

- Data Bases / Information Retrieval
- Data Structures / Algorithms / Complexity

## Keywords

- Data structures
- Algorithms
- Large data sets
- External memory methods
- Big data
- Streaming
- Web-scale