http://www.dagstuhl.de/13251

16. – 21. Juni 2013, Dagstuhl Seminar 13251

Parallel Data Analysis

Organisatoren

Artur Andrzejak (Universität Heidelberg, DE)
Joachim Giesen (Universität Jena, DE)
Raghu Ramakrishnan (Microsoft Corporation – Redmond, US)
Ion Stoica (University of California – Berkeley, US)

Auskunft zu diesem Dagstuhl Seminar erteilt

Dagstuhl Service Team

Dokumente

Dagstuhl Report, Volume 3, Issue 6 Dagstuhl Report
Motivationstext
Teilnehmerliste
Gemeinsame Dokumente

Summary

Motivation and goals

Parallel data analysis accelerates the investigation of data sets of all sizes, and is indispensable when processing huge volumes of data. The current ubiquity of parallel hardware such as multi-core processors, modern GPUs, and computing clusters has created an excellent environment for this approach. However, exploiting these computing resources effectively requires significant efforts due to the lack of mature frameworks, software, and even algorithms designed for data analysis in such computing environments.

As a result, parallel data analysis is often being used only as the last resort, i.e., when the data size becomes too big for sequential data analysis, and it is hardly ever used for analyzing small and medium-sized data sets though it could be also beneficial for there, i.e., by cutting compute time down from hours to minutes or even making the data analysis process interactive. The barrier of adoption is even higher for specialists from other areas such as sciences, business, and commerce. These users often have to make do with slower, yet much easier to use sequential programming environments and tools, regardless of the data size.

The seminar participants have tried to address these challenges by focusing on the following goals:

  • Providing user-friendly parallel programming paradigms and cross-platform frameworks or libraries for easy implementation and experimentation.
  • Designing efficient and scalable parallel algorithms for machine learning and statistical analysis in connection with an analysis of use cases.

The program

The seminar program consisted of individual presentations on new results and ongoing work, a plenary session, as well as work in two working groups. The primary role of the focus groups was to foster the collaboration of the participants, allowing cross-disciplinary knowledge sharing and insights. Work in one group is still ongoing and targets as a result a publication in a magazine.

The topics of the plenary session and the working groups were the following ones:

  • Panel ``From Big Data to Big Money'
  • Working group ``A'': Algorithms and applications
  • Working group ``P'': Programming paradigms, frameworks and software.
License
  Creative Commons BY 3.0 Unported license
  Artur Andrzejak, Joachim Giesen, Raghu Ramakrishnan, and Ion Stoica

Classification

  • Artificial Intelligence / Robotics
  • Data Bases / Information Retrieval
  • Data Structures / Algorithms / Complexity

Keywords

  • Parallel machine learning
  • Parallel data processing
  • Data mining
  • Software frameworks
  • Storage and database systems

Buchausstellung

Bücher der Teilnehmer 

Buchausstellung im Erdgeschoss der Bibliothek

(nur in der Veranstaltungswoche).

Dokumentation

In der Reihe Dagstuhl Reports werden alle Dagstuhl-Seminare und Dagstuhl-Perspektiven-Workshops dokumentiert. Die Organisatoren stellen zusammen mit dem Collector des Seminars einen Bericht zusammen, der die Beiträge der Autoren zusammenfasst und um eine Zusammenfassung ergänzt.

 

Download Übersichtsflyer (PDF).

Publikationen

Es besteht weiterhin die Möglichkeit, eine umfassende Kollektion begutachteter Arbeiten in der Reihe Dagstuhl Follow-Ups zu publizieren.

Dagstuhl's Impact

Bitte informieren Sie uns, wenn eine Veröffentlichung ausgehend von
Ihrem Seminar entsteht. Derartige Veröffentlichungen werden von uns in der Rubrik Dagstuhl's Impact separat aufgelistet  und im Erdgeschoss der Bibliothek präsentiert.