http://www.dagstuhl.de/13251

June 16 – 21 , 2013, Dagstuhl Seminar 13251

Parallel Data Analysis

Organizers

Artur Andrzejak (Universität Heidelberg, DE)
Joachim Giesen (Universität Jena, DE)
Raghu Ramakrishnan (Microsoft Corporation – Redmond, US)
Ion Stoica (University of California – Berkeley, US)

For support, please contact

Dagstuhl Service Team

Documents

Dagstuhl Report, Volume 3, Issue 6 Dagstuhl Report
Aims & Scope
List of Participants
Shared Documents

Summary

Motivation and goals

Parallel data analysis accelerates the investigation of data sets of all sizes, and is indispensable when processing huge volumes of data. The current ubiquity of parallel hardware such as multi-core processors, modern GPUs, and computing clusters has created an excellent environment for this approach. However, exploiting these computing resources effectively requires significant efforts due to the lack of mature frameworks, software, and even algorithms designed for data analysis in such computing environments.

As a result, parallel data analysis is often being used only as the last resort, i.e., when the data size becomes too big for sequential data analysis, and it is hardly ever used for analyzing small and medium-sized data sets though it could be also beneficial for there, i.e., by cutting compute time down from hours to minutes or even making the data analysis process interactive. The barrier of adoption is even higher for specialists from other areas such as sciences, business, and commerce. These users often have to make do with slower, yet much easier to use sequential programming environments and tools, regardless of the data size.

The seminar participants have tried to address these challenges by focusing on the following goals:

  • Providing user-friendly parallel programming paradigms and cross-platform frameworks or libraries for easy implementation and experimentation.
  • Designing efficient and scalable parallel algorithms for machine learning and statistical analysis in connection with an analysis of use cases.

The program

The seminar program consisted of individual presentations on new results and ongoing work, a plenary session, as well as work in two working groups. The primary role of the focus groups was to foster the collaboration of the participants, allowing cross-disciplinary knowledge sharing and insights. Work in one group is still ongoing and targets as a result a publication in a magazine.

The topics of the plenary session and the working groups were the following ones:

  • Panel ``From Big Data to Big Money'
  • Working group ``A'': Algorithms and applications
  • Working group ``P'': Programming paradigms, frameworks and software.
License
  Creative Commons BY 3.0 Unported license
  Artur Andrzejak, Joachim Giesen, Raghu Ramakrishnan, and Ion Stoica

Classification

  • Artificial Intelligence / Robotics
  • Data Bases / Information Retrieval
  • Data Structures / Algorithms / Complexity

Keywords

  • Parallel machine learning
  • Parallel data processing
  • Data mining
  • Software frameworks
  • Storage and database systems

Book exhibition

Books from the participants of the current Seminar 

Book exhibition in the library, ground floor, during the seminar week.

Documentation

In the series Dagstuhl Reports each Dagstuhl Seminar and Dagstuhl Perspectives Workshop is documented. The seminar organizers, in cooperation with the collector, prepare a report that includes contributions from the participants' talks together with a summary of the seminar.

 

Download overview leaflet (PDF).

Publications

Furthermore, a comprehensive peer-reviewed collection of research papers can be published in the series Dagstuhl Follow-Ups.

Dagstuhl's Impact

Please inform us when a publication was published as a result from your seminar. These publications are listed in the category Dagstuhl's Impact and are presented on a special shelf on the ground floor of the library.

NSF young researcher support