TOP
Search the Dagstuhl Website
Looking for information on the websites of the individual seminars? - Then please:
Not found what you are looking for? - Some of our services have separate websites, each with its own search option. Please check the following list:
Schloss Dagstuhl - LZI - Logo
Schloss Dagstuhl Services
Seminars
Within this website:
External resources:
  • DOOR (for registering your stay at Dagstuhl)
  • DOSA (for proposing future Dagstuhl Seminars or Dagstuhl Perspectives Workshops)
Publishing
Within this website:
External resources:
dblp
Within this website:
External resources:
  • the dblp Computer Science Bibliography


Dagstuhl Seminar 13251

Parallel Data Analysis

( Jun 16 – Jun 21, 2013 )

(Click in the middle of the image to enlarge)

Permalink
Please use the following short url to reference this page: https://www.dagstuhl.de/13251

Organizers

Contact


Motivation

Parallel data analysis accelerates the investigation of data sets of all sizes, and is indispensable when processing huge volumes of data. The current ubiquity of parallel hardware such as multi-core processors, modern GPUs, and computing clusters has created an excellent environment for this approach. However, exploiting these computing resources effectively requires significant efforts due to the lack of mature frameworks, software, and even algorithms designed for data analysis in such computing environments.

As a result, parallel data analysis is often being used only as the last resort, i.e., when the data size becomes too big for sequential data analysis, and it is hardly ever used for analyzing small and medium-sized data sets. The barrier of adoption is even higher for specialists from other areas such as sciences, business, and commerce. These users often have to make do with slower, yet much easier to use sequential programming environments and tools, regardless of the data size.

The seminar will try to address these challenges by focusing on three major goals:

  • Designing efficient and scalable parallel algorithms for machine learning and statistical analysis.
  • Providing user-friendly parallel programming paradigms and cross-platform frameworks or libraries for easy implementation and experimentation.
  • Developing benchmarks, standardized data sets, and public platforms for evaluating (parallel) data analysis algorithms and environments.

To achieve this, the seminar will bring together academic researchers and industry practitioners to foster cross-disciplinary interactions on parallel analysis of scientific and business data. In particular, it will target the communities in the areas of machine learning and data mining, parallel and distributed systems, database systems, and languages and tools for data analysis.

The seminar program will include individual presentations on new research results, tools and usage scenarios, plenary sessions, as well as work in focus groups. The primary role of the focus groups will be to foster the collaboration of the participants on new project proposals, research papers, and the creation of benchmarks for parallel data analysis algorithms and tools.


Summary

Motivation and goals

Parallel data analysis accelerates the investigation of data sets of all sizes, and is indispensable when processing huge volumes of data. The current ubiquity of parallel hardware such as multi-core processors, modern GPUs, and computing clusters has created an excellent environment for this approach. However, exploiting these computing resources effectively requires significant efforts due to the lack of mature frameworks, software, and even algorithms designed for data analysis in such computing environments.

As a result, parallel data analysis is often being used only as the last resort, i.e., when the data size becomes too big for sequential data analysis, and it is hardly ever used for analyzing small and medium-sized data sets though it could be also beneficial for there, i.e., by cutting compute time down from hours to minutes or even making the data analysis process interactive. The barrier of adoption is even higher for specialists from other areas such as sciences, business, and commerce. These users often have to make do with slower, yet much easier to use sequential programming environments and tools, regardless of the data size.

The seminar participants have tried to address these challenges by focusing on the following goals:

  • Providing user-friendly parallel programming paradigms and cross-platform frameworks or libraries for easy implementation and experimentation.
  • Designing efficient and scalable parallel algorithms for machine learning and statistical analysis in connection with an analysis of use cases.

The program

The seminar program consisted of individual presentations on new results and ongoing work, a plenary session, as well as work in two working groups. The primary role of the focus groups was to foster the collaboration of the participants, allowing cross-disciplinary knowledge sharing and insights. Work in one group is still ongoing and targets as a result a publication in a magazine.

The topics of the plenary session and the working groups were the following ones:

  • Panel ``From Big Data to Big Money'
  • Working group ``A'': Algorithms and applications
  • Working group ``P'': Programming paradigms, frameworks and software.
Copyright Artur Andrzejak, Joachim Giesen, Raghu Ramakrishnan, and Ion Stoica

Participants
  • Artur Andrzejak (Universität Heidelberg, DE) [dblp]
  • Ron Bekkerman (Carmel Ventures - Herzeliya, IL) [dblp]
  • Joos-Hendrik Böse (SAP SE - Berlin, DE) [dblp]
  • Sebastian Breß (Universität Magdeburg, DE) [dblp]
  • Patrick Briest (McKinsey&Company - Düsseldorf, DE) [dblp]
  • Jürgen Broß (FU Berlin, DE) [dblp]
  • Lutz Büch (Universität Heidelberg, DE) [dblp]
  • Michael J. Cafarella (University of Michigan - Ann Arbor, US) [dblp]
  • Surajit Chaudhuri (Microsoft Corporation - Redmond, US) [dblp]
  • Tyson Condie (Yahoo! Inc. - Burbank, US) [dblp]
  • Giuseppe Di Fatta (University of Reading, GB) [dblp]
  • Rodrigo Fonseca (Brown University - Providence, US) [dblp]
  • Johannes Fürnkranz (TU Darmstadt, DE) [dblp]
  • Joao Gama (University of Porto, PT) [dblp]
  • Joachim Giesen (Universität Jena, DE) [dblp]
  • Philipp Große (SAP SE - Walldorf, DE) [dblp]
  • Max Heimel (TU Berlin, DE) [dblp]
  • Yves J. Hilpisch (Visixion GmbH, DE)
  • Anthony D. Joseph (University of California - Berkeley, US) [dblp]
  • George Karypis (University of Minnesota - Minneapolis, US) [dblp]
  • Shonali Krishnaswamy (Infocomm Research - Singapore, SG) [dblp]
  • Soeren Laue (Universität Jena, DE) [dblp]
  • Frank McSherry (Microsoft Corp. - Mountain View, US) [dblp]
  • Klaus Mueller (Stony Brook University, US) [dblp]
  • Jens K. Müller (Universität Jena, DE) [dblp]
  • Srinivasan Parthasarathy (Ohio State University - Columbus, US) [dblp]
  • Tom Peterka (Argonne National Laboratory, US) [dblp]
  • Raghu Ramakrishnan (Microsoft Corporation - Redmond, US) [dblp]
  • Ion Stoica (University of California - Berkeley, US) [dblp]
  • Domenico Talia (University of Calabria, IT) [dblp]
  • Alexandre Termier (University of Grenoble, FR) [dblp]
  • Markus Weimer (Microsoft Corporation - Redmond, US) [dblp]
  • Hans-Martin Will (SpaceCurve - Seattle, US) [dblp]
  • Matei Zaharia (University of California - Berkeley, US) [dblp]
  • Osmar Zaiane (University of Alberta - Edmonton, CA) [dblp]

Classification
  • artificial intelligence / robotics
  • data bases / information retrieval
  • data structures / algorithms / complexity

Keywords
  • Parallel machine learning
  • parallel data processing
  • data mining
  • software frameworks
  • storage and database systems