http://www.dagstuhl.de/14462

09. – 12. November 2014, Dagstuhl Seminar 14462

Systems and Algorithms for Large-scale Graph Analytics

Organisatoren

Derek Murray (San Francisco, US)
Amitabha Roy (EPFL – Lausanne, CH)
Eiko Yoneki (University of Cambridge, GB)

Auskunft zu diesem Dagstuhl Seminar erteilt

Dagstuhl Service Team

Dokumente

Dagstuhl Report, Volume 4, Issue 11 Dagstuhl Report
Motivationstext
Teilnehmerliste
Gemeinsame Dokumente
Programm des Dagstuhl Seminars [pdf]

Summary

Graph analytics, the class of data analysis that deals with data forming networks, is emerging as a huge consumer of computational resources due to its complex, data-hungry algorithms. Social networking, personal medicine, bioinformatics, text/graphics content analysis and search engines are a few examples where Tera-, Peta- or even Exa-scale graph processing is required. Graph algorithms are becoming increasingly important for solving multiple problems in diverse fields. As these problems grow in scale, parallel computing resources are required to meet the computational and memory requirements. Notably, the algorithms, software and hardware that have worked well for developing mainstream parallel applications are not usually effective for massive-scale graphs from the real world, which exhibits more complex and irregular structures than traditional data in scientific computing.

Research into large scale graph processing within the computer science community is currently at an early and fragmented stage. This seminar brought together researchers from systems, computer architecture, algorithms and databases to discuss emerging trends and to identify opportunities for future advancement. Prior to the seminar, we had prepared a range of research questions below.

  1. What is the correct algorithmic abstraction for systems handling large graphs? Algorithmic complexity researchers use PRAM and I/O complexity models to characterize the algorithmic complexity of graph processing. On the other hand, systems researchers largely build systems that implement scatter-gather and label propagation models of computation. These are different world views rendering theory and practice incompatible with each other. We will begin work towards a formal algorithmic model for existing large scale graph processing systems as part of this seminar with a view to answering this question. This model should accurately describe large-scale graph processing systems built by the systems community as well as be formal enough to enable algorithmic complexity researchers to draw useful conclusions about their scalability with data set size. This will require close co-operation between theoretical computer scientists on the algorithms front to talk to practical systems researchers.
  2. What is the taxonomy of applications that graph processing systems should support? Can it be reduced to a set of representative benchmarks that researchers in this area need to care about? We can currently identify two main interest areas. The first is large scale graph traversal, of interest to the high performance computing and web-services community; primarily driven by security applications and from data mining needs. The second is spectral approaches, primarily of interest to the machine learning community, building systems such as Graphlab. The output from this agenda item will be a clear set of well defined applications that the community can agree will serve as objects of study for building high performance graph processing systems.
  3. What is the correct interface to the system that may be assumed when building a graph DSL? DSL researchers are interested in productive and easy ways to specify graph computations. However they have given relatively little thought to interfacing in an efficient way to systems that execute graph computation. The agenda item therefore will be a discussion between programming language researchers and systems, algorithms and database people researchers about the correct level of interface between a DSL and the underlying systems. A good model in this regard is the decoupling of database systems from ways to query them using declarative languages like SQL. The litmus test for success for this agenda item will therefore be a sketch for a DSL that exposes opportunities for optimization, is productive to use and at the same time is oblivious to the underlying system.
  4. What is the design trade-off among different graph processing approaches. For example, (i) the general graph processing system vs. the dedicated approaches specially optimized for specific graph problems, (ii) the running time between pre-processing and graph processing, (iii) performance vs. running expense.

The seminar identified whole graph analytics and point queries on graphs that explore neighborhoods of vertices as distinct application domains, which require separate treatment and systems. All the participants agreed that there was an urgent need to standardize benchmarks and datasets in order to make meaningful progress with graph processing - particularly given the diverse nature of the communities involved. In addition, the seminar identified a number of interesting approaches and trends. There was also considerable participation from industry, which included work in graph databases as well as new systems architectures that will require practitioners to rethink traditional approaches for graph processing.

The seminar consisted of 6 sessions on focused topic presentations and discussions, followed by a joint session with the seminar 14461 on "High-performance Graph Algorithms and applications computational Science". At the last day of the seminar, the whole morning was dedicated to the discussion on the challenges and future directions of large-scale graph processing (see Section Challenges and Future directions).

License
  Creative Commons BY 3.0 Unported license
  Eiko Yoneki and Amitabha Roy and Derek Murray

Classification

  • Data Structures / Algorithms / Complexity
  • Optimization / Scheduling
  • Software Engineering

Keywords

  • Large-scale graph processing
  • Graph structured data
  • Graph algorithms
  • Graph traversal
  • Machine Learning
  • Databases
  • Computer systems
  • Computer architecture
  • Parallel I/O
  • Parallel programming
  • Storage

Buchausstellung

Bücher der Teilnehmer 

Buchausstellung im Erdgeschoss der Bibliothek

(nur in der Veranstaltungswoche).

Dokumentation

In der Reihe Dagstuhl Reports werden alle Dagstuhl-Seminare und Dagstuhl-Perspektiven-Workshops dokumentiert. Die Organisatoren stellen zusammen mit dem Collector des Seminars einen Bericht zusammen, der die Beiträge der Autoren zusammenfasst und um eine Zusammenfassung ergänzt.

 

Download Übersichtsflyer (PDF).

Publikationen

Es besteht weiterhin die Möglichkeit, eine umfassende Kollektion begutachteter Arbeiten in der Reihe Dagstuhl Follow-Ups zu publizieren.

Dagstuhl's Impact

Bitte informieren Sie uns, wenn eine Veröffentlichung ausgehend von
Ihrem Seminar entsteht. Derartige Veröffentlichungen werden von uns in der Rubrik Dagstuhl's Impact separat aufgelistet  und im Erdgeschoss der Bibliothek präsentiert.