http://www.dagstuhl.de/14462

November 9 – 12 , 2014, Dagstuhl Seminar 14462

Systems and Algorithms for Large-scale Graph Analytics

Organizers

Derek Murray (San Francisco, US)
Amitabha Roy (EPFL – Lausanne, CH)
Eiko Yoneki (University of Cambridge, GB)

For support, please contact

Dagstuhl Service Team

Documents

Dagstuhl Report, Volume 4, Issue 11 Dagstuhl Report
Aims & Scope
List of Participants
Shared Documents
Dagstuhl Seminar Schedule [pdf]

Summary

Graph analytics, the class of data analysis that deals with data forming networks, is emerging as a huge consumer of computational resources due to its complex, data-hungry algorithms. Social networking, personal medicine, bioinformatics, text/graphics content analysis and search engines are a few examples where Tera-, Peta- or even Exa-scale graph processing is required. Graph algorithms are becoming increasingly important for solving multiple problems in diverse fields. As these problems grow in scale, parallel computing resources are required to meet the computational and memory requirements. Notably, the algorithms, software and hardware that have worked well for developing mainstream parallel applications are not usually effective for massive-scale graphs from the real world, which exhibits more complex and irregular structures than traditional data in scientific computing.

Research into large scale graph processing within the computer science community is currently at an early and fragmented stage. This seminar brought together researchers from systems, computer architecture, algorithms and databases to discuss emerging trends and to identify opportunities for future advancement. Prior to the seminar, we had prepared a range of research questions below.

  1. What is the correct algorithmic abstraction for systems handling large graphs? Algorithmic complexity researchers use PRAM and I/O complexity models to characterize the algorithmic complexity of graph processing. On the other hand, systems researchers largely build systems that implement scatter-gather and label propagation models of computation. These are different world views rendering theory and practice incompatible with each other. We will begin work towards a formal algorithmic model for existing large scale graph processing systems as part of this seminar with a view to answering this question. This model should accurately describe large-scale graph processing systems built by the systems community as well as be formal enough to enable algorithmic complexity researchers to draw useful conclusions about their scalability with data set size. This will require close co-operation between theoretical computer scientists on the algorithms front to talk to practical systems researchers.
  2. What is the taxonomy of applications that graph processing systems should support? Can it be reduced to a set of representative benchmarks that researchers in this area need to care about? We can currently identify two main interest areas. The first is large scale graph traversal, of interest to the high performance computing and web-services community; primarily driven by security applications and from data mining needs. The second is spectral approaches, primarily of interest to the machine learning community, building systems such as Graphlab. The output from this agenda item will be a clear set of well defined applications that the community can agree will serve as objects of study for building high performance graph processing systems.
  3. What is the correct interface to the system that may be assumed when building a graph DSL? DSL researchers are interested in productive and easy ways to specify graph computations. However they have given relatively little thought to interfacing in an efficient way to systems that execute graph computation. The agenda item therefore will be a discussion between programming language researchers and systems, algorithms and database people researchers about the correct level of interface between a DSL and the underlying systems. A good model in this regard is the decoupling of database systems from ways to query them using declarative languages like SQL. The litmus test for success for this agenda item will therefore be a sketch for a DSL that exposes opportunities for optimization, is productive to use and at the same time is oblivious to the underlying system.
  4. What is the design trade-off among different graph processing approaches. For example, (i) the general graph processing system vs. the dedicated approaches specially optimized for specific graph problems, (ii) the running time between pre-processing and graph processing, (iii) performance vs. running expense.

The seminar identified whole graph analytics and point queries on graphs that explore neighborhoods of vertices as distinct application domains, which require separate treatment and systems. All the participants agreed that there was an urgent need to standardize benchmarks and datasets in order to make meaningful progress with graph processing - particularly given the diverse nature of the communities involved. In addition, the seminar identified a number of interesting approaches and trends. There was also considerable participation from industry, which included work in graph databases as well as new systems architectures that will require practitioners to rethink traditional approaches for graph processing.

The seminar consisted of 6 sessions on focused topic presentations and discussions, followed by a joint session with the seminar 14461 on "High-performance Graph Algorithms and applications computational Science". At the last day of the seminar, the whole morning was dedicated to the discussion on the challenges and future directions of large-scale graph processing (see Section Challenges and Future directions).

License
  Creative Commons BY 3.0 Unported license
  Eiko Yoneki and Amitabha Roy and Derek Murray

Classification

  • Data Structures / Algorithms / Complexity
  • Optimization / Scheduling
  • Software Engineering

Keywords

  • Large-scale graph processing
  • Graph structured data
  • Graph algorithms
  • Graph traversal
  • Machine Learning
  • Databases
  • Computer systems
  • Computer architecture
  • Parallel I/O
  • Parallel programming
  • Storage

Book exhibition

Books from the participants of the current Seminar 

Book exhibition in the library, ground floor, during the seminar week.

Documentation

In the series Dagstuhl Reports each Dagstuhl Seminar and Dagstuhl Perspectives Workshop is documented. The seminar organizers, in cooperation with the collector, prepare a report that includes contributions from the participants' talks together with a summary of the seminar.

 

Download overview leaflet (PDF).

Publications

Furthermore, a comprehensive peer-reviewed collection of research papers can be published in the series Dagstuhl Follow-Ups.

Dagstuhl's Impact

Please inform us when a publication was published as a result from your seminar. These publications are listed in the category Dagstuhl's Impact and are presented on a special shelf on the ground floor of the library.

NSF young researcher support