Dagstuhl Seminar 14461
High-performance Graph Algorithms and Applications in Computational Science
( Nov 09 – Nov 14, 2014 )
- Ulrich Carsten Meyer (Goethe-Universität - Frankfurt a. M., DE)
- Henning Meyerhenke (KIT - Karlsruher Institut für Technologie, DE)
- Ali Pinar (Sandia Nat. Labs - Livermore, US)
- Ilya Safro (Clemson University, US)
- Andreas Dolzmann (for scientific matters)
- Susanne Bach-Bernhard (for administrative matters)
Graphs (or networks) are a very powerful abstraction of various phenomena that can be expressed as a relation between entities. For several decades researchers in theoretical computer science and discrete mathematics have been developing a wealth of graph theory and graph algorithms. Recently, however, we see a qualitative change how graph algorithms are used in practice. This change is due to (i) the complex structure of graphs in new and emerging applications, (ii) the size of typical inputs, and (iii) the computer systems on which graph problems are solved.
Application areas driving the aforementioned change can be very diverse. To name a few, consider microblogging services in social media, genome assembly in bioinformatics, routing in traffic networks, network analysis and graph mining (e. g. predicting network evolution or detecting communities), technology diffusion and decentralized trade in economics, and combinatorial scientific computing (e. g. spectral graph theory, graph partitioning). These application areas have in common that current and emerging data sources show massive scale and irregular or non-uniform distribution of structural and semantic properties of entities and their interactions.
Despite the increasing importance of these complex networks, research on their many aspects is still in early stages, and we are facing daunting challenges. One big challenge stems from the fact that the governing principles on how some of these networks evolve and function are unknown. Without such principles, we cannot predict what the Internet will look like in five years, we cannot simulate email exchange networks, and we cannot tell whether two graphs are similar or not.
Additional challenges are posed by the hardware. Tomorrow’s computer architectures are becoming increasingly parallel and heterogeneous. Yet, many graph algorithms as well as techniques in relevant applications are inherently sequential. That is why many previous results do not transfer easily into efficient computational tools. To fully exploit emerging architectures, however, it is important to develop parallel algorithms and tools. Other technological challenges, for example deep memory hierarchies and solid-state disks (SSDs), and particularly their impact on algorithm design have also to be considered in this context. Algorithmic techniques that show promise try to be oblivious to the specific hardware, and optimize for general classes of typical hardware.
As a consequence, the central goal of the seminar is to exploit different techniques for highperformance graph algorithm design in various application areas. This will span the issues of efficient and effective implementation from well known network analysis and graph mining techniques that are traditionally sequential to modern methods that require a high degree of parallelism to applications that need the performance of emerging hardware architectures. Only if larger inputs can be processed faster than before, the underlying science can reach new levels.
We will exploit synergies by bringing together researchers that represent various communities such as (a) researchers designing, analyzing, and implementing graph algorithms relevant to the seminar topics; (b) specialists in innovative models of computation as well as in parallel and highperformance computing; (c) specialists in combinatorial scientific computing; (d) researchers working on graph mining and network analysis techniques, who may come from very diverse fields such as databases, data mining, theoretical computer science, applied mathematics, and applied physics; (e) domain scientists as users of algorithms and software tools developed by the communities mentioned before.
Research on high-performance graph algorithms and applications in computational science is currently scattered across several communities with different cultures. With this seminar we want to promote this emerging scientific area as a subfield in the computing sciences. We hope to nurture strong collaborations and a “common language” between communities that are otherwise disjoint to some extent. By bringing together experts from different fields, we hope that the next-generation tools of the trade become significantly faster and more feature-rich than the current state of the art. This, in turn, will allow domain scientists to process larger data sets more effectively, which will allow them to tackle problems previously out of reach.
Many presentations in this Dagstuhl seminar emphasized recent trends regarding typical inputs and their effect on graph algorithm development for practical purposes. One can divide these presentations into four categories: (i) Traditional graph (or matrix) problems in new scenarios, (ii) graph analytics algorithms of various sorts, (iii) parallel computing aspects such as tools, computational models, load balancing, and communication; and finally (iv) emerging high-performance application and hardware trends. The following four paragraphs give a brief overview over the talks presented in each of these categories.
Pothen discussed different matching problems and how the emergence of complex networks have changed various matching algorithms recently. Road networks, in turn, are by no means complex and the traditional Dijkstra algorithm solves queries on continental instances in few seconds. Yet, for more challenging scenarios, for example millions of queries per second on webservers or multiple optimization critieria, more elaborate solutions are necessary, as presented by Sanders. Toledo addressed the importance of communication efficiency on large-scale parallel systems for traditional numerical problems such as LU decomposition. A similar numerical topic was the solution of Laplacian linear systems, for which new combinatorial solvers and related techniques from the theory community were presented and discussed by Madry and by Toledo. Furthermore, Boman and Toledo initiated a tangible plan for a scientific competition on solvers for this class of linear systems.
The analytics algorithms part experienced a number of talks on graph clustering and community detection, which means the identification of natural vertex groups in graphs. Several very fast algorithms and their implementation were discussed and compared. Centralities are used for finding important (but in general unrelated) vertices or edges in a graph. Catalyürek showed how to exploit parallelism in centrality algorithms to speed them up in different hardware settings, including accelerators. Bergamini, in turn, used approximation to obtain a speedup in dynamic graphs. Many other analytics tasks and algorithms were discussed, including anomaly detection presented by Miller and label inference by Chakrabarti, who both focused on techniques for very large graphs. Graph size was also a motivation for sparsification as discussed by Parthasarathy, either to save space or running time (or both) in later stages of an algorithmic pipeline.
Parallelism was the common theme in the third category. Here we summarize algorithmic techniques such as load balancing by graph partitioning, computational models as well as tools and middleware. Several speakers outlined challenges and/or algorithmic solutions in graph partitioning, in particular for complex networks or massively parallel systems. It became also clear that the development of graph algorithms for massive inputs benefits from suitable computational models. An example is the parallel external memory model for which Meyer as well as Veith showed algorithmic solutions. Another prerequisite for efficient graph algorithms in practice is tool support, including building block standards (proposed by Buluc) and communication middleware (presented by Lumsdaine). The pros and cons of different tools were discussed in an animated manner with the co-located Dagstuhl seminar 14462 "Systems and Algorithms for Large-scale Graph Analytics" within a joint session. The organizers are confident that this discussion has led to a better understanding of each other's community and their contributions. We also hope and think that this exchange will lead to an accelerated dissemination of the respective leading research results across community borders.
Finally, Brugger presented innovative hardware specifically designed to support certain graph algorithms. Talks with a particular focus on innovative applications from outside the core of computer science were presented by several speakers as well. Both Srivastav and Buluc, for example, described algorithms for sequence assembly, a problem in bioinformatics with massive data sets.
- Deepak Ajwani (Bell Labs - Dublin, IE) [dblp]
- Elisabetta Bergamini (KIT - Karlsruher Institut für Technologie, DE) [dblp]
- Rob Bisseling (Utrecht University, NL) [dblp]
- Erik Boman (Sandia National Labs - Albuquerque, US) [dblp]
- Christian Brugger (TU Kaiserslautern, DE) [dblp]
- Aydin Buluc (Lawrence Berkeley National Laboratory, US) [dblp]
- Ümit V. Çatalyürek (Ohio State University, US) [dblp]
- Deepayan Chakrabarti (Facebook - Menlo Park, US) [dblp]
- Tiago de Paula Peixoto (Universität Bremen, DE)
- Yann Disser (TU Berlin, DE) [dblp]
- John Feo (Pacific Northwest National Lab. - Richland, US) [dblp]
- Enver Kayaaslan (ENS - Lyon, FR) [dblp]
- Dominique LaSalle (University of Minnesota - Minneapolis, US) [dblp]
- Andrew Lumsdaine (Indiana University - Bloomington, US) [dblp]
- Kamesh Madduri (Pennsylvania State University - University Park, US) [dblp]
- Aleksander Madry (EPFL - Lausanne, CH) [dblp]
- Fredrik Manne (University of Bergen, NO) [dblp]
- Ulrich Carsten Meyer (Goethe-Universität - Frankfurt a. M., DE) [dblp]
- Friedhelm Meyer auf der Heide (Universität Paderborn, DE) [dblp]
- Henning Meyerhenke (KIT - Karlsruher Institut für Technologie, DE) [dblp]
- Benjamin A. Miller (MIT Lincoln Laboratory - Lexington, US) [dblp]
- Petra Mutzel (TU Dortmund, DE) [dblp]
- Braxton Osting (University of Utah, US) [dblp]
- Srinivasan Parthasarathy (Ohio State University - Columbus, US) [dblp]
- Francois Pellegrini (University of Bordeaux, FR) [dblp]
- Ali Pinar (Sandia Nat. Labs - Livermore, US) [dblp]
- Alex Pothen (Purdue University - West Lafayette, US) [dblp]
- Ilya Safro (Clemson University, US) [dblp]
- Peter Sanders (KIT - Karlsruher Institut für Technologie, DE) [dblp]
- Christian Schulz (KIT - Karlsruher Institut für Technologie, DE) [dblp]
- Anand Srivastav (Universität Kiel, DE) [dblp]
- Christian Staudt (KIT - Karlsruher Institut für Technologie, DE) [dblp]
- Veronika Strnadova (University of California - Santa Barbara, US) [dblp]
- Sivan Toledo (Tel Aviv University, IL) [dblp]
- Jesper Larsson Träff (TU Wien, AT) [dblp]
- Bora Uçar (ENS - Lyon, FR) [dblp]
- Panayot S. Vassilevski (LLNL - Livermore, US) [dblp]
- David Veith (Goethe-Universität - Frankfurt a. M., DE) [dblp]
- Katharina A. Zweig (TU Kaiserslautern, DE) [dblp]
- data bases / information retrieval
- data structures / algorithms / complexity
- (Hyper)Graph and network algorithms
- high-performance computing
- massive network analytics
- parallel algorithmic machine models
- graph-based computational applications