TOP
Search the Dagstuhl Website
Looking for information on the websites of the individual seminars? - Then please:
Not found what you are looking for? - Some of our services have separate websites, each with its own search option. Please check the following list:
Schloss Dagstuhl - LZI - Logo
Schloss Dagstuhl Services
Seminars
Within this website:
External resources:
  • DOOR (for registering your stay at Dagstuhl)
  • DOSA (for proposing future Dagstuhl Seminars or Dagstuhl Perspectives Workshops)
Publishing
Within this website:
External resources:
dblp
Within this website:
External resources:
  • the dblp Computer Science Bibliography


Dagstuhl Seminar 19491

Big Graph Processing Systems

( Dec 01 – Dec 06, 2019 )

(Click in the middle of the image to enlarge)

Permalink
Please use the following short url to reference this page: https://www.dagstuhl.de/19491

Organizers

Contact



Motivation

In our world, data is not just getting bigger, it is also getting more connected. Exploring, describing, predicting, and explaining phenomena connected to the interconnected world requires the use of an adequate data abstraction. Graphs are recognized as a general, natural, and flexible data-abstraction that can model complex relationships, interactions, and interdependencies between objects. Graphs have been widely used to represent datasets and encode problems across an already extensive range of application domains. The ever-increasing size of graph-structured data for these applications creates a critical need for scalable and even elastic systems that can process large amounts of it efficiently. Additionally, the complexity of using multiple datasets simultaneously in complex analysis, raises numerous challenges for graph processing, from new requirements to new capabilities.

This Dagstuhl Seminar is planning to discuss some of these open challenges on the interplay between graph data, abstractions, systems, performance engineering, and software engineering, with a main focus on the following key topics related to big graph processing systems:

  1. Design Decisions of Big Graph Processing Ecosystems: In modern setups, graph-processing is not a self-sustained, independent activity, but rather part of a larger big-data processing ecosystem with many system alternatives and possible design decisions. We need a clear understanding of the impact and the trade-offs of the various decisions in order to effectively guide the developers of big graph processing applications.
  2. High-Level Graph Processing Abstractions: While imperative programming models, such as vertex-centric or edge-centric programming models, are popular, they are lacking a high-level exposition to the end user. To increase the power of graph processing systems and foster the usage of graph analytics in applications, we need to design high-level graph processing abstractions. It is currently completely open how future declarative graph processing abstractions could look like.
  3. Application and Domain Specific Requirements: Due to the ubiquity of graph-shaped data, users and applications deal with these data in their daily tasks from private to professional life. It becomes thus crucial to understand the user’s requirements in executing queries and complex analytical tasks on graph data, and to understand their actual usage of these data in production environments.
  4. Performance and Scalability Evaluation: Traditionally, performance and scalability are measures of efficiency, e.g. FLOPS, throughput, or speedup, are difficult to apply for graph processing, especially since performance is non-trivially dependent on platform, algorithm, and dataset. Moreover, running graph-processing workloads in the cloud leverages additional challenges. Such performance-related issues are key to identify, design, and build upon widely recognized benchmarks for graph processing.

To address these topics the seminar will bring together researchers, developers, and practitioners actively working on these topics. The aim is to use the insights and the results of the discussions of the seminar to provide a roadmap that can guide the development of several aspects in the future of big graph processing systems.

Copyright Angela Bonifati, Alexandru Iosup, Sherif Sakr, and Hannes Voigt

Press Room

Press Reviews


Summary

In memoriam:

This seminar is dedicated to the memory of our co-organizer and friend Sherif Sakr (1979-2020), whose unexpected early departure happened a few months after the seminar. Sherif was a leading scientist in the field of Big Data Technologies. We are grateful to him for the time spent together and the joint work preceding and following the seminar. He will be deeply missed.

The world has become more interconnected than ever. Through an advancing wave of technologies and applications, our society is producing and consuming data at an unprecedented scale and complexity. To model the data, graphs offer a general model and mathematical abstraction, in the simplest form based on arbitrary objects (vertices) connected by relationships (edges), with possibly additional information (properties). Graphs enable already a remarkable range of application domains, from industry to science, from society to governance, from education to gaming, but their true potential is just beginning to be unlocked. However, the tremendous increase in the size, complexity, and diversity of the graph-structured data and their applications, and the increasing community using graphs to understand and automate the world around us, raises new challenges for computer science. Under these new circumstances, the potential benefits of graph processing could be canceled by the difficulty to understand, create, develop, and automate graph processing for the masses. Focusing on the interplay between graph data, abstractions, systems, performance engineering, and software engineering, this seminar brings together researchers, developers, and practitioners actively working on this topic, to discuss emph{timely and relevant} open challenges with a main focus on the following topics: trade-off of design decisions of big graph processing systems, high-level graph programming abstractions and graph query languages, the specific requirements for different application domains for benchmarking and graph engineering purposes, systems and ecosystems for graph processing, the fundamental processes and methods leading to the science, design, and engineering of graph processing.

The seminar focused on the following key topics related to big graph processing systems:

Topic 1. Design Decisions of Big Graph Processing Ecosystems: In modern setups, graph-processing is not a self-sustained, independent activity, but rather part of a larger big-data processing ecosystem. Typical examples include the Giraph's deployment in the Facebook MapReduce ecosystem, Powergraph in the GraphLab machine learning and data-mining ecosystem, and GraphX in the Apache Spark ecosystem. In general, more alternatives usually mean harder decisions for choice. In practice, with the wide spectrum of big graph processing systems, with different design decisions, that are currently available, it becomes very challenging to decide by intuition which system is the most adequate for a given application requirements, workload, or the underlying ecosystem. Making such decisions requires significant knowledge about the graph complexity, graph size, world requirements, and even the implementation details of the various available systems. Currently, we lack the fundamental models to understand and quantitatively analyze, estimate, and describe the complexity of big graph processing jobs. In addition, there is no understanding on the relationship between the graph complexity and the computational complexity of big graph processing jobs. Therefore, we need a clear understanding for the impact and the trade-offs of the various decisions (e.g., centralized vs distributed, partitioning strategy, programming model, graph representation model, memory storage vs disk storage) in order to effectively guide the developers of big graph processing applications.

Topic 2. High-Level Graph Processing Abstractions: While imperative programming models, such as vertex-centric or edge-centric programming models, are popular, they are lacking a high-level exposition to the end user. This way the end user is required more technical programming, which limits the end user productivity in building graph processing pipelines. In contrast, graph query languages build on more high-level, declarative constructs. Query language abstraction give more power to the less technical user and allow for extensive performance optimization by the underlying graph processing system. Current graph query languages, however, lack the power required in many graph analytics use cases. To increase the power of graph processing systems and foster the usage of graph analytics in applications, we need to design high-level graph processing abstractions. It is currently completely open how future declarative graph processing abstractions could look like, which the best level of abstraction is, how abstraction for analytics integrate with existing graph query languages, and we can evaluate new graph processing abstractions regarding utility, simplicity, expressiveness, and optimization potential.

Topic 3. Performance and Scalability Evaluation: Traditionally, performance and scalability are measures of efficiency, contrasting the ability of systems to utilize resources: FLOPS, throughput (e.g., EVPS), or speedup (i.e., compared to either a single-node, or a sequential implementation). Such metrics are difficult to apply for graph processing, especially since performance is non-trivially dependent on platform, algorithm, and dataset (i.e., the PAD triangle). Therefore, many important questions arise: how to compare the performance of graph-processing systems?, how to define scalability?, should one compare largely different systems, e.g., a distributed, heterogeneous system with a highly-tuned, hand-written sequential implementation?, how to design a framework for reproducible performance evaluation?. Moreover, running graph-processing workloads in the cloud leverages additional challenges. First, we would like to understand whether the intrinsic cloud elasticity could be harnessed for graph processing. Second, clouds are known to be impacted by large degrees of performance variability due to colocation and virtualization overheads. Studying the impact of cloud performance variability onto graph-processing workloads is another topic of interest. Such performance-related issues are key to identify, design, and build upon widely recognized benchmarks for graph processing.

For each topic, the discussion also considered specific and general applications of graph processing, at various volume, velocity, and other dimensions.

The seminar brought together over 40 diverse and high quality researchers with core expertise from two generally distinct communities, data management and (large-scale) computer systems. The seminar was successful, and addressed in particular topics around graph processing systems: ecosystems, abstractions and other fundamental theory, and performance. To this end, we structured the seminar as follows:

  1. Prior to the seminar, the co-organizers have contacted each participant, eliciting commitment for one or several topics, and ideas for key elements of the discussion.
  2. During the first day of the seminar, the morning was dedicated to short presentations by each participant, and a long break-out session per topic. The former allowed the participants to better understand each other's core ideas and keywords, to identify synergies and to find experts for keywords not entirely familiar.
  3. For the next two days, each morning challenged at least one half the participants with a tutorial given by a leading expert from the other community, then proceeded with break-out sessions organized per topic, and ended with a plenary session to share the main ideas. The tutorials were given by Tamer Özsu on "Graph Processing: A Panoramic View and Some Open Problems", on behalf of the data management community, and by Antonino Tumeo on "Big Graph Processing: The System Perspective", on behalf of the systems community. The main results of these two days of intense work were making terminology more uniform across the participants, and the core ideas about challenges (open problems), directions for long-term research, and identification of concrete short-term plans for continuation.
  4. During the last day of the seminar, the participants finalized the immediate conclusions of the seminar (see Section "In Conclusion: Challenges and Future Directions for Big Graph Processing Systems"), and agreed on the plans for continuation.
Copyright Alexandru Iosup, Angela Bonifati, Sherif Sakr, and Hannes Voigt

Participants
  • Khaled Ammar (Thomson Reuters Labs, CA) [dblp]
  • Renzo Angles (University of Talca - Chile, CL) [dblp]
  • Walid Aref (Purdue University - West Lafayette, US) [dblp]
  • Marcelo Arenas (PUC - Santiago de Chile, CL) [dblp]
  • Maciej Besta (ETH Zürich, CH) [dblp]
  • Peter A. Boncz (CWI - Amsterdam, NL) [dblp]
  • Angela Bonifati (University Claude Bernard - Lyon, FR) [dblp]
  • Khuzaima Daudjee (University of Waterloo, CA) [dblp]
  • Emanuele Della Valle (Polytechnic University of Milan, IT) [dblp]
  • Stefania Dumbrava (ENSIIE - Evry, FR) [dblp]
  • Olaf Hartig (Linköping University, SE) [dblp]
  • Bernhard Haslhofer (AIT - Austrian Institute of Technology - Wien, AT) [dblp]
  • Tim Hegeman (VU University Amsterdam, NL) [dblp]
  • Jan Hidders (Birkbeck, University of London, GB) [dblp]
  • Katja Hose (Aalborg University, DK) [dblp]
  • Adriana Iamnitchi (University of South Florida - Tampa, US) [dblp]
  • Alexandru Iosup (VU University Amsterdam, NL) [dblp]
  • Vasiliki Kalavri (Boston University, US) [dblp]
  • Hugo Kapp (Oracle Labs Switzerland - Zürich, CH)
  • Wim Martens (Universität Bayreuth, DE) [dblp]
  • M. Tamer Özsu (University of Waterloo, CA) [dblp]
  • Eric Peukert (Universität Leipzig, DE) [dblp]
  • Stefan Plantikow (Neo4j - Berlin, DE) [dblp]
  • Mohamed Ragab (University of Tartu, EE) [dblp]
  • Matei R. Ripeanu (University of British Columbia - Vancouver, CA) [dblp]
  • Sherif Sakr (University of Tartu, EE) [dblp]
  • Semih Salihoglu (University of Waterloo, CA) [dblp]
  • Christian Schulz (Universität Wien, AT) [dblp]
  • Petra Selmer (Neo4j - London, GB) [dblp]
  • Juan F. Sequeda (data.world - Austin, US) [dblp]
  • Joshua Shinavier (Uber Engineering - Palo Alto, US) [dblp]
  • Gábor Szárnyas (Budapest Univ. of Technology & Economics, HU) [dblp]
  • Riccardo Tommasini (Polytechnic University of Milan, IT)
  • Antonino Tumeo (Pacific Northwest National Lab. - Richland, US) [dblp]
  • Alexandru Uta (VU University Amsterdam, NL) [dblp]
  • Ana Lucia Varbanescu (University of Amsterdam, NL) [dblp]
  • Hannes Voigt (Neo4j - Leipzig, DE) [dblp]
  • Hsiang-Yun Wu (TU Wien, AT) [dblp]
  • Nikolay Yakovets (TU Eindhoven, NL) [dblp]
  • Da Yan (The University of Alabama - Birmingham, US) [dblp]
  • Eiko Yoneki (University of Cambridge, GB) [dblp]

Related Seminars
  • Dagstuhl Seminar 25171: Holistic Graph-Processing Systems: Enabling Real-World Scale and Societal Impact (2025-04-21 - 2025-04-25) (Details)

Classification
  • data bases / information retrieval
  • world wide web / internet

Keywords
  • Big Data
  • Big Graphs
  • Graph Processing Systems