Higher-Order Graph Models: From Theoretical Foundations to Machine Learning
( 29. Aug – 01. Sep, 2021 )
- Tina Eliassi-Rad (Northeastern University - Boston, US)
- Vito Latora (Queen Mary University of London, GB)
- Martin Rosvall (University of Umeå, SE)
- Ingo Scholtes (Universität Würzburg, DE & Universität Zürich, CH)
- Michael Gerke (für wissenschaftliche Fragen)
- Jutka Gasiorowski (für administrative Fragen)
- Sequential Motifs in Observed Walks - LaRock, Timothy; Scholtes, Ingo; Eliassi-Rad, Tina - Cornell University : arXiv.org, 2022. - 31 pp..
- Inference of time-ordered multibody interactions - Alvarez-Rodriguez, Unai; Petrovic, Luka V.; Scholtes, Ingo - Cornell University : arXiv.org, 2021. - 10 pp..
- De Bruijn goes Neural : Causality-Aware Graph Neural Networks for Time Series Data on Dynamic Graphs - Qarkaxhija, Lisi; Perri, Vincenzo; Scholtes, Ingo - Cornell University : arXiv.org, 2022. - 15 pp..
- One Graph to Rule them All : Using NLP and GraphNeural Networks to analyse Tolkien’s Legendarium - Perri, Vincenzo; Qarkaxhija, Lisi; Zehe, Albin; Hotho, Andreas; Scholtes, Ingo - Cornell University : arXiv.org, 2022. - 26 pp..
Graph and network models are a cornerstone of data science and machine learning applications in computer science, social sciences, humanities, and life sciences. Most state-of-the-art network analysis and graph-based learning techniques build on simple graph abstractions, where nodes represent a system's elements, and links represent dyadic interactions, relations, or dependencies between those elements. This mathematical formalism has proven useful for reasoning about, for example, the centrality of nodes, the evolution and control of dynamical processes, and the community or cluster structure in complex systems.
While the advantages of graph models of relational data are undisputed, we often have access to rich data with multiple types of higher-order, inherently non-dyadic interactions that simple graphs cannot represent in a meaningful way. Important examples include relational data on actors in social systems who engage in group collaborations, time-stamped interaction data giving rise to chronologically ordered sequences of (dyadic) links, sequential data such as user clickstreams, mobility trajectories or citation paths with sequences of traversed nodes, and data on networked systems with multiple types or layers of connectivity. Over the past years, researchers have shown that the presence of such higher-order interactions can fundamentally influence our understanding of complex networked systems. They can change our notion of the importance of nodes captured by centrality measures, affect the detection of cluster and community structures in graphs, and influence dynamical processes like diffusion or epidemic spreading, as well as associated control or containment strategies in non-trivial ways.
To address this important challenge, researchers in topological data analysis, network science, machine learning, and physics recently started to generalize network analysis to higher-order graph models that capture more than dyadic relations. Over the past few years, this research community has developed a rich portfolio of higher-order network models and representations. Exemplary modelling approaches successfully use hypergraphs, simplicial network models, high-dimensional De Bruijn graphs, higher-, variable-, and multi-order Markov chains, as well as multi-layer and multiplex networks. These modelling approaches address the same fundamental limitation of graph models, namely that we cannot understand the structure and dynamics of complex systems by decomposing direct and indirect interactions between elements into a set of dyadic relations with a single type. However, the similarities and differences between these different modelling approaches, and the machine learning techniques derived from them, are poorly understood.
Addressing this gap, this Dagstuhl Seminar aims to improve our understanding of the strengths, weaknesses, commonalities, and differences of these different approaches along with their resulting computational challenges. Bringing together key researchers from different communities, the seminar aims to form a common foundation for the developing graph mining and machine learning techniques that use recent advances in the study of higher-order graph models. We specifically aim to develop a common language and a shared research platform that fosters progress in data analytics and machine learning for data with complex relational structure.
The network science and graph mining community has created a rich portfolio of data analysis and visualisation techniques that have become a cornerstone for knowledge extraction from relational data on complex systems. Most of those techniques build on simple graph abstractions, where nodes represent a system’s elements, and links represent dyadic interactions, relations, or dependencies between those elements. This mathematical formalism has proven useful for reasoning, e.g., about the centrality of nodes, the evolution and control of dynamical processes, and the community or cluster structure in complex systems, given that we have access to relational data . However, the graph abstractions used in those methods typically do not account for higher-order relations between nodes that are present in many real complex systems. Important examples for such data include:
- relational data that is inherently non-dyadic, such as (unordered) sets of authors co-authoring scientific articles, protein triplets in a cell that simultaneously interact with each other, or actors in social systems engaging in group collaborations,
- time-stamped data on social networks with chronologically ordered sequences of (dyadic) interactions, where specific sequences of nodes interact via causal paths
- sequential data on networked systems, such as user click streams, mobility trajectories, financial transaction sequences, citation paths, or directed acyclic graphs that give rise to a chronologically or topologically ordered sequences of nodes traversed by processes
- data on networked systems with multiple types or layers of links that cannot be reduced to a simple graph model
Over the past years, researchers have shown that the presence of such higher-order interactions can fundamentally alter our understanding of complex systems. They can change our notion of the importance of nodes captured by centrality measures, affect the detection of cluster and community structures in graphs, and influence dynamical processes like diffusion or epidemic spreading, as well as associated control strategies in non-trivial ways [24, 28, 33, 4, 5, 10, 35]. To further develop graph-based representations of data and broaden their potential application in pattern recognition, data analysis, and machine learning, over the past few years researchers have developed a rich portfolio of higher-order network models and representations that capture more than just dyadic dependencies in complex systems. The organisers of this seminar have recently summarised current research and open challenges in this area in three independent overview and perspective articles [1, 30, 15]. An incomplete list of approaches explored over the past few years include:
- hypergraphs, where each hyperedge can connect an arbitrary number of nodes 
- simplicial network models, where simplices represent d-dimensional group interactions [12, 10]
- d-dimensional De Bruijn graphs, where edges capture ordered sets of d dyadic interactions [28, 16]
- memory networks, where memory nodes capture Non-Markovian properties in time series data 
- higher-, variable-, and multi-order Markov models for temporal networks [33, 20, 4]
- multi-layer and multiplex networks with multiple types of links between nodes 
- applications of categorical sequence mining techniques to model patterns in sequences of node sets 
In Figure 1, we illustrate some of the higher-order graph models listed above. All these modelling approaches address the same fundamental limitation of graph models when studying complex systems: we cannot understand a system’s structure and dynamics by decomposing direct and indirect interactions between elements into a set of dyadic relations with a single type. However, the similarities and differences between these different approaches are still not fully understood.
At a critical time for the community, this Dagstuhl seminar intended to improve our understanding of the strengths, weaknesses, commonalities, and differences of these different approaches along with their resulting computational and epistemological challenges. The seminar aimed to create a common foundation for developing graph mining and machine learning techniques that use recent advances in the study of higher-order graph models by gathering key researchers from different communities, including machine learning, information retrieval and data mining, complex systems theory, theoretical physics, network science, computational social science, and mathematics. The participants included senior and junior researchers focusing on four related and intersecting topics: (i) Topological and Graph-Theoretic Foundations, (ii) Higher-Order Models for Dynamical Processes, (iii) Higher-Order Pattern Recognition and Machine Learning, (iv) Computational Aspects in Higher-Order Graph Analysis and Graph Mining.
The organisers used the four topics to structure the seminar program and derive the participants’ initial assignment to possible working groups. After an initial round of brief opening statements, participants introduced themselves and stated their specific interests for the seminar during five-minute lightning talks. During a match-making session taking place in the afternoon of day one, all interests expressed by the participants were consolidated into a set of working groups, addressing the following six areas: (i) Visualisation and Interpretability of Higher-Order Graph Models, (ii) Learning and Model Selection, (iii) Unification of Different Higher-Order Modelling Frameworks, (iv) Benchmark Data and Evaluation Practices, (v) Applications of Higher-Order Graph Models, and (vi) Societal Impact, Robustness, and Fairness. In the remaining time of the seminar, participants worked on those issues in the groups. This report includes summaries of the opening statements, the results of the working groups, and a summary of a panel discussion taking place on the evening of day two.
- Federico Battiston, Giulia Cencetti, Iacopo Iacopini, Vito Latora, Maxime Lucas, Alice Patania, Jean-Gabriel Young, and Giovanni Petri. Networks beyond pairwise interactions: structure and dynamics. Physics Reports, 2020.
- Austin R. Benson. Tools for higher-order network analysis. CoRR, abs/1802.06820, 2018.
- Austin R. Benson, Rediet Abebe, Michael T. Schaub, Ali Jadbabaie, and Jon Kleinberg. Simplicial closure and higher-order link prediction. Proceedings of the National Academy of Sciences, 115(48):E11221–E11230, 2018.
- Austin R. Benson, David F. Gleich, and Jure Leskovec. Tensor spectral clustering for partitioning higher-order network structures. In Proceedings of the 2015 SIAM International Conference on Data Mining, pages 118–126, 2015.
- Austin R Benson, David F Gleich, and Jure Leskovec. Higher-order organization of complex networks. Science, 353(6295):163–166, 2016.
- Austin R. Benson, David F. Gleich, and Lek-Heng Lim. The spacey random walk: A stochastic process for higher-order data. SIAM Review, 59(2):321–345, 2017.
- Austin R Benson, Ravi Kumar, and Andrew Tomkins. Sequences of sets. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pages 1148–1157, 2018.
- Matthias Bolten, Stephanie Friedhoff, Andreas Frommer, Matthias Heming, and Karsten Kahl. Algebraic multigrid methods for laplacians of graphs. Linear Algebra and its Applications, 434(11):2225 – 2243, 2011. Special Issue: Devoted to the 2nd NASC 08 Conference in Nanjing (NSC).
- Daniel Edler, Ludvig Bohlin, et al. Mapping higher-order network flows in memory and multilayer networks with infomap. Algorithms, 10(4):112, 2017.
- Ernesto Estrada and Grant J. Ross. Centralities in simplicial complexes. applications to protein interaction networks. Journal of Theoretical Biology, 438:46 – 60, 2018.
- Gourab Ghoshal, Vinko Zlatić, Guido Caldarelli, and Mark EJ Newman. Random hypergraphs and their applications. Physical Review E, 79(6):066118, 2009.
- Iacopo Iacopini, Giovanni Petri, Alain Barrat, and Vito Latora. Simplicial models of social contagion. Nature communications, 10(1):1–9, 2019.
- Mikko Kivelä, Alex Arenas, Marc Barthelemy, James P. Gleeson, Yamir Moreno, and Mason A. Porter. Multilayer networks. Journal of Complex Networks, 2(3):203–271, 07 2014.
- Renaud Lambiotte, Martin Rosvall, Michael Schaub, Ingo Scholtes, and Jian Xu. Beyond graph mining: Higher-order data analytics for temporal network data. In Hands-On Tutorial at the 24th ACM SIGKDD International Conference on Knowledge Discovery, volume 38, 2018.
- Renaud Lambiotte, Martin Rosvall, and Ingo Scholtes. From networks to optimal higherorder models of complex systems. In Nature physics, Vol. 15, p. 313-320, March 25 2019
- Timothy LaRock, Vahan Nanumyan, Ingo Scholtes, Giona Casiraghi, Tina Eliassi-Rad, and Frank Schweitzer. HYPA: Efficient detection of path anomalies in time series data on networks. In Proceedings of the 2020 SIAM International Conference on Data Mining, pages 460–468. SIAM, 2020.
- Vito Latora, Vincenzo Nicosia, and Giovanni Russo. Complex Networks: Principles, Methods and Applications. Cambridge University Press, 2017.
- Naoki Masuda, Mason A. Porter, and Renaud Lambiotte. Random walks and diffusion on networks. Physics Reports, 716-717:1 – 58, 2017. Random walks and diffusion on networks.
- Huda Nassar, Caitlin Kennedy, Shweta Jain, Austin R. Benson, and David F. Gleich. Using cliques with higher-order spectral embeddings improves graph visualizations. In Proceedings of the 2020 Web Conference (WWW), pages 2927–2933, 2020.
- Tiago P Peixoto and Martin Rosvall. Modelling sequences and temporal networks with dynamic community structures. Nature communications, 8(1):1–12, 2017.
- Vincenzo Perri and Ingo Scholtes. Hotvis: Higher-order time-aware visualisation of dynamic graphs. In Proceedings of the 28th International Symposium on Graph Drawing and Network Visualization (Graph Drawing 2020), Vancouver, BC, Canada (to appear), 2020.
- Luka V. Petrovic and Ingo Scholtes. PaCo: Fast Counting of Causal Paths in Temporal Network Data. In Proceedings of the 11th Temporal Web Analytics Workshop (TempWeb 2021) held in conjunction with The Web Conference 2021, Ljubljana, Slovenia, April 2021
- Luka V. Petrovic and Ingo Scholtes. Learning the markov order of paths in a network, 2020. arXiv 2007.02861.
- Martin Rosvall, Alcides V Esquivel, Andrea Lancichinetti, Jevin D West, and Renaud Lambiotte. Memory in network flows and its effects on spreading dynamics and community detection. Nature communications, 5:4630, 2014.
- Mandana Saebi, Giovanni Luca Ciampaglia, Lance M Kaplan, and Nitesh V Chawla. Honem: Network embedding using higher-order patterns in sequential data. arXiv preprint arXiv:1908.05387, 2019.
- Mandana Saebi, Jian Xu, Lance M. Kaplan, Bruno Ribeiro, and Nitesh V. Chawla. Efficient modeling of higher-order dependencies in networks: from algorithm to application for anomaly detection. EPJ Data Sci., 9(1):15, 2020.
- Ingo Scholtes. When is a network a network? multi-order graphical model selection in pathways and temporal networks. In Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD ’17, page 1037–1046, New York, NY, USA, 2017. Association for Computing Machinery.
- Ingo Scholtes, Nicolas Wider, René Pfitzner, Antonios Garas, Claudio Tessone, and Frank Schweitzer. Causality-driven slow-down and speed-up of diffusion in non-markovian temporal networks. Nature communications, 5:5024, 2014.
- Yingxia Shao, Shiyue Huang, Xupeng Miao, Bin Cui, and Lei Chen. Memory-aware framework for efficient second-order random walk on large graphs. In Proceedings of the 2020 ACM SIGMOD International Conference on Management of Data, page 1797–1812, 2020.
- Leo Torres, Ann S Blevins, Danielle S Bassett, and Tina Eliassi-Rad. The why, how, and when of representations for complex systems. arXiv preprint arXiv:2006.02870, 2020.
- Francesco Tudisco, Austin R. Benson, and Konstantin Prokopchik. Nonlinear higher-order label spreading. CoRR, abs/2006.04762, 2020.
- Tao Wu, Austin R. Benson, and David F. Gleich. General tensor spectral co-clustering for higher-order data. In Advances in Neural Information Processing Systems 29, pages 2559–2567, 2016.
- Jian Xu, Thanuka L Wickramarathne, and Nitesh V Chawla. Representing higher-order dependencies in networks. Science advances, 2(5):e1600028, 2016.
- Hao Yin, Austin R. Benson, Jure Leskovec, and David F. Gleich. Local higher-order graph clustering. In Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 555–564, 2017.
- Yan Zhang, Antonios Garas, and Ingo Scholtes. Higher-order models capture changes in controllability of temporal networks. In Journal of Physics: Complexity, Vol. 2, No. 1, January 29, 2021
- Unai Alvarez-Rodriguez (Universität Zürich, CH) [dblp]
- Luca Gallo (University of Catania, IT) [dblp]
- Christoph Gote (ETH Zürich, CH) [dblp]
- Jürgen Hackl (University of Liverpool, GB) [dblp]
- Andreas Hotho (Universität Würzburg, DE) [dblp]
- Kang-Ju Lee (Seoul National University, KR) [dblp]
- Leonie Neuhäuser (RWTH Aachen, DE) [dblp]
- Vincenzo Perri (Universität Zürich, CH) [dblp]
- Giovanni Petri (ISI Foundation - Torino, IT) [dblp]
- Luka Petrovic (Universität Zürich, CH) [dblp]
- Martin Rosvall (University of Umeå, SE) [dblp]
- Michael Schaub (RWTH Aachen, DE) [dblp]
- Maximilian Schich (Tallinn University, EE) [dblp]
- Ingo Scholtes (Universität Würzburg, DE & Universität Zürich, CH) [dblp]
- Frank Schweitzer (ETH Zürich, CH) [dblp]
- Markus Strohmaier (RWTH Aachen, DE) [dblp]
- Federico Battiston (Central European University - Vienna, AT) [dblp]
- Ginestra Bianconi (Queen Mary University of London, GB) [dblp]
- Rebekka Burkholz (Harvard School of Public Health - Boston, US) [dblp]
- Giulia Cencetti (Bruno Kessler Foundation - Trento, IT) [dblp]
- Gabriele DiBona (Queen Mary University of London, GB) [dblp]
- Daniel Edler (University of Umeå, SE) [dblp]
- Tina Eliassi-Rad (Northeastern University - Boston, US) [dblp]
- Anton Eriksson (University of Umeå, SE)
- David F. Gleich (Purdue University - West Lafayette, US) [dblp]
- Stephan Günnemann (TU München, DE) [dblp]
- Heather Harrington (University of Oxford, GB) [dblp]
- Desmond J. Higham (University of Edinburgh, GB) [dblp]
- Fariba Karimi (Complexity Science Hub - Wien, AT) [dblp]
- Danai Koutra (University of Michigan - Ann Arbor, US) [dblp]
- Renaud Lambiotte (University of Oxford, GB) [dblp]
- Timothy LaRock (Northeastern University - Boston, US) [dblp]
- Vito Latora (Queen Mary University of London, GB) [dblp]
- Yamir Moreno (University of Zaragoza, ES) [dblp]
- Natasa Przulj (Barcelona Supercomputing Center, ES) [dblp]
- Lisi Qarkaxhija (Universität Würzburg, DE)
- Alice Schwarze (University of Washington - Seattle, US) [dblp]
- Jelena Smiljanic (University of Umeå, SE) [dblp]
- Michele Starnini (ISI Foundation, IT) [dblp]
- Chester Tan (National University of Singapore, SG)
- Leo Torres (Northeastern University - Boston, US) [dblp]
- Anatol Wegner (University College London, GB) [dblp]
- Data Structures and Algorithms
- Machine Learning
- Social and Information Networks
- topological data analysis
- (social) network analysis
- graph theory
- statistical relational learning
- graph mining