https://www.dagstuhl.de/21352

### August 29 – September 1 , 2021, Dagstuhl Seminar 21352

# Higher-Order Graph Models: From Theoretical Foundations to Machine Learning

## Organizers

Tina Eliassi-Rad (Northeastern University – Boston, US)

Vito Latora (Queen Mary University of London, GB)

Martin Rosvall (University of Umeå, SE)

Ingo Scholtes (Universität Würzburg, DE & Universität Zürich, CH)

## For support, please contact

## Documents

Dagstuhl Report, Volume 11, Issue 7

Aims & Scope

List of Participants

Dagstuhl Seminar Schedule [pdf]

## Summary

The network science and graph mining community has created a rich portfolio of data analysis and visualisation techniques that have become a cornerstone for knowledge extraction from relational data on complex systems. Most of those techniques build on simple graph abstractions, where nodes represent a system’s elements, and links represent *dyadic interactions, relations, or dependencies* between those elements. This mathematical formalism has proven useful for reasoning, e.g., about the centrality of nodes, the evolution and control of dynamical processes, and the community or cluster structure in complex systems, given that we have access to *relational data* [17]. However, the graph abstractions used in those methods typically do not account for **higher-order relations** between nodes that are present in many real complex systems. Important examples for such data include:

- relational data that is inherently non-dyadic, such as (unordered) sets of authors co-authoring scientific articles, protein triplets in a cell that simultaneously interact with each other, or actors in social systems engaging in group collaborations,
- time-stamped data on social networks with chronologically ordered sequences of (dyadic) interactions, where specific sequences of nodes interact via
*causal paths* - sequential data on networked systems, such as user click streams, mobility trajectories, financial transaction sequences, citation paths, or directed acyclic graphs that give rise to a chronologically or topologically ordered sequences of nodes traversed by processes
- data on networked systems with multiple types or layers of links that cannot be reduced to a simple graph model

Over the past years, researchers have shown that the presence of such higher-order interactions can fundamentally alter our understanding of complex systems. They can change our notion of the importance of nodes captured by centrality measures, affect the detection of cluster and community structures in graphs, and influence dynamical processes like diffusion or epidemic spreading, as well as associated control strategies in non-trivial ways [24, 28, 33, 4, 5, 10, 35]. To further develop graph-based representations of data
and broaden their potential application in pattern recognition, data analysis, and machine learning, over the past few years researchers have developed a rich portfolio of **higher-order network models and representations** that capture more than just dyadic dependencies in complex systems. The organisers of this seminar have recently summarised current research and open challenges in this area in three independent overview and perspective articles [1, 30, 15]. An incomplete list of approaches explored over the past few years include:

- hypergraphs, where each
*hyperedge*can connect an arbitrary number of nodes [11] - simplicial network models, where
*simplices*represent d-dimensional group interactions [12, 10] *d*-dimensional De Bruijn graphs, where edges capture*ordered*sets of*d*dyadic interactions [28, 16]- memory networks, where memory nodes capture Non-Markovian properties in time series data [24]
- higher-, variable-, and multi-order Markov models for temporal networks [33, 20, 4]
- multi-layer and multiplex networks with multiple types of links between nodes [13]
- applications of categorical sequence mining techniques to model patterns in sequences of node sets [7]

**Figure 1**Figure 1 Illustration of standard graph model (left) and four modelling approaches capturing different types of higher-order interactions proposed in topological data analysis, network science, and computer science. Figure adapted from [15].

In Figure 1, we illustrate some of the higher-order graph models listed above. All these modelling approaches address the same fundamental limitation of graph models when studying complex systems: **we cannot understand a system’s structure and dynamics by decomposing direct and indirect interactions between elements into a set of dyadic relations with a single type.** However, the similarities and differences between these different approaches are still not fully understood.

At a critical time for the community, this Dagstuhl seminar intended to improve our understanding of the strengths, weaknesses, commonalities, and differences of these different approaches along with their resulting computational and epistemological challenges. The seminar aimed to create a common foundation for developing graph mining and machine learning techniques that use recent advances in the study of higher-order graph models by gathering key researchers from different communities, including machine learning, information retrieval and data mining, complex systems theory, theoretical physics, network science, computational social science, and mathematics. The participants included senior and junior researchers focusing on four related and intersecting topics: (i) Topological and Graph-Theoretic Foundations, (ii) Higher-Order Models for Dynamical Processes, (iii) Higher-Order Pattern Recognition and Machine Learning, (iv) Computational Aspects in Higher-Order Graph Analysis and Graph Mining.

The organisers used the four topics to structure the seminar program and derive the participants’ initial assignment to possible working groups. After an initial round of brief opening statements, participants introduced themselves and stated their specific interests for the seminar during five-minute lightning talks. During a match-making session taking place in the afternoon of day one, all interests expressed by the participants were consolidated into a set of working groups, addressing the following six areas: (i) Visualisation and Interpretability of Higher-Order Graph Models, (ii) Learning and Model Selection, (iii) Unification of Different Higher-Order Modelling Frameworks, (iv) Benchmark Data and Evaluation Practices, (v) Applications of Higher-Order Graph Models, and (vi) Societal Impact, Robustness, and Fairness. In the remaining time of the seminar, participants worked on those issues in the groups. This report includes summaries of the opening statements, the results of the working groups, and a summary of a panel discussion taking place on the evening of day two.

### References

- Federico Battiston, Giulia Cencetti, Iacopo Iacopini, Vito Latora, Maxime Lucas, Alice Patania, Jean-Gabriel Young, and Giovanni Petri. Networks beyond pairwise interactions: structure and dynamics. Physics Reports, 2020.
- Austin R. Benson. Tools for higher-order network analysis. CoRR, abs/1802.06820, 2018.
- Austin R. Benson, Rediet Abebe, Michael T. Schaub, Ali Jadbabaie, and Jon Kleinberg. Simplicial closure and higher-order link prediction. Proceedings of the National Academy of Sciences, 115(48):E11221–E11230, 2018.
- Austin R. Benson, David F. Gleich, and Jure Leskovec. Tensor spectral clustering for partitioning higher-order network structures. In Proceedings of the 2015 SIAM International Conference on Data Mining, pages 118–126, 2015.
- Austin R Benson, David F Gleich, and Jure Leskovec. Higher-order organization of complex networks. Science, 353(6295):163–166, 2016.
- Austin R. Benson, David F. Gleich, and Lek-Heng Lim. The spacey random walk: A stochastic process for higher-order data. SIAM Review, 59(2):321–345, 2017.
- Austin R Benson, Ravi Kumar, and Andrew Tomkins. Sequences of sets. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pages 1148–1157, 2018.
- Matthias Bolten, Stephanie Friedhoff, Andreas Frommer, Matthias Heming, and Karsten Kahl. Algebraic multigrid methods for laplacians of graphs. Linear Algebra and its Applications, 434(11):2225 – 2243, 2011. Special Issue: Devoted to the 2nd NASC 08 Conference in Nanjing (NSC).
- Daniel Edler, Ludvig Bohlin, et al. Mapping higher-order network flows in memory and multilayer networks with infomap. Algorithms, 10(4):112, 2017.
- Ernesto Estrada and Grant J. Ross. Centralities in simplicial complexes. applications to protein interaction networks. Journal of Theoretical Biology, 438:46 – 60, 2018.
- Gourab Ghoshal, Vinko Zlatić, Guido Caldarelli, and Mark EJ Newman. Random hypergraphs and their applications. Physical Review E, 79(6):066118, 2009.
- Iacopo Iacopini, Giovanni Petri, Alain Barrat, and Vito Latora. Simplicial models of social contagion. Nature communications, 10(1):1–9, 2019.
- Mikko Kivelä, Alex Arenas, Marc Barthelemy, James P. Gleeson, Yamir Moreno, and Mason A. Porter. Multilayer networks. Journal of Complex Networks, 2(3):203–271, 07 2014.
- Renaud Lambiotte, Martin Rosvall, Michael Schaub, Ingo Scholtes, and Jian Xu. Beyond graph mining: Higher-order data analytics for temporal network data. In Hands-On Tutorial at the 24th ACM SIGKDD International Conference on Knowledge Discovery, volume 38, 2018.
- Renaud Lambiotte, Martin Rosvall, and Ingo Scholtes. From networks to optimal higherorder models of complex systems. In Nature physics, Vol. 15, p. 313-320, March 25 2019
- Timothy LaRock, Vahan Nanumyan, Ingo Scholtes, Giona Casiraghi, Tina Eliassi-Rad, and Frank Schweitzer. HYPA: Efficient detection of path anomalies in time series data on networks. In Proceedings of the 2020 SIAM International Conference on Data Mining, pages 460–468. SIAM, 2020.
- Vito Latora, Vincenzo Nicosia, and Giovanni Russo. Complex Networks: Principles, Methods and Applications. Cambridge University Press, 2017.
- Naoki Masuda, Mason A. Porter, and Renaud Lambiotte. Random walks and diffusion on networks. Physics Reports, 716-717:1 – 58, 2017. Random walks and diffusion on networks.
- Huda Nassar, Caitlin Kennedy, Shweta Jain, Austin R. Benson, and David F. Gleich. Using cliques with higher-order spectral embeddings improves graph visualizations. In Proceedings of the 2020 Web Conference (WWW), pages 2927–2933, 2020.
- Tiago P Peixoto and Martin Rosvall. Modelling sequences and temporal networks with dynamic community structures. Nature communications, 8(1):1–12, 2017.
- Vincenzo Perri and Ingo Scholtes. Hotvis: Higher-order time-aware visualisation of dynamic graphs. In Proceedings of the 28th International Symposium on Graph Drawing and Network Visualization (Graph Drawing 2020), Vancouver, BC, Canada (to appear), 2020.
- Luka V. Petrovic and Ingo Scholtes. PaCo: Fast Counting of Causal Paths in Temporal Network Data. In Proceedings of the 11th Temporal Web Analytics Workshop (TempWeb 2021) held in conjunction with The Web Conference 2021, Ljubljana, Slovenia, April 2021
- Luka V. Petrovic and Ingo Scholtes. Learning the markov order of paths in a network, 2020. arXiv 2007.02861.
- Martin Rosvall, Alcides V Esquivel, Andrea Lancichinetti, Jevin D West, and Renaud Lambiotte. Memory in network flows and its effects on spreading dynamics and community detection. Nature communications, 5:4630, 2014.
- Mandana Saebi, Giovanni Luca Ciampaglia, Lance M Kaplan, and Nitesh V Chawla. Honem: Network embedding using higher-order patterns in sequential data. arXiv preprint arXiv:1908.05387, 2019.
- Mandana Saebi, Jian Xu, Lance M. Kaplan, Bruno Ribeiro, and Nitesh V. Chawla. Efficient modeling of higher-order dependencies in networks: from algorithm to application for anomaly detection. EPJ Data Sci., 9(1):15, 2020.
- Ingo Scholtes. When is a network a network? multi-order graphical model selection in pathways and temporal networks. In Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD ’17, page 1037–1046, New York, NY, USA, 2017. Association for Computing Machinery.
- Ingo Scholtes, Nicolas Wider, René Pfitzner, Antonios Garas, Claudio Tessone, and Frank Schweitzer. Causality-driven slow-down and speed-up of diffusion in non-markovian temporal networks. Nature communications, 5:5024, 2014.
- Yingxia Shao, Shiyue Huang, Xupeng Miao, Bin Cui, and Lei Chen. Memory-aware framework for efficient second-order random walk on large graphs. In Proceedings of the 2020 ACM SIGMOD International Conference on Management of Data, page 1797–1812, 2020.
- Leo Torres, Ann S Blevins, Danielle S Bassett, and Tina Eliassi-Rad. The why, how, and when of representations for complex systems. arXiv preprint arXiv:2006.02870, 2020.
- Francesco Tudisco, Austin R. Benson, and Konstantin Prokopchik. Nonlinear higher-order label spreading. CoRR, abs/2006.04762, 2020.
- Tao Wu, Austin R. Benson, and David F. Gleich. General tensor spectral co-clustering for higher-order data. In Advances in Neural Information Processing Systems 29, pages 2559–2567, 2016.
- Jian Xu, Thanuka L Wickramarathne, and Nitesh V Chawla. Representing higher-order dependencies in networks. Science advances, 2(5):e1600028, 2016.
- Hao Yin, Austin R. Benson, Jure Leskovec, and David F. Gleich. Local higher-order graph clustering. In Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 555–564, 2017.
- Yan Zhang, Antonios Garas, and Ingo Scholtes. Higher-order models capture changes in controllability of temporal networks. In Journal of Physics: Complexity, Vol. 2, No. 1, January 29, 2021

**Summary text license**

Creative Commons BY 4.0

Ingo Scholtes, Tina Eliassi-Rad, Vito Latora, and Martin Rosvall

## Classification

- Data Structures And Algorithms
- Machine Learning
- Social And Information Networks

## Keywords

- Topological data analysis
- (social) network analysis
- Graph theory
- Statistical relational learning
- Graph mining