https://www.dagstuhl.de/19212

### 19. – 24. Mai 2019, Dagstuhl-Seminar 19212

# Topology, Computation and Data Analysis

## Organisatoren

Michael Kerber (TU Graz, AT)

Vijay Natarajan (Indian Institute of Science – Bangalore, IN)

Bei Wang (University of Utah – Salt Lake City, US)

## Auskunft zu diesem Dagstuhl-Seminar erteilt

## Dokumente

Dagstuhl Report, Volume 9, Issue 5

Motivationstext

Teilnehmerliste

Dagstuhl's Impact: Dokumente verfügbar

## Summary

The Dagstuhl Seminar titled "Topology, Computation, and Data Analysis" brought together researchers in mathematics, computer science, and visualization to engage in active discussions on theoretical, computational, practical, and application aspects of topology for data analysis. The seminar has led to stronger ties between the computational topology and TopoInVis (topology based visualization) communities and identification of research challenges and open problems that can be addressed together.

### Context

Topology is the study of connectivity of space that abstracts away geometry and provides succinct representations of the space and functions defined on it. Topology-based methods for data analysis have received considerable attention in the recent years given its promise to handle large and feature-rich data that are becoming increasingly common. Computing topological properties in the data domain and/or range is a step in the direction of more abstract, higher-level data analysis and visualization. Such an approach has become more important in the context of automatic and semi-automatic data exploration, analysis, and understanding. The primary attraction for topology-based methods is the ability to generate "summary" qualitative views of large data sets. Such views often require fewer geometrical primitives to be extracted, stored, and to be visualized as compared to views obtained directly from the raw data. Two communities, computational topology and TopoInVis (topology based visualization), have made significant progress during the past two decades on developing topological abstractions and applying them to data analysis. In addition, there are multiple other research programs (relatively fewer in number) on this topic within the statistics and machine learning fields, and within a few application domains. Computational topology grew from within computational geometry and algebraic topology and studies algorithmic questions on topological structures. The focus of topological data analysis and TopoInVis is data - algorithms, methods, and systems for improved and intuitive understanding of data via application of topological structures. Researchers in computational topology typically have a math or theoretical computer science background whereas TopoInVis researchers have a computational, computer engineering, or applied background. There is very little communication between the two communities due to the different origins and the fact that there are no common conferences or symposia where both communities participate.

### Goals

The Dagstuhl seminar 17292 (July 2017) successfully brought together researchers with mixed background to talk about problems of mutual interest. Following this seminar, the benefits of the inter-community ties was well appreciated, at least by the attendees of the seminar. The goal of the current seminar was to strengthen existing ties, establish new ones, identify challenges that requires the two communities to work together, and establish mechanisms for increased communication and transfer of results from one to the other. During the previous Dagstuhl seminar, we also noticed significant interaction between researchers within the individual communities, with say theoretical and applied backgrounds. We wanted to continue to encourage such interaction.

### Topics

We chose four current and emerging topics that will benefit from an inter-community discussion. Topics are common to both communities, with different aspects studied within an individual community.

**Reeb graphs, Reeb Spaces, and Mappers.** The Reeb graph, its loop-free version called the contour tree, and the higher-dimensional generalization called the Reeb space are topological structures that capture the connectivity of level sets of univariate or multivariate functions. They are independently well studied within the computational topology and TopoInVis communities. Recent developments define stable distance measures between Reeb graphs, inspired by analogous distance measures in persistent homology. Barring a few exceptions, the theoretical results have no practical realizations. On the practical side, effective visual exploration and visual analysis methods based on Reeb graphs and spaces have been developed for a wide variety of domains including combustion studies, climate science, astronomy, and molecular modeling. These applications often utilize only a simplified version of the topological structure. One such simplification, the mapper algorithm, consists of a discretized version of Reeb graphs and has shown an immense industrial potential. Very recently, the theoretical aspects of the mapper algorithm and its generalizations has moved in the focus of research. Exchange of ideas and results between the two communities will help advancing this progress further.

**Topological analysis and visualization of multivariate data.** Multivariate datasets arise in many scientific applications. Consider, for example, combustion or climate simulations where multiple physical measurements (say, temperature and pressure) or concentrations of chemical species are computed simultaneously. We model these variables mathematically as multiple continuous, real-valued functions. We are interested in understanding the relationships between these functions, and more generally, in developing efficient and effective tools for their analysis and visualization. Unlike for real-valued functions, very few tools exist for studying multivariate data topologically. Besides the aforementioned Reeb spaces and mappers, notable examples of these tools are the Jacobi sets, Pareto sets, and Joint Contour Nets. Understanding the theoretical properties of these tools and adapting them in analysis and visualization remains a very active research area. In addition, combining these topological tools with multivariate statistical analysis would be of interest. On the other hand, research towards multidimensional persistence would help advance multivariate data analysis both mathematically and computationally. We plan to expand our discussion on multidimensional persistent homology that include topics such as identifying meaningful and computable topological invariants; discussing computability and applicability in the multidimensional setting, comparison of multidimensional data, kernel methods for multidimensional persistence, and adapting multidimensional persistence in visualization.

**New opportunities for vector field topology.** Vector field topology for visualization pioneered by Helman and Hesselink has inspired much research in topological analysis and visualization of vector fields. A large body of work for time-independent vector field deals with fixed (critical) points, invariant sets, separatrices, periodic orbits, saddle connectors and Morse decomposition as well as vector field simplification that reduces its complexity. Research for time-dependent vector field is concerned with critical point tracking, Finite Time Lyapunov Exponents (FTLE), Lagrangian coherent structure (LCS), streak line topology, as well as unsteady vector field topology. For this workshop, we ask the following questions: can advancements in computational topology help bring new opportunities for the study of vector field topology? In particular, can they help developing novel, scalable and mathematically rigorous ways to rethink vector field data? An example is the topological notion of robustness, a cousin of persistence, introduced via the well diagram and well group theory. Robustness has been shown to be very useful in quantifying feature stability for steady and unsteady vector fields.

**Software tools and libraries.** How do we make topological data analysis applicable to large datasets? A natural first step is algorithm and software engineering. This refers to developing the best algorithms for a particular problem and to optimize the implementation of these algorithms. The state of affairs within the communities is quite diverse: while scalable algorithms are available for some problems(e.g., computation of Reeb graphs or persistence diagrams in low dimensions), current developments make significant progress on other fronts, for example the computation of approximate persistence diagrams of Vietoris-Rips complexes. On the other extreme, the theory of multi-dimensional persistence is just beginning to be supported by algorithmic contributions. Besides these efforts, parallelizable and distributed algorithms play an important role towards practicality. One further important aspect of software design is interface design, that is, to make those implementations available to non-experts. While this final development step is usually rather neglected in theoretical research, there have been efforts in both communities towards generally applicable and easy-to-use software. Software contributors of both communities will profit from exchanging ideas and experiences.

### Participants, Schedule, and Organization

The invitees were identified according to the focus topics of the seminar while ensuring diversity in terms of gender, country / region of workplace, and experience. The aim was to bring together sufficient number of experts interested in each topic and representing the two communities to facilitate an engaging discussion.

We planned for different talk types, longer overviews and shorted contributed research talks, and breakout sessions. We scheduled six overview talks on the first day. These overview talks were aligned with the four topics of the seminar, planned to be accessible to members of both communities, and set the stage for the discussions and shorter research talks on the following days. The speakers Ulrich Bauer (Reeb graphs), Christoph Garth (topology based methods in visualization), Gunther Weber (topological analysis for exascale), Michael Lesnick (computational aspects of 2-parameter persistence) Claudia Landi (multi-parameter persistence), and Vanessa Robins (discrete Morse theory and image analysis) gave a gentle introduction to the area followed by a state-of-the-art report and discussion on open problems.

Participants gave short research talks (16 total) during Tuesday-Friday with a focus on challenges and opportunities. These talks were organized during the morning sessions.

We scheduled breakout sessions on the afternoons of Tuesday and Thursday. On Tuesday, we solicited discussion topics and identified three topics to be of interest - *multivariate data, reconstruction,* and *tensor field topology*. Participants chose to join a group based on their interest. All groups contained participants from both communities. We formed two discussion groups with inputs from experts on multi-parameter persistence who were part of a different group on Tuesday. The second breakout session was on *Multi-parameter persistence computation*, where they discussed and analyzed a recently proposed algorithm. All groups presented a summary of their discussion and plans during a plenary session at the end of the day.

Many participants joined an organized excursion to Bernkastel-Kues on Wednesday afternoon. On Friday morning, we scheduled a discussion and brainstorming session to close the seminar and and to plan for future events.

### Results and Reflection

Participants unanimously agreed that the seminar was successful in enabling cross-fertilization and identifying important challenging problems that require both communities to work together. The breakout sessions were instrumental in identifying some of the challenges and topics for further collaboration. At least two such challenges (together with motivating applications) were identified, possibly leading to collaboratory efforts.

The breakout sessions were planned for the entire afternoon after lunch. The longer duration allowed for in-depth and technical discussions that stimulates further work after the seminar. Based on feedback during informal discussions and the brainstorming session on Friday, we expect multiple working groups will be formed to write expository articles and survey articles. Members of the two communities have also shown enthusiasm to participate in workshops and conferences of each other. In conclusion, we believe that the seminar has achieved the goal of bringing together the two communities and charting a path for tackling bigger challenges in the area of topological data analysis.

**Summary text license**

Creative Commons BY 3.0 Unported license

Michael Kerber, Vijay Natarajan, and Bei Wang

## Dagstuhl-Seminar Series

- 23192: "Topological Data Analysis and Applications" (2023)
- 17292: "Topology, Computation and Data Analysis" (2017)

## Classification

- Data Structures / Algorithms / Complexity

## Keywords

- Computational topology
- Topological data analysis