https://www.dagstuhl.de/17292

### July 16 – 21 , 2017, Dagstuhl Seminar 17292

# Topology, Computation and Data Analysis

## Organizers

Hamish Carr (University of Leeds, GB)

Michael Kerber (TU Graz, AT)

Bei Wang (University of Utah – Salt Lake City, US)

## For support, please contact

## Documents

Dagstuhl Report, Volume 7, Issue 7

Aims & Scope

List of Participants

Shared Documents

Dagstuhl's Impact: Documents available

Dagstuhl Seminar Wiki

Dagstuhl Seminar Schedule [pdf]

(Use seminar number and access code to log in)

## Summary

The Dagstuhl Seminar titled *Topology, Computation and Data Analysis*
has brought together researchers with mathematical and computational backgrounds in addressing emerging directions within computational topology for data analysis in practice. The seminar has contributed to the convergence between mathematical and computational thinking, in the development of mathematically rigorous theories and data-driven scalable algorithms.

### Context

In the last two decades, considerable effort has been made in a number of research communities into computational applications of topology. Inherently, topology abstracts functions and graphs into simpler forms, and this has an obvious attraction for data analysis. This attraction is redoubled in the era of extreme data, in which humans increasingly rely on tools that extract mathematically well-founded abstractions that the human can examine and reason about. In effect, topology is applied as a form of data compression or reduction: topology is one of the most powerful forms of mathematical compression that we know how to apply to data.

Efforts to apply topology computationally to data, however, have largely been fragmented so far, with work progressing in a number of communities, principally computational topology, topological data analysis, and topological visualization. Of these, computational topology expands from computational geometry and algebraic topology to seek algorithmic approaches to topological problems, while topological data analysis and topological visualization seeks to apply topology to data analysis, of graphs and networks in the first case and of (usually) simulated volumetric data in the second. The research in these communities can roughly be clustered into theory (what are the underlying mathematical concepts), applications (how are they used for data analysis), and computation (how to compute abstractions for real datasets). It is crucial to advances in this area that these three branches go hand-in-hand, and communication between theoretical, applied, and computational researchers are therefore indispensable. On the other hand, there has been surprisingly little communication between the computational topology and topological visualization communities, mostly caused by the fact that each community has its own set of regular venues. As a consequence, the linkages in the two communities have been independent of each other, and results can take years to migrate from one community to the other.

### Vision

Our goal was therefore to soften the aforementioned rather strict separation between computational topology and topological visualization by establishing new inter-community ties. The seminar aimed to bring together cross sections of both communities, including researchers with theoretical, applied, and computational backgrounds. By reducing redundancy and accelerating cross-communication, we expected a significant boost to both areas, perhaps even leading to a singular more dynamic community. As a side effect, we also wanted to provide a communication platform within each community between theory and application.

### Topics

We identified specific research topics reflecting emerging trends in both communities. These topics were chosen to span the spectrum from the theoretical (category theory), to applicable theory (multidimensional persistent homology), and from applied theory (singularity theory and fiber topology) to the computational (scalable topological computation, applications) aspect.

**Category theory: theory and applications.** Category theory has recently gained momentum in computational topology, in particular through sheaves and cosheaves, which are extremely useful as an alternative foundation for level set persistence. Recent work has shown that the data of a Reeb graph can be stored in a category-theoretic object called a cosheaf, and this opens the way to define a metric for Reeb graphs known as the interleaving distance. Sheaves can also be used in deriving theoretical understandings between the Reeb space and its discrete approximations. Research into sheaves and their relationship with computation is, however, in its infancy, and would benefit from pooling the resources of experts in category theory and topological data analysis, to address questions such as how to simplify theories in computational topology, how to reinterpret persistent homology, or how to compare topological structures.

**Multidimensional persistent homology.** The second area of active research, both mathematically and computationally, is the extension of unidimensional persistence to multidimensional persistence.
Mathematically, the lack of a complete discrete invariant for the multidimensional case raises the theoretical question of identifying meaningful topological invariants to compute. Some earlier proposals have been complemented by recent approaches and raise the immediate question of computability and applicability. Besides the invariants themselves, other questions such as the comparison of multidimensional data, or the efficient generation of cell complexes suitable for the multidimensional case are crucial, but hardly studied questions in this context. Computationally, existing algorithms for topological constructs rely on filtrations to encapsulate a sweep order through the data, thus serializing the problem for algorithmic implementation. For multidimensional data, this serialization is hard to achieve, and progress in this area is, therefore, crucial for computational advances in the topological analysis of data.

**Singularity theory and fiber topology in multivariate data analysis.** Singularity theory and fiber topology both seek to extend Morse theory from scalar fields to multivariate data described as functions mapping f: X to R^d. Since multivariate datasets are near-ubiquitous in scientific applications such as oceanography, astrophysics chemistry, meteorology, nuclear engineering and molecular dynamics, advances here are also crucial for topological data analysis and visualization. Methods from computational topology have been developed to support the analysis of scalar field data with widespread applicability. However, very few tools exist for studying multivariate data topologically: the most notable examples of these tools are the Jacobi set, the Reeb space, and its recent computational approximation, the Joint Contour Net. Here, we aim to bring together researchers in singularity theory, fiber topology and topological data analysis to develop new theory and algorithms driving a new generation of analytic tools.

**Scalable computation.** At the opposite pole from theory is the practical question: how do we apply topological analysis to ever-larger data sets? This question spans questions of algorithmic performance to the accuracy of representation: using the metaphor of compression, do we want lossy or lossless compression, how fast can we perform it, and what do we lose in the process? Moreover, the largest data sets are necessarily computed and stored on clusters, and scalability of topological computation therefore also depends on building distributed and parallel algorithms. For example, the standard algorithm for computing persistent homology is cubic in the number of simplices, but can be speeded up in theory and practice, and further improved by parallel computation. However, many challenges remain, including efficient generation, storage and management of simplicial complexes, streaming computation, I/O efficient computation, approximate computation, and non-simplicial complexes. Some of these approaches have already been applied in topological visualization, and cross-fertilization between the two communities is therefore of great interest.

### Participants, Schedule, and Organization

The invitees were chosen according to the topics, bringing together enough expertise for each topic and resulting in a representative subset of both communities. Out of the 37 invited researchers in the first round, 28 accepted our invitation, pointing out the general interest for the seminar topic in both communities.

We decided for a mixed setup with introductory talks, contributed research talks and breakout sessions.

For the first day, we scheduled two overview talks per listed topic, which were delivered by Steve Oudot and Elizabeth Munch (Category theory), Michael Lesnick and Claudia Landi (Multidimensional persistent homology), Osamu Saeki and Julien Tierny (Singularity theory), and Yusu Wang and Valerio Pascucci (Scalable Computation). Further contributed talks by participants took place from Tuesday to Friday morning, resulting in a total of 19 contributed talks.

The afternoons of Tuesday and Thursday were used for breakout sessions. The format was different on the two days. Based on the discussions on Monday, we identified the topics "multivariate topology" and "scalable computation" as topics of general interest. We decided to let every participant discuss both topics, so we organized 4 discussion groups on multivariate topology in the early afternoon, and 3 discussion groups on scalable computation in the later afternoon (plus an alternative group with a different topic). We composed these groups mostly randomly, making sure that members of both communities are roughly balanced in each group. On Thursday afternoon, we let participants propose their topics of interest. 5 groups were formed discussing various aspects raised in contributed talks. On Wednesday and Friday morning, the outcomes of every discussion group were summarized and discussed in a plenary session.

Moreover, the majority of the participants joined an organized excursion to Trier on Wednesday afternoon.

### Results and Reflection

The participants gave the unanimous feedback that the breakout sessions were a full success (and several proposed more time for such discussions in possible upcoming seminars). We first let people from a mixed background to discuss rather vague topics on Tuesday, and asked for specific topics on Thursday. Such an organizational plan led to a stimulating working environment, and helped to avoid idle breakout sessions.

We believe that we have fully achieved the goal of softening the separation between the two communities involved in this seminar. We expect visible evidence of newly formed inter-community ties fostered by the seminar, for instance through joint research projects and/or survey articles summarizing major open problems on the interface of both communities. To the best of our knowledge, 3 working groups are being formed and at least 1 position paper is underway that will combine expertise from both communities to tackle key research questions raised during the seminar.

**Summary text license**

Creative Commons BY 3.0 Unported license

Hamish Carr, Michael Kerber, and Bei Wang

## Dagstuhl Seminar Series

- 23192: "Topological Data Analysis and Applications" (2023)
- 19212: "Topology, Computation and Data Analysis" (2019)

## Classification

- Data Structures / Algorithms / Complexity

## Keywords

- Computational topology
- Topological data analysis