TOP
Search the Dagstuhl Website
Looking for information on the websites of the individual seminars? - Then please:
Not found what you are looking for? - Some of our services have separate websites, each with its own search option. Please check the following list:
Schloss Dagstuhl - LZI - Logo
Schloss Dagstuhl Services
Seminars
Within this website:
External resources:
  • DOOR (for registering your stay at Dagstuhl)
  • DOSA (for proposing future Dagstuhl Seminars or Dagstuhl Perspectives Workshops)
Publishing
Within this website:
External resources:
dblp
Within this website:
External resources:
  • the dblp Computer Science Bibliography


Dagstuhl Seminar 24122

Low-Dimensional Embeddings of High-Dimensional Data: Algorithms and Applications

( Mar 17 – Mar 22, 2024 )

Permalink
Please use the following short url to reference this page: https://www.dagstuhl.de/24122

Organizers

Contact

Dagstuhl Seminar Wiki

Shared Documents

Schedule
  • Upload (Use personal credentials as created in DOOR to log in)

Motivation

Low-dimensional embeddings are widely used for unsupervised data exploration across many scientific fields, from single-cell biology to artificial intelligence. These fields routinely deal with high-dimensional characterization of millions of objects, and the data often contain rich structure with hierarchically organised clusters, progressions, and manifolds. Researchers increasingly use 2D embeddings (t-SNE, UMAP, autoencoders, etc.) to get an intuitive understanding of their data and to generate scientific hypotheses or follow-up analysis plans. With so many scientific insights hinging on these visualisations, it becomes urgent to examine the current state of these techniques mathematically and algorithmically.

This Dagstuhl Seminar intends to bring together machine learning researchers working on algorithm development, mathematicians interested in provable guarantees, and practitioners applying embedding methods in biology, chemistry, humanities, social science, etc. Our aim is to bring together the world's leading experts to (i) survey the state of the art; (ii) identify critical shortcomings of existing methods; (iii) brainstorm ideas for the next generation of methods; and (iv) forge collaborations to help make these a reality.

This seminar should lay the groundwork for future methods that rise to the challenge of visualising high-dimensional data sets while emphasising their idiosyncrasies and scaling to tens, hundreds, and potentially thousands of millions of data points.

Seminar topics:

  • Manifold assumption and manifold learning.
  • Spectral methods, diffusion, Laplacian methods, etc.
  • Relationships and trade-offs between different embedding algorithms.
  • Limitations and shortcomings of low-dimensional embeddings. Danger of over-interpretation.
  • Local, global, and hierarchical structure preservation.
  • Non-Euclidean embeddings, such as hyperbolic or spherical.
  • Low- (~2) vs. mid-range- (~256) dimensional embeddings: unique challenges.
  • Low-dimensional embeddings in actual practice: embeddings of cells, molecules, texts, graph nodes, images, etc. Data modalities and their challenges.
  • Scaling up for larger datasets, runtime considerations.
  • Self-supervised embeddings via contrastive learning.
  • Theoretical guarantees and mathematical properties of unsupervised and self-supervised embeddings.
  • Topological data analysis in the embedding space; topological constraints on embeddings.
Copyright Fred Hamprecht, Dmitry Kobak, Smita Krishnaswamy, and Gal Mishne

Participants

Classification
  • Data Structures and Algorithms
  • Machine Learning

Keywords
  • dimensionality reduction
  • visualization
  • high-dimensional