https://www.dagstuhl.de/16382
September 18 – 23 , 2016, Dagstuhl Seminar 16382
Foundations of Unsupervised Learning
Organizers
Maria-Florina Balcan (Carnegie Mellon University – Pittsburgh, US)
Shai Ben-David (University of Waterloo, CA)
Ruth Urner (MPI für Intelligente Systeme – Tübingen, DE)
Ulrike von Luxburg (Universität Tübingen, DE)
For support, please contact
Documents
Dagstuhl Report, Volume 6, Issue 9
Aims & Scope
List of Participants
Dagstuhl's Impact: Documents available
Summary
The success of Machine Learning methods for prediction crucially depends on data preprocessing such as building a suitable feature representation. With the recent explosion of data availability, there is a growing tendency to "let the data speak itself". Thus, unsupervised learning is often employed as a a first step in data analysis to build a good feature representation, but also, more generally, to detect patterns and regularities independently of any specific prediction task. There is a wide rage of tasks frequently performed for these purposes such as representation learning, feature extraction, outlier detection, dimensionality reduction, manifold learning, clustering and latent variable models.
The outcome of such an unsupervised learning step has far reaching effects. The quality of a feature representation will affect the quality of a predictor learned based on this representation, a learned model of the data generating process may lead to conclusions about causal relations, a data mining method applied to a database of people may identify certain groups of individuals as "suspects" (for example of being prone to developing a specific disease or of being likely to commit certain crimes).
However, in contrast to the well-developed theory of supervised learning, currently systematic analysis of unsupervised learning tasks is scarce and our understanding of the subject is rather meager. It is therefore more than timely to put effort into developing solid foundations for unsupervised learning methods. It is important to understand and be able to analyze the validity of conclusions being drawn from them. The goal of this Dagstuhl Seminar was to foster the development of a solid and useful theoretical foundation for unsupervised machine learning tasks.
The seminar hosted academic researchers from the fields of theoretical computer science and statistics as well as some researchers from industry. Bringing together experts from a variety of backgrounds, highlighted the many facets of unsupervised learning. The seminar included a number of technical presentations and discussions about the state of the art of research on statistical and computational analysis of unsupervised learning tasks.
We have held lively discussions concerning the development of objective criteria for the evaluation of unsupervised learning tasks, such as clustering. These converged to a consensus that such universal criteria cannot exist and that there is need to incorporate specific domain expertise to develop different objectives for different intended uses of the clusterings. Consequently, there was a debate concerning ways in which theoretical research could build useful tools for practitioners to assist them in choosing suitable methods for their tasks. One promising direction for progress towards better alignment of algorithmic objectives with application needs is the development of paradigms for interactive algorithms for such unsupervised learning tasks, that is, learning algorithms that incorporate adaptive "queries" to a domain expert. The seminar included presentations and discussions of various frameworks for the development of such active algorithms as well as tools for analysis of their benefits.
We believe, the seminar was a significant step towards further collaborations between different research groups with related but different views on the topic. A very active interchange of ideas took place and participants expressed their satisfactions of having gained new insights into directions of research relevant to their own. As a group, we developed a higher level perspective of the important challenges that research of unsupervised learning is currently facing.


Classification
- Artificial Intelligence / Robotics
- Data Structures / Algorithms / Complexity
Keywords
- Machine learning
- Theory of computing
- Unsupervised learning
- Representation learning
- Clustering