http://www.dagstuhl.de/15101

01. – 06. März 2015, Dagstuhl Seminar 15101

Bridging Information Visualization with Machine Learning

Organisatoren

Daniel A. Keim (Universität Konstanz, DE)
Tamara Munzner (University of British Columbia – Vancouver, CA)
Fabrice Rossi (University of Paris I, FR)
Michel Verleysen (University of Louvain, BE)

Auskunft zu diesem Dagstuhl Seminar erteilt

Dagstuhl Service Team

Dokumente

Dagstuhl Report, Volume 5, Issue 3 Dagstuhl Report
Motivationstext
Teilnehmerliste
Gemeinsame Dokumente
Dagstuhl's Impact: Dokumente verfügbar
Programm des Dagstuhl Seminars [pdf]

Summary

Motivations and context of the seminar

Following the success of Dagstuhl seminar 12081 "Information Visualization, Visual Data Mining and Machine Learning" [1, 2] which provided to the participants from the IV and ML communities the ground for understanding each other, this Dagstuhl seminar aimed at bringing once again the visualization and machine learning communities together.

Information visualization and visual data mining leverage the human visual system to provide insight and understanding of unorganized data. Visualizing data in a way that is appropriate for the user's needs proves essential in a number of situations: getting insights about data before a further more quantitative analysis (e.g., for expert selection of a number of clusters in a data set), presenting data to a user through well-chosen table, graph or other structured representations, relying on the cognitive skills of humans to show them extended information in a compact way, etc.

The scalability of visualization methods is an issue: human vision is intrinsically limited to between two and three dimensions, and the human preattentive system cannot handle more than a few combined features. In addition the computational burden of many visualization methods is too large for real time interactive use with large datasets. In order to address these scalability issues and to enable visual data mining of massive sets of high dimensional data (or so-called "big data"), simplification methods are needed, so as to select and/or summarize important dimensions and/or objects.

Traditionally, two scientific communities developed tools to address these problems: the machine learning (ML) and information visualization (IV) communities. On the one hand, ML provides a collection of automated data summarizing/compression solutions. Clustering algorithms summarize a set of objects with a smaller set of prototypes, while projection algorithms reduce the dimensionality of objects described by high-dimensional vectors. On the other hand, the IV community has developed user-centric and interactive methods to handle the human vision scalability issue.

Building upon seminar 12081, the present seminar aimed at understanding key challenges such as interactivity, quality assessment, platforms and software, and others.

Organization

The seminar was organized in order to maximize discussion time and in a way that avoided a conference like program with classical scheduled talks. After some lightning introduction by each participant, the seminar began with two tutorial talks one about machine learning (focused on visualization related topics) followed by another one about information visualization. Indeed, while some attendants of the present seminar participated to seminar 12081, most of the participants did not. The tutorials helped establishing some common vocabulary and giving an idea of ongoing research in ML and IV.

After those talks, the seminar was organized in parallel working groups with periodic plenary meeting and discussions, as described below.

Topics and groups

After the two tutorials, the participants spend some time identifying topics they would like to discuss during the seminar. Twenty one emerged:

  1. Definition and analysis of quantitative evaluation measures for dimensionality reduction (DR) methods (and for other methods);
  2. In the context of dimensionality reduction: visualization of quality measures and of the sensitivity of some results to user inputs;
  3. What IV tasks (in addition to DR related tasks) could benefit from ML? What ML tasks could benefit from IV?
  4. Reproducible/stable methods and the link of those aspects to sensitivity and consensus results;
  5. Understanding the role of the user in mixed systems (which include both a ML and an IV component);
  6. Interactive steerable ML methods (relation to intermediate results);
  7. Methods from both fields for dynamic multivariate networks;
  8. ML methods that can scale up to IV demands (especially in terms of interactivity);
  9. Interpretable/transparent decisions;
  10. Uncertainty;
  11. Matching vocabularies/taxonomies between ML and IV;
  12. Limits to ML;
  13. Causality;
  14. User guidance: precalculating results, understanding user intentions;
  15. Mixing user and data driven evaluation (leveraging a ROC curve, for instance);
  16. Privacy;
  17. Applications and use cases;
  18. Prior knowledge integration;
  19. Formalizing task definition;
  20. Usability;
  21. Larger scope ML.

After some clustering and voting those topics were merged into six popular broader subjects which were discussed in working groups through the rest of the week:

  1. Dynamic networks
  2. Quality
  3. Emerging tasks
  4. Role of the user
  5. Reproducibility and interpretability
  6. New techniques for Big Data

The rest of the seminar was organized as a series of meeting in working groups interleaved with plenary meetings which allowed working groups to report on their joint work, to steer the global process, etc.

Conclusion

As reported in the rest of this document, the working groups were very productive as was the whole week. In particular, the participants have identified a number of issues that mostly revolve around complex systems that are being built for visual analytics. Those systems need to be scalable, they need to support rich interaction, steering, objective evaluation, etc. The results must be stable and interpretable, but the system must also be able to include uncertainty into the process (in addition to prior knowledge). Position papers and roadmaps have been written as a concrete output of the discussions on those complex visual analytics systems.

The productivity of the week has confirmed that researchers from information visualization and from machine learning share some common medium to long term research goals. It appeared also clearly that there is still a strong need for a better understanding between the two communities. As such, it was decided to work on joint tutorial proposals for upcoming IV and ML conferences. In order to facilitate the exchange between the communities outside of the perfect conditions provided by Dagstuhl, the blog "Visualization meets Machine Learning" was initiated.

It should be noted finally that the seminar was very appreciated by the participants as reported by the survey. Because of the practical organization of the seminar, participants did not know each other fields very well and it might have been better to allows slightly more time for personal introduction. Some open research questions from each field that seems interesting to the other fields could also have been presented. But the positive consequences of avoiding a conference like schedule was very appreciated. The participants were pleased by the ample time for discussions, the balance between the two communities and the quality of the discussions. Those aspects are quite unique to Dagstuhl.

References

  1. Daniel A. Keim, Fabrice Rossi, Thomas Seidl, Michel Verleysen, and Stefan Wrobel. Dagstuhl Manifesto: Information Visualization, Visual Data Mining and Machine Learning (Dagstuhl Seminar 12081). Informatik-Spektrum, 35:58–83, 8 2012.
  2. Daniel A. Keim, Fabrice Rossi, Thomas Seidl, Michel Verleysen, and Stefan Wrobel, (editors). Information Visualization, Visual Data Mining and Machine Learning (Dagstuhl Seminar 12081), Dagstuhl Reports, 2(2):58–83, Schloss Dagstuhl–Leibniz-Zentrum fuer Informatik, 2012. http://dx.doi.org/10.4230/DagRep.2.2.58
License
  Creative Commons BY 3.0 Unported license
  Daniel A. Keim, Tamara Munzner, Fabrice Rossi, and Michel Verleysen

Related Dagstuhl Seminar

Classification

  • Computer Graphics / Computer Vision
  • Data Bases / Information Retrieval
  • Soft Computing / Evolutionary Algorithms

Keywords

  • Information visualization
  • Machine learning

Buchausstellung

Bücher der Teilnehmer 

Buchausstellung im Erdgeschoss der Bibliothek

(nur in der Veranstaltungswoche).

Dokumentation

In der Reihe Dagstuhl Reports werden alle Dagstuhl-Seminare und Dagstuhl-Perspektiven-Workshops dokumentiert. Die Organisatoren stellen zusammen mit dem Collector des Seminars einen Bericht zusammen, der die Beiträge der Autoren zusammenfasst und um eine Zusammenfassung ergänzt.

 

Download Übersichtsflyer (PDF).

Publikationen

Es besteht weiterhin die Möglichkeit, eine umfassende Kollektion begutachteter Arbeiten in der Reihe Dagstuhl Follow-Ups zu publizieren.

Dagstuhl's Impact

Bitte informieren Sie uns, wenn eine Veröffentlichung ausgehend von
Ihrem Seminar entsteht. Derartige Veröffentlichungen werden von uns in der Rubrik Dagstuhl's Impact separat aufgelistet  und im Erdgeschoss der Bibliothek präsentiert.