February 19 – 24, 2012, Dagstuhl Seminar 12081
Information Visualization, Visual Data Mining and Machine Learning
Daniel A. Keim (Universitšt Konstanz, DE)
Fabrice Rossi (University of Paris I, FR)
Thomas Seidl (RWTH Aachen, DE)
Michel Verleysen (University of Louvain, BE)
Stefan Wrobel (Fraunhofer IAIS, St. Augustin & University of Bonn, DE)
1 / 2 >
For support, please contact
Information visualization and visual data mining leverage the human visual system to provide insight and understanding of unorganized data. Visualizing data in a way that is appropriate for the user's needs proves essential in a number of situations: getting insights about data before a further more quantitative analysis, presenting data to a user through well-chosen table, graph or other structured representations, relying on the cognitive skills of humans to show them extended information in a compact way, etc.
Machine learning enables computers to automatically discover complex patterns in data and, when examples of such patterns are available, to learn automatically from the examples how to recognize occurrences of those patterns in new data. Machine learning has proven itself quite successful in day to day tasks such as SPAM filtering and optical character recognition.
Both research fields share a focus on data and information, and it might seem at first that the main difference between the two fields is the predominance of visual representations of the data in information visualization compared to its relatively low presence in machine learning. However, it should be noted that visual representations are used in a quite systematic way in machine learning, for instance to summarize predictive performances, i.e., whether a given system is performing well in detecting some pattern. This can be traced back to a long tradition of statistical graphics for instance. Dimensionality reduction is also a major topic in machine learning: one aims here at describing as accurately as possible some data with a small number of variables rather than with their original possibly numerous variables. Principal component analysis is the simplest and most well known example of such a method. In the extreme case where one uses only two or three variables, dimensionality reduction is a form of information visualization as the new variables can be used to directly display the original data.
Even if this could be seen as an over simplification of the reality, one could consider that ML tends to provide scalability through automated methods based on the optimization of some ad hoc quality measure, while IV tends to rely on the user to direct the summarizing process, using adapted interactive techniques. Then, the two fields remain quite isolated, despite some well known contact points such as the Self-Organizing Map and Multidimensional Scaling.
The main difference between both fields is the role of the user in the data exploration and modeling. The ultimate goal of machine learning is somehow to get rid of the user: everything should be completely automated and done by a computer. While the user could still play a role by, e.g., choosing the data description or the type of algorithm to use, his/her influence should be limited to a strict minimum. In information visualization, a quite opposite point of view is put forward as visual representations are designed to be leveraged by a human to extract knowledge from the data. Patterns are discovered by the user, models are adjusted to the data under user steering, etc.
The seminar was organized in this context with the specific goal of bringing together researchers from both communities in order to tighten the loose links between them.
It became clear that a large effort is still needed at the algorithmic and software levels. First, fast machine learning techniques are needed that can be embedded in interactive visualization systems. Second, there is the need for a standard software environment that can be used in both communities. The unavailability of such a system hurts research to some extent, as some active system environments in one field do not include even basic facilities from the other. One typical example is the R statistical environment with which a large part of machine learning research is conducted and whose interactive visualization capabilities are limited, in particular in comparison to the state-of-the-art static visualization possibilities. One possible solution foreseen at the seminar was the development of some dynamic data sharing standard that can be implemented in several software environments, allowing fast communication between those environments and facilitating software reuse.
Judging by the liveliness of the discussions and the number of joint research projects proposed at the end of the seminar, this meeting between the machine learning and the information visualization communities was more than needed. The flexible format of the Dagstuhl seminars is perfectly adapted to this type of meeting and the only frustration perceivable at the end of the week was that it had indeed reached its end. It was clear that researchers from the two communities were starting to understand each other and were eager to share more thoughts and actually start working on joint projects. This calls for further seminars ...
More information about the Dagstuhl seminar can be found at http://www.dagstuhl.de/12081.
- Information Visualization
- Machine Learning
- Computer Graphics
- Information Retrieval
- Soft Computing.
- Information visualization
- Machine learning
- Nonlinear dimensionality reduction
- Exploratory data analysis