13. – 18. August 2000, Dagstuhl-Seminar 00331

Intelligent Data Analysis


M. Berthold (Berkeley), R. Kruse (Magdeburg), X. Liu (London), H. Szczerbicka (Hannover)

Auskunft zu diesem Dagstuhl-Seminar erteilt

Dagstuhl Service Team


Externe Homepage
Dagstuhl-Seminar-Report 283


For the last decade or so, the size of machine-readable data sets has increased dramatically and the problem of "data explosion" has become apparent. On the other hand, recent developments in computing have provided the basic infrastructure for fast access to vast amounts of online data and many of the advanced computational methods for extracting information from large quantities of data are beginning to mature. These developments have created a new range of problems and challenges for the analysts, as well as new opportunities for intelligent systems in data analysis and have led to the emergence of the field of Intelligent Data Analysis (IDA), a combination of diverse disciplines including Artificial Intelligence and Statistics in particular. These fields often complement each other: many statistical methods, particularly those for large data sets, rely on computation, but brute computing power is no substitute for statistical knowledge.

The goal of this seminar is to bring together a number of experts from the various disciplines to discuss important issues in Intelligent Data Analysis, review current progress in the field, and identify those challenging and fruitful areas for further research. The seminar will focus on some key issues in intelligent data analysis that are directly relevant to the above aspects, both from the application and theoretical side:

  • Strategies: Data analysis in a problem-solving context is typically an iterative process involving problem formulation, model building, and interpretation of the results. The question of how data analysis may be carried out effectively should lead us to having a close look not only at those individual components in the data analysis process, but also at the process as a whole, asking what would constitute a sensible analysis strategy.
  • Integration: In addition to careful thinking at every stage of an analysis process and intelligent application of relevant domain expertise regarding both data and subject matters, Intelligent Data Analysis requires critical assessment and selection of relevant analysis approaches. This often means a sensible integration of various techniques stemming from different disciplines, given that certain techniques from one field could improve a method from another one.
  • Data Quality: Data are now viewed as a key organizational resource and the use of high-quality data for decision making has received increasing attention. Data can be noisy, incomplete and inconsistent, and it is not always easy to handle these problems. Research on data quality has attracted a significant amount of attention from different communities and progress has been made, but further work is urgently needed to come up with practical and effective methods for managing different kinds of data quality problems in large databases.
  • Scalability: One of the key issues involved in large-scale data analysis is "scalability", e.g. if a method works well for a task involving a dozen variables, is it still going to perform well for one with over 100 or 1000 variables? Currently, technical reports of analyzing "big data" are still sketchy. Analysis of big, opportunistic data (data collected for an unrelated purpose) is beset with many statistical pitfalls. We need to accumulate much more practical experience in analyzing large, complex real-world data sets in order to obtain a deep understanding of the IDA process.

Due to the interdisciplinary nature of the proposed audience the real challenge of this meeting will be the initiation of interactions across different disciplines. To provide the necessary background, we plan to have one introductory tutorial-style presentation each day, aiming to familiarize researchers with concepts from the various fields. In addition, we will ask researchers from industry to describe an application from the "real world" each afternoon, which will demonstrate applicability of methods across boundaries. Hopefully this also raises interests in grounding academic research in practical applications. We also ask all participants to prepare an informal presentation regarding issues of interest of their own, such as a report on a particular application, or the brief discussion of an interesting algorithm or methodology. These presentations will be spontaneously scheduled during the late morning or in the afternoon session as need arises or when participants feel that a certain point needs special attention.


In der Reihe Dagstuhl Reports werden alle Dagstuhl-Seminare und Dagstuhl-Perspektiven-Workshops dokumentiert. Die Organisatoren stellen zusammen mit dem Collector des Seminars einen Bericht zusammen, der die Beiträge der Autoren zusammenfasst und um eine Zusammenfassung ergänzt.


Download Übersichtsflyer (PDF).

Dagstuhl's Impact

Bitte informieren Sie uns, wenn eine Veröffentlichung ausgehend von Ihrem Seminar entsteht. Derartige Veröffentlichungen werden von uns in der Rubrik Dagstuhl's Impact separat aufgelistet  und im Erdgeschoss der Bibliothek präsentiert.


Es besteht weiterhin die Möglichkeit, eine umfassende Kollektion begutachteter Arbeiten in der Reihe Dagstuhl Follow-Ups zu publizieren.