https://www.dagstuhl.de/00331

August 13 – 18 , 2000, Dagstuhl Seminar 00331

Intelligent Data Analysis

Organizer

M. Berthold (Berkeley), R. Kruse (Magdeburg), X. Liu (London), H. Szczerbicka (Hannover)

For support, please contact

Dagstuhl Service Team

Documents

External Homepage
Dagstuhl-Seminar-Report 283

Motivation

For the last decade or so, the size of machine-readable data sets has increased dramatically and the problem of "data explosion" has become apparent. On the other hand, recent developments in computing have provided the basic infrastructure for fast access to vast amounts of online data and many of the advanced computational methods for extracting information from large quantities of data are beginning to mature. These developments have created a new range of problems and challenges for the analysts, as well as new opportunities for intelligent systems in data analysis and have led to the emergence of the field of Intelligent Data Analysis (IDA), a combination of diverse disciplines including Artificial Intelligence and Statistics in particular. These fields often complement each other: many statistical methods, particularly those for large data sets, rely on computation, but brute computing power is no substitute for statistical knowledge.

The goal of this seminar is to bring together a number of experts from the various disciplines to discuss important issues in Intelligent Data Analysis, review current progress in the field, and identify those challenging and fruitful areas for further research. The seminar will focus on some key issues in intelligent data analysis that are directly relevant to the above aspects, both from the application and theoretical side:

  • Strategies: Data analysis in a problem-solving context is typically an iterative process involving problem formulation, model building, and interpretation of the results. The question of how data analysis may be carried out effectively should lead us to having a close look not only at those individual components in the data analysis process, but also at the process as a whole, asking what would constitute a sensible analysis strategy.
  • Integration: In addition to careful thinking at every stage of an analysis process and intelligent application of relevant domain expertise regarding both data and subject matters, Intelligent Data Analysis requires critical assessment and selection of relevant analysis approaches. This often means a sensible integration of various techniques stemming from different disciplines, given that certain techniques from one field could improve a method from another one.
  • Data Quality: Data are now viewed as a key organizational resource and the use of high-quality data for decision making has received increasing attention. Data can be noisy, incomplete and inconsistent, and it is not always easy to handle these problems. Research on data quality has attracted a significant amount of attention from different communities and progress has been made, but further work is urgently needed to come up with practical and effective methods for managing different kinds of data quality problems in large databases.
  • Scalability: One of the key issues involved in large-scale data analysis is "scalability", e.g. if a method works well for a task involving a dozen variables, is it still going to perform well for one with over 100 or 1000 variables? Currently, technical reports of analyzing "big data" are still sketchy. Analysis of big, opportunistic data (data collected for an unrelated purpose) is beset with many statistical pitfalls. We need to accumulate much more practical experience in analyzing large, complex real-world data sets in order to obtain a deep understanding of the IDA process.

Due to the interdisciplinary nature of the proposed audience the real challenge of this meeting will be the initiation of interactions across different disciplines. To provide the necessary background, we plan to have one introductory tutorial-style presentation each day, aiming to familiarize researchers with concepts from the various fields. In addition, we will ask researchers from industry to describe an application from the "real world" each afternoon, which will demonstrate applicability of methods across boundaries. Hopefully this also raises interests in grounding academic research in practical applications. We also ask all participants to prepare an informal presentation regarding issues of interest of their own, such as a report on a particular application, or the brief discussion of an interesting algorithm or methodology. These presentations will be spontaneously scheduled during the late morning or in the afternoon session as need arises or when participants feel that a certain point needs special attention.

Documentation

In the series Dagstuhl Reports each Dagstuhl Seminar and Dagstuhl Perspectives Workshop is documented. The seminar organizers, in cooperation with the collector, prepare a report that includes contributions from the participants' talks together with a summary of the seminar.

 

Download overview leaflet (PDF).

Publications

Furthermore, a comprehensive peer-reviewed collection of research papers can be published in the series Dagstuhl Follow-Ups.

Dagstuhl's Impact

Please inform us when a publication was published as a result from your seminar. These publications are listed in the category Dagstuhl's Impact and are presented on a special shelf on the ground floor of the library.