TOP
Search the Dagstuhl Website
Looking for information on the websites of the individual seminars? - Then please:
Not found what you are looking for? - Some of our services have separate websites, each with its own search option. Please check the following list:
Schloss Dagstuhl - LZI - Logo
Schloss Dagstuhl Services
Seminars
Within this website:
External resources:
  • DOOR (for registering your stay at Dagstuhl)
  • DOSA (for proposing future Dagstuhl Seminars or Dagstuhl Perspectives Workshops)
Publishing
Within this website:
External resources:
dblp
Within this website:
External resources:
  • the dblp Computer Science Bibliography


Dagstuhl Seminar 00331

Intelligent Data Analysis

( Aug 13 – Aug 18, 2000 )

(Click in the middle of the image to enlarge)

Permalink
Please use the following short url to reference this page: https://www.dagstuhl.de/00331

Organizers
  • H. Szczerbicka (Hannover)
  • M. Berthold (Berkeley)
  • R. Kruse (Magdeburg)
  • X. Liu (London)




Motivation

For the last decade or so, the size of machine-readable data sets has increased dramatically and the problem of "data explosion" has become apparent. On the other hand, recent developments in computing have provided the basic infrastructure for fast access to vast amounts of online data and many of the advanced computational methods for extracting information from large quantities of data are beginning to mature. These developments have created a new range of problems and challenges for the analysts, as well as new opportunities for intelligent systems in data analysis and have led to the emergence of the field of Intelligent Data Analysis (IDA), a combination of diverse disciplines including Artificial Intelligence and Statistics in particular. These fields often complement each other: many statistical methods, particularly those for large data sets, rely on computation, but brute computing power is no substitute for statistical knowledge.

The goal of this seminar is to bring together a number of experts from the various disciplines to discuss important issues in Intelligent Data Analysis, review current progress in the field, and identify those challenging and fruitful areas for further research. The seminar will focus on some key issues in intelligent data analysis that are directly relevant to the above aspects, both from the application and theoretical side:

  • Strategies: Data analysis in a problem-solving context is typically an iterative process involving problem formulation, model building, and interpretation of the results. The question of how data analysis may be carried out effectively should lead us to having a close look not only at those individual components in the data analysis process, but also at the process as a whole, asking what would constitute a sensible analysis strategy.
  • Integration: In addition to careful thinking at every stage of an analysis process and intelligent application of relevant domain expertise regarding both data and subject matters, Intelligent Data Analysis requires critical assessment and selection of relevant analysis approaches. This often means a sensible integration of various techniques stemming from different disciplines, given that certain techniques from one field could improve a method from another one.
  • Data Quality: Data are now viewed as a key organizational resource and the use of high-quality data for decision making has received increasing attention. Data can be noisy, incomplete and inconsistent, and it is not always easy to handle these problems. Research on data quality has attracted a significant amount of attention from different communities and progress has been made, but further work is urgently needed to come up with practical and effective methods for managing different kinds of data quality problems in large databases.
  • Scalability: One of the key issues involved in large-scale data analysis is "scalability", e.g. if a method works well for a task involving a dozen variables, is it still going to perform well for one with over 100 or 1000 variables? Currently, technical reports of analyzing "big data" are still sketchy. Analysis of big, opportunistic data (data collected for an unrelated purpose) is beset with many statistical pitfalls. We need to accumulate much more practical experience in analyzing large, complex real-world data sets in order to obtain a deep understanding of the IDA process.

Due to the interdisciplinary nature of the proposed audience the real challenge of this meeting will be the initiation of interactions across different disciplines. To provide the necessary background, we plan to have one introductory tutorial-style presentation each day, aiming to familiarize researchers with concepts from the various fields. In addition, we will ask researchers from industry to describe an application from the "real world" each afternoon, which will demonstrate applicability of methods across boundaries. Hopefully this also raises interests in grounding academic research in practical applications. We also ask all participants to prepare an informal presentation regarding issues of interest of their own, such as a report on a particular application, or the brief discussion of an interesting algorithm or methodology. These presentations will be spontaneously scheduled during the late morning or in the afternoon session as need arises or when participants feel that a certain point needs special attention.


Participants
  • H. Szczerbicka (Hannover)
  • M. Berthold (Berkeley)
  • R. Kruse (Magdeburg)
  • X. Liu (London)