https://www.dagstuhl.de/02181

April 29 – May 3 , 2002, Dagstuhl Seminar 02181

Information Integration

Organizer

V. Krishnamurthy (Oracle Redwood Shores CA, USA), F. Leymann (IBM & Univ. Stuttgart, D), N. Mattos (IBM San Jose CA, USA), B. Mitschang (Univ. Stuttgart, D)

For support, please contact

Dagstuhl Service Team

Documents

List of Participants

Information Integration subsumes all technologies needed to provide form manipulation of information scattered over many data stores while supporting a single system image. The data stores to be integrated are inherently heterogeneous in nature, owned by different organizations, and distributed over the whole world. Data can be structured (e.g. relational data), semi-structured (e.g. XML documents or hyper-linked HTML pages), or unstructured (e.g. opaque flat files, multi-media streams). Access to the data can be based on standardized interfaces (e.g. SQL) or via proprietary APIs (e.g. RYO solutions).

Information integration is expected to become a key technology in many application areas like product data management, business process management, enterprise application integration, life science (including drug design, health care management), or entertainment (e.g. media on demand) to name but a few. Software vendors begin to deliver first products, currently focusing on a particular application area. Research in Information Integration is currently done in different disciplines.

The major goal of the seminar is to bring representatives from the different communities (from research as well as from software vendors and from users) together for a first stocktaking, a joint in-depth understanding of the issues, to identify and prioritize the main research items, identify standardization needs, and to discuss demanding questions and open problems in detail. The areas to discuss include:

  • How to get access to the various data stores?
  • Different technologies like SQL/MED wrappers, J2EE connectors, EAI adapters, and Web Services can be used for these purposes. When should either of these technologies be used? Can they be unified?
  • What are possible system structures?
  • Which role will database systems, application server, workflow systems, messaging systems, portal servers, etc. play? How do they relate and cooperate?
  • Does "Web Database Technology4 suffice?
  • Can XML be used as the language for describing the integrated information base? How to capture "navigational access4 based on hyper-linked HTML pages performed today in many application areas? How to combine search and query functionality? How is XML stored - sliced/diced, as whole document as file in file system, as whole document but combined with other documents in file system? How do you index these effectively? How do you combine SQL and an XML-based query over the same data (i.e., XML query against SQL data and SQL against XML)? Is a pure XML database the way to go or will an extended relational engine be the right solution?
  • How is information described?
  • As different data stores are combined in a dynamic manner the quality of the information available in a data store becomes key. Which information qualities are needed? How are they described? How can qualities be compared, assessed, measured,5? Which metadata is relevant (schema, ontologies,5)?
  • Which federated database technologies can be used?
  • What is a federated schema if structured and unstructured data are brought together? Which schema integration techniques, federated query and search technologies are applicable?
  • Which transaction model is appropriate?
  • Some of the underlying data stores support classical transactions, others dont. Collective manipulation of data stores demands transactional guarantees. Which guarantees are needed? Data stores are owned by different legal entities and are often accessed via the Internet. Which concurrency models, recovery models are applicable

    With this seminar we would like to bring together, for the first time ever, people from different areas that all work on the broad topic of 'Information Integration'. We can see the topic of 'Information Integration' to range from application-oriented areas like geographic information systems or product management systems to generic areas in computer science like repository technology, database federation, or data exchange. It is assumed that the discussions in this seminar will provide a first step in the process of finding the needed solutions to the various forms of 'Information Integration'. The participant list covers various well-known people as well as young scientists from both industry and academics. It is our hope that the seminar will improve the understanding of this field, and stimulate new collaborations between the different communities.

    Documentation

    In the series Dagstuhl Reports each Dagstuhl Seminar and Dagstuhl Perspectives Workshop is documented. The seminar organizers, in cooperation with the collector, prepare a report that includes contributions from the participants' talks together with a summary of the seminar.

     

    Download overview leaflet (PDF).

    Publications

    Furthermore, a comprehensive peer-reviewed collection of research papers can be published in the series Dagstuhl Follow-Ups.

    Dagstuhl's Impact

    Please inform us when a publication was published as a result from your seminar. These publications are listed in the category Dagstuhl's Impact and are presented on a special shelf on the ground floor of the library.