29. April – 03. Mai 2002, Dagstuhl Seminar 02181
V. Krishnamurthy (Oracle Redwood Shores CA, USA), F. Leymann (IBM & Univ. Stuttgart, D), N. Mattos (IBM San Jose CA, USA), B. Mitschang (Univ. Stuttgart, D)
Auskunft zu diesem Dagstuhl Seminar erteilt
Information Integration subsumes all technologies needed to provide form manipulation of information scattered over many data stores while supporting a single system image. The data stores to be integrated are inherently heterogeneous in nature, owned by different organizations, and distributed over the whole world. Data can be structured (e.g. relational data), semi-structured (e.g. XML documents or hyper-linked HTML pages), or unstructured (e.g. opaque flat files, multi-media streams). Access to the data can be based on standardized interfaces (e.g. SQL) or via proprietary APIs (e.g. RYO solutions).
Information integration is expected to become a key technology in many application areas like product data management, business process management, enterprise application integration, life science (including drug design, health care management), or entertainment (e.g. media on demand) to name but a few. Software vendors begin to deliver first products, currently focusing on a particular application area. Research in Information Integration is currently done in different disciplines.
The major goal of the seminar is to bring representatives from the different communities (from research as well as from software vendors and from users) together for a first stocktaking, a joint in-depth understanding of the issues, to identify and prioritize the main research items, identify standardization needs, and to discuss demanding questions and open problems in detail. The areas to discuss include:
- How to get access to the various data stores?
- Different technologies like SQL/MED wrappers, J2EE connectors, EAI adapters, and Web Services can be used for these purposes. When should either of these technologies be used? Can they be unified?
- What are possible system structures?
- Which role will database systems, application server, workflow systems, messaging systems, portal servers, etc. play? How do they relate and cooperate?
- Does "Web Database Technology4 suffice?
- Can XML be used as the language for describing the integrated information base? How to capture "navigational access4 based on hyper-linked HTML pages performed today in many application areas? How to combine search and query functionality? How is XML stored - sliced/diced, as whole document as file in file system, as whole document but combined with other documents in file system? How do you index these effectively? How do you combine SQL and an XML-based query over the same data (i.e., XML query against SQL data and SQL against XML)? Is a pure XML database the way to go or will an extended relational engine be the right solution?
- How is information described?
- As different data stores are combined in a dynamic manner the quality of the information available in a data store becomes key. Which information qualities are needed? How are they described? How can qualities be compared, assessed, measured,5? Which metadata is relevant (schema, ontologies,5)?
- Which federated database technologies can be used?
- What is a federated schema if structured and unstructured data are brought together? Which schema integration techniques, federated query and search technologies are applicable?
- Which transaction model is appropriate?
- Some of the underlying data stores support classical transactions, others dont. Collective manipulation of data stores demands transactional guarantees. Which guarantees are needed? Data stores are owned by different legal entities and are often accessed via the Internet. Which concurrency models, recovery models are applicable
With this seminar we would like to bring together, for the first time ever, people from different areas that all work on the broad topic of 'Information Integration'. We can see the topic of 'Information Integration' to range from application-oriented areas like geographic information systems or product management systems to generic areas in computer science like repository technology, database federation, or data exchange. It is assumed that the discussions in this seminar will provide a first step in the process of finding the needed solutions to the various forms of 'Information Integration'. The participant list covers various well-known people as well as young scientists from both industry and academics. It is our hope that the seminar will improve the understanding of this field, and stimulate new collaborations between the different communities.