November 7 – 12 , 2010, GI-Dagstuhl Seminar 10452
Data Exchange, Integration, and Streams
For support, please contact
Nowadays, electronic data are ubiquitous and exist in different formats, in different locations, and in rapidly increasing volumes. Furthermore, data are often in the form of a stream that is transmitted via a network. Information integration is the problem of combining data from multiple heterogeneous sources into a unifying format accessible by end-users. Information integration is regarded as a major challenge faced by every modern organization concerned with data collection and analysis, data migration, and data evolution. In fact, in a 2008 article in the Communications of the ACM, Phil Bernstein of Microsoft Research and Laura Haas of IBM Research wrote that Large enterprises spend a great deal of time and money on information integration ... Frequently cited as the biggest and most expensive challenge that information-technology shops face, information integration is thought to consume about 40% of their budget. Information integration is also important in scientific research where discovery depends crucially on the integration of scientific data from multiple sources.
The research community has addressed the information integration challenge by investigating in depth certain specific facets of information integration, the most prominent of which are data exchange, data integration, and data streams. Data exchange and data integration deal with the execution of information integration, but they adopt distinctly different approaches. Data exchange is the problem of transforming data residing in different sources into data structured under a target schema; in particular, data exchange entails the materialization of data, after the data have been extracted from the sources and re-structured into the unified format. In contrast, data integration can be described as symbolic or virtual integration: users are provided with the capability to pose queries and obtain answers via the unified format interface, while the data remain in the sources and no materialization of the restructured data takes place. The study of data exchange and data integration has been facilitated by the systematic use of schema mappings, which are high-level specifications (typically expressed in a suitable logical formalism) that describe the relationship between two database schemas. As a matter of fact, schema mappings are often described as the essential building blocks in data exchange and date integration, and have been the object of extensive research investigations in recent years. These investigations span a wide spectrum of topics, from semantics and algorithms to the design and development of systems for data exchange and data integration based on schema mappings.
In the basic data stream model, the input data consists of one or several streams of data items that can be read only sequentially, one after the other. This scenario is relevant for a large number of applications where massive amounts of data need to be processed. Typically, algorithms have to work with one or few passes over the data and a memory buffer of size significantly smaller than the input size. In the past few years, a new theory has emerged for reasoning about algorithms that work within these constraints. This theory involves the design of efficient algorithms, techniques for proving lower bounds on the resources required for solving specific problems, and the design of general-purpose data-stream management systems.
The main aim of DEIS'10 is to expose young researchers from both academia and industry to state-of-the-art developments in information integration and to prepare them for productive research in data exchange, data integration, and data streams.
DEIS'10 Webpage (with detailed information about the schedule and the application procedure).
- Data Management
- Data exchange
- Data integration
- Data streams
- Heterogeneous databases
- Data inter-operability
- Metadata management
- Query answering and query rewriting
- Inconsistent databases