31. August – 05. September 2003, Dagstuhl Seminar 03362
Data Quality on the Web
Auskunft zu diesem Dagstuhl Seminar erteilt
Although techniques for managing, querying, and integrating data on the Web have significantly matured over the last few years, well-founded and applicable approaches to determine or even to guarantee a certain degree of quality of the data are still missing. Reasons for this include in particular the lack of common, agreed-upon models of quality measurements and the difficulty of handling quality information during data integration and query processing. The problem of data quality arises in many scenarios, e.g., during the integration of business or scientific data, in Web mining, data dissemination, and in particular in querying the Web using search and meta-search engines. Furthermore, it affects various kinds of data, such as structured and semistructured data, text documents as well as streaming data. Information about data quality is becoming more and more important since it provides some kind of yardstick describing the value and reliability of (possibly heterogeneous) forms of distributed or integrated data.
The aim of this seminar was to foster collaboration among researchers from different areas working on problems related to data quality. This included but was not limited to data integration, information retrieval (particularly search engines), scientific data warehousing and applications domains from the computational sciences and bioinformatics. In all these areas, data quality plays a crucial role and therefore different specific solutions have been developed. Sharing and exchanging this knowledge could result in significant synergy effects.
The seminar focused on the following major issues:
- Criteria and measurements for quality of Web data,
- Representation and exchange of quality information as metadata,
- Usage and maintenance of data quality in Web querying and data integration.
The intention was to clarify terminologies and models, analyze the state of the art in the different areas, discuss problems, approaches and applications of quality-aware Web data management and to identify future trends and research directions in the above mentioned areas.
For this purpose, the seminar was organized in four workings groups
- "Metadata & Modeling",
- "Information Quality Assessment and Measurement",
- "Do you Trust in Data Quality?",
- and "Data Integration"
where participants discussed the special issues and presented their results to the other group members afterwards.
Summaries of the working groups can be found in the Seminar Report.