August 31 – September 5 , 2003, Dagstuhl Seminar 03362

Data Quality on the Web


Michael Gertz (University of California – Davis, US)
M. Tamer Özsu (University of Waterloo, CA)
Gunter Saake (Universität Magdeburg, DE)
Kai-Uwe Sattler (TU Ilmenau, DE)

For support, please contact

Dagstuhl Service Team


List of Participants
Dagstuhl's Impact: Documents available


Although techniques for managing, querying, and integrating data on the Web have significantly matured over the last few years, well-founded and applicable approaches to determine or even to guarantee a certain degree of quality of the data are still missing. Reasons for this include in particular the lack of common, agreed-upon models of quality measurements and the difficulty of handling quality information during data integration and query processing. The problem of data quality arises in many scenarios, e.g., during the integration of business or scientific data, in Web mining, data dissemination, and in particular in querying the Web using search and meta-search engines. Furthermore, it affects various kinds of data, such as structured and semistructured data, text documents as well as streaming data. Information about data quality is becoming more and more important since it provides some kind of yardstick describing the value and reliability of (possibly heterogeneous) forms of distributed or integrated data.

The aim of this seminar was to foster collaboration among researchers from different areas working on problems related to data quality. This included but was not limited to data integration, information retrieval (particularly search engines), scientific data warehousing and applications domains from the computational sciences and bioinformatics. In all these areas, data quality plays a crucial role and therefore different specific solutions have been developed. Sharing and exchanging this knowledge could result in significant synergy effects.

The seminar focused on the following major issues:

  • Criteria and measurements for quality of Web data,
  • Representation and exchange of quality information as metadata,
  • Usage and maintenance of data quality in Web querying and data integration.

The intention was to clarify terminologies and models, analyze the state of the art in the different areas, discuss problems, approaches and applications of quality-aware Web data management and to identify future trends and research directions in the above mentioned areas.

For this purpose, the seminar was organized in four workings groups

  • "Metadata & Modeling",
  • "Information Quality Assessment and Measurement",
  • "Do you Trust in Data Quality?",
  • and "Data Integration"

where participants discussed the special issues and presented their results to the other group members afterwards.

Summaries of the working groups can be found in the Seminar Report.


In the series Dagstuhl Reports each Dagstuhl Seminar and Dagstuhl Perspectives Workshop is documented. The seminar organizers, in cooperation with the collector, prepare a report that includes contributions from the participants' talks together with a summary of the seminar.


Download overview leaflet (PDF).

Dagstuhl's Impact

Please inform us when a publication was published as a result from your seminar. These publications are listed in the category Dagstuhl's Impact and are presented on a special shelf on the ground floor of the library.


Furthermore, a comprehensive peer-reviewed collection of research papers can be published in the series Dagstuhl Follow-Ups.