31. August – 05. September 2003, Dagstuhl-Seminar 03362

Data Quality on the Web


Michael Gertz (University of California – Davis, US)
M. Tamer Özsu (University of Waterloo, CA)
Gunter Saake (Universität Magdeburg, DE)
Kai-Uwe Sattler (TU Ilmenau, DE)

Auskunft zu diesem Dagstuhl-Seminar erteilt

Dagstuhl Service Team


Dagstuhl's Impact: Dokumente verfügbar


Although techniques for managing, querying, and integrating data on the Web have significantly matured over the last few years, well-founded and applicable approaches to determine or even to guarantee a certain degree of quality of the data are still missing. Reasons for this include in particular the lack of common, agreed-upon models of quality measurements and the difficulty of handling quality information during data integration and query processing. The problem of data quality arises in many scenarios, e.g., during the integration of business or scientific data, in Web mining, data dissemination, and in particular in querying the Web using search and meta-search engines. Furthermore, it affects various kinds of data, such as structured and semistructured data, text documents as well as streaming data. Information about data quality is becoming more and more important since it provides some kind of yardstick describing the value and reliability of (possibly heterogeneous) forms of distributed or integrated data.

The aim of this seminar was to foster collaboration among researchers from different areas working on problems related to data quality. This included but was not limited to data integration, information retrieval (particularly search engines), scientific data warehousing and applications domains from the computational sciences and bioinformatics. In all these areas, data quality plays a crucial role and therefore different specific solutions have been developed. Sharing and exchanging this knowledge could result in significant synergy effects.

The seminar focused on the following major issues:

  • Criteria and measurements for quality of Web data,
  • Representation and exchange of quality information as metadata,
  • Usage and maintenance of data quality in Web querying and data integration.

The intention was to clarify terminologies and models, analyze the state of the art in the different areas, discuss problems, approaches and applications of quality-aware Web data management and to identify future trends and research directions in the above mentioned areas.

For this purpose, the seminar was organized in four workings groups

  • "Metadata & Modeling",
  • "Information Quality Assessment and Measurement",
  • "Do you Trust in Data Quality?",
  • and "Data Integration"

where participants discussed the special issues and presented their results to the other group members afterwards.

Summaries of the working groups can be found in the Seminar Report.


In der Reihe Dagstuhl Reports werden alle Dagstuhl-Seminare und Dagstuhl-Perspektiven-Workshops dokumentiert. Die Organisatoren stellen zusammen mit dem Collector des Seminars einen Bericht zusammen, der die Beiträge der Autoren zusammenfasst und um eine Zusammenfassung ergänzt.


Download Übersichtsflyer (PDF).


Es besteht weiterhin die Möglichkeit, eine umfassende Kollektion begutachteter Arbeiten in der Reihe Dagstuhl Follow-Ups zu publizieren.

Dagstuhl's Impact

Bitte informieren Sie uns, wenn eine Veröffentlichung ausgehend von
Ihrem Seminar entsteht. Derartige Veröffentlichungen werden von uns in der Rubrik Dagstuhl's Impact separat aufgelistet  und im Erdgeschoss der Bibliothek präsentiert.