Dagstuhl Seminar 26281: Multimodal Data Quality – Human, Computational, and Institutional Perspectives

Dagstuhl Seminar 26281

Multimodal Data Quality – Human, Computational, and Institutional Perspectives

( Jul 05 – Jul 10, 2026 )

(Click in the middle of the image to enlarge)

Permalink

Please use the following short url to reference this page: https://www.dagstuhl.de/26281

Organizers

Gianluca Demartini (University of Queensland - Brisbane, AU)
Vanessa Murdock (Amazon Web Services - Seattle, US)
Felix Naumann (Hasso-Plattner-Institut, Universität Potsdam, DE)
Shazia Sadiq (University of Queensland - Brisbane, AU)
Divesh Srivastava (AT&T - Bedminster, US)

Contact

Marsha Kleinbauer (for scientific matters)
Susanne Bach-Bernhard (for administrative matters)

Dagstuhl Reports

As part of the mandatory documentation, participants are asked to submit their talk abstracts, working group results, etc. for publication in our series Dagstuhl Reports via the Dagstuhl Reports Submission System.

Upload (Use personal credentials as created in DOOR to log in)

Dagstuhl Seminar Wiki

Dagstuhl Seminar Wiki (Use personal credentials as created in DOOR to log in)

Shared Documents

Dagstuhl Materials Page (Use personal credentials as created in DOOR to log in)

Schedule

Upload (Use personal credentials as created in DOOR to log in)

Motivation

Show Motivation

The reliance of advanced applications (e.g., in domains like education, finance, health) and emerging technologies (e.g., large language models (LLMs)) on an ever-increasing scale and variety of multimodal data has created the need for new tools, methods, algorithms, and even new professions (e.g., data quality officer), to ensure that the data coming from different sources is fit for purpose.

Data quality in its many dimensions is generally improved by data curation, which includes at least data preparation and cleaning (e.g., dealing with data quality issues, different formats and structures; data transformations; integration and augmentation); data annotation (e.g., collecting human labels to then train supervised machine learning models to scale-up the annotation process); data synthesis and generation; and inclusion of human and institutional oversight in large-scale automated curation tasks.

This Dagstuhl Seminar is relevant to the diverse fields of data management, data engineering, human computation and crowdsourcing, data-driven decision-making, responsible AI, and data governance. The seminar looks at three perspectives: Human, Computational, and Institutional. There is an evident need to bring together perspectives from domain experts who understand the properties and semantics of the datasets (human perspective); from algorithmic advancements that can help improve data quality (computational perspective); and from institutional imperatives including regulations, standards and organizational policies that create necessary safeguards for governance of data pipeline processes (institutional perspective).

Topics discussed in the seminar will aim to span these three perspectives of data quality as outlined above; including but not limited to:

Human perspective

Data Bias and Quality of human annotations collected at scale via crowdsourcing
Active Learning methods to optimize data annotation strategies
Support human labelling by means of data augmentation or LLM-generated
The human impact of data quality due to bias, fairness, robustness, toxicity, privacy

Computational perspective

The role of generative AI in data annotation
Hallucination, reliability, trust of LLMs
Training AI with noisy or unbalanced data and evaluating on representative data
Data-Centric AI: machine-generated data, data quality for fine-tuning, evaluation/training data for heterogeneous agentic systems
Algorithmic fairness, multi-calibration
Data security and privacy (e.g., jail-breaking LLMs)

Institutional perspective

Data Governance and Responsible Information Use
Information Resilience across the data value chain
Compliance with data standards and regulation
Ethics related to the use of low-quality data to inform decisions and train AI
Organizational structures and best practice
Data monetization and value realization

Creative Commons BY 4.0

Gianluca Demartini, Vanessa Murdock, Felix Naumann, Shazia Sadiq, and Divesh Srivastava

Participants

Show Participants

Please log in to DOOR to see more details.

Ziawasch Abedjan
Maribel Acosta
Abraham Bernstein
Robert Busa-Fekete
Gianluca Demartini
Lisa Ehrlinger
Anna Fariha
Donatella Firmani
Leon Fröhling
Ujwal Gadiraju
Helena Galhardas
Lukasz Golab
Hazar Harmouch
Danula Hettiachchi
Jin Ke
Ramayya Krishnan
Vanessa Murdock
Felix Naumann
Jahna Otterbacher
Rema Padman
Paolo Papotti
Maria Angela Pellegrino
Francesca Pezzuti
Amy Rechkemmer
Anna Richter
Anisa Rula
Shazia Sadiq
Irina Shklovski
Divesh Srivastava
Nicola Tonellotto
Matthias Weidlich
Sonja Zillner

Classification

Computers and Society
Databases
Human-Computer Interaction

Keywords

Data Quality
Bias and Fairness
Responsible AI

Seminar 26281

Search the Dagstuhl Website

Schloss Dagstuhl Services

Seminars

Within this website:

External resources:

Publishing

Within this website:

External resources:

dblp

Within this website:

External resources:

Dagstuhl Seminar 26281

Multimodal Data Quality – Human, Computational, and Institutional Perspectives

( Jul 05 – Jul 10, 2026 )

Permalink

Organizers

Contact

Dagstuhl Reports

Dagstuhl Seminar Wiki

Shared Documents

Schedule

Motivation

Participants

Classification

Keywords