TOP
Search the Dagstuhl Website
Looking for information on the websites of the individual seminars? - Then please:
Not found what you are looking for? - Some of our services have separate websites, each with its own search option. Please check the following list:
Schloss Dagstuhl - LZI - Logo
Schloss Dagstuhl Services
Seminars
Within this website:
External resources:
  • DOOR (for registering your stay at Dagstuhl)
  • DOSA (for proposing future Dagstuhl Seminars or Dagstuhl Perspectives Workshops)
Publishing
Within this website:
External resources:
dblp
Within this website:
External resources:
  • the dblp Computer Science Bibliography


Dagstuhl Seminar 25142

Explainability in Focus: Advancing Evaluation through Reusable Experiment Design

( Mar 30 – Apr 02, 2025 )

(Click in the middle of the image to enlarge)

Permalink
Please use the following short url to reference this page: https://www.dagstuhl.de/25142

Organizers

Contact

Shared Documents


Schedule

Summary

This summary outlines the key outcomes of Dagstuhl Seminar 25142, which focused on the role of explanations in advancing Responsible and Ethical AI. The discussion emphasized the importance of explainability in AI systems to:

  • Demystify AI systems: Helping users understand the rationale behind AI-generated outcomes.
  • Promote accountability: Enabling users to verify that decisions are based on valid, unbiased data.
  • Encourage transparency: Reinforcing trust and confidence in AI technologies through clear, interpretable outputs.
  • Support debugging and decision-making: Assisting users in evaluating whether to trust a prediction or recommendation.

This seminar brought together researchers, practitioners, and experts in the field of explainable AI to collaboratively develop reusable resources aimed at standardizing the evaluation of explainability methods. The goal was to ensure that evaluation practices are robust, consistent, and adaptable across diverse contexts and applications.

A major outcome of the seminar was the identification of three key challenges:

  1. Balancing technical rigor with human-centric considerations when determining which aspects of explanations should be assessed;
  2. Developing consistent and reliable metrics for evaluating the selected criteria; and
  3. Ensuring that both criteria and measurements are appropriately tailored to specific use cases where explainability is critical.

To illustrate practical applications of the discussed frameworks, we presented case studies showcasing end-to-end evaluation examples.

Copyright Simone Stumpf, Elizabeth Daly, and Stefano Teso

Motivation

Explainability in artificial intelligence (AI) is paramount for ensuring the responsible and ethical deployment of these technologies across various domains. It plays a crucial role in establishing trust between AI systems and humans, particularly in applications that directly impact individuals’ lives, such as healthcare, finance, and criminal justice. The ability to provide clear, comprehensible explanations of AI-driven decisions helps demystify these complex systems, allowing users to understand the rationale behind outcomes. This transparency promotes accountability, enabling users to verify that AI systems are making decisions based on valid and unbiased data, ultimately reinforcing trust and confidence in the technology.

Yet, a crucial aspect that tends to be overlooked is the insight that explanations can be leveraged for different objectives and when evaluating the utility of these methods the objective needs to be taken into account. Explanations can enhance transparency, help users form a cognitive model of a trained ML system, aid in debugging, or assist users in determining whether to place trust in a prediction or recommendation. While many explanatory mechanisms have been proposed in the community, comparing these solutions remains challenging without more standardized evaluation strategies. Compounding this issue is the versatile nature of explanations, meaning algorithm designers in reality should tailor their evaluation strategies to specific tasks.

The objective of this workshop is to bring together researchers, practitioners, and experts in the field of explainable AI to collaboratively develop reusable experiment designs. Evaluating explainability methods is something that has not been standardized by the community, meaning each author must develop and justify their own approaches. This reduces the abilities of researchers to publish their findings, which can also hinder progress in this space.

To address this, we aim to identify the different objectives and tasks for explainability methods and use this Dagstuhl Seminar to bring together researchers from the field to create a repository of adaptable tasks and experiments that will be made available to the community as open-source resources. By fostering discussions, sharing insights, and creating practical frameworks, this seminar aims to accelerate progress in the field of explainability, ensuring that evaluation practices are robust, consistent, and applicable across a wide range of contexts and applications.

Our goal is to fill this gap in the community, lowering the barrier for entry for AI researchers to properly evaluate their contributions with sound evaluation strategies grounded in cognitive science.

Copyright Ruth Mary Josephine Byrne, Elizabeth M. Daly, Simone Stumpf, and Stefano Teso

Participants

Please log in to DOOR to see more details.

  • Elisabeth André (Universität Augsburg, DE) [dblp]
  • Jaesik Choi (KAIST - Daejeon, KR) [dblp]
  • Peter Clark (Allen Institute for AI - Seattle, US) [dblp]
  • Elizabeth M. Daly (IBM Research - Dublin, IE) [dblp]
  • Peter Flach (University of Bristol, GB) [dblp]
  • Jasmina Gajcin (IBM Research - Dublin, IE)
  • Tobias Huber (TH Ingolstadt, DE)
  • Eda Ismail-Tsaous (bidt - München, DE)
  • Patricia Kahr (TU Eindhoven, NL)
  • Francesca Naretto (University of Pisa, IT)
  • Talya Porat (Imperial College London, GB)
  • Daniele Quercia (Nokia Bell Labs - Cambridge, GB) [dblp]
  • Lindsay Sanneman (Arizona State University - Tempe, US) [dblp]
  • Ute Schmid (Universität Bamberg, DE) [dblp]
  • Kacper Sokol (ETH Zürich, CH) [dblp]
  • Timo Speith (Universität Bayreuth, DE) [dblp]
  • Wolfgang Stammer (TU Darmstadt, DE) [dblp]
  • Simone Stumpf (University of Glasgow, GB) [dblp]
  • Stefano Teso (University of Trento, IT) [dblp]
  • Nava Tintarev (Maastricht University, NL) [dblp]

Classification
  • Artificial Intelligence
  • Human-Computer Interaction
  • Machine Learning

Keywords
  • Explainability
  • Mental Models
  • interactive machine learning
  • Experiment Design
  • Human-centered AI