TOP
Suche auf der Schloss Dagstuhl Webseite
Sie suchen nach Informationen auf den Webseiten der einzelnen Seminare? - Dann:
Nicht fündig geworden? - Einige unserer Dienste laufen auf separaten Webseiten mit jeweils eigener Suche. Bitte beachten Sie folgende Liste:
Schloss Dagstuhl - LZI - Logo
Schloss Dagstuhl Services
Seminare
Innerhalb dieser Seite:
Externe Seiten:
  • DOOR (zum Registrieren eines Dagstuhl Aufenthaltes)
  • DOSA (zum Beantragen künftiger Dagstuhl Seminare oder Dagstuhl Perspektiven Workshops)
Publishing
Innerhalb dieser Seite:
Externe Seiten:
dblp
Innerhalb dieser Seite:
Externe Seiten:
  • die Informatik-Bibliographiedatenbank dblp


Dagstuhl-Perspektiven-Workshop 24352

Conversational Agents: A Framework for Evaluation (CAFE)

( 25. Aug – 30. Aug, 2024 )

Permalink
Bitte benutzen Sie folgende Kurz-Url zum Verlinken dieser Seite: https://www.dagstuhl.de/24352

Organisatoren

Kontakt

Motivation

Conversational Agents (CA) as frontends to Information Retrieval (IR) and Recommender Systems (RS) become more popular in everyday life, with a wider range of users and usages. The latest developments in Large Language Models (LLMs) will have tremendous consequences, especially for the workplace and education. In this Dagstuhl Perspectives Workshop, we want to focus on the evaluation of these conversational systems, as appropriate methods are still missing. The quality of these systems is limited in terms of personalization, veracity and correctness, bias, transparency, trustworthiness, and understandability. Thus, evaluation methods must address these shortcomings. Furthermore, user- and usage-oriented aspects should become a more prominent and integral component in evaluations, as the user population as well as the tasks these systems are used for become more heterogeneous. For this reason, the topic-centric view of relevance has to be extended to a broad range of facets which are important for the different usage scenarios. Therefore, suitable evaluation criteria have to be specified, which form the basis for defining appropriate measures. Most importantly, the range of evaluation methods must be revisited and extended, as popular methods like the Cranfield approach or crowdsourcing must be complemented by new evaluation methods and strategies specifically tailored to this new type of system.

More in detail, we will focus our discussion on several key open issues, among which are the following:

  • how to cross the borders of different areas, mainly Information Retrieval and Recommender Systems in our case, but also Natural Language Processing;
  • how to create experimental collections and evaluate Large Language Models in terms of their bias, explainability, veracity, correctness, and hallucination in the CA context;
  • how to incorporate user- and usage-oriented facets in order to understand how users’ perceived conversation qualities (e.g., attentiveness, adaptability, understanding, and response quality) and perceived recommendation qualities (e.g.,, accuracy, novelty, interaction adequacy, and explanation) might interact with each other in a CA to affect user beliefs (e.g., perceived usefulness, perceived ease of use, transparency, user control, rapport, humanness), user attitudes (e.g., user satisfaction, trust), and behavioral intentions (e.g., intention to use);
  • how to measure information leakage and privacy, and how to ensure that a CA does not propagate sensitive information;
  • how to devise proper simulation approaches to support both the development and the evaluation of a CA, avoiding circularity (the techniques used for simulation are similar to those used for developing systems), ensuring reliability, and reducing the gap between offline measurements and online user evaluations;
  • how to evaluate to what extent answers/recommendations produced by a CA are appropriate, tailored to, and understandable for a specific audience, e.g., school kids, the general public, professionals, and people with (cognitive) disabilities.

Overall, all the above questions call, as one possible output of the workshop, for envisioning some reference architecture for CA systems, geared towards evaluation, which allows the different areas to cooperate on a common ground and to share a common roadmap for improving our understanding of CA systems and making them more effective.

Copyright Christine Bauer, Li Chen, Nicola Ferro, and Norbert Fuhr

Klassifikation
  • Artificial Intelligence
  • Human-Computer Interaction
  • Information Retrieval

Schlagworte
  • Conversational Agents
  • Information Retrieval
  • Recommender Systems
  • Evaluation
  • User Interaction