Search the Dagstuhl Website
Looking for information on the websites of the individual seminars? - Then please:
Not found what you are looking for? - Some of our services have separate websites, each with its own search option. Please check the following list:
Schloss Dagstuhl - LZI - Logo
Schloss Dagstuhl Services
Within this website:
External resources:
  • DOOR (for registering your stay at Dagstuhl)
  • DOSA (for proposing future Dagstuhl Seminars or Dagstuhl Perspectives Workshops)
Within this website:
External resources:
Within this website:
External resources:
  • the dblp Computer Science Bibliography

Dagstuhl Perspectives Workshop 24492

Human in the Loop Learning through Grounded Interaction in Games

( Dec 01 – Dec 06, 2024 )

Please use the following short url to reference this page:




Over the past few years, there has been a decisive move in Artificial Intelligence (AI) towards human-centered intelligence and AI models that can learn through interaction. This shift is the result of the appearance of Large Language Models (LLMs) able to act as Intelligent Assistants such as InstructGPT, ChatGPT, BARD, or Lamda-2 (Ouyang et al, 2022; OpenAI, 2022; Touvron et al, 2023) and achieving an entire new level of performance in many AI tasks. Much of the success of these models is due to training regimes combining supervised learning and learning from interaction with humans, such as Reinforcement Learning Through Human Feedback (Christiano et al, 2017; Ouyang et al, 2022). Particularly, the most recent among such models, such as GPT-4, are trained with multimodal data and capable of producing output in different modalities. However, these models also have well-known issues, such as hallucinations, so that researchers talk of a Generative AI Paradox (West et al, 2023).

In parallel with the above developments, there has also been substantial progress on grounded interaction – developing models aware of the situation in which they operate (a physical world in the case of robots, a virtual world in the case of artificial agents) and able to, e.g., understand / produce references to this situation (Fitzgerald et al, 2013; Kazemzadeh et al, 2014; Kennington & Schlangen 2017; Chevalier-Boisvert et al, 2019; Testoni and Bernardi 2021; Suglia et al, 2024) perhaps through negotiation (Clark and Brennan, 1990). However, the communication between the intelligent assistant and grounded interaction communities is still limited (Krishnamurthy & Kollar 2013).

A particularly promising approach to study learning through grounded interaction with human agents is virtual world games: games in which conversational agents impersonating characters can learn to perform tasks, or improve their communicative ability, by interacting with human players in platforms such as Minecraft or Light (Johnson et al, 2016; Urbanek et al, 2019; Narayan-Chen et al, 2019; Szlam et al, 2019; Kiseleva et al, 2022; Zhou et al, 2023). Games have been shown to be a promising platform for collecting data from thousands of players (Ahn, 2006; Yu et al, 2023); virtual worlds approach the complexity of the real world; and virtual agents operating in such virtual worlds need to be able to develop a variety of interactional skills to be perceived as 'real' (Schlangen, 2023; Chamalasetti et al, 2023).

This Dagstuhl Perspectives Workshop aims, first of all, to bring together the communities working on the related areas of learning through interaction, (conversational) agents in games, dialogue and interaction, and collecting judgments from crowds through games, to make each community aware of the most recent developments in the other areas. We also intend to discuss current challenges, and whether advances in one area (e.g., grounded interaction) can benefit other areas (e.g., interactive learning). Topics to be discussed include:

  • Are there still gains to be had by training LLMs via games? Is interaction in games still a useful approach to training LLMs in this era when millions of people interact daily with them?
  • How beneficial is grounded human-in-the-loop? Can grounded human-in-the-loop interaction result in better learning than purely textual interaction, or interaction involving text and images but without reference to a scene? E.g., can it help with hallucinations?
  • Are there benefits from more complex interaction in interactive learning? Is there any advantage in moving towards a type of interaction more similar to actual human-human interaction – e.g., one in which conversational agents, as well, are allowed to ask clarification requests / take the initiative?
  • Gamification vs worldification: how does making externally motivated goals more game-like compare with making games more world-like?
Copyright Raffaela Bernardi, Julia Hockenmaier, Udo Kruschwitz, and Massimo Poesio

  • Computation and Language
  • Computer Science and Game Theory
  • Human-Computer Interaction

  • Artificial intelligence
  • Conversational agents in games
  • Human-in-the-loop learning
  • Grounded dialogue and interaction