TOP
Search the Dagstuhl Website
Looking for information on the websites of the individual seminars? - Then please:
Not found what you are looking for? - Some of our services have separate websites, each with its own search option. Please check the following list:
Schloss Dagstuhl - LZI - Logo
Schloss Dagstuhl Services
Seminars
Within this website:
External resources:
  • DOOR (for registering your stay at Dagstuhl)
  • DOSA (for proposing future Dagstuhl Seminars or Dagstuhl Perspectives Workshops)
Publishing
Within this website:
External resources:
dblp
Within this website:
External resources:
  • the dblp Computer Science Bibliography


Dagstuhl Seminar 20021

Spoken Language Interaction with Virtual Agents and Robots (SLIVAR): Towards Effective and Ethical Interaction

( Jan 05 – Jan 10, 2020 )

(Click in the middle of the image to enlarge)

Permalink
Please use the following short url to reference this page: https://www.dagstuhl.de/20021

Organizers

Contact


Schedule

Motivation

Recent times have seen growing interest in spoken language-based interaction between human beings and so-called ‘intelligent’ machines. Presaged by the release of Apple’s Siri in 2011, speech-enabled devices - such as Amazon Echo, Google Home, and Apple HomePod - are now becoming a familiar feature in people’s homes. Coming years are likely to see the appearance of more embodied social agents (such as robots), but, as yet, there is no clear theoretical basis, nor even practical guidelines, for the optimal integration of spoken language interaction with such entities.

One possible reason for this situation is that the spoken language processing (SLP) and human-robot interaction (HRI) communities are fairly distinct, with only modest overlap. This means that spoken language technologists are often working with arbitrary robots (or limit themselves to conversational agents), and roboticists are typically using off-the-shelf spoken language components without too much regard for their appropriateness. As a consequence, an artefact’s visual, vocal, and behavioural affordances are often not aligned (such as providing non-human robots with inappropriate humanlike voices), and usability suffers – the human-machine interface is not ‘habitable’.

These usability issues can only be resolved by the establishment of a meaningful dialogue between the SLP and HRI communities. Both would benefit from a deeper understanding of each other's methodologies and research perspectives through an open and flexible discussion. The aim of this Dagstuhl Seminar is thus to bring together a critical mass of researchers from the SLP and HRI communities in order to (i) provide a timely opportunity to review the critical open research questions, (ii) propose appropriate evaluation protocols for speech-based human-robot interaction, (iii) investigate opportunities to collect and share relevant corpora, and (iv) consider the ethical and societal issues associated with such machines.

Examples of issues to be addressed at the seminar are:

  • How real is the ‘habitability’ issue, how can it be measured, and what steps could/should be taken to mitigate its effects?
  • What are the dimensions of multimodal affordances in human-robot interaction involving spoken language?
  • What role do emotions play in speech-based HRI?
  • What is the relation between human ‘natural’ language and the language deployed in speechbased human-robot interaction?
  • What can be learnt about the process of ‘understanding’ language that is situated and grounded in real-world action and interaction by using robots as an investigative platform?
  • To what extent do robots have to understand gestures, eye-gaze, and non-linguistic sounds as part of natural language interactions?
  • To what extent do robots have to understand indirect speech acts?
  • How important is the ability to acquire/learn/integrate novel linguistic concepts in real-time?
  • What issues arise from environments containing multiple agents (human or non-human)?
  • What standards and resources already exist, and what is missing?
  • What ethical issues arise from the development of spoken language enabled artefacts?
Copyright Laurence Devillers, Tatsuya Kawahara, Roger K. Moore, and Matthias Scheutz


Summary

Motivation and aims

Recent times have seen growing interest in spoken language-based interaction between human beings and so-called “intelligent” machines. Presaged by the release of Apple’s Siri in 2011, speech-enabled devices – such as Amazon Echo, Google Home, and Apple HomePod – are now becoming a familiar feature in people’s homes. Coming years are likely to see the theoretical basis, nor even practical guidelines, for the optimal integration of spoken language interaction with such entities.

One possible reason for this situation is that the spoken language processing (SLP) and human-robot interaction (HRI) communities are fairly distinct, with only modest overlap. This means that spoken language technologists are often working with arbitrary robots (or limit themselves to conversational agents), and roboticists are typically using off-the-shelf spoken language components without too much regard for their appropriateness. As a consequence, an artefact’s visual, vocal, and behavioural affordances are often not aligned (such as providing non-human robots with inappropriate human-like voices), and usability suffers – the human-machine interface is not “habitable”.

These usability issues can only be resolved by the establishment of a meaningful dialogue between the SLP and HRI communities. Both would benefit from a deeper understanding of each other’s methodologies and research perspectives through an open and flexible discussion. The aim of the seminar was thus to bring together a critical mass of researchers from the SLP and HRI communities in order to (i) provide a timely opportunity to review the critical open research questions, (ii) propose appropriate evaluation protocols for speech-based human-robot interaction, (iii) investigate opportunities to collect and share relevant corpora, and (iv) consider the ethical and societal issues associated with such machines.

Participants

A broad range of expertise was represented by the seminar participants, with a total of 38 attendees including industry experts, PhD students and academics from 14 different countries. The research areas of this interdisciplinary group included SLP, robotics, virtual agents, HRI, dialogue systems, natural language processing, as well as other intersections of SLIVAR.

Seminar overview

to introduce themselves and their research, as well as share their insights on challenges and opportunities in SLIVAR. The presentations were interwoven with four stimulus talks given by leading experts in their respective fields. In light of these presentations, participants formed discussion groups based on the clustering of related topics. The seminar’s schedule was intentionally adaptable to allow for discussions to shift and new groups to form over the course of the week. Alongside discussions, “Show and Tell” sessions were organised to provide participants an opportunity to demonstrate their work and further stimulate discussion.

A non-exhaustive list of topics covered are outlined below along with a selection of the questions discussed within groups.

  • Adaptability
    • How do you cope with the frontier between user adaptation and system adaptation?
    • Are there representations that better enable adaptivity to users?
  • Architecture
    • What are the desiderata for a spoken dialogue system-robot architecture?
  • Ethics
    • What can we do as scientists and engineers to create ethical agents?
    • Should a robot be able to pursue goals that you do not know?
  • Evaluation
    • How do we evaluate HRI systems effectively and efficiently?
    • What are the existing evaluation approaches for SLIVAR?
  • Interaction
    • How do we bridge the gap between dialogue management and interaction management?
    • What kind of interaction modules are useful for dialogue and why?
  • Multimodality
    • What are the minimum representations units for different modalities?
    • What is the added value of multimodal features of spoken interaction in HRI?
  • Natural Language Understanding (NLU) Scalability
    • How should we approach large scale supervised learning for NLU?
  • Speech in Action
    • How can we create challenging interaction situations where speech performance is coordinated to a partner's action?
  • Usability
    • What are the use cases for SLIVAR systems?
    • What is the role of physical or virtual embodiment?

Seminar outcomes

The topics and questions outlined above facilitated a stimulating week of discussion and interdisciplinary collaboration, from which several next steps were established. These include participation in a number of workshops, special sessions and conferences, including but not limited to:

  • SIGdial 2020 Special Session on Situated Dialogue with Virtual Agents and Robots
  • HRI 2020 Second Workshop on Natural Language Generation for HRI
  • IJCAI 2020 ROBOTDIAL Workshop on Dialogue Models for HRI
  • 29th IEEE International Conference on Robot & Human Interactive Communication
  • Interspeech 2020

Research and position papers were also discussed, specifically focusing on the evaluation and ethics of SLIVAR systems. For the former, suggestions included a survey of existing evaluation approaches, a report paper on issues in SLIVAR and HRI evaluation, and investigations into the automation of SLIVAR system objective evaluation. For the latter, next steps included a survey of existing architectures for embedded ethical competence and a position paper on ethical machine learning and artificial intelligence.

The final, and perhaps most valuable outcome of the seminar was the establishment of a new SLIVAR community. There was a strong enthusiasm for the discussions during the seminar to continue with a second SLIVAR meeting, as well as suggestions for growing the community through the formal establishment of a special interest group. Overall, the seminar provided a unique opportunity to create a foundation for collaborative research in SLIVAR which will no doubt have a positive impact on future work in this field.

Copyright Laurence Devillers, Tatsuya Kawahara, Roger K. Moore, and Matthias Scheutz

Participants
  • Hugues Ali Mehenni (CNRS - Orsay, FR)
  • Gérard Bailly (University Grenoble Alpes, FR)
  • Bruce Balentine (Entreprise Integration Group - Zürich, CH)
  • Roberto Basili (University of Rome "Tor Vergata", IT)
  • Timo Baumann (Universität Hamburg, DE) [dblp]
  • Michael C. Brady (American University of Central Asia, KG)
  • Hendrik Buschmeier (Universität Bielefeld, DE)
  • Nick Campbell (Trinity College Dublin, IE) [dblp]
  • Nigel Crook (Oxford Brookes University, GB)
  • Laurence Devillers (CNRS - Orsay, FR) [dblp]
  • Johanna Dobbriner (TU Dublin, IE)
  • Jens Edlund (KTH Royal Institute of Technology - Stockholm, SE) [dblp]
  • Mary Ellen Foster (University of Glasgow, GB) [dblp]
  • Emer Gilmartin (ADAPT Centre - Dublin, IE)
  • Manuel Giuliani (University of the West of England - Bristol, GB)
  • Martin Heckmann (Honda Research Europe - Offenbach, DE) [dblp]
  • Kristiina Jokinen (AIST - Tokyo Waterfront, JP) [dblp]
  • Tatsuya Kawahara (Kyoto University, JP)
  • Casey Kennington (Boise State University, US)
  • Evangelia Kordoni (HU Berlin, DE)
  • Ivana Kruijff-Korbayová (DFKI - Saarbrücken, DE)
  • Pierre Lison (Norwegian Computing Center, NO)
  • Joseph J. Mariani (CNRS - Orsay, FR)
  • Cynthia Matuszek (University of Maryland, Baltimore County, US) [dblp]
  • Roger K. Moore (University of Sheffield, GB) [dblp]
  • Mikio Nakano (Honda Research Institute Japan - Wako, JP) [dblp]
  • Catherine Pelachaud (Sorbonne University - Paris, FR) [dblp]
  • Roberto Pieraccini (Google Switzerland - Zürich, CH)
  • Matthias Scheutz (Tufts University - Medford, US) [dblp]
  • David Schlangen (Universität Potsdam, DE)
  • Abhishek Shrivastava (Indian Institute of Technology - Guwahati, IN)
  • Gabriel Skantze (KTH Royal Institute of Technology - Stockholm, SE)
  • Lucy Skidmore (University of Sheffield, GB)
  • Serge Thill (Radboud University Nijmegen, NL) [dblp]
  • David R. Traum (USC - Playa Vista, US) [dblp]
  • Matthew Walter (TTIC - Chicago, US)
  • Lun Wang (Sapienza University of Rome, IT)
  • Preben Wik (Furhat Robotics - Stockholm, SE)

Classification
  • artificial intelligence / robotics
  • society / human-computer interaction

Keywords
  • spoken language technology
  • human-robot interaction
  • embodied conversational agents