https://www.dagstuhl.de/21052

January 31 – February 5 , 2021, Dagstuhl Seminar 21052

CANCELLED Privacy in Speech and Language Technology

Due to the Covid-19 pandemic, this seminar was cancelled. A related Dagstuhl Seminar was scheduled to August 21 – 26 , 2022 – Seminar 22342.

Organizers

Simone Fischer-Hübner (Karlstad University, SE)
Dietrich Klakow (Universität des Saarlandes, DE)
Peggy Valcke (KU Leuven, BE)
Emmanuel Vincent (INRIA Nancy – Grand Est, FR)

For support, please contact

Dagstuhl Service Team

Motivation

In the last few years, voice assistants have become the preferred means of interacting with smart devices and services. Chatbots and related technologies such as automated translation or typing prediction are also widely used. These technologies often rely on cloud-based machine learning systems trained on speech or text data collected from the users.

The recording, storage and processing of users' speech or text data raises severe privacy threats. This data contains a wealth of personal information about, e.g., the personality, ethnicity and health state of the user, that may be (mis)used for targeted processing or advertisement. It also includes information about the user identity which could be exploited by an attacker to impersonate him/her. News articles exposing these threats to the general public have made national headlines.

A new generation of privacy-preserving speech and language technologies is needed that ensures user privacy while still providing users with the same benefits and companies with the training data needed to develop these technologies. Recent regulations such as the European General Data Protection Regulation (GDPR), which promotes the principle of privacy-by-design, have further fueled interest. Yet, efforts in this direction have suffered from the lack of collaboration across research communities. These include the development of encryption tools such as homomorphic encryption and secure multiparty computation, machine learning frameworks such as federated or decentralized learning, and anonymization techniques targeting speech and language specifically. Privacy in speech and language technology also recently attracted the interest of law researchers and data protection authorities.

To the best of our knowledge, this Dagstuhl Seminar will be the first event that aims to bring together academic researchers, industry representatives, and policy makers in the fields of speech processing, natural language processing, privacy-enhancing technologies (PETs), machine learning, and law and ethics, in order to draw cross-disciplinary solutions. The questions to be addressed include (but are not limited to) the following:

  • What are the threats to privacy arising from the recording, storage and processing of user-generated speech and language data? What is their probability of occurrence and their impact?
  • What are the related ethical and moral issues?
  • How shall those threats be translated into actionable, formal privacy models? Do existing general-purpose privacy models apply or are new domain-specific models needed?
  • Which existing PETs can be leveraged to address privacy requirements regarding raw speech and language data? How shall they be combined into holistic solutions?
  • How should secondary data, e.g., models trained on raw data, be treated?
  • Which new PETs are being developed? Can they benefit from cross-disciplinary collaboration?
  • What privacy goals can these PETs achieve? Which metrics shall be used to assess their success?
  • How shall these PETs be implemented in practice, so as to provide transparent information and management capabilities to the users? How can formal guarantees be made and explained?
  • What are the expected limitations of these PETs? What is the research roadmap to address them?
  • How will privacy laws affect these new developments? Conversely, how will they be impacted by these new developments?

The Dagstuhl Seminar will involve of a mix of plenary talks and subgroup discussions aiming to achieve a shared understanding of problems and solutions and to sketch a cross-disciplinary roadmap we hope to publish as a joint position paper. Besides, there will be multiple breaks for invitees to socialize and make new cross-disciplinary collaborations emerge.

D. Klakow and E. Vincent acknowledge support from the European Union's Horizon 2020 Research and Innovation Program within project COMPRISE "Cost-effective, multilingual, privacy-driven voice-enabled services" (www.compriseh2020.eu).

Motivation text license
  Creative Commons BY 3.0 DE
  Simone Fischer-Hübner, Dietrich Klakow, Peggy Valcke, and Emmanuel Vincent

Related Dagstuhl Seminar

Classification

  • Computation And Language
  • Computers And Society
  • Cryptography And Security

Keywords

  • Speech and language technology
  • Privacy
  • Data protection
  • Privacy-enhancing technologies
  • Law and policy

Documentation

In the series Dagstuhl Reports each Dagstuhl Seminar and Dagstuhl Perspectives Workshop is documented. The seminar organizers, in cooperation with the collector, prepare a report that includes contributions from the participants' talks together with a summary of the seminar.

 

Download overview leaflet (PDF).

Publications

Furthermore, a comprehensive peer-reviewed collection of research papers can be published in the series Dagstuhl Follow-Ups.

Dagstuhl's Impact

Please inform us when a publication was published as a result from your seminar. These publications are listed in the category Dagstuhl's Impact and are presented on a special shelf on the ground floor of the library.