https://www.dagstuhl.de/21052

31. Januar – 05. Februar 2021, Dagstuhl-Seminar 21052

Privacy in Speech and Language Technology

Organisatoren

Simone Fischer-Hübner (Karlstad University, SE)
Dietrich Klakow (Universität des Saarlandes, DE)
Peggy Valcke (KU Leuven, BE)
Emmanuel Vincent (INRIA Nancy – Grand Est, FR)

Auskunft zu diesem Dagstuhl-Seminar erteilen

Simone Schilke zu administrativen Fragen

Andreas Dolzmann zu wissenschaftlichen Fragen

Dokumente

Programm des Dagstuhl-Seminars (Hochladen)

(Zum Einloggen bitte persönliche DOOR-Zugangsdaten verwenden)

Motivation

In the last few years, voice assistants have become the preferred means of interacting with smart devices and services. Chatbots and related technologies such as automated translation or typing prediction are also widely used. These technologies often rely on cloud-based machine learning systems trained on speech or text data collected from the users.

The recording, storage and processing of users' speech or text data raises severe privacy threats. This data contains a wealth of personal information about, e.g., the personality, ethnicity and health state of the user, that may be (mis)used for targeted processing or advertisement. It also includes information about the user identity which could be exploited by an attacker to impersonate him/her. News articles exposing these threats to the general public have made national headlines.

A new generation of privacy-preserving speech and language technologies is needed that ensures user privacy while still providing users with the same benefits and companies with the training data needed to develop these technologies. Recent regulations such as the European General Data Protection Regulation (GDPR), which promotes the principle of privacy-by-design, have further fueled interest. Yet, efforts in this direction have suffered from the lack of collaboration across research communities. These include the development of encryption tools such as homomorphic encryption and secure multiparty computation, machine learning frameworks such as federated or decentralized learning, and anonymization techniques targeting speech and language specifically. Privacy in speech and language technology also recently attracted the interest of law researchers and data protection authorities.

To the best of our knowledge, this Dagstuhl Seminar will be the first event that aims to bring together academic researchers, industry representatives, and policy makers in the fields of speech processing, natural language processing, privacy-enhancing technologies (PETs), machine learning, and law and ethics, in order to draw cross-disciplinary solutions. The questions to be addressed include (but are not limited to) the following:

  • What are the threats to privacy arising from the recording, storage and processing of user-generated speech and language data? What is their probability of occurrence and their impact?
  • What are the related ethical and moral issues?
  • How shall those threats be translated into actionable, formal privacy models? Do existing general-purpose privacy models apply or are new domain-specific models needed?
  • Which existing PETs can be leveraged to address privacy requirements regarding raw speech and language data? How shall they be combined into holistic solutions?
  • How should secondary data, e.g., models trained on raw data, be treated?
  • Which new PETs are being developed? Can they benefit from cross-disciplinary collaboration?
  • What privacy goals can these PETs achieve? Which metrics shall be used to assess their success?
  • How shall these PETs be implemented in practice, so as to provide transparent information and management capabilities to the users? How can formal guarantees be made and explained?
  • What are the expected limitations of these PETs? What is the research roadmap to address them?
  • How will privacy laws affect these new developments? Conversely, how will they be impacted by these new developments?

The Dagstuhl Seminar will involve of a mix of plenary talks and subgroup discussions aiming to achieve a shared understanding of problems and solutions and to sketch a cross-disciplinary roadmap we hope to publish as a joint position paper. Besides, there will be multiple breaks for invitees to socialize and make new cross-disciplinary collaborations emerge.

D. Klakow and E. Vincent acknowledge support from the European Union's Horizon 2020 Research and Innovation Program within project COMPRISE "Cost-effective, multilingual, privacy-driven voice-enabled services" (www.compriseh2020.eu).

Motivation text license
  Creative Commons BY 3.0 DE
  Simone Fischer-Hübner, Dietrich Klakow, Peggy Valcke, and Emmanuel Vincent

Classification

  • Computation And Language
  • Computers And Society
  • Cryptography And Security

Keywords

  • Speech and language technology
  • Privacy
  • Data protection
  • Privacy-enhancing technologies
  • Law and policy

Dokumentation

In der Reihe Dagstuhl Reports werden alle Dagstuhl-Seminare und Dagstuhl-Perspektiven-Workshops dokumentiert. Die Organisatoren stellen zusammen mit dem Collector des Seminars einen Bericht zusammen, der die Beiträge der Autoren zusammenfasst und um eine Zusammenfassung ergänzt.

 

Download Übersichtsflyer (PDF).

Publikationen

Es besteht weiterhin die Möglichkeit, eine umfassende Kollektion begutachteter Arbeiten in der Reihe Dagstuhl Follow-Ups zu publizieren.

Dagstuhl's Impact

Bitte informieren Sie uns, wenn eine Veröffentlichung ausgehend von
Ihrem Seminar entsteht. Derartige Veröffentlichungen werden von uns in der Rubrik Dagstuhl's Impact separat aufgelistet  und im Erdgeschoss der Bibliothek präsentiert.