TOP
Search the Dagstuhl Website
Looking for information on the websites of the individual seminars? - Then please:
Not found what you are looking for? - Some of our services have separate websites, each with its own search option. Please check the following list:
Schloss Dagstuhl - LZI - Logo
Schloss Dagstuhl Services
Seminars
Within this website:
External resources:
  • DOOR (for registering your stay at Dagstuhl)
  • DOSA (for proposing future Dagstuhl Seminars or Dagstuhl Perspectives Workshops)
Publishing
Within this website:
External resources:
dblp
Within this website:
External resources:
  • the dblp Computer Science Bibliography


Dagstuhl Seminar 24242

Computational Analysis and Simulation of the Human Voice

( Jun 09 – Jun 14, 2024 )

Permalink
Please use the following short url to reference this page: https://www.dagstuhl.de/24242

Organizers

Contact

Dagstuhl Seminar Wiki

Shared Documents

Schedule
  • Upload (Use personal credentials as created in DOOR to log in)

Motivation

The human voice is able to produce a very rich set of different sounds, making it the single most important channel for communication human-to-human, and also potentially for human-computer interaction. Spoken communication can be thought of as a stack of layered transport protocols that includes language, speech, voice, and sound. In this Dagstuhl seminar, we will be concerned with the voice and its function as a transducer from neurally encoded speech patterns to sound. This very complex mechanism remains insufficiently explained both in terms of analysing voice sounds, as for example in medical assessment of vocal function, and of simulating them from first principles, as in talking or singing machines. There will be four main themes to the seminar:

Voice Analysis: Measures derived from voice recordings are clinically attractive, being non-invasive and relatively inexpensive. For clinical voice assessment, however, quantitative objective measures of vocal status have been researched for some seven decades, yet perceptual assessment by listening is still the dominating method. Isolating the properties of a voice (the machine) from those of its owner’s speech or singing (the process) is far from trivial. We will explore how computational approaches might facilitate a functional decomposition that can advance beyond conventional cut-off values of metrics and indices.

Voice Visualization: Trained listeners can deduce some of what is going on in the larynx and the vocal tract, but we cannot easily see it or document it. The multidimensionality of the voice poses interesting challenges to the making of effective visualizations. Most current visualizations are textbook transforms of the acoustic signal, but they are not as clinically or pedagogically relevant as they might be. Can functionally or perceptually informed visualizations improve on this situation?

Voice Simulation: balancing low- and high-order models. A “complete” physics-based computational model of the voice organ would have to account for bidirectional energy exchange between fluids and moving structures at high temporal and spatial resolutions, in 3D. Computational brute force is still not an option to represents voice production in all its complexity, and a proper balance between high and low order approaches has to be found. We will discuss strategies for choosing effective partitionings or hybrids of the simulation tasks that could be suitable for specific sub-problems.

Data science and voice research: With today’s machine learning and deep neural network methods, end-to-end systems for both text-to-speech and speech recognition have become remarkably successful, but they remain quite ignorant of the basics of vocal function. Yet machine learning and big data science approaches should be very useful for helping us deal with and account for the variability in voices. Rather than seeking for automated discrimination between normal and pathological voice, clinicians wish for objective assessments of the progress of an intervention, while researchers wish for ways to distil succinct models of voice production from multi-modal big-data observations. We will explore how techniques such as domain-specific feature selection and auto-encoding can make progress toward these goals.

We expect that this seminar will result in (1) leading researchers in the vocological community becoming up-to-date on recent computational advances, (2) seasoned computer scientists and data analysts becoming engaged in voice-related challenges, (3) a critical review of the potentials and limitations of deep learning and computational mechanics techniques, as applied to analysis and simulation of the voice, and (4) a week of creative brainstorming, leading to a roadmap for pursuing outstanding issues in computational voice research.

Copyright Peter Birkholz, Oriol Guasch Fortuny, Nathalie Henrich Bernardoni, and Sten Ternström

Classification
  • Machine Learning
  • Sound

Keywords
  • voice analysis
  • voice simulation
  • health care
  • visualization