January 27 – February 1 , 2019, Dagstuhl Seminar 19052

Computational Methods for Melody and Voice Processing in Music Recordings


Emilia Gómez (UPF – Barcelona, ES)
Meinard Müller (Universität Erlangen-Nürnberg, DE)
Yi-Hsuan Yang (Academica Sinica – Taipei, TW)

For support, please contact

Dagstuhl Service Team


Dagstuhl Report, Volume 9, Issue 1 Dagstuhl Report
List of Participants
Shared Documents
Dagstuhl Seminar Wiki

(Use seminar number and access code to log in)

Press Room


In our daily lives, we are constantly surrounded by music, and we are deeply influenced by music. Making music together can create strong ties between people, while fostering communication and creativity. This is demonstrated, for example, by the large community of singers active in choirs or by the fact that music constitutes an important part of our cultural heritage. The availability of music in digital formats and its distribution over the world wide web has changed the way we consume, create, enjoy, explore, and interact with music. To cope with the increasing amount of digital music, one requires computational methods and tools that allow users to find, organize, analyze, and interact with music – topics that are central to the research field known as Music Information Retrieval (MIR).

This Dagstuhl Seminar is devoted to a branch of MIR that is of particular importance: processing melodic voices using computational methods. It is often the melody, a specific succession of musical tones, which constitutes the leading element in a piece of music. In this seminar we want to discuss how to detect, extract, and analyze melodic voices as they occur in recorded performances of a piece of music. Even though it may be easy for a human to recognize and hum a main melody, the automated extraction and reconstruction of such information from audio signals is extremely challenging due to superposition of different sound sources as well as the intricacy of musical instruments and, in particular, the human voice. As one main objective of the seminar, we want to critically review the state of the art of computational approaches to various MIR tasks related to melody processing including pitch estimation, source separation, instrument recognition, singing voice analysis and synthesis, and performance analysis (timbre, intonation, expression). Second, we aim at triggering interdisciplinary discussions that leverage insights from fields such as audio processing, machine learning, music perception, music theory, and information retrieval. Third, we shall explore novel applications in music and multimedia retrieval, content creation, musicology, education, and human-computer interaction.

By gathering internationally renowned key players from different research areas, our goal is to highlight and better understand the problems that arise while dealing with a highly interdisciplinary topic such as melody extraction or voice separation. Special focus will be put on increasing the diversity of the MIR community in collaboration with the mentoring program developed by the Women in Music Information Retrieval (WiMIR) initiative. General questions and issues that may be addressed in this seminar include, but are not limited to the following list:

  • Model-based melody extraction and singing voice separation
  • Multipitch and predominant frequency estimation
  • Modeling of musical aspects such as vibrato, tremolo, and glissando
  • Informed singing voice separation (exploiting additional information such as score and lyrics)
  • Integrated models for voice separation, melody extraction, and singing analysis
  • Alignment methods for synchronizing lyrics and recorded songs
  • Assessment of singing style, expression, skills, and enthusiasm
  • Content-based retrieval (e.g. query-by-humming, theme identification)
  • Data mining techniques for identifying singing-related resources on the world wide web
  • Collecting and annotating musical sounds via crowdsourcing
  • User interfaces for singing information analysis and visualization
  • Singing for gaming and medical purposes
  • Understanding expressiveness in singing
  • Extraction of emotion-related parameters from melodic voices
  • Extraction of vocal quality descriptors
  • Singing voice synthesis and transformation
  • Cognitive and sensorimotor factors in singing
  • Deep learning approaches for melody extraction and voice processing
  • Deep autoencoder models for feature design and classification
  • Hierarchical models for short-term/long-term dependencies

  Creative Commons BY 3.0 DE
  Emilia Gómez, Meinard Müller, and Yi-Hsuan Yang

Dagstuhl Seminar Series


  • Data Bases / Information Retrieval
  • Multimedia
  • Society / Human-computer Interaction


  • Music information retrieval
  • Music processing
  • Singing voice processing
  • Audio signal processing
  • Machine learning


In the series Dagstuhl Reports each Dagstuhl Seminar and Dagstuhl Perspectives Workshop is documented. The seminar organizers, in cooperation with the collector, prepare a report that includes contributions from the participants' talks together with a summary of the seminar.


Download overview leaflet (PDF).


Furthermore, a comprehensive peer-reviewed collection of research papers can be published in the series Dagstuhl Follow-Ups.

Dagstuhl's Impact

Please inform us when a publication was published as a result from your seminar. These publications are listed in the category Dagstuhl's Impact and are presented on a special shelf on the ground floor of the library.

NSF young researcher support