February 20 – 25 , 2022, Dagstuhl Seminar 22082

Deep Learning and Knowledge Integration for Music Audio Analysis


Rachel Bittner (Spotify – Paris, FR)
Meinard Müller (Universität Erlangen-Nürnberg, DE)
Juhan Nam (KAIST – Daejeon, KR)

For support, please contact

Simone Schilke for administrative matters

Michael Gerke for scientific matters


List of Participants
Shared Documents
Dagstuhl Seminar Wiki
Dagstuhl Seminar Schedule (Upload here)

(Use personal credentials as created in DOOR to log in)


Given the increasing amount of digital music, the development of computational tools that allow users to find, organize, analyze, and interact with music has become central to the research field known as Music Information Retrieval (MIR). As in general multimedia processing, many of the recent advances in MIR have been driven by techniques based on deep learning (DL). For example, DL-based techniques have led to significant improvements for numerous MIR tasks, including music source separation, music transcription, chord recognition, melody estimation, beat tracking, tempo estimation, and lyric alignment. In particular, significant improvements could be achieved for specific music scenarios where sufficient training data is available. A particular strength of DL-based approaches is their ability to extract complex features directly from raw audio data, which can then be used to make predictions based on hidden structures and relations.

However, DL-based approaches also come at a cost, being a data-hungry and computing-intensive technology. The design of suitable network architectures can be cumbersome, and the behavior of DL-based systems is often hard to understand. These general properties of DL-based approaches can also be observed when processing music, which spans an enormous range of forms and styles, not to speak of the many ways music may be generated and represented. While in music analysis and classification problems, one aims at capturing musically relevant aspects related to melody, harmony, rhythm, or instrumentation, data-driven approaches often capture confounding factors that may not directly relate to the target concept. One main advantage of classical model-based engineering approaches is that they result in explainable and explicit models that can be adjusted intuitively. On the downside, such hand-engineered approaches require profound signal processing skills and domain knowledge and may result in highly specialized solutions that cannot be directly transferred to other problems.

In this Dagstuhl Seminar, we will critically review the potential and limitations of recent deep learning techniques using music as a challenging application domain. As one main objective of the seminar, we want to systematically explore how musical knowledge can be integrated into neural network architectures to obtain explainable models that are less vulnerable to data biases and confounding factors. Furthermore, besides explainability and generalization aspects, we will also discuss robustness and efficiency issues in the learning as well as inference stage. To give the seminar cohesion, our main focus will be on music analysis tasks applied to audio representations (rather than symbolic music representations). However, related research problems in neighboring fields such as music generation and audio synthesis may also play a role.

More specific questions and issues that will be addressed in this seminar include, but are not limited to the following list:

  • Data mining, collection, and annotation
  • Data accessibility and copyright issues
  • Preprocessing of music data for deep learning
  • Musically informed data augmentation
  • Multitask learning
  • Transfer learning
  • Explainable deep learning models
  • Differentiable digital signal processing
  • Hierarchical models for short-term/long-term dependencies
  • Efficiency and robustness issues
  • Musical conditioning of deep learning models
  • Musically informed input representations
  • Structured output spaces
  • Integrating knowledge from music-perception and neuroscience research in deep learning systems
  • Human-in-the-loop systems for music processing

Motivation text license
  Creative Commons BY 4.0
  Rachel Bittner, Meinard Müller, and Juhan Nam

Dagstuhl Seminar Series


  • Information Retrieval
  • Machine Learning
  • Sound


  • Music information retrieval
  • Audio signal processing
  • Deep learning
  • Knowledge representation
  • User interaction and interfaces


In the series Dagstuhl Reports each Dagstuhl Seminar and Dagstuhl Perspectives Workshop is documented. The seminar organizers, in cooperation with the collector, prepare a report that includes contributions from the participants' talks together with a summary of the seminar.


Download overview leaflet (PDF).


Furthermore, a comprehensive peer-reviewed collection of research papers can be published in the series Dagstuhl Follow-Ups.

Dagstuhl's Impact

Please inform us when a publication was published as a result from your seminar. These publications are listed in the category Dagstuhl's Impact and are presented on a special shelf on the ground floor of the library.