20. – 25. Februar 2022, Dagstuhl-Seminar 22082

Deep Learning and Knowledge Integration for Music Audio Analysis


Rachel Bittner (Spotify – Paris, FR)
Meinard Müller (Universität Erlangen-Nürnberg, DE)
Juhan Nam (KAIST – Daejeon, KR)

Auskunft zu diesem Dagstuhl-Seminar erteilen

Simone Schilke zu administrativen Fragen

Michael Gerke zu wissenschaftlichen Fragen


Gemeinsame Dokumente
Programm des Dagstuhl-Seminars [pdf]

Press Room


Given the increasing amount of digital music, the development of computational tools that allow users to find, organize, analyze, and interact with music has become central to the research field known as Music Information Retrieval (MIR). As in general multimedia processing, many of the recent advances in MIR have been driven by techniques based on deep learning (DL). For example, DL-based techniques have led to significant improvements for numerous MIR tasks, including music source separation, music transcription, chord recognition, melody estimation, beat tracking, tempo estimation, and lyric alignment. In particular, significant improvements could be achieved for specific music scenarios where sufficient training data is available. A particular strength of DL-based approaches is their ability to extract complex features directly from raw audio data, which can then be used to make predictions based on hidden structures and relations.

However, DL-based approaches also come at a cost, being a data-hungry and computing-intensive technology. The design of suitable network architectures can be cumbersome, and the behavior of DL-based systems is often hard to understand. These general properties of DL-based approaches can also be observed when processing music, which spans an enormous range of forms and styles, not to speak of the many ways music may be generated and represented. While in music analysis and classification problems, one aims at capturing musically relevant aspects related to melody, harmony, rhythm, or instrumentation, data-driven approaches often capture confounding factors that may not directly relate to the target concept. One main advantage of classical model-based engineering approaches is that they result in explainable and explicit models that can be adjusted intuitively. On the downside, such hand-engineered approaches require profound signal processing skills and domain knowledge and may result in highly specialized solutions that cannot be directly transferred to other problems.

In this Dagstuhl Seminar, we will critically review the potential and limitations of recent deep learning techniques using music as a challenging application domain. As one main objective of the seminar, we want to systematically explore how musical knowledge can be integrated into neural network architectures to obtain explainable models that are less vulnerable to data biases and confounding factors. Furthermore, besides explainability and generalization aspects, we will also discuss robustness and efficiency issues in the learning as well as inference stage. To give the seminar cohesion, our main focus will be on music analysis tasks applied to audio representations (rather than symbolic music representations). However, related research problems in neighboring fields such as music generation and audio synthesis may also play a role.

More specific questions and issues that will be addressed in this seminar include, but are not limited to the following list:

  • Data mining, collection, and annotation
  • Data accessibility and copyright issues
  • Preprocessing of music data for deep learning
  • Musically informed data augmentation
  • Multitask learning
  • Transfer learning
  • Explainable deep learning models
  • Differentiable digital signal processing
  • Hierarchical models for short-term/long-term dependencies
  • Efficiency and robustness issues
  • Musical conditioning of deep learning models
  • Musically informed input representations
  • Structured output spaces
  • Integrating knowledge from music-perception and neuroscience research in deep learning systems
  • Human-in-the-loop systems for music processing

Motivation text license
  Creative Commons BY 4.0
  Rachel Bittner, Meinard Müller, and Juhan Nam

Dagstuhl-Seminar Series


  • Information Retrieval
  • Machine Learning
  • Sound


  • Music information retrieval
  • Audio signal processing
  • Deep learning
  • Knowledge representation
  • User interaction and interfaces


In der Reihe Dagstuhl Reports werden alle Dagstuhl-Seminare und Dagstuhl-Perspektiven-Workshops dokumentiert. Die Organisatoren stellen zusammen mit dem Collector des Seminars einen Bericht zusammen, der die Beiträge der Autoren zusammenfasst und um eine Zusammenfassung ergänzt.


Download Übersichtsflyer (PDF).

Dagstuhl's Impact

Bitte informieren Sie uns, wenn eine Veröffentlichung ausgehend von Ihrem Seminar entsteht. Derartige Veröffentlichungen werden von uns in der Rubrik Dagstuhl's Impact separat aufgelistet  und im Erdgeschoss der Bibliothek präsentiert.


Es besteht weiterhin die Möglichkeit, eine umfassende Kollektion begutachteter Arbeiten in der Reihe Dagstuhl Follow-Ups zu publizieren.