https://www.dagstuhl.de/22082
February 20 – 25 , 2022, Dagstuhl Seminar 22082
Deep Learning and Knowledge Integration for Music Audio Analysis
Organizers
Rachel Bittner (Spotify – Paris, FR)
Meinard Müller (Universität Erlangen-Nürnberg, DE)
Juhan Nam (KAIST – Daejeon, KR)
For support, please contact
Documents
List of Participants
Shared Documents
Dagstuhl Seminar Schedule [pdf]
Press Room
- Week 71: The Dagstuhl Dawgs (folk-rnn v Swedish + Sturm) - Blog entry by Bob L. T. Sturm on Tunes from the Ai Frontiers.
Motivation
Given the increasing amount of digital music, the development of computational tools that allow users to find, organize, analyze, and interact with music has become central to the research field known as Music Information Retrieval (MIR). As in general multimedia processing, many of the recent advances in MIR have been driven by techniques based on deep learning (DL). For example, DL-based techniques have led to significant improvements for numerous MIR tasks, including music source separation, music transcription, chord recognition, melody estimation, beat tracking, tempo estimation, and lyric alignment. In particular, significant improvements could be achieved for specific music scenarios where sufficient training data is available. A particular strength of DL-based approaches is their ability to extract complex features directly from raw audio data, which can then be used to make predictions based on hidden structures and relations.
However, DL-based approaches also come at a cost, being a data-hungry and computing-intensive technology. The design of suitable network architectures can be cumbersome, and the behavior of DL-based systems is often hard to understand. These general properties of DL-based approaches can also be observed when processing music, which spans an enormous range of forms and styles, not to speak of the many ways music may be generated and represented. While in music analysis and classification problems, one aims at capturing musically relevant aspects related to melody, harmony, rhythm, or instrumentation, data-driven approaches often capture confounding factors that may not directly relate to the target concept. One main advantage of classical model-based engineering approaches is that they result in explainable and explicit models that can be adjusted intuitively. On the downside, such hand-engineered approaches require profound signal processing skills and domain knowledge and may result in highly specialized solutions that cannot be directly transferred to other problems.
In this Dagstuhl Seminar, we will critically review the potential and limitations of recent deep learning techniques using music as a challenging application domain. As one main objective of the seminar, we want to systematically explore how musical knowledge can be integrated into neural network architectures to obtain explainable models that are less vulnerable to data biases and confounding factors. Furthermore, besides explainability and generalization aspects, we will also discuss robustness and efficiency issues in the learning as well as inference stage. To give the seminar cohesion, our main focus will be on music analysis tasks applied to audio representations (rather than symbolic music representations). However, related research problems in neighboring fields such as music generation and audio synthesis may also play a role.
More specific questions and issues that will be addressed in this seminar include, but are not limited to the following list:
- Data mining, collection, and annotation
- Data accessibility and copyright issues
- Preprocessing of music data for deep learning
- Musically informed data augmentation
- Multitask learning
- Transfer learning
- Explainable deep learning models
- Differentiable digital signal processing
- Hierarchical models for short-term/long-term dependencies
- Efficiency and robustness issues
- Musical conditioning of deep learning models
- Musically informed input representations
- Structured output spaces
- Integrating knowledge from music-perception and neuroscience research in deep learning systems
- Human-in-the-loop systems for music processing
Motivation text license Creative Commons BY 4.0
Rachel Bittner, Meinard Müller, and Juhan Nam
Dagstuhl Seminar Series
- 19052: "Computational Methods for Melody and Voice Processing in Music Recordings" (2019)
- 16092: "Computational Music Structure Analysis" (2016)
- 11041: "Multimodal Music Processing" (2011)
Classification
- Information Retrieval
- Machine Learning
- Sound
Keywords
- Music information retrieval
- Audio signal processing
- Deep learning
- Knowledge representation
- User interaction and interfaces