Dagstuhl Seminar 26272
Open Music Data for Music Processing Research
( Jun 28 – Jul 03, 2026 )
Permalink
Organizers
- Magdalena Fuentes (NYU - Brooklyn, US)
- Dasaem Jeong (Sogang University - Seoul, KR)
- Meinard Müller (Universität Erlangen-Nürnberg, DE)
Contact
- Michael Gerke (for scientific matters)
- Christina Schwarz (for administrative matters)
Over the past decades, Music Information Retrieval (MIR) has developed into a multidisciplinary field connecting signal processing, machine learning, musicology, and the digital humanities. MIR engages with melody, harmony, rhythm, timbre, and cultural diversity across audio recordings, symbolic scores, lyrics, videos, and metadata. At its core, MIR is data-driven: progress depends on reliable, diverse, and representative datasets. Yet despite advances in artificial intelligence and deep learning, open and sustainable music data resources remain scarce, fragmented, and difficult to share.
This Dagstuhl Seminar addresses one of the central challenges in the field: how to build a more open, reliable, and inclusive ecosystem for music data. While computer vision and natural language processing benefit from large-scale benchmark datasets, MIR still faces persistent barriers. Existing datasets are often narrow in scope, focusing on Western or popular music while neglecting other traditions. Copyright restrictions, unstable hosting platforms, and inconsistent annotations further hinder accessibility, reproducibility, and sustainability. These issues not only slow progress but also reinforce inequalities, as groups with privileged data access gain advantages while newcomers and underrepresented communities are left behind.
The seminar aims to bring together researchers, developers, educators, and practitioners from MIR, machine learning, and the computational humanities. Key topics include:- Complexity and Representation: Capturing the richness of music and aligning multimodal data such as audio, symbolic, and textual sources.
- Annotation and Bias: Developing reliable annotation practices, addressing subjectivity, and mitigating cultural and stylistic bias.
- Legal and Ethical Barriers: Navigating copyright and licensing while considering the roles of public domain, Creative Commons, and synthetic music data.
- Reproducibility and Sustainability: Building infrastructures, standards, and documentation practices for long-term usability.
- Community and Collaboration: Creating shared frameworks, open-source tools, and recognition mechanisms such as citation standards, dataset papers, and community awards that properly value dataset curation and foster inclusivity.
The seminar will emphasize discussion and collaboration over formal presentations. Plenary sessions, breakout groups, and hands-on demos will provide space to exchange perspectives, present tools and datasets, and explore solutions. Creative and social activities, including informal music-making, will strengthen community bonds and highlight the cultural dimensions of music research.
The seminar aims to define practical steps toward more transparent, sustainable, and inclusive music data. By connecting expertise across disciplines, we hope to lay the groundwork for lasting resources that strengthen research and creativity in MIR and beyond.

Classification
- Databases
- Machine Learning
- Sound
Keywords
- music information retrieval
- audio signal processing
- deep learning
- open source
- user interaction and interfaces