February 28 – March 4 , 2016, Dagstuhl Seminar 16092
Computational Music Structure Analysis
1 / 3 >
For support, please contact
In this executive summary, we start with a short introduction to computational music structure analysis and then summarize the main topics and questions raised in this seminar. Furthermore, we briefly describe the background of the seminar's participants, the various activities, and the overall organization. Finally, we reflect on the most important aspects of this seminar and conclude with future implications and acknowledgments.
One of the attributes distinguishing music from other types of multimedia data and general sound sources are the rich, intricate, and hierarchical structures inherently organizing notated and performed music. On the lowest level, one may have sound events such as individual notes, which are characterized by the way they sound, i.e., their timbre, pitch and duration. Such events form larger structures such as motives, phrases, and chords, and these elements again form larger constructs that determine the overall layout of the composition. This higher structural level is specified in terms of musical parts and their mutual relations. The general goal of music structure analysis is to segment or decompose music into patterns or units that possess some semantic relevance and then to group these units into musically meaningful categories.
While humans often have an intuitive understanding of musical patterns and their relations, it is generally hard to explicitly describe, quantify, and capture musical structures. Because of different organizing principles and the existence of temporal hierarchies, musical structures can be highly complex and ambiguous. First of all, a temporal segmentation of a musical work may be based on various properties such as homogeneity, repetition, and novelty. While the musical structure of one piece of music may be explained by repeating melodies, the structure in other pieces may be characterized by a certain instrumentation or tempo. Then, one has to account for different musical dimensions, such as melody, harmony, rhythm, or timbre. For example, in Beethoven's Fifth Symphony the "fate motive" is repeated in various ways -- sometimes the motive is shifted in pitch, sometimes only the rhythmic pattern is preserved. Furthermore, the segmentation and structure will depend on the musical context to be considered; in particular, the threshold of similarity may change depending on the timescale or hierarchical level of focus. For example, the recapitulation of a sonata may be considered a kind of repetition of the exposition on a coarse temporal level even though there may be significant modifications in melody and harmony. In addition, the complexity of the problem can depend on how the music is represented. For example, while it is often easy to detect certain structures such as repeating melodies in symbolic music data, it is often much harder to automatically identify such structures in audio representations. Finally, certain structures may emerge only in the aural communication of music. For example, grouping structures may be imposed by accent patterns introduced in performance. Hence, such structures are the result of a creative or cognitive process of the performer or listener rather then being an objective, measurable property of the underlying notes of the music.
Main Topics and Questions
In this seminar, we brought together experts from diverse fields including psychology, music theory, composition, computer science, music technology, and engineering. Through the resulting interdisciplinary discussions, we aimed to better understand the structures that emerge in composition, performance, and listening, and how these structures interrelate. For example, while there are certain structures inherent in the note content of music, the perception and communication of structure are themselves also creative acts subject to interpretation. There may be some structures intended by the composer or improviser, which are not fully communicated by symbolic descriptions such as musical score notation. The performer, if different from the composer, then must interpret structures from the score, and decide on the prosodic means by which to convey them. When a listener then tries to make sense of the performed piece, that act of sense-making, of constructing structure and meaning from an auditory stream is also a creative one. As a result, different people along this communication chain may come up with different solutions, depending on their experiences, their musical backgrounds, and their current thinking or mood.
Based on our discussions of various principles and aspects that are relevant for defining musical patterns and structures, the following questions were raised.
- In which way do these notions depend on the music style and tradition?
- How can one account for the relations within and across different hierarchical levels of structural patterns?
- How can long-term structures be built up from short-term patterns, and, vice versa, how can the knowledge of global structural information support the analysis of local events?
- How can information on rhythm, melody, harmony, timbre, or dynamics be fused within unifying structural models?
- How can the relevance of these aspects be measured?
- How do computational models need to be changed to account for human listeners?
By addressing such fundamental questions, we aimed for a better understanding of the principles and model assumptions on which current computational procedures are based, as well as the identification of the main challenges ahead.
Another important goal of this seminar was to discuss how computational structure analysis methods may open up novel ways for users to find and access music information in large, unstructured, and distributed multimedia collections. Computational music structure analysis is not just an end in itself; it forms the foundation for many music processing and retrieval applications. Computational methods for structuring and decomposing digitized artifacts into semantically meaningful units are of fundamental importance not only for music content but also for general multimedia content including speech, image, video, and geometric data. Decomposing a complex object into smaller units often constitutes the first step for simplifying subsequent processing and analysis tasks, for deriving compact object descriptions that can be efficiently stored and transmitted, and for opening up novel ways for users to access, search, navigate, and interact with the content. In the music context, many of the current commercially available services for music recommendation and playlist generation employ context-based methods, where textual information (e.g., tags, structured metadata, user access patterns) surrounding the music object are exploited. However, there are numerous data mining problems for which context-based analysis is insufficient, as it tends to be low on specifics and unevenly distributed across artists and styles. In such cases, one requires content-based methods, where the information is obtained directly from the analysis of audio signals, scores and other representations of the music. In this context, the following questions were raised.
- How can one represent partial and complex similarity relations within and across music documents?
- What are suitable interfaces that allow users to browse, interact, adapt, and understand musical structures?
- How can musical structures be visualized?
- How can structural information help improve the organizing and indexing of music collections?
Participants, Interaction, Activities
In our seminar, we had 31 participants, who came from various locations around the world including North America (8 participants from the U.S.), Asia (2 participants from Japan), and Europe (21 participants from Austria, France, Germany, Netherlands, Portugal, Spain, United Kingdom). Many of the participants came to Dagstuhl for the first time and expressed enthusiasm about the open and retreat-like atmosphere. Besides its international character, the seminar was also highly interdisciplinary. While most of the participating researchers are working in the fields of music information retrieval, we have had participants with a background in musicology, cognition, psychology, signal processing, and other fields. This led to the seminar having many cross-disciplinary intersections and provoking discussions as well as numerous social activities including playing music together. One particular highlight of such social activities was a concert on Thursday evening, where various participant-based ensembles performed a wide variety of music including popular music, jazz, and classical music. Some of the performed pieces were original compositions by the seminar's participants.
Overall Organization and Schedule
Dagstuhl seminars are known for having a high degree of flexibility and interactivity, which allows participants to discuss ideas and to raise questions rather than to present research results. Following this tradition, we fixed the schedule during the seminar asking for spontaneous contributions with future-oriented content, thus avoiding a conference-like atmosphere, where the focus tends to be on past research achievements. After the organizers have given an overview of the Dagstuhl concept and the seminar's overall topic, we started the first day with self-introductions, where all participants introduced themselves and expressed their expectations and wishes for the seminar. We then continued with a small number of ten-minute stimulus talks, where specific participants were asked to address some critical questions on music structure analysis in a nontechnical fashion. Each of these talks seamlessly moved towards an open discussion among all participants, where the respective presenters took over the role of a moderator. These discussions were well received and often lasted for more than half an hour. The first day closed with a brainstorming session on central topics covering the participants' interests while shaping the overall schedule and format of our seminar. During the next days, we split into small groups, each group discussing a more specific topic in greater depth. The results and conclusions of these parallel group sessions, which lasted between 60 to 90 minutes, were then presented to, and discussed with, the plenum. Furthermore, group discussions were interleaved with additional stimulus talks spontaneously given by participants. This mixture of presentation elements gave all participants the opportunity for presenting their ideas to the plenum while avoiding a monotonous conference-like presentation format. Finally, on the last day, the seminar concluded with a session we called "self-outroductions" where each participant presented his or her personal view of the main research challenges and the seminar.
Conclusions and Aknowledgement
Having the Dagstuhl seminar, our aim was to gather researchers from different fields including information retrieval, signal processing, musicology and psychology. This allowed us to approach the problem of music structure analysis by looking at a broad spectrum of data analysis techniques (including signal processing, machine learning, probabilistic models, user studies), by considering different domains (including text, symbolic, image, audio representations), and by drawing inspiration from creative perspectives of the agents (composer, performer, listener) involved. As a key result of this seminar, we achieved some significant progress towards understanding, modeling, representing, extracting, and exploiting musical structures. In particular, our seminar contributed to further closing the gap between music theory, cognition, and the computational sciences.
The Dagstuhl seminar gave us the opportunity for having interdisciplinary discussions in an inspiring and retreat-like atmosphere. The generation of novel, technically oriented scientific contributions was not the focus of the seminar. Naturally, many of the contributions and discussions were on a rather abstract level, laying the foundations for future projects and collaborations. Thus, the main impact of the seminar is likely to take place in the medium to long term. Some more immediate results, such as plans to share research data and software, also arose from the discussions. As measurable outputs from the seminar, we expect to see several joint papers and applications for funding.
Beside the scientific aspect, the social aspect of our seminar was just as important. We had an interdisciplinary, international, and very interactive group of researchers, consisting of leaders and future leaders in our field. Many of our participants were visiting Dagstuhl for the first time and enthusiastically praised the open and inspiring setting. The group dynamics were excellent with many personal exchanges and common activities. Some scientists expressed their appreciation for having the opportunity for prolonged discussions with researchers from neighboring research fields -- some thing that which is often impossible during conference-like events.
In conclusion, our expectations of the seminar were not only met but exceeded, in particular with respect to networking and community building. We would like to express our gratitude to the Dagstuhl board for giving us the opportunity to organize this seminar, the Dagstuhl office for their exceptional support in the organization process, and the entire Dagstuhl staff for their excellent service during the seminar. In particular, we want to thank Susanne Bach-Bernhard, Roswitha Bardohl, Marc Herbstritt, and Sascha Daeges for their assistance during the preparation and organizing of the seminar.
Creative Commons BY 3.0 Unported license
Juan Pablo Bello, Elaine Chew, and Meinard Müller
Related Dagstuhl Seminar
- 11041: "Multimodal Music Processing" (2011)
- Data Bases / Information Retrieval
- Society / Human-computer Interaction
- Music information retrieval
- Music processing
- Music perception and cognition
- Music composition and performance
- Knowledge representation
- User interaction and interfaces