P4-05: Learning Unsupervised Hierarchies of Audio Concepts
Afchar, Darius*, Hennequin, Romain, Guigue, Vincent
Subjects (starting with primary): MIR fundamentals and methodology -> music signal processing ; Domain knowledge -> representations of music ; Musical features and properties -> musical style and genre ; MIR tasks -> automatic classification ; Domain knowledge -> machine learning/artificial intelligence for music ; Musical features and properties -> musical affect, emotion and mood
Presented In-person, in Bengaluru: 4-minute short-format presentation
Music signals are difficult to interpret from their low-level features, perhaps even more than images: e.g. highlighting part of a spectrogram or an image is often insufficient to convey high-level ideas that are genuinely relevant to humans. In computer vision, concept learning was therein proposed to adjust explanations to the right abstraction level (e.g. detect clinical concepts from radiographs). These methods have yet to be used for MIR.
In this paper, we adapt concept learning to the realm of music, with its particularities. For instance, music concepts are typically non-independent and of mixed nature (e.g. genre, instruments, mood), unlike previous work that assumed disentangled concepts.
We propose a method to learn numerous music concepts from audio and then automatically hierarchise them to expose their mutual relationships. We conduct experiments on datasets of playlists from a music streaming service, serving as a few annotated examples for diverse concepts. Evaluations show that the mined hierarchies are aligned with both ground-truth hierarchies of concepts -- when available -- and with proxy sources of concept similarity in the general case.