P7-03: Music Representation Learning Based on Editorial Metadata from Discogs
Alonso-Jiménez, Pablo*, Serra, Xavier, Bogdanov, Dmitry
Subjects (starting with primary): MIR fundamentals and methodology -> metadata, tags, linked data, and semantic web ; MIR tasks -> similarity metrics ; Musical features and properties -> musical style and genre ; MIR tasks -> automatic classification ; Musical features and properties -> musical affect, emotion and mood ; Musical features and properties -> representations of music
Presented In-person, in Bengaluru: 4-minute short-format presentation
This paper revisits the idea of music representation learning supervised by editorial metadata, contributing to the state of the art in two ways. First, we exploit the public editorial metadata available on Discogs, an extensive community-maintained music database containing information about artists, releases, and record labels. Second, we use a contrastive learning setup based on COLA, different from previous systems based on triplet loss. We train models targeting several associations derived from the metadata and experiment with stacked combinations of learned representations, evaluating them on standard music classification tasks. Additionally, we consider learning all the associations jointly in a multi-task setup. We show that it is possible to improve the performance of current self-supervised models by using inexpensive metadata commonly available in music collections, producing representations comparable to those learned on classification setups. We find that the resulting representations based on editorial metadata outperform a system trained with music style tags available in the same large-scale dataset, which motivates further research using this type of supervision. Additionally, we give insights on how to preprocess Discogs metadata to build training objectives and provide public pre-trained models.