P2-12: Latent feature augmentation for chorus detection

Du, Xingjian*, Liang, Huidong, Wan, Yuan, Lin, Yuheng, Chen, Ke, Zhu, Bilei, Ma, Zejun

Subjects (starting with primary): ; MIR tasks -> music transcription and annotation

Presented Virtually: 4-minute short-format presentation

Abstract:

In this paper, we introduce LA-Chorus, a chorus detection model based on latent feature augmentation and ResNet FPN architecture. Our contributions in LA-Chorus are three-fold. Firstly, we propose a method for implicitly augmenting chorus data in the latent space during the train7 ing stage. Compared to augmentations on audio surfaces such as time stretching and pitch shifting, latent augmentations indicate changes at a higher level in original audio, thereby increasing the diversity and sufficiency in training. Second, we apply Feature Pyramid Network (FPN) to generate additional embeddings from low dimension to high dimension, consequently achieving a multi-scale training paradigm. Lastly, we release Di-Chorus, a new open-source dataset of diverse genres and languages for the community of music structure analysis. In conjunction with other public datasets, we conduct comprehensive ex18 periments to evaluate the performance of LA-Chorus compared to other state-of-the-art models, which demonstrate the out-performance of LA-Chorus and the effectiveness of proposed latent feature augmentation.

Direct link to video