Multi-pitch Estimation meets Microphone Mismatch: Applicability of Domain Adaptation

Bittner, Franca*; Gonzalez, Marcel; Richter, Maike L; Lukashevich, Hanna; Abeßer, Jakob

P4-11: Multi-pitch Estimation meets Microphone Mismatch: Applicability of Domain Adaptation

Bittner, Franca*, Gonzalez, Marcel, Richter, Maike L, Lukashevich, Hanna, Abeßer, Jakob

Subjects (starting with primary): MIR fundamentals and methodology -> music signal processing ; Evaluation, datasets, and reproducibility -> MIR tasks ; Domain knowledge -> machine learning/artificial intelligence for music ; Evaluation, datasets, and reproducibility -> novel datasets and use cases ; Applications -> music training and education ; MIR tasks -> music transcription and annotation

Presented In-person, in Bengaluru: 4-minute short-format presentation

Abstract:

The performance of machine learning (ML) models is known to be affected by discrepancies between training (source) and real-world (target) data distributions. This problem is referred to as domain shift and is commonly approached using domain adaptation (DA) methods. As one relevant scenario, automatic piano transcription algorithms in music learning applications potentially suffer from domain shift since pianos are recorded in different acoustic conditions using various devices. Yet, most currently available datasets for piano transcription only cover ideal recording situations with high-quality microphones. Consequently, a transcription model trained on these datasets will face a mismatch between source and target data in real-world scenarios. To address this issue, we employ a recently proposed dataset which includes annotated piano recordings covering typical real-life recording settings for a piano learning application on mobile devices. We first quantify the influence of the domain shift on the performance of a deep learning-based piano multi-pitch estimation (MPE) algorithm. Then, we employ and evaluate four unsupervised DA methods to reduce domain shift. Our results show that the studied MPE model is surprisingly robust to domain shift in microphone mismatch scenarios and the DA methods do not notably improve the transcription performance.

Direct link to video