P5-12: EnsembleSet: a new high quality synthesised dataset for chamber ensemble separation

Sarkar, Saurjya*, Benetos, Emmanouil, Sandler, Mark

Subjects (starting with primary): Domain knowledge -> machine learning/artificial intelligence for music ; Evaluation, datasets, and reproducibility -> novel datasets and use cases ; MIR tasks -> sound source separation

Presented In-person, in Bengaluru: 4-minute short-format presentation


Music source separation research has made great advances in recent years, especially towards the problem of separating vocals, drums, and bass stems from mastered songs. The advances in this field can be directly attributed to the availability of large-scale multitrack research datasets for these mentioned stems. Tasks such as separating similar-sounding sources from an ensemble recording have seen limited research due to the lack of sizeable, bleed-free multitrack datasets. In this paper, we introduce a novel multitrack dataset called EnsembleSet generated using the Spitfire BBC Symphony Orchestra library using ensemble scores from RWC Classical Music Database and Mutopia. Our data generation method introduces automated articulation mapping for different playing styles based on the input MIDI/MusicXML data. The sample library also enables us to render the dataset with 20 different mix/microphone configurations allowing us to study various recording scenarios for each performance. The dataset presents 80 tracks (6+ hours) with a range of string, wind, and brass instruments arranged as chamber ensembles. We also present our benchmark on our synthesised dataset using a permutation-invariant time-domain separation model for chamber ensembles which produces generalisable results when tested on real recordings from existing datasets.

Direct link to video