AudioLoader: a hassle-free Pytorch audio dataset loader

Kin Wai Cheuk, Kwan Yee Heung, Dorien Herremans

LV-16: AudioLoader: a hassle-free Pytorch audio dataset loader

Kin Wai Cheuk, Kwan Yee Heung, Dorien Herremans

Abstract: AudioLoader is a PyTorch package which helps users to auto-download, unzip and prepossess (audio re-sampling, segmenting, data splitting) common audio datasets that are still not available in the official torchaudio dataset collection yet. AudioLoader supports a wide rage of datasets for different applications such as speech recognition (Multilingual LibriSpeech (MLS), TIMIT, SpeechCommands v2 with 12 classes), automatic music transcription (MAPS, MusicNet, MAESTRO), and music source separation (MusdbHQ). Slakh2100 will also be included in our future release. AudioLoader is designed to be hassle-free. Once called, it returns a pytorch dataset class, and it can be combined with \verb | torch.utils.data.DataLoader | as usual. This design allows users and researchers to spend less time on dataset preparation so that they can focus more on the research and model development part. In this paper, we will demonstrate the usage of AudioLoader by using various datasets such as MLS, MAESTEO, and MusdbHQ as the examples. AudioLoader is an on-going open source project available on GitHub, more datasets will be supported in the future and contributions from the community is highly welcomed.