Abstract:

Melody tracks are worthy of special attention in the field of symbolic music information retrieval (MIR) because they contribute more towards music perception than many other musical components. However, many existing symbolic MIR systems neglect melody track identification (MTI) and are thus less effective. Existing MTI methods are also not robust and perform poorly on MIDI files representing music of unusual genres, arrangements, or formats. To address this problem, we propose a CNN-Transformer-based MTI model designed to robustly identify a single melody track for a given MIDI file. As this process can take a sizable amount of time for long songs, we also use a sparse Transformer to speed up attention computation. Our experiments show that our proposed model outperforms state-of-the-art (SOTA) algorithms in accuracy and can also benefit downstream MIR tasks.

Direct link to video