LV-43: SCHMUBERT: A SYMBOLIC CREATIVE HARMONIC MUSIC UNMASKING BIDIRECTIONAL ENCODER REPRESENTATION TRANSFORMER

Matthias Plasser, Silvan Peter

Abstract: Denoising Diffusion Probabilistic Models (DDPMs) have shown great success generating high quality samples in both discrete and continuous domains. How- ever, Discrete Denoising Diffusion Probabilistic Models (D3PMs) have not yet been shown to be directly appli- cable to the domain of Symbolic Music. In this work we present the direct generation of Polyphonic Symbolic Music using D3PMs. Our model does not only exhibit state of the art sample quality, but also allows for var- ious conditioning methods at sample time without ex- tra training. As the model is trained to reconstruct ran- domly masked out tokens, conditioning on an existing piece of symbolic music is possible. Such condition- ing scenarios include, but are not limited to, accom- paniment (one track is provided, accompaniment tracks are masked out) and infilling/completion (one or multi- ple tracks with temporal gaps are provided). We provide our implementation, trained model weights and some se- lected samples at https://github.com/plassma/symbolic-music-discrete-diffusion.