LV-43: SCHMUBERT: A SYMBOLIC CREATIVE HARMONIC MUSIC UNMASKING BIDIRECTIONAL ENCODER REPRESENTATION TRANSFORMER
Matthias Plasser, Silvan Peter
Abstract:
Denoising Diffusion Probabilistic Models (DDPMs) have
shown great success generating high quality samples in
both discrete and continuous domains. How-
ever, Discrete Denoising Diffusion Probabilistic Models
(D3PMs) have not yet been shown to be directly appli-
cable to the domain of Symbolic Music. In this work
we present the direct generation of Polyphonic Symbolic
Music using D3PMs. Our model does not only exhibit
state of the art sample quality, but also allows for var-
ious conditioning methods at sample time without ex-
tra training. As the model is trained to reconstruct ran-
domly masked out tokens, conditioning on an existing
piece of symbolic music is possible. Such condition-
ing scenarios include, but are not limited to, accom-
paniment (one track is provided, accompaniment tracks
are masked out) and infilling/completion (one or multi-
ple tracks with temporal gaps are provided). We provide
our implementation, trained model weights and some se-
lected samples at https://github.com/plassma/symbolic-music-discrete-diffusion.