Pop Music Generation with Controllable Phrase Lengths

Naruse, Daiki*; Takahata, Tomoyuki; Mukuta, Yusuke; Harada, Tatsuya

P1-14: Pop Music Generation with Controllable Phrase Lengths

Naruse, Daiki*, Takahata, Tomoyuki, Mukuta, Yusuke, Harada, Tatsuya

Subjects (starting with primary): MIR tasks -> music generation ; Domain knowledge -> representations of music ; Musical features and properties -> representations of music ; Musical features and properties -> structure, segmentation, and form ; Domain knowledge -> machine learning/artificial intelligence for music

Presented Virtually: 4-minute short-format presentation

Abstract:

Research on music generation using deep learning has attracted more attention; in particular, Transformer-based models have succeeded in generating coherent musical pieces. Recently, an increasing number of studies have focused on phrases that are smaller musical units, and several studies have addressed phrase-level control. In this study, we propose a method for sequentially generating a piece that enables the control of each phrase length and, consequently, the length of the entire piece. We added PHRASE and a new event, BAR COUNTDOWN, which indicates the number of bars remaining in the phrase, to the existing event-based music representations. To reflect user input indicating the phrase lengths of the piece being generated, we used an autoregressive generation model that adds these two events to the generated event-token sequence based on the user input and uses it as input for the next time step. Subjective listening tests revealed that the pieces generated by our methods possessed designated phrase lengths and ended naturally at the determined length.

Direct link to video