P1-16: Modeling the rhythm from lyrics for melody generation of pop songs

zhang, daiyu*, Wang, Ju-Chiang, Kosta, Katerina, Smith, Jordan B. L., Zhou, Shicen

Subjects (starting with primary): MIR fundamentals and methodology -> multimodality ; MIR tasks -> music generation

Presented In-person, in Bengaluru: 4-minute short-format presentation

Abstract:

Creating a pop song melody according to pre-written lyrics is a typical practice for composers. A computational model of how lyrics are set as melodies is important for automatic composition systems, but an end-to-end lyric-to-melody model would require enormous amounts of paired training data. To mitigate the data constraints, we adopt a two-stage approach, dividing the task into lyric-to-rhythm and rhythm-to-melody modules. However, the lyric-to-rhythm task is still challenging due to its multimodality. In this paper, we propose a novel lyric-to-rhythm framework that includes part-of-speech tags to achieve better text-setting, and a Transformer architecture designed to model long-term syllable-to-note associations. For the rhythm-to-melody task, we adapt a proven chord-conditioned melody Transformer, which has achieved state-of-the-art results. Experiments for Chinese lyric-to-melody generation show that the proposed framework is able to model key characteristics of rhythm and pitch distributions in the dataset, and in a subjective evaluation, the melodies generated by our system were rated as similar to or better than those of a state-of-the-art alternative.

Direct link to video