CLIN 29 in Groningen

A Neural Network Approach to Automatic Songwriting
Tim Van de Cruys

Combined song text generation and melody composition is a challenging task for a computational system. First of all, the separate modalities need to comply with a number of aspects in order to be meaningful: the song text needs to be syntactically well-formed and topically coherent, while the melody needs to adhere to certain musicological constraints. Secondly, song text and melody need to be properly aligned to one another, such that the rhythms of both modalities match. And finally, the tonality of the melody needs to be consistent with the mood or sentiment expressed in the song text. We present a recurrent neural encoder-decoder network for the joint generation of song text and melody. The encoder network consists of a melody component - trained on a corpus of existing songs - and a text component, trained on a large corpus of general texts. Their joint representation is transmitted to the decoder network, which is trained to generate both song text and melody according to rhythmic constraints and sentiment conditions. Previous work has either focused on the tasks of text and music generation separately, or considered the task of generating melody for a given text. To our knowledge, this is the first work that considers their joint generation.