CLIN 29 in Groningen

Demo: Automatically Generating Papers of the Speech Group
Lyan Verwimp and Patrick Wambacq

The Speech Group at KU Leuven has been founded more than 30 years ago. During those years, many papers, master theses and PhD theses have been written. As a fun experiment to celebrate the 30 years anniversary, we collected those data, performed some minimal normalization, and trained an LSTM language model on them. The training data consists of 2.8M words and a vocabulary of 100k words. We can now generate new papers from our 'Speech Group LM', by feeding it a seed word or sentence and either picking the most probable word at every time step, or by sampling from the output distribution. The generated papers typically contain many popular Speech Group topics and well-known names, which demonstrates the generalization capabilities of LSTMs. We can generate papers in the command line or through our webservice, listed in https://www.spraak.org/webservice/