Berton, Thomas
[UCL]
Delvenne, Jean-Charles
[UCL]
Saerens, Marco
[UCL]
This thesis addressed the importance of the vocals, the instrumentals and the lyrics in Music Emotion Classification. More precisely, it investigated which of the three elements is better at predicting the emotion of a song. The combination of them has been analysed as well. Furthermore, we compared two machine learning, i.e. Random Forest and Logistic Regression, in order to determine which one is better in the context of this work. To our best knowledge, this work is unprecedented. The classification is based on state-of-the-art features for both audio and lyrics. The instrumentals have been represented by a mix of high and low-level features, whereas the database of vocals only contains spectrum-based features. Regarding lyrics, various preprocessing approaches have been tested as well as tf-idf scores. The conceptualization of the emotions has been based on Hevner’s adjectives checklist, taking into account four different emotions. They are the calmness, the sadness, the happiness and the angriness. Regarding the database, a ground truth dataset containing around 300 songs has been created. This number can motivated by the fact that databases in this research area rarely exceed 1000 songs. For each song are associated the vocals, the instrumental and the lyrics. The separation between the voice and the instrumental has been achieved thanks to an audio software called RX7. The use of the latter is unprecedented in Music Emotion Recognition. Since the databases are a major problem in MER and a lot of time has been dedicated to building a ground truth dataset, we freely released the databases on which the experiments have been based. The reader will find them at the following url : https://github.com/thomasberton/musicemotionrecognition. Experiments showed that the best accuracy rate for emotion classification is provided by instrumentals (75%) followed by a combination of the three elements (67%). Then, the vocals provided an accuracy of 63%, whereas the lyrics performed 49%. Regarding the models, the random forest approach has outperformed the logistic regression for the audio analysis, i.e. the vocals and instrumentals, and the multimodal analysis. However the logistic regression model has provided a better accuracy for the lyrical analysis. Additionally, this model has showed better performances regarding the prediction of happy songs when it only considered vocals. The models were foremost good at predicting songs that had an opposite arousal, i.e. calm songs versus angry songs. This can be explained by the fact we mostly used global features, which are known to induce this kind of effects


Bibliographic reference |
Berton, Thomas. Importance of instrumentals, vocals and lyrics in music emotion classification : based on random forest and logistic regression. Ecole polytechnique de Louvain, Université catholique de Louvain, 2019. Prom. : Delvenne, Jean-Charles ; Saerens, Marco. |
Permanent URL |
http://hdl.handle.net/2078.1/thesis:21981 |