Speaker: Manuel Otero

Abstract:

This master’s thesis addresses the analysis and recognition of emotions in speech, within the framework of the EmoSPeech 2024 challenge. Different approaches to the state of the art are presented, from traditional methods to current models such as Wav2Vec2 and Wav2Vec2-BERT. For the experiments, the Spanish MEACorpus 2023 database is used, and the results are evaluated with the F1-score metric. Experiments are carried out with the Wav2Vec2 models with fine-tuning in Spanish and Wav2Vec2-BERT, in addition to a fusion of Wav2Vec2 models trained in different languages for emotion recognition. A new system based on Whisper (designed for ASR) is also developed, called Whisper-for-emotions. After the execution and evaluation of the experiments, the results obtained are analysed and conclusions are drawn. Finally, possible future lines of research to improve the recognition of emotions in speech are proposed.