A Speech Emotion Recognition Approach Using Discrete Wavelet Transform and Deep Learning Techniques in a Brazilian Portuguese Corpus
NEW
The study of emotion encompasses the human mind's cognitive processes and psychological states. With the rapid advancement and declining costs of technology, researchers have become increasingly focused on capturing voice, gestures, facial expressions, and other expressions of emotion. In this study, we combine a Deep Learning model with the Wavelet Transform technique for the task of Speech Emotion Recognition, which aims to detect and identify emotions in informal and spontaneous speech, part of a Brazilian-Portuguese corpus, achieving a macro F1-score of 0.566 and a ROC-AUC score of 0.7217 on the CORAA database, while surpassing the results achieved in another work presented at the International Conference on Computational Processing of Portuguese Language 2022, which uses the same architecture together with transfer learning techniques, by up to 11% macro F1. Our methodology integrates a deep learning model with advanced signal processing techniques. Specifically, we leverage a pre-trained large-scale neural network architecture tailored for audio analysis, incorporating Discrete Wavelet Transform and Mel Spectrogram features to enhance the model’s performance. Additionally, we apply the SpecAugment technique for effective data augmentation. Our approach is positioned as the second-best overall and the top-performing method among those that do not utilize open-set techniques, such as using other datasets or using transfer learning techniques during the model training, while being one of the few works that excelled the proposed baselines when compared with the works presented at the event.