Comparative Wavelet and MFCC Speech Emotion Recognition Experiments on the RAVDESS Dataset
DOI:
https://doi.org/10.17762/msea.v71i3.468Abstract
— Emotion Recognition (ER) from speech is one of the most interesting research domains for the scientific world. The challenge behind ER is essentially the method of speech-feature-extraction that can efficiently encapsulate speaker independent emotional information from speech signals. This paper compares the performance of Window-Fourier Transform Method, Mel-Frequency Cepstral Coefficients (MFCC’s) and Continuous/Discrete Wavelet Transforms from the perspective of constant vs variant localization of time-frequency on The Rayerson audio-visual database of emotional speech and song. Wavelet transform has proven to be a promising non-linear tool for signal analysis that has been successfully applied in image recognition, compression and other tasks. MFCC’s has been a standard in feature extraction for speech. The motive here is to compare both the methods using the Random Forest algorithm with similar hyperparameters.