Comparative Wavelet and MFCC Speech Emotion Recognition Experiments on the RAVDESS Dataset

Authors

  • Aayush Bajaj, Abhishek Jha, Lakshay Vashisth, Dr. K.C. Tripathi

DOI:

https://doi.org/10.17762/msea.v71i3.468

Abstract

Emotion Recognition (ER) from speech is one of the most interesting research domains for the scientific world. The challenge behind ER is essentially the method of speech-feature-extraction that can efficiently encapsulate speaker independent emotional information from speech signals. This paper compares the performance of Window-Fourier Transform Method, Mel-Frequency Cepstral Coefficients (MFCC’s) and Continuous/Discrete Wavelet Transforms from the perspective of constant vs variant localization of time-frequency on The Rayerson audio-visual database of emotional speech and song. Wavelet transform has proven to be a promising non-linear tool for signal analysis that has been successfully applied in image recognition, compression and other tasks. MFCC’s has been a standard in feature extraction for speech. The motive here is to compare both the methods using the Random Forest algorithm with similar hyperparameters.

Downloads

Published

2022-08-19

How to Cite

Aayush Bajaj, Abhishek Jha, Lakshay Vashisth, Dr. K.C. Tripathi. (2022). Comparative Wavelet and MFCC Speech Emotion Recognition Experiments on the RAVDESS Dataset. Mathematical Statistician and Engineering Applications, 71(3), 1288–1293. https://doi.org/10.17762/msea.v71i3.468

Issue

Section

Articles