Performance Evaluation of Tuned and Untuned Machine Learning Models in Speech Emotion Recognition
DOI:
https://doi.org/10.30595/juita.v14i1.29015Keywords:
Emotion recognition, machine learning, gridsearchcv, confusion matrix.Abstract
This analysis takes on a comparative review of three distinct machine learning approaches: Support Vector Machine (SVM), Multi-Layer Perceptron (MLP), and Random Forest (RF) to ascertain emotional states in verbal communication by utilizing the RAVDESS resource. In this review, we perform a strategy that unites audio feature extraction, model training with or without tweaks to hyperparameters, and evaluation via metrics including accuracy, precision, recall, and F1-score. The assessment shows that, before any refinement, SVM secured the utmost accuracy of 79%, trailed by MLP at 76% and RF at 71%. Following optimization, only SVM exhibited an enhancement, reaching 80%, whereas MLP and RF displayed negligible or no improvement. An examination of the confusion matrix revealed that SVM produced the most uniformly distributed predictions and effectively reduced misclassification errors, particularly within the emotion categories of “calm” and “happy.” This investigation offers empirical substantiation of SVM as a robust baseline model for speech emotion recognition in localized settings, while simultaneously providing insights into model optimization and development that could inform future implementations in speech-based human–computer interaction.
References
[1] L.-L. Guo, L.-B. Wang, J.-W. Dang, and S.-F. Ding, “Research Progress of Discrete Speech Emotion Recognition,” Ruan Jian Xue Bao, vol. 35, no. 12, pp. 5487–5508, 2024, doi: 10.13328/j.cnki.jos.007232.
[2] M. O. Oyediran, O. S. Ojo, S. Bharany, A. E. Adeniyi, A. L. Imoize, Y. Farhaoui, and J. B. Awotunde, "Speech emotion recognition using yet another mobile Network tool," in Proc. Int. Conf. Artificial Intelligence and Smart Environment, Cham, Switzerland: Springer International Publishing, 2022, pp. 729-739, doi: 10.1007/978-3-031-26254-8_106.
[3] A. S. Nasim, R. H. Chowdory, A. Dey, and A. Das, "Recognizing speech emotion based on acoustic features using machine learning," in Proc. 2021 Int. Conf. Adv. Comput. Sci. Inf. Syst. (ICACSIS), IEEE, 2021, pp. 1–7, doi: 10.1109/ICACSIS53237.2021.9631319.
[4] Y. Li, "Enhancing speech emotion recognition for real-world applications via ASR integration," in Proc. 2023 11th Int. Conf. Affective Comput. Intell. Interact. Workshops Demos (ACIIW), IEEE, 2023, pp. 1–5, doi: 10.1109/ACIIW59127.2023.10388136.
[5] A. Vyakaranam, B. Ramayah, and T. Maul, “Preliminary Study: Speech Emotion Recognition in Online Teaching from the Perspective of Educators Especially Late Deafened,” in Proc. 2024 2nd Int. Conf. Softw. Eng. Inf. Technol. (ICoSEIT), 2024, pp. 216–221, doi: 10.1109/ICoSEIT60086.2024.10497503.
[6] T. Rathi and M. Tripathy, "Analyzing the influence of different speech data corpora and speech features on speech emotion recognition: A review," Speech Commun., vol. 162, pp. 103102, 2024, doi: 10.1016/j.specom.2024.103102.
[7] A. R. Lakshminarayanan, I. S. R. Balaji, S. T. Hussain, V. Jayaraman, and C. S. Anwar, “Enhancing Speech Emotional Recognition through a Multi-Layer Perceptron Model,” in Proc. 2023 2nd Int. Conf. Trends Electr., Electron. Comput. Eng. (TEECCON), 2023, pp. 178–183, doi: 10.1109/TEECCON59234.2023.10335806.
[8] E. Blumentals and A. Salimbajevs, "Emotion recognition in real-world support call center data for Latvian language," in CEUR Workshop Proc., vol. 3124, 2022.
[9] A. V. Porco and D. Kang, “Enhancing Emotion Classification Through Speech and Correlated Emotional Sounds via a Variational Auto-Encoder Model with Prosodic Regularization,” in Proc. 2023 IEEE Int. Conf. Comput. Vis. Mach. Intell. (CVMI), 2023, doi: 10.1109/CVMI59935.2023.10464855.
[10] S. Mekruksavanich, A. Jitpattanakul, and N. Hnoohom, “Negative Emotion Recognition using Deep Learning for Thai Language,” in Proc. 2020 Jt. Int. Conf. Digit. Arts, Media Technol. (ECTI DAMT NCON), 2020, pp. 71–74, doi: 10.1109/ECTIDAMTNCON48261.2020.9090768.
[11] R. Sharma and A. Pradhan, “Implementation of Machine Learning based Optimized Speech Emotion Recognition,” in Proc. 2nd Int. Conf. Autom., Comput. Renew. Syst. (ICACRS), 2023, pp. 1090–1095, doi: 10.1109/ICACRS58579.2023.10405195.
[12] S. R. Livingstone and F. A. Russo, “The Ryerson Audio-Visual Database of Emotional Speech and Song (RAVDESS): A dynamic, multimodal set of facial and vocal expressions in North American English,” PLoS ONE, vol. 13, no. 5, p. e0196391, 2018, doi: 10.1371/journal.pone.0196391.
[13] S. Cai, Y. Xiao, J. Pan, Q. Zhao, and Y. Yan, “Noise robust feature scheme for automatic speech recognition based on auditory perceptual mechanisms,” IEICE Trans. Inf. Syst., vol. E95-D, no. 6, pp. 1610–1618, 2012, doi: 10.1587/transinf.E95.D.1610.
[14] S. S. Hanna, N. Korany, and M. B. Abd-El-Malek, “Speech recognition using Hilbert-Huang transform based features,” in Proc. 2017 40th Int. Conf. Telecommun. Signal Process. (TSP), 2017, pp. 338–341, doi: 10.1109/TSP.2017.8076000.
[15] S. D. Voran, “Why Some Audio Signal Short-Time Fourier Transform Coefficients Have Nonuniform Phase Distributions,” in Proc. IEEE Int. Conf. Multimed. Expo (ICME), 2024, doi: 10.1109/ICME57554.2024.10687591.
[16] F. L. de Mattos, M. E. Pellenz, and A. S. Britto, “Time Distributed Multiview Representation for Speech Emotion Recognition,” in Lecture Notes in Computer Science, 2024, pp. 148–162, doi: 10.1007/978-3-031-49018-7_11.
[17] Y. Tan, Z. Wang, K. Qian, Z. Bao, Z. Cao, B. Hu, Y. Yamamoto, and B. W. Schuller, "Amnet: Introducing an adaptive mel-spectrogram end-to-end neural network for heart sound classification," in Proc. 2023 IEEE Int. Conf. E-health Networking, Application & Services (Healthcom), IEEE, 2023, pp. 90-94, doi: 10.1109/Healthcom56612.2023.10472362.
[18] R. Lin, Z. Zhou, S. You, R. Rao, and C. C. J. Kuo, “Geometrical Interpretation and Design of Multilayer Perceptrons,” IEEE Trans. Neural Netw. Learn. Syst., vol. 35, no. 2, pp. 2545–2559, 2024, doi: 10.1109/TNNLS.2022.3190364.
[19] M. Jabardi, “Support Vector Machines: Theory, Algorithms, and Applications,” Infocommunications J., vol. 17, no. 1, pp. 66–75, 2025, doi: 10.36244/ICJ.2025.1.8.
[20] S. Kukreti, K. Al-Attabi, R. Chandrashekar, K. P. Rani, A. Badhoutiya, N. S. Boob, and A. Srivastava, "Enhancing Disease Prediction through Random Forests in Healthcare Analytics," in Proc. 2024 7th Int. Conf. Contemporary Computing and Informatics (IC3I), vol. 7, IEEE, 2024, pp. 1693-1699, doi: 10.1109/IC3I61595.2024.10828927.
[21] A. Thakur and S. K. Dhull, “Language-independent hyperparameter optimization-based speech emotion recognition system,” Int. J. Inf. Technol. Singap., vol. 14, no. 7, pp. 3691–3699, 2022, doi: 10.1007/s41870-022-00996-9.
[22] J. Erbani, P.-É. Portier, E. Egyed-Zsigmond, and D. Nurbakova, "Confusion matrices: A unified theory," IEEE Access, vol. 12, pp. 181372–181419, 2024, doi: 10.1109/ACCESS.2024.3507199.
[23] K. J. S. Narayanan and A. Manimaran, “Using Decision Risk and Decision Accuracy Metrics for Decision Making for Remote Sensing and GIS Applications,” in Lecture Notes in Civil Engineering, 2024, pp. 125–136, doi: 10.1007/978-981-99-6229-711.
[24] M. Heydarian, T. E. Doyle, and R. Samavi, “MLCM: Multi-Label Confusion Matrix,” IEEE Access, vol. 10, pp. 19083–19095, 2022, doi: 10.1109/ACCESS.2022.3151048.
[25] D. A. Tarihoran and H. Santoso, "Comparative Analysis of Machine Learning Algorithms for Groundwater Potability Classification in Jakarta," JUITA: Jurnal Informatika, vol. 13, no. 3, pp. 371–381, Nov. 2025, doi: 10.30595/juita.v13i3.27348.
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2026 Muhammad Hudzaifah Nasrullah; Dede Cahyadi, Tilly Raycitra Widya, Anggraeni Pratama Indrianto, Lilik Tiara Giantri

This work is licensed under a Creative Commons Attribution 4.0 International License.

JUITA: Jurnal Informatika is licensed under a Creative Commons Attribution 4.0 International License.








