Performance Evaluation of Tuned and Untuned Machine Learning Models in Speech Emotion Recognition

Authors

Muhammad Hudzaifah Nasrullah Yarsi Pratama University
Dede Cahyadi Yarsi Pratama University
Tilly Raycitra Widya Yarsi Pratama University
Ewin Suciana Yarsi Pratama University
Lilik Tiara Giantri Yarsi Pratama University

DOI:

https://doi.org/10.30595/juita.v14i1.29015

Keywords:

Emotion recognition, machine learning, gridsearchcv, confusion matrix.

Abstract

This analysis takes on a comparative review of three distinct machine learning approaches: Support Vector Machine (SVM), Multi-Layer Perceptron (MLP), and Random Forest (RF) to ascertain emotional states in verbal communication by utilizing the RAVDESS resource. In this review, we perform a strategy that unites audio feature extraction, model training with or without tweaks to hyperparameters, and evaluation via metrics including accuracy, precision, recall, and F1-score. The assessment shows that, before any refinement, SVM secured the utmost accuracy of 79%, trailed by MLP at 76% and RF at 71%. Following optimization, only SVM exhibited an enhancement, reaching 80%, whereas MLP and RF displayed negligible or no improvement. An examination of the confusion matrix revealed that SVM produced the most uniformly distributed predictions and effectively reduced misclassification errors, particularly within the emotion categories of “calm” and “happy.” This investigation offers empirical substantiation of SVM as a robust baseline model for speech emotion recognition in localized settings, while simultaneously providing insights into model optimization and development that could inform future implementations in speech-based human–computer interaction.

Author Biographies

Muhammad Hudzaifah Nasrullah, Yarsi Pratama University

Department of Informatics Engineering

Dede Cahyadi, Yarsi Pratama University

Department of Informatics Engineering

Tilly Raycitra Widya, Yarsi Pratama University

Department of Informatics Engineering

Ewin Suciana, Yarsi Pratama University

Department of Informatics Engineering

Lilik Tiara Giantri, Yarsi Pratama University

Department of Informatics Engineering

References

[1] L.-L. Guo, L.-B. Wang, J.-W. Dang, and S.-F. Ding, “Research Progress of Discrete Speech Emotion Recognition,” Ruan Jian Xue Bao, vol. 35, no. 12, pp. 5487–5508, 2024, doi: 10.13328/j.cnki.jos.007232.

[2] M. O. Oyediran, O. S. Ojo, S. Bharany, A. E. Adeniyi, A. L. Imoize, Y. Farhaoui, and J. B. Awotunde, "Speech emotion recognition using yet another mobile Network tool," in Proc. Int. Conf. Artificial Intelligence and Smart Environment, Cham, Switzerland: Springer International Publishing, 2022, pp. 729-739, doi: 10.1007/978-3-031-26254-8_106.

[3] A. S. Nasim, R. H. Chowdory, A. Dey, and A. Das, "Recognizing speech emotion based on acoustic features using machine learning," in Proc. 2021 Int. Conf. Adv. Comput. Sci. Inf. Syst. (ICACSIS), IEEE, 2021, pp. 1–7, doi: 10.1109/ICACSIS53237.2021.9631319.

[4] Y. Li, "Enhancing speech emotion recognition for real-world applications via ASR integration," in Proc. 2023 11th Int. Conf. Affective Comput. Intell. Interact. Workshops Demos (ACIIW), IEEE, 2023, pp. 1–5, doi: 10.1109/ACIIW59127.2023.10388136.

[5] A. Vyakaranam, B. Ramayah, and T. Maul, “Preliminary Study: Speech Emotion Recognition in Online Teaching from the Perspective of Educators Especially Late Deafened,” in Proc. 2024 2nd Int. Conf. Softw. Eng. Inf. Technol. (ICoSEIT), 2024, pp. 216–221, doi: 10.1109/ICoSEIT60086.2024.10497503.

[6] T. Rathi and M. Tripathy, "Analyzing the influence of different speech data corpora and speech features on speech emotion recognition: A review," Speech Commun., vol. 162, pp. 103102, 2024, doi: 10.1016/j.specom.2024.103102.

[7] A. R. Lakshminarayanan, I. S. R. Balaji, S. T. Hussain, V. Jayaraman, and C. S. Anwar, “Enhancing Speech Emotional Recognition through a Multi-Layer Perceptron Model,” in Proc. 2023 2nd Int. Conf. Trends Electr., Electron. Comput. Eng. (TEECCON), 2023, pp. 178–183, doi: 10.1109/TEECCON59234.2023.10335806.

[8] E. Blumentals and A. Salimbajevs, "Emotion recognition in real-world support call center data for Latvian language," in CEUR Workshop Proc., vol. 3124, 2022.

[9] A. V. Porco and D. Kang, “Enhancing Emotion Classification Through Speech and Correlated Emotional Sounds via a Variational Auto-Encoder Model with Prosodic Regularization,” in Proc. 2023 IEEE Int. Conf. Comput. Vis. Mach. Intell. (CVMI), 2023, doi: 10.1109/CVMI59935.2023.10464855.

[10] S. Mekruksavanich, A. Jitpattanakul, and N. Hnoohom, “Negative Emotion Recognition using Deep Learning for Thai Language,” in Proc. 2020 Jt. Int. Conf. Digit. Arts, Media Technol. (ECTI DAMT NCON), 2020, pp. 71–74, doi: 10.1109/ECTIDAMTNCON48261.2020.9090768.

[11] R. Sharma and A. Pradhan, “Implementation of Machine Learning based Optimized Speech Emotion Recognition,” in Proc. 2nd Int. Conf. Autom., Comput. Renew. Syst. (ICACRS), 2023, pp. 1090–1095, doi: 10.1109/ICACRS58579.2023.10405195.

[12] S. R. Livingstone and F. A. Russo, “The Ryerson Audio-Visual Database of Emotional Speech and Song (RAVDESS): A dynamic, multimodal set of facial and vocal expressions in North American English,” PLoS ONE, vol. 13, no. 5, p. e0196391, 2018, doi: 10.1371/journal.pone.0196391.

[13] S. Cai, Y. Xiao, J. Pan, Q. Zhao, and Y. Yan, “Noise robust feature scheme for automatic speech recognition based on auditory perceptual mechanisms,” IEICE Trans. Inf. Syst., vol. E95-D, no. 6, pp. 1610–1618, 2012, doi: 10.1587/transinf.E95.D.1610.

[14] S. S. Hanna, N. Korany, and M. B. Abd-El-Malek, “Speech recognition using Hilbert-Huang transform based features,” in Proc. 2017 40th Int. Conf. Telecommun. Signal Process. (TSP), 2017, pp. 338–341, doi: 10.1109/TSP.2017.8076000.

[15] S. D. Voran, “Why Some Audio Signal Short-Time Fourier Transform Coefficients Have Nonuniform Phase Distributions,” in Proc. IEEE Int. Conf. Multimed. Expo (ICME), 2024, doi: 10.1109/ICME57554.2024.10687591.

[16] F. L. de Mattos, M. E. Pellenz, and A. S. Britto, “Time Distributed Multiview Representation for Speech Emotion Recognition,” in Lecture Notes in Computer Science, 2024, pp. 148–162, doi: 10.1007/978-3-031-49018-7_11.

[17] Y. Tan, Z. Wang, K. Qian, Z. Bao, Z. Cao, B. Hu, Y. Yamamoto, and B. W. Schuller, "Amnet: Introducing an adaptive mel-spectrogram end-to-end neural network for heart sound classification," in Proc. 2023 IEEE Int. Conf. E-health Networking, Application & Services (Healthcom), IEEE, 2023, pp. 90-94, doi: 10.1109/Healthcom56612.2023.10472362.

[18] R. Lin, Z. Zhou, S. You, R. Rao, and C. C. J. Kuo, “Geometrical Interpretation and Design of Multilayer Perceptrons,” IEEE Trans. Neural Netw. Learn. Syst., vol. 35, no. 2, pp. 2545–2559, 2024, doi: 10.1109/TNNLS.2022.3190364.

[19] M. Jabardi, “Support Vector Machines: Theory, Algorithms, and Applications,” Infocommunications J., vol. 17, no. 1, pp. 66–75, 2025, doi: 10.36244/ICJ.2025.1.8.

[20] S. Kukreti, K. Al-Attabi, R. Chandrashekar, K. P. Rani, A. Badhoutiya, N. S. Boob, and A. Srivastava, "Enhancing Disease Prediction through Random Forests in Healthcare Analytics," in Proc. 2024 7th Int. Conf. Contemporary Computing and Informatics (IC3I), vol. 7, IEEE, 2024, pp. 1693-1699, doi: 10.1109/IC3I61595.2024.10828927.

[21] A. Thakur and S. K. Dhull, “Language-independent hyperparameter optimization-based speech emotion recognition system,” Int. J. Inf. Technol. Singap., vol. 14, no. 7, pp. 3691–3699, 2022, doi: 10.1007/s41870-022-00996-9.

[22] J. Erbani, P.-É. Portier, E. Egyed-Zsigmond, and D. Nurbakova, "Confusion matrices: A unified theory," IEEE Access, vol. 12, pp. 181372–181419, 2024, doi: 10.1109/ACCESS.2024.3507199.

[23] K. J. S. Narayanan and A. Manimaran, “Using Decision Risk and Decision Accuracy Metrics for Decision Making for Remote Sensing and GIS Applications,” in Lecture Notes in Civil Engineering, 2024, pp. 125–136, doi: 10.1007/978-981-99-6229-711.

[24] M. Heydarian, T. E. Doyle, and R. Samavi, “MLCM: Multi-Label Confusion Matrix,” IEEE Access, vol. 10, pp. 19083–19095, 2022, doi: 10.1109/ACCESS.2022.3151048.

[25] D. A. Tarihoran and H. Santoso, "Comparative Analysis of Machine Learning Algorithms for Groundwater Potability Classification in Jakarta," JUITA: Jurnal Informatika, vol. 13, no. 3, pp. 371–381, Nov. 2025, doi: 10.30595/juita.v13i3.27348.

Downloads

PDF

Published

2026-03-31

How to Cite

Nasrullah, M. H., Cahyadi, D., Widya, T. R., Suciana, E., & Giantri, L. T. (2026). Performance Evaluation of Tuned and Untuned Machine Learning Models in Speech Emotion Recognition. JUITA: Jurnal Informatika, 14(1), 205–213. https://doi.org/10.30595/juita.v14i1.29015

Download Citation

Issue

JUITA Vol. 14 Issue 1, March 2026

Section

Articles

License

Copyright (c) 2026 Muhammad Hudzaifah Nasrullah; Dede Cahyadi, Tilly Raycitra Widya, Anggraeni Pratama Indrianto, Lilik Tiara Giantri

Creative Commons License

This work is licensed under a Creative Commons Attribution 4.0 International License.

Creative Commons License

JUITA: Jurnal Informatika is licensed under a Creative Commons Attribution 4.0 International License.

Crossref

Scopus

Google Scholar

Europe PMC

Similar Articles

Joseph Teguh Santoso, Eko Sediyono, Kristoko Dwi Hartomo, Irwan Sembiring, Optimizing Attendance System: Integrating Liveness Detection and Deep Learning for Reliable Face Recognition , JUITA: Jurnal Informatika: JUITA Vol. 12 No. 2, November 2024
I Kadek Gunawan, I Putu Agung Bayupati, Kadek Suar Wibawa, I Made Sukarsa, Laurensius Adi Kurniawan, Indonesian Plate Number Identification Using YOLACT and Mobilenetv2 in the Parking Management System , JUITA: Jurnal Informatika: JUITA Vol. 9 No. 1, May 2021
Lianny Wydiastuty Kusuma, Andri Wijaya, Asahiro Nathanael Star Sitohang, Ceng Giap Yo, Development and Evaluation of Stroke Disease Classification Models: Classical Machine Learning, Deep Learning, and Explainable AI Approaches , JUITA: Jurnal Informatika: JUITA Vol. 13 Issue 3, November 2025
Hapsoro Agung Nugroho, Siti Hasanah, Mahmud Yusuf, Seismic Data Quality Analysis Based on Image Recognition Using Convolutional Neural Network , JUITA: Jurnal Informatika: JUITA Vol. 10 No. 1, May 2022
Rudi Heriansyah, Wahyu Mulyo Utomo, Performance Evaluation of Digital Image Processing by Using Scilab , JUITA: Jurnal Informatika: JUITA Vol. 9 No. 2, November 2021
Diky Arianto Tarihoran, Hadi Santoso, Comparative Analysis of Machine Learning Algorithms for Groundwater Potability Classification in Jakarta , JUITA: Jurnal Informatika: JUITA Vol. 13 Issue 3, November 2025
Made Agus Panji Sujaya, I Made Agus Wirawan, Gede Indrawan, Subject Independent Emotion Recognition Using Electroencephalogram Signals with Continuous Capsule Network Method , JUITA: Jurnal Informatika: JUITA Vol. 13 Issue 1, March 2025
Fitri Yanti, Jaka Sutresna, Alphabet Recognition with Augmented Reality Technology Based on Android Using Extreme Programming Model , JUITA: Jurnal Informatika: JUITA Vol. 10 No. 1, May 2022
Afiyati Afiyati, Azhari Azhari, Anny Kartika Sari, Abdul Karim, Challenges of Sarcasm Detection for Social Network : A Literature Review , JUITA: Jurnal Informatika: JUITA Vol. 8 Nomor 2, November 2020
Bayu Anugerah Putra, Fitri Handayani, Yulia Fatma, Jhidan Daelvin Hendra, Performance Analysis of SVM and BERT in Predicting the Availability of Stunting Prevention Services in Indonesia , JUITA: Jurnal Informatika: JUITA Vol. 14 Issue 1, March 2026

1 > >>

You may also start an advanced similarity search for this article.