Artificial Intelligent for Human Emotion Detection with the Mel-Frequency Cepstral Coefficient (MFCC)

Anita Ahmad Kasim, Muhammad Bakri, Irwan Mahmudi, Rahmawati Rahmawati, Zulnabil Zulnabil

Abstract


Emotions are an important aspect of human communication. Expression of human emotions can be identified through sound. The development of voice detection or speech recognition is a technology that has developed rapidly to help improve human-machine interaction. This study aims to classify emotions through the detection of human voices. One of the most frequently used methods for sound detection is the Mel-Frequency Cepstrum Coefficient (MFCC) where sound waves are converted into several types of representation. Mel-frequency cepstral coefficients (MFCCs) are the coefficients that collectively represent the short-term power spectrum of a sound, based on a linear cosine transform of a log power spectrum on a nonlinear mel scale of frequency. The primary data used in this research is the data recorded by the author. The secondary data used is data from the "Berlin Database of Emotional Speech" in the amount of 500 voice recording data. The use of MFCC can extract implied information from the human voice, especially to recognize the feelings experienced by humans when pronouncing the sound. In this study, the highest accuracy was obtained when training with epochs of 10000 times, which was 85% accuracy.

Keywords


Human emotions; Voice Feature; MFCC

References


[1] Md. Z. Uddin and E. G. Nilsson, “Emotion recognition using speech and neural structured learning to facilitate edge intelligence,” Eng Appl Artif Intell, vol. 94, p. 103775, 2020, doi: 10.1016/j.engappai.2020.103775, access time November 1st, 2022

[2] S. Lalitha, D. Geyasruti, R. Narayanan, and S. M, “Emotion Detection Using MFCC and Cepstrum Features,” Procedia Comput Sci, vol. 70, pp. 29–35, 2015, doi: 10.1016/j.procs.2015.10.020, access time November 1st, 2022

[3] I. Rahmawanthi, J. Raharjo, and A. Rusdinar, “Detection Human Voice in Emotion Condition Using Linear Predictive Coding (LPC) with Coarse to Fine Search (CFS) Classification Based on Data Processing,” in e-proceeding of engineering, 2019, pp. 656–663, access time October 31th, 2022

[4] R. Via Yuliantari, R. Hidayat, and O. Wahyunggoro, “Ekstrasi Ciri dan Pengenalan Suara Vokal Bahasa Indonesia Berdasarkan Jenis Kelamin Secara Real Time,” in Prosiding SNATIF3, 2016, pp. 1–6, access time October 31th, 2022

[5] P. Thu and Z. Tun, “Audio Feature Extraction Using Mel-Frequency Cepstral Coefficients,” vol. 2, p. 12, 2020, doi: 10.5281/zenodo.1342401, access time December 11th, 2022

[6] H. Heriyanto and D. A. Irawati, “Comparison of Mel Frequency Cepstral Coefficient (MFCC) Feature Extraction, With and Without Framing Feature Selection, to Test the Shahada Recitation,” RSF Conference Series: Engineering and Technology, vol. 1, no. 1, pp. 335–354, Dec. 2021, doi: 10.31098/cset.v1i1.395, access time December 11th, 2022

[7] A. A. Kasim, R. Wardoyo, and A. Harjoko, “Batik classification with artificial neural network based on texture-shape feature of main ornament,” International Journal of Intelligent Systems and Applications, vol. 9, no. 6, pp. 55–65, Jun. 2017, doi: 10.5815/ijisa.2017.06.06, access time October 31th, 2022

[8] A. A. Kasim and Agus Harjoko, “Klasifikasi Citra Batik Menggunakan Jaringan Syaraf Tiruan Berdasarkan Gray Level Co-Occurrence Matrices (GLCM) Agus Harjoko,” in Seminar Nasional Aplikasi Teknologi Informasi (SNATI), 2014, pp. C7-C–13, access time October 31th, 2022

[9] N. J. Nalini and S. Palanivel, “Music emotion recognition: The combined evidence of MFCC and residual phase,” Egyptian Informatics Journal, vol. 17, no. 1, pp. 1–10, 2016, doi: 10.1016/j.eij.2015.05.004, access time November 1st, 2022

[10] A. A. Sundawa, A. G. Putrada, and N. A. Suwastika, “Implementasi dan Analisis Simulasi Deteksi Emosi Melalui Pengenalan Suara Menggunakan Mel-Frequency Cepstrum Coefficient dan Hidden Markov Model Berbasis IOT,” in e-Proceeding of Engineering, 2019, pp. 1–8, access time October 31th, 2022

[11] S. Helmiyah, A.Fadlil, and A.Yudhana, “Pengenalan Pola Emosi Manusia Berdasarkan Ucapan Menggunakan Ekstraksi Fitur Mel-Frequency Cepstral Coefficients (MFCC),” Cogito Smart Journal, vol. 4, no. 2, pp. 372–381, 2018, doi: 10.31154/cogito.v4i2.129.372-381, access time October 31th, 2022

[12] F. Burkhardt, A. Paeschke, M. Rolfes, W. Sendlmeier, and B. Weiss, “A Database of German Emotional Speech,” in Interspeech, 2005. doi: 10.21437/Interspeech.2005-446, access time October 31th, 2022

[13] M. N. Mohanty and H. K. Palo, “Child emotion recognition using probabilistic neural network with effective features,” Measurement, vol. 152, p. 107369, 2020, doi: 10.1016/j.measurement.2019.107369, access time November 1st, 2022

[14] Y. R. Prayogi, “Modifikasi Metode MFCC untuk Identifikasi Pembicara di Lingkungan Ber-Noise,” JOINTECS) Journal of Information Technology and Computer Science, vol. 4, no. 1, pp. 2541–3619, 2019, doi: 10.31328/jointecs.v4i1.999, access time November 1st, 2022

[15] Ranny, I.S. Suwardi, T.L.E. Rajab, and D.P. Lestari, “Study of Sound Processing and Application on Information Technology,” JUITA, vol. VII, no. 1, pp. 1–10, 2019, doi: 10.30595/juita.v7i1.3491, access time November 1st, 2022

[16] Heriyanto, S. Hartati, and A.E. Putra, “Ekstraksi Ciri Mel Frequency Cepstral Coefficient (MFCC) dan Rerata Coefficient Untuk Pengecekan Bacaan Al-Qur’an,” Telematika, vol. 15, no. 02, pp. 99–108, 2019, doi: 10.31315/telematika.v15i2.3123, access time November 1st, 2022


Full Text: PDF

DOI: 10.30595/juita.v11i1.15435

Refbacks

  • There are currently no refbacks.


Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 International License.

ISSN: 2579-8901