Emotional Text Classification Using TF-IDF (Term Frequency-Inverse Document Frequency) And LSTM (Long Short-Term Memory)

Muhammad Ibnu Alfarizi, Lailis Syafaah, Merinda Lestandy

Abstract


Humans in carrying out communication activities can express their feelings either verbally or non-verbally. Verbal communication can be in the form of oral or written communication. A person's feelings or emotions can usually be seen by their behavior, tone of voice, and expression. Not everyone can see emotion only through writing, whether in the form of words, sentences, or paragraphs. Therefore, a classification system is needed to help someone determine the emotions contained in a piece of writing. The novelty of this study is a development of previous research using a similar method, namely LSTM but improved on the word weighting process using the TF-IDF method as a further process of LSTM classification. The method proposed in this research is called Natural Language Processing (NLP). The purpose of this study was to compare the classification method with the LSTM (Long Short-Term Memory) model by adding the word weighting TF-IDF (Term Frequency–Inverse Document Frequency) and the LinearSVC model, as well to increase accuracy in determining an emotion (sadness, anger, fear, love, joy, and surprise) contained in the text. The dataset used is 18000, which is divided into 16000 training data and 2000 test data with 6 classifications of emotion classes, namely sadness, anger, fear, love, joy, and surprise. The results of the classification accuracy of emotions using the LSTM method yielded a 97.50% accuracy while using the LinearSVC method resulted in an accuracy value of 89%.

Keywords


Emotional Text Classification, TF-IDF, LSTM, LinearSVC

References


[1] J. Abdillah, “Klasifikasi Emosi pada Lirik Lagu menggunakan Metode Bidirectional LSTM dengan Pembobotan GloVe Word Representation Program Studi Sarjana Informatika Fakultas Informatika Universitas Telkom Bandung,” 2020, doi: 10.29207/resti.v4i4.2156.

[2] M. R. Firmansyah, R. Ilyas, and F. Kasyidi, “Klasifikasi Kalimat Ilmiah Menggunakan Recurrent Neural Network,” Pros. 11th Ind. Res. Work. Natl. Semin., vol. 11, no. 1, pp. 488–495, 2020, doi: 10.35313/irwns.v11i1.2055.

[3] M. R. Hasan, M. Maliha, and M. Arifuzzaman, “Sentiment Analysis with NLP on Twitter Data,” 5th Int. Conf. Comput. Commun. Chem. Mater. Electron. Eng. IC4ME2 2019, pp. 1–4, 2019, doi: 10.1109/IC4ME247184.2019.9036670.

[4] M. A. Fauzi, “Random forest approach fo sentiment analysis in Indonesian language,” Indones. J. Electr. Eng. Comput. Sci., vol. 12, no. 1, pp. 46–50, 2018, doi: 10.11591/ijeecs.v12.i1.pp46-50.

[5] W. K. Sari, D. P. Rini, and R. F. Malik, “Text Classification Using Long Short-Term Memory With GloVe Features,” J. Ilm. Tek. Elektro Komput. dan Inform., vol. 5, no. 2, p. 85, 2020, doi: 10.26555/jiteki.v5i2.15021.

[6] W. K. Sari, D. P. Rini, R. F. Malik, and I. S. B. Azhar, “Klasifikasi Teks Multilabel pada Artikel Berita Menggunakan Long Short- Term Memory dengan Word2Vec,” Resti, vol. 1, no. 10, pp. 276–285, 2017, doi: 10.29207/resti.v4i2.1655.

[7] M. Lestandy, A. Abdurrahim, and L. Syafa’ah, “Analisis Sentimen Tweet Vaksin COVID-19 Menggunakan Recurrent Neural Network dan Naive Bayes,” J. RESTI (Rekayasa Sist. dan Teknol. Informasi), vol. 5, no. 2, pp. 802–808, 2021, doi: 10.29207/resti.v5i4.3308.

[8] Y. A. Solangi, Z. A. Solangi, S. Aarain, A. Abro, G. A. Mallah, and A. Shah, “Review on Natural Language Processing (NLP) and Its Toolkits for Opinion Mining and Sentiment Analysis,” 2018 IEEE 5th Int. Conf. Eng. Technol. Appl. Sci. ICETAS 2018, pp. 1–4, 2019, doi: 10.1109/ICETAS.2018.8629198.

[9] E. Saravia, H. C. Toby Liu, Y. H. Huang, J. Wu, and Y. S. Chen, “Carer: Contextualized affect representations for emotion recognition,” Proc. 2018 Conf. Empir. Methods Nat. Lang. Process. EMNLP 2018, no. January, pp. 3687–3697, 2020, doi: 10.18653/v1/d18-1404.

[10] S. Sumpeno, “KLASIFIKASI EMOSI UNTUK TEKS BAHASA INDONESIA - Sumpeno - 2009.pdf,” no. c, 2009.

[11] G. Sidorov, “Vector space model,” SpringerBriefs Comput. Sci., pp. 5–10, 2019, doi: 10.1007/978-3-030-14771-6_2.

[12] Imamah and F. H. Rachman, “Twitter sentiment analysis of Covid-19 using term weighting TF-IDF and logistic regresion,” Proceeding - 6th Inf. Technol. Int. Semin. ITIS 2020, pp. 238–242, 2020, doi: 10.1109/ITIS50118.2020.9320958.

[13] A. F. Anees, A. Shaikh, A. Shaikh, and S. Shaikh, “Survey Paper on Sentiment Analysis : Techniques and Challenges,” EasyChair, pp. 2516–2314, 2020.

[14] N. Yadav, O. Kudale, A. Rao, S. Gupta, and A. Shitole, “Twitter Sentiment Analysis Using Supervised Machine Learning,” Lect. Notes Data Eng. Commun. Technol., vol. 57, no. April 2020, pp. 631–642, 2021, doi: 10.1007/978-981-15-9509-7_51.

[15] Q. J. Li. Dan, “Text Sentiment Analysis Based on Long Short-Term Memory.pdf,” pp. 471–475, 2016, doi: 10.1109/CCI.2016.7778967.

[16] L. Kurniasari and A. Setyanto, “Sentiment analysis using recurrent neural network-lstm in bahasa Indonesia,” J. Eng. Sci. Technol., vol. 15, no. 5, pp. 3242–3256, 2020, doi: 10.1088/1742-6596/1471/1/012018.

[17] M. Nurjannah and I. Fitri Astuti, “PENERAPAN ALGORITMA TERM FREQUENCY-INVERSE DOCUMENT FREQUENCY (TF-IDF) UNTUK TEXT MINING Mahasiswa S1 Program Studi Ilmu Komputer FMIPA Universitas Mulawarman Dosen Program Studi Ilmu Komputer FMIPA Universitas Mulawarman,” J. Inform. Mulawarman, vol. 8, no. 3, pp. 110–113, 2013, doi: 10.30872/jim.v8i3.113.

[18] S. Sarica and J. Luo, “Stopwords in technical language processing,” PLoS One, vol. 16, no. 8 August, pp. 1–13, 2021, doi: 10.1371/journal.pone.0254937.

[19] D. N. Chandra, G. Indrawan, and I. N. Sukajaya, “Klasifikasi Berita Lokal Radar Malang Menggunakan Metode Naive Bayes dengan Fitur N-Gram,” J. Ilm. Teknol. dan Infomasia ASIA, vol. 10, no. 2, pp. 11–19, 2019, doi: 10.23887/jik.v4i2.2772.

[20] A. A. Maarif, “Penerapan Algoritma TF-IDF untuk Pencarian Karya Ilmiah,” Dok. Karya Ilm. | Tugas Akhir | Progr. Stud. Tek. Inform. - S1 | Fak. Ilmu Komput. | Univ. Dian Nuswantoro Semarang, no. 5, p. 4, 2015, [Online]. Available: mahasiswa.dinus.ac.id/docs/skripsi/jurnal/15309.pdf

[21] F. Masri, D. Saepudin, and D. Adytia, “Forecasting of Sea Level Time Series using Deep Learning RNN, LSTM, and BiLSTM, Case Study in Jakarta Bay, Indonesia,” e-Proceeding Eng., vol. 7, no. 2, pp. 8544–8551, 2020.

[22] K. Ivanedra and M. Mustikasari, “Implementasi Metode Reccurrent Neural Network pada Text Summarization dengan Teknik Abstraktif,” J. Teknol. Inf. dan Ilmu Komput., vol. 6, no. 4, p. 377, 2019, doi: 10.25126/jtiik.2019641067.

[23] R. Cahyadi, A. Damayanti, and D. Aryadani, “Recurrent Neural Network (RNN) dengan Long Short Term Memory (LSTM) untuk Analisis Sentimen data instagram,” J. Inform. dan Komput., vol. 5, no. 1, pp. 1–9, 2020, doi: 10.26798/jiko.v5i1.407.


Full Text: PDF

DOI: 10.30595/juita.v10i2.13262

Refbacks

  • There are currently no refbacks.


Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 International License.

ISSN: 2579-8901