Analysis of the Impact of Vectorization Methods on Machine Learning-Based Sentiment Analysis of Tweets Regarding Readiness for Offline Learning

Yesi Novaria Kunang, Widya Putri Mentari

Abstract


Twitter users use social media to express emotions about something, whether it is criticism or praise. Analyzing the opinions or sentiments in the tweets that Twitter users send can identify their emotions for a particular topic. This study aims to determine the impact of vectorization methods on public sentiment analysis regarding the readiness for offline learning in Indonesia during the Covid-19 pandemic. The authors labeled sentiment using two different approaches: manually and automatically using the NLP TextBlob library. We compared the vectorization method used by employing count vectorization, TF-IDF, and a combination of both. The feature vectors were then classified using three classification methods: naïve Bayes, logistic regression, and k-nearest neighbor, for both manual and automatic labeling. To assess the performance of sentiment analysis models, we used accuracy, precision, recall, and F1-score for performance metrics. The best results showed that the Logistic regression classifier with the feature extraction technique that combines count vectorization and TF-IDF provided the best performance for both data with manual and automatic labeling.

Keywords


naïve Bayes, k-nearest neighbor, logistic regression, sentiment analysis, offline learning

References


[1] A. Kumar et al., “Wuhan to world: the COVID-19 pandemic,” Front. Cell. Infect. Microbiol., p. 242, 2021.

[2] M. Qorib, T. Oladunni, M. Denis, E. Ososanya, and P. Cotae, “Covid-19 vaccine hesitancy: Text mining, sentiment analysis and machine learning on COVID-19 vaccination Twitter dataset,” Expert Syst. Appl., vol. 212, p. 118715, Feb. 2023, doi: 10.1016/j.eswa.2022.118715.

[3] R. A. Utami, R. E. Mose, and M. Martini, “Pengetahuan, Sikap dan Keterampilan Masyarakat dalam Pencegahan COVID-19 di DKI Jakarta,” J. Kesehat. Holist., vol. 4, no. 2, pp. 68–77, Jul. 2020, doi: 10.33377/jkh.v4i2.85.

[4] Hendriyanto, “Pembelajaran Tatap Muka Dilaksanakan Secara Terbatas,” Direktorat Sekolah Dasar Direktorat Jenderal PAUD Dikdas dan Dikmen Kementerian Pendidikan, Kebudayaan, Riset dan Teknologi, Jun. 09, 2021. https://ditpsd.kemdikbud.go.id/public/artikel/detail/pembelajaran-tatap-muka-dilaksanakan-secara-terbatas (accessed Sep. 09, 2022).

[5] T. Krisdiyanto, “Analisis Sentimen Opini Masyarakat Indonesia Terhadap Kebijakan PPKM pada Media Sosial Twitter Menggunakan Naïve Bayes Clasifiers,” J. CoreIT J. Has. Penelit. Ilmu Komput. Dan Teknol. Inf., vol. 7, no. 1, p. 32, Jul. 2021, doi: 10.24014/coreit.v7i1.12945.

[6] N. D. Mentari, M. A. Fauzi, and L. Muflikhah, “Analisis Sentimen Kurikulum 2013 Pada Sosial Media Twitter Menggunakan Metode K-Nearest Neighbor dan Feature Selection Query Expansion Ranking,” J. Pengemb. Teknol. Inf. Dan Ilmu Komput. E-ISSN, vol. 2548, p. 964X, 2018.

[7] D. Devarapalli, M. S. Sri, P. K. Sri, P. Charishma, and P. V. N. Mounika, “Sentiment Analysis of COVID-19 Tweets Using Classification Algorithms,” in Innovations in Computer Science and Engineering: Proceedings of the Ninth ICICSE, 2021, Springer, 2022, pp. 395–405.

[8] Steven Loria., “TextBlob: Simplified Text Processing,” TextBlob: Simplified Text Processing, 2020. https://textblob.readthedocs.io/en/dev/

[9] G. A. Buntoro, “Analisis sentimen hatespeech pada twitter dengan metode naïve bayes classifier dan support vector machine,” J. Din. Inform., vol. 5, no. 2, pp. 1–21, 2016.

[10] P. Balakesava Reddy, S. Ramasubbareddy, G. Viswanath, and K. Govinda, “Sentiment Analysis of Tweets Related to COVID-19,” in Innovations in Computer Science and Engineering: Proceedings of the Ninth ICICSE, 2021, Springer, 2022, pp. 385–393.

[11] M. Furqan, S. Sriani, and S. M. Sari, “Analisis Sentimen Menggunakan K-Nearest Neighbor Terhadap New Normal Masa Covid-19 Di Indonesia,” Techno.Com, vol. 21, no. 1, pp. 51–60, Feb. 2022, doi: 10.33633/tc.v21i1.5446.

[12] A. Hamzah, “Lexicon-based Emotion Detection for Academic Questionnaire Results,” in Seminar Nasional Informatika (SEMNASIF), 2021, pp. 37–49.

[13] I. D. Onantya, Indriati, and P. P. Adikara, “Analisis Sentimen Pada Ulasan Aplikasi BCA Mobile Menggunakan BM25 Dan Improved K-Nearest Neighbor,” J. Pengemb. Teknol. Inf. Dan Ilmu Komput., vol. 3, no. 3, pp. 2575–2580, 2019.

[14] J. A. Septian, T. M. Fachrudin, and A. Nugroho, “Analisis Sentimen Pengguna Twitter Terhadap Polemik Persepakbolaan Indonesia Menggunakan Pembobotan TF-IDF dan K-Nearest Neighbor,” J. Intell. Syst. Comput., vol. 1, no. 1, pp. 43–49, Aug. 2019, doi: 10.52985/insyst.v1i1.36.

[15] A. Budianto, R. Ariyuana, and D. Maryono, “PERBANDINGAN K-NEAREST NEIGHBOR (KNN) DAN SUPPORT VECTOR MACHINE (SVM) DALAM PENGENALAN KARAKTER PLAT KENDARAAN BERMOTOR,” J. Ilm. Pendidik. Tek. Dan Kejuru., vol. 11, no. 1, p. 27, Nov. 2019, doi: 10.20961/jiptek.v11i1.18018.

[16] G. M. Raza, Z. S. Butt, S. Latif, and A. Wahid, “Sentiment Analysis on COVID Tweets: An Experimental Analysis on the Impact of Count Vectorizer and TF-IDF on Sentiment Predictions using Deep Learning Models,” in 2021 International Conference on Digital Futures and Transformative Technologies (ICoDT2), Islamabad, Pakistan: IEEE, May 2021, pp. 1–6. doi: 10.1109/ICoDT252288.2021.9441508.

[17] Y. S. Mahardhika and E. Zuliarso, “Analisis Sentimen Terhadap Pemerintahan Joko Widodo Pada Media Sosial Twitter Menggunakan Algoritma Naives Bayes Classifier,” in SINTAK, UNISBANK, 2018.

[18] A. Wendland, M. Zenere, and J. Niemann, “Introduction to Text Classification: Impact of Stemming and Comparing TF-IDF and Count Vectorization as Feature Extraction Technique,” in Systems, Software and Services Process Improvement, M. Yilmaz, P. Clarke, R. Messnarz, and M. Reiner, Eds., in Communications in Computer and Information Science, vol. 1442. Cham: Springer International Publishing, 2021, pp. 289–300. doi: 10.1007/978-3-030-85521-5_19.

[19] T. Ahmed, S. F. Mukta, T. Al Mahmud, S. A. Hasan, and M. Gulzar Hussain, “Bangla Text Emotion Classification using LR, MNB and MLP with TF-IDF & CountVectorizer,” in 2022 26th International Computer Science and Engineering Conference (ICSEC), Sakon Nakhon, Thailand: IEEE, Dec. 2022, pp. 275–280. doi: 10.1109/ICSEC56337.2022.10049341.

[20] Joshua Roesslein, “Tweepy Documentation,” Tweepy Documentation, 2009. https://docs.tweepy.org/en/stable/index.html

[21] Python community, “Sastrawi,” Sastrawi. https://pypi.org/project/Sastrawi/

[22] A. Heryana and U. Unggul, “Informan dan pemilihan informan dalam penelitian kualitatif,” Univ. Esa Unggul, vol. 25, p. 15, 2018.

[23] M. T. Ari Bangsa, S. Priyanta, and Y. Suyanto, “Aspect-Based Sentiment Analysis of Online Marketplace Reviews Using Convolutional Neural Network,” IJCCS Indones. J. Comput. Cybern. Syst., vol. 14, no. 2, p. 123, Apr. 2020, doi: 10.22146/ijccs.51646.

[24] P. Nandwani and R. Verma, “A review on sentiment analysis and emotion detection from text,” Soc. Netw. Anal. Min., vol. 11, no. 1, p. 81, Dec. 2021, doi: 10.1007/s13278-021-00776-6.

[25] fabridamicelli, “scikit-learn,” scikit-learn. https://github.com/scikit-learn/scikit-learn

[26] M. Birjali, M. Kasri, and A. Beni-Hssane, “A comprehensive survey on sentiment analysis: Approaches, challenges and trends,” Knowl.-Based Syst., vol. 226, p. 107134, Aug. 2021, doi: 10.1016/j.knosys.2021.107134.

[27] D. Vidotto, J. K. Vermunt, and K. Van Deun, “Bayesian Latent Class Models for the Multiple Imputation of Categorical Data,” methodology, vol. 14, no. 2, pp. 56–68, Apr. 2018, doi: 10.1027/1614-2241/a000146.

[28] P. P. O. Mahawardana and G. A. Sasmita, “Analisis Sentimen Berdasarkan Opini dari Media Sosial Twitter terhadap ‘Figure Pemimpin’ Menggunakan Python,” J. Ilm. Teknol. Dan Komput., vol. 3, no. 1, 2022.

[29] B. A. Prasetyo, “Analisis Sentimen Pengguna Twitter untuk Teks Berbahasa Indonesia Terhadap Penyedia Layanan Home Fix Broadband,” presented at the Seminar Nasional Teknik Industri, Yogyakarta, Indonesia: Universitas Gadjah Mada, 2021.

[30] D. A. Vonega, A. Fadila, and D. E. Kurniawan, “Analisis Sentimen Twitter Terhadap Opini Publik Atas Isu Pencalonan Puan Maharani dalam PILPRES 2024,” J. Appl. Inform. Comput., vol. 6, no. 2, pp. 129–135, Nov. 2022, doi: 10.30871/jaic.v6i2.4300.

[31] W. Aljedaani et al., “Sentiment analysis on Twitter data integrating TextBlob and deep learning models: The case of US airline industry,” Knowl.-Based Syst., vol. 255, p. 109780, Nov. 2022, doi: 10.1016/j.knosys.2022.109780.

[32] D. Hazarika, G. Konwar, S. Deb, and D. J. Bora, “Sentiment Analysis on Twitter by Using TextBlob for Natural Language Processing,” presented at the The International Conference on Research in Management & Technovation 2020, Jan. 2020, pp. 63–67. doi: 10.15439/2020KM20.


Full Text: PDF

DOI: 10.30595/juita.v11i2.17568

Refbacks

  • There are currently no refbacks.


Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 International License.

ISSN: 2579-8901