Non-linear Kernel Optimisation of Support Vector Machine Algorithm for Online Marketplace Sentiment Analysis

Abdul Fadlil, Imam Riadi, Fiki Andrianto

Abstract


Twitter is a social media platform that is very important in the digital world. Fast communication and interaction make Twitter a vital information center in sentiment analysis. The purpose of this research is to classify public opinion about the presence of marketplaces in Indonesia, both positive and negative sentiments, using a Non-linear SVM algorithm based on 1276 tweets. This research involves the stages of data pre-processing, labeling, feature extraction using TF-IDF, and data division into three scenarios: 80% training data and 20% test data, 50% training data and 50% test data scenario, and 20% training data and 80% test data scenario. The last process, GridSearchCV, combines cross-validation and non-linear SVM parameters for model evaluation using a confusion matrix. The best SVM model resulting from the scenario was 80% training and 20% test data, with hyperparameters Gamma = 100 and C = 0.01, achieving 89% accuracy. When tested on never-before-seen data, the accuracy increased to 90%, with an f1-score of 91%, precision of 88%, and recall of 95% on negative sentiments. In conclusion, evaluating the performance of non-linear SVM kernels with a combination of hyperparameter values can improve accuracy, especially on public response information about online marketplaces and public sentiment.

Keywords


Marketplace, SVM non-linear, Indonesia, machine learning

References


[1] L. Wang and C. A. Alexander, “Machine learning in big data,” Int. J. Math. Eng. Manag. Sci., vol. 1, no. 2, pp. 52–61, 2016, doi: 10.33889/ijmems.2016.1.2-006.

[2] M. I. Al-Mashhadani, K. M. Hussein, E. T. Khudir, and M. Ilyas, “Sentiment Analysis using Optimised Feature Sets in Different Facebook/Twitter Dataset Domains with Big Data,” Iraqi J. Comput. Sci. Math., vol. 3, no. 1, pp. 64–70, 2022, doi: 10.52866/ijcsm.2022.01.01.007.

[3] K. S. MANOJ and S. SMITA, “Support Vector Machine and Random Forest Machine Learning Algorithms for Sentiment Analysis on Tourism Reviews: a Performance Analysis,” i-manager’s J. Comput. Sci., vol. 9, no. 3, p. 1, 2021, doi: 10.26634/jcom.9.3.18479.

[4] I. S. K. Idris, Y. A. Mustofa, and I. A. Salihi, “Analisis Sentimen Terhadap Penggunaan Aplikasi Shopee Mengunakan Algoritma Support Vector Machine (SVM),” Jambura J. Electr. Electron. Eng., vol. 5, no. 1, pp. 32–35, 2023, doi: 10.37905/jjeee.v5i1.16830.

[5] U. Makhmudah, S. Bukhori, J. A. Putra, and B. A. B. Yudha, “Sentiment Analysis of Indonesian Homosexual Tweets Using Support Vector Machine Method,” Proc. - 2019 Int. Conf. Comput. Sci. Inf. Technol. Electr. Eng. ICOMITEE 2019, pp. 183–186, 2019, doi: 10.1109/ICOMITEE.2019.8920940.

[6] I. Kurniawan et al., “Perbandingan Algoritma Naive Bayes Dan SVM Dalam Sentimen Analisis Marketplace Pada Twitter,” JATISI (Jurnal Tek. Inform. dan Sist. Informasi), vol. 10, no. 1, pp. 731–740, 2023, [Online]. Available: https://jurnal.mdp.ac.id/index.php/jatisi/article/view/3582

[7] E. R. Kaburuan, Y. S. Sari, and I. Agustina, “Sentiment Analysis on Product Reviews from Shopee Marketplace using the Naïve Bayes Classifier,” Lontar Komput. J. Ilm. Teknol. Inf., vol. 13, no. 3, p. 150, 2022, doi: 10.24843/lkjiti.2022.v13.i03.p02.

[8] P. S. Hutapea and W. Maharani, “Sentiment Analysis on Twitter Social Media towards Shopee E-Commerce through Support Vector Machine (SVM) Method,” JINAV J. Inf. Vis., vol. 4, no. 1, pp. 7–17, 2023, doi: 10.35877/454ri.jinav1504.

[9] S. N. Alsubari et al., “Data analytics for the identification of fake reviews using supervised learning,” Comput. Mater. Contin., vol. 70, no. 2, pp. 3189–3204, 2022, doi: 10.32604/cmc.2022.019625.

[10] X. Xiahou and Y. Harada, “B2C E-Commerce Customer Churn Prediction Based on K-Means and SVM,” ournal Theor. Appl. Electron. Commer. Res., vol. 17, pp. 458–475, 2022.

[11] H. Tufail, M. U. Ashraf, K. Alsubhi, and H. M. Aljahdali, “The Effect of Fake Reviews on e-Commerce during and after Covid-19 Pandemic: SKL-Based Fake Reviews Detection,” IEEE Access, vol. 10, pp. 25555–25564, 2022, doi: 10.1109/ACCESS.2022.3152806.

[12] Z. Alhaq, A. Mustopa, S. Mulyatun, and J. D. Santoso, “Penerapan Metode Support Vector Machine Untuk Analisis Sentimen Pengguna Twitter,” J. Inf. Syst. Manag., vol. 3, no. 2, pp. 44–49, 2021, doi: 10.24076/joism.2021v3i2.558.

[13] A. B. Osmond and F. Hidayat, “Electronic Commerce Product Recommendation using Enhanced Conjoint Analysis,” Int. J. Adv. Comput. Sci. Appl., vol. 12, no. 11, pp. 666–673, 2021, doi: 10.14569/IJACSA.2021.0121176.

[14] F. El Barakaz, O. Boutkhoum, and A. El Moutaouakkil, “A new preprocessing method reduces the dimensionality of classification models,” ACM Int. Conf. Proceeding Ser., 2019, doi: 10.1145/3372938.3373005.

[15] R. Ahuja, A. Chug, S. Kohli, S. Gupta, and P. Ahuja, “The impact of features extraction on the sentiment analysis,” Procedia Comput. Sci., vol. 152, pp. 341–348, 2019, doi: 10.1016/j.procs.2019.05.008.

[16] P. Mukherjee, Y. Badr, S. Doppalapudi, S. M. Srinivasan, R. S. Sangwan, and R. Sharma, “Effect of Negation in Sentences on Sentiment Analysis and Polarity Detection,” Procedia Comput. Sci., vol. 185, no. June, pp. 370–379, 2021, doi: 10.1016/j.procs.2021.05.038.

[17] H. Zhou, “Research of Text Classification Based on TF-IDF and CNN-LSTM,” J. Phys. Conf. Ser., vol. 2171, no. 1, 2022, doi: 10.1088/1742-6596/2171/1/012021.

[18] A. R. Lubis, M. K. M. Nasution, O. S. Sitompul, and E. M. Zamzami, “The effect of the TF-IDF algorithm in times series in forecasting word on social media,” Indones. J. Electr. Eng. Comput. Sci., vol. 22, no. 2, pp. 976–984, 2021, doi: 10.11591/ijeecs.v22.i2.pp976-984.

[19] A. A. Jalal and B. H. Ali, “Text documents clustering using data mining techniques,” Int. J. Electr. Comput. Eng., vol. 11, no. 1, pp. 664–670, 2021, doi: 10.11591/ijece.v11i1.pp664-670.

[20] S. W. Kim and J. M. Gil, “Research paper classification systems based on TF-IDF and LDA schemes,” Human-centric Comput. Inf. Sci., vol. 9, no. 1, 2019, doi: 10.1186/s13673-019-0192-7.

[21] L. Zhang, “Research on case reasoning method based on TF-IDF,” Int. J. Syst. Assur. Eng. Manag., vol. 12, no. 3, pp. 608–615, 2021, doi: 10.1007/s13198-021-01135-6.

[22] M. S. Reza, U. Hafsha, R. Amin, R. Yasmin, and S. Ruhi, “Improving SVM performance for type II diabetes prediction with an improved non-linear kernel: Insights from the PIMA dataset,” Comput. Methods Programs Biomed. Updat., vol. 4, no. August, p. 100118, 2023, doi: 10.1016/j.cmpbup.2023.100118.

[23] R. Kusumawati, A. D’Arofah, and P. A. Pramana, “Comparison Performance of Naive Bayes Classifier and Support Vector Machine Algorithm for Twitter’s Classification of Tokopedia Services,” J. Phys. Conf. Ser., vol. 1320, no. 1, 2019, doi: 10.1088/1742-6596/1320/1/012016.

[24] N. Nandal, R. Tanwar, T. Choudhury, and S. C. Satapathy, “Context driven bipolar adjustment for optimized aspect level sentiment analysis,” J. Sci. Ind. Res. (India)., vol. 79, no. 2, pp. 122–127, 2020, doi: 10.56042/jsir.v79i2.68447.

[25] M. A. Virgananda, I. Budi, Kamrozi, and R. R. Suryono, “Purchase Intention and Sentiment Analysis on Twitter Related to Social Commerce,” Int. J. Adv. Comput. Sci. Appl., vol. 14, no. 7, pp. 543–550, 2023, doi: 10.14569/IJACSA.2023.0140760.

[26] R. Yang et al., “Big data analytics for financial Market volatility forecast based on support vector machine,” Int. J. Inf. Manage., vol. 50, no. May, pp. 452–462, 2020, doi: 10.1016/j.ijinfomgt.2019.05.027.

[27] H. Syahputra, “Sentiment Analysis of Community Opinion on Online Store in Indonesia on Twitter using Support Vector Machine Algorithm (SVM),” J. Phys. Conf. Ser., vol. 1819, no. 1, 2021, doi: 10.1088/1742-6596/1819/1/012030.

[28] M. Desai and M. A. Mehta, “Techniques for sentiment analysis of Twitter data: A comprehensive survey,” Proceeding - IEEE Int. Conf. Comput. Commun. Autom. ICCCA 2016, no. March, pp. 149–154, 2017, doi: 10.1109/CCAA.2016.7813707.

[29] P. H. Prastyo, I. Ardiyanto, and R. Hidayat, “Indonesian Sentiment Analysis: An Experimental Study of Four Kernel Functions on SVM Algorithm with TF-IDF,” 2020 Int. Conf. Data Anal. Bus. Ind. W. Towar. a Sustain. Econ. ICDABI 2020, 2020, doi: 10.1109/ICDABI51230.2020.9325685.

[30] R. C. Chen and H. L. Lin, “Application of support vector machines on prediction of repeat visitation,” Int. Conf. Comput. Intell. Man-Machine Syst. Cybern. - Proc., vol. 1, no. April, pp. 152–157, 2006.

[31] Y. Yu et al., “Quantitative analysis of multiple components based on support vector machine (SVM),” Optik (Stuttg)., vol. 237, no. March, p. 166759, 2021, doi: 10.1016/j.ijleo.2021.166759.

[32] A. N. Rohman, R. Luviana Musyarofah, E. Utami, and S. Raharjo, “Natural Language Processing on Marketplace Product Review Sentiment Analysis,” 2020 2nd Int. Conf. Cybern. Intell. Syst. ICORIS 2020, 2020, doi: 10.1109/ICORIS50180.2020.9320827.

[33] T. Hastie, R. Tibshirani, and J. Friedman, The Elements of Statistical Learning (Data Mining, Inference, and Prediction), vol. 26, no. 4. 1967.


Full Text: PDF

DOI: 10.30595/juita.v12i1.19798

Refbacks

  • There are currently no refbacks.


Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 International License.

ISSN: 2579-8901