Sentiment Analysis of the Public Towards the Kanjuruhan Tragedy with the Support Vector Machine Method

- A tragedy occurred in the Indonesian football world during the Arema vs. Persebaya match on October 1, 2022, resulting in the loss of approximately 714 lives, including 131 fatalities and 583 injuries. The tragedy is believed to have been caused by tear gas in the spectator stands and the closure of exits at the Kanjuruhan stadium. This event sparked a diverse range of public responses on social media, which can be analyzed through sentiment analysis. In this study, we employed the Support Vector Machine (SVM) algorithm, known for its speed and accuracy in text classification, to process and analyze tweets from October 1 to 31, 2022, as well as YouTube comments related to the Kanjuruhan tragedy from October 1 to November 20, 2022. Among the different SVM kernels, the RBF kernel exhibited the highest accuracy, precision, recall, and F1 scores, reaching 76.40%, 75.74%, 76.40%, and 75.18% respectively, when predicting data with three labels. Furthermore, the RBF kernel showed the best performance for data with two labels, achieving the highest accuracy, precision, recall, and F1-Score, which increased to 81.54%, 81.56%, 81.54%, and 81.56%, respectively.


I. INTRODUCTION
On October 1st, 2022, a tragic incident occurred during the football match between Arema FC and Persebaya at Kanjuruhan Stadium in Malang Regency, East Java, Indonesia.The incident resulted in the loss of at least 714 lives, including 131 fatalities and 583 injuries, according to the police spokesperson, Irjen Pol Dedi Prasetyo, as of October 9th, 2022 [1].The high number of casualties was attributed to the firing of tear gas into the spectator stands, leading to a stampede, and the closure of stadium exits, which trapped spectators and caused chaos [2].
In today's digital age, individuals can express their opinions and sentiments regarding events and incidents that impact them, particularly on social media platforms.These expressed opinions are commonly referred to as sentiments [3].With the growing number of people sharing their sentiments, it becomes challenging to gauge the overall sentiment of the public.Sentiment analysis is a technique that classifies the emotions conveyed in written text, such as comments and reviews, into positive, negative, or neutral categories [4].
Various approaches exist for identifying the sentiment of a text or document, including unsupervised learning (lexicon-based) and supervised learning (machine learning), as well as hybrid methods [5].The lexicon-based approach, or dictionary-based, involves generating a list of words commonly used to express opinions [6].These words typically consist of adjectives that serve as indicators or benchmarks for sentiment in a sentence, such as "good," "bad," and "beautiful,".On the other hand, the supervised learning approach relies on machine learning algorithms to analyze sentiments, such as Naive Bayes, SVM, K-NN, Logistic Regression, and Decision Tree [7].
Recent studies have employed machine learning methods for sentiment analysis.For instance, the Naive Bayes algorithm was applied to analyze Twitter sentiment regarding COVID-19 in the Philippines [8].Another study combined Naive Bayes with K-means to analyze sentiment in product reviews [9].A hybrid approach utilizing the Random Forest and Support Vector Machine algorithms was also used for sentiment analysis [10].SVM was also employed to predict market movements' direction [11].However, the Naive Bayes method is known for its accuracy and computational efficiency compared to other methods [12].The Naive Bayes method is known for its accuracy and computational efficiency compared to other methods [13].SVM is commonly utilized in various sentiment classification scenarios [14].In a 2019 study comparing sentiment analysis of the 2019 Indonesian presidential election using SVM and KNN, SVM exhibited faster testing time and higher average accuracy than KNN [15].
The novelty of this study lies in the dataset used, which is the Tragedy Kanjuruhan dataset taken from tweets and YouTube comments related to the tragedy.This study compared the performance of SVM with other algorithms such as Naive Bayes, Random Forest, Decision Tree, and K-NN.The results demonstrated that SVM achieved the highest accuracy in classifying sentiments related to the Kanjuruhan tragedy.
The main contributions of this study are providing insights into the public's response to the Kanjuruhan tragedy and demonstrating the use of sentiment analysis in informing disaster response efforts.Additionally, the study contributes to disaster management by introducing a novel approach to sentiment analysis that effectively analyses public sentiment toward specific events or topics.Furthermore, the study contributes to digital communication by utilizing Twitter and YouTube as digital communication tools for analyzing public opinion regarding the tragedy.

II. METHOD
The sentiment analysis in this study was conducted through several flows.The flow diagram can be seen in Fig. 1.

A. Data Collection
Data for this study were collected from two sources: Twitter and YouTube.The Twitter data consisted of tweets about the Kanjuruhan tragedy from October 1 to October 31, 2022, while the YouTube data consisted of comments on the YouTube content "Tragedi Kanjuruhan #UsutsampaiTuntas | Mata Najwa" from the initial upload until November 20, 2022.Tweet crawling was performed using the Snscrape library.YouTube comment crawling was conducted using the Spreadsheet Apps Script with the YouTube API.The crawled data was saved in CSV format, resulting in 15,224 tweets and 3,999 YouTube comments.The chosen period for data collection was based on the high distribution intensity of tweets and comments related to the Kanjuruhan tragedy.

B. Preprocessing
Preprocessing data is the initial stage to refine the crawled data for easier processing.Several steps involved in data preprocessing include case folding, which converts all letters to lowercase; tokenization, which breaks the document into parts called tokens; Filtering, which removes punctuation and nonalphabetic characters; Stop-word Removal, which selects important words or removes words that are not considered significant in the text mining process; and stemming, which transforms words into their base form by removing word affixes [16]- [19].This study will conduct two preprocessing steps on the text data.The first step focuses on cleaning, converting to lowercase, filtering, and normalization.The second step involves dividing the text into individual words, removing common words, and reducing words to their base form.Lemmatization is used in the second step to group words with similar meanings.It simplifies the analysis by converting words to their dictionary form, considering their part of speech and context.Lemmatization is applied in the second preprocessing step to aid text classification.However, it is not used in the first step since the dataset will be labeled using the Indonesian Roberta sentiment classifiers, which can accurately label Indonesian text without lemmatization or stemming [20]- [22].Overall, incorporating lemmatization in the second preprocessing step helps in grouping different word forms to their base form, making it easier to analyze the text.

C. Data Labelling
The data will be labeled using the Indonesian Roberta Sentiment Classifier Inference model, a Deep Learning model with a 95.36% accuracy in classifying text, comments, or reviews in the Indonesian language [20].This model has been previously utilized in a study on sentiment analysis of public opinion regarding the Covid-19 vaccine [23].Hence, manual data labeling was not required during the labeling process.By employing this labeling model, the labeled data's accuracy and consistency are ensured.The labeling will be conducted in two categories: two and three labels.In the 2-label system, positive sentiment is assigned if the labeling score is positive, while negative sentiment is assigned otherwise.In the 3-label system, positive sentiment is assigned if the sentiment input yields a positive score, neutral sentiment is assigned if the sentiment input yields a neutral score, and negative sentiment is assigned if the sentiment input is not classified as positive or neutral.

D. Classification with SVM
After the second preprocessing, the data will be divided into training and testing sets, followed by TF-IDF weighting.Then, a Grid Search will determine the optimal hyperparameters for each SVM kernel.Afterward, sentiment classification will be performed using SVM with four kernels, each employing its best parameters.This comprehensive approach is designed to enhance the study's sentiment analysis performance and accuracy.Utilizing multiple SVM kernels with finetuned hyperparameters allows for an extensive exploration of the best configurations for sentiment classification, ensuring robustness and effectiveness in handling diverse contexts.
Support Vector Machine (SVM) is an effective machine learning technique with good generalization performance for classification.SVM belongs to the class of supervised learning and will find a hyperplane that can divide the input space into two classes [15].In SVM, there is a term called support vector that refers to two different class data with the closest distance, a hyperplane that is the boundary line between the two support vectors, and a margin that is the distance between the support vectors and the hyperplane [24]- [26].The created margin must be maximum to anticipate data similar to other classes.Support Vector Machine (SVM) also has several kernels that can enhance the SVM method, such as Polynomial, Sigmoid, Linear, and Radial Basis Function (RBF) kernels [27] (Table I).The SVM as in Table I uses kernel formulas (Polynomial, Sigmoid, Linear, and RBF) to transform input data into higher-dimensional feature space for identifying a hyperplane separator.The kernel function and parameters depend on the data type since each kernel has strengths and weaknesses.For example, the Polynomial kernel raises a dot product to a certain power and adds a constant.The Sigmoid kernel uses a hyperbolic tangent function, and the Linear kernel performs dot product operation, and the RBF kernel measures distance with the Gaussian function [27].

E. Evaluation
During the evaluation phase, the classification results of the model are meticulously examined by analyzing the labeled data using a confusion matrix.This comprehensive evaluation allows us to assess the model's performance in terms of accuracy, precision, recall, and F1 score.The objective of the evaluation phase as in Table II is to derive meaningful insights regarding the trained model's proficiency in accurately classifying sentiment in the data related to the Kanjuruhan tragedy.
From the Table II can be seen that the accuracy metric is determined by computing the proportion of correct predictions, including both positive and negative data, out of the total data.Precision is computed by taking the ratio of correct positive predictions to the total number of positive predictions.Recall, also known as sensitivity, is calculated by dividing the total number of correct positive predictions by the total number of data labeled as positive.F1-Score is a measure that considers both precision and recall in its calculation [10], [28].The equations used to compute the aforementioned values are (1 -4).

III. RESULT AND DISCUSSION
A. Data Collection A total of 15,224 tweets containing the keyword "kanjuruhan" were collected from Twitter from October 1 to October 31, 2022, capturing the high intensity of tweets related to the Kanjuruhan tragedy.Additionally, 3,999 comments were retrieved on the video titled "Tragedi Kanjuruhan #UsutsampaiTuntas | Mata Najwa," uploaded between October 6 and November 20, 2022.This time-specific approach enables the study to focus on sentiment analysis during those particular periods, mitigating the complexities of real-time sentiment fluctuations.The Twitter data collection period coincides with the peak public engagement on the tragedy.At the same time, the YouTube comments capture sentiments expressed during the availability of the investigative series by Mata Najwa.By narrowing down the data to these specific time ranges, the study gains valuable insights into the public's response to the Kanjuruhan tragedy at those specific points in time.The findings contribute to a better understanding public sentiment within a specific context and provide a solid foundation for improving future responses.The examples of the collected data can be seen in Table III.

B. Data Preprocessing
The preprocessing phase is divided into two steps.In the first preprocessing step, data cleaning, case folding, filtering 1, normalization, data merging, and filtering two are performed.The criteria used to select keywords for data filtering in this step include identifying words that are not relevant to the Kanjuruhan tragedy, such as "judi bola," "sambo," "produk," "jualan," "itaewon," and similar terms.These types of words were frequently present in the tweet dataset related to the Kanjuruhan tragedy.For the YouTube data, the same criteria were applied as for the tweets, adding words that could eliminate subjectivity, such as "najwa", "narasi tv".This resource was done to ensure that the focus of the comments was more directed towards the Kanjuruhan tragedy rather than the organizers or YouTube content creators, specifically Mata Najwa or Narasi TV.After merging the two datasets, a relevant filtering process was performed by selecting keywords from the top 10 hashtags associated with the Kanjuruhan tragedy (Fig. 3).
The second preprocessing is carried out after labeling.In this stage, tokenizing, stop-word removal, and lemmatization are implemented.

C. Data Labeling
This study used the Indonesian Roberta Sentiment Classifier Inference deep learning model to label data.The first pre-processed data was labeled using this deep learning model with two options (Fig. 4).
The labeling results using the Roberta Sentiment Classifier Inference Indonesia deep learning model with 3 and 2 sentiments are depicted in Fig. 4. The findings reveal that in the case of 3 sentiments, there are 6907 texts classified as having a neutral sentiment, 4901 texts classified as having a negative sentiment, and 1897 texts classified as having a positive sentiment.Regarding two sentiments, 6957 texts are classified as positive and 6748 as negative.Examples of the labeled data with 3 and 2 sentiments can be found in Table IV.

D. Data Splitting
The data is divided into two parts with different proportions: the training data with an 80% proportion of the entire data and the test data with a 20% proportion or the remaining data after being reduced for training data.The training data, which accounts for 80% of the total data, is used for the training process.In comparison, the remaining 20% is used as the test data to evaluate the algorithm's performance.Table V shows the results of the data-splitting process for three sentiments and two sentiments.

E. Term Weighting
The dataset that has undergone the labeling process is subsequently partitioned into training and testing data, with a ratio of 80% to 20%.The training and testing data are then subjected to the TF-IDF weighting process (Table VI).
The results of the TF-IDF weighting process for the training and testing data are presented in Table VI, which illustrates the TF-IDF weights assigned to each term in the dataset.The first column represents the test text index; the second column indicates the term index, and the third column shows the assigned TF-IDF weight for the respective term in the test text index.For instance, the row "0 579 0.387712" indicates that the term with index 579 in the test text with an index of 0 has a TF-IDF weight of 0.387712.

F. SVM Classification
After dividing the data into train and test data, TF-IDF weighting was performed and followed by classification using the Support Vector Machine (SVM) method.This study uses four SVM kernels: linear, polynomial, sigmoid, and RBF.The four kernels will be tuned to determine the best parameter values for each kernel using the Grid Search method by entering hyperparameter values as input.The Grid Search process will produce the best parameter values for each kernel.The input values of the parameters are subjected to processing and testing on the train data using grid search to obtain the optimal combination of parameter values.The resulting best hyperparameters of each kernel are presented in Table VII.
After determining the optimal parameter values for each kernel, the performance of each kernel was evaluated in terms of accuracy, precision, recall, and F1-Score.For data with three labels, the RBF kernel achieved the highest accuracy, precision, recall, and F1-Score, with values of 76.40%, 75.74%, 76.40%, and 75.18%, respectively; for data with two labels, the RBF kernel showed the best performance, achieving the highest accuracy, precision, recall, and F1-Score, all of which increased to 81.54%.Other kernels also demonstrated improved accuracy, precision, recall, and F1-Score when evaluated on data with two labels.The results of the evaluation are presented in Table VIII.When we reduce the number of labels to 2, the model will have an easier time distinguishing between positive and negative classes, resulting in increased accuracy, precision, recall, and f1-score.In addition, reducing the number of labels can also reduce the amount of ambiguous or unclear data, allowing the model to focus more on relevant and accurate data.Reducing the number of labels can also reduce overfitting in the model because, with fewer classes, the model has fewer possibilities to learn patterns specific to only one particular label.The RBF kernel performed better than the other kernels in this study due to its ability to handle non-linear separations effectively.The RBF kernel has flexibility in adapting decision contours, which results in more effective sentiment class separation.The smoother decision boundary of the RBF kernel improves accuracy and alignment with the data distribution.The RBF kernel's capability to adapt to complex data sizes and distributions makes it more suitable for sentiment classification.

G. Evaluation
The comparison between the actual test data and the prediction results using the best-performing SVM kernel can be observed in the confusion matrix for two and three sentiments, as depicted in Fig. 5 and Table IX.Based on Table IX, a comparison is presented between the actual and predictive data for three sentiment labels: Negative, Neutral, and Positive.In the 3-sentiment case, the Negative label has 981 actual data and 1026 predictive data, while the Neutral label has 1354 actual data and 1503 predictive data.The Positive label consists of 406 actual data and 212 predictive data.In the 2-sentiment case, there are 1330 actual data and 1352 predictive data for the Negative label, 1420 actual data, and 1411 for the Positive label.The differences between the actual and predictive data are evident, with a variance of 54 data for the Negative label and 194 data for the Positive label in the 3-sentiment case.In the 2sentiment case, there is a slight difference of 22 data for both the Negative and Positive labels as in Fig. 6.
The differences between the actual and predicted data in SVM classification can be attributed to various factors, including insufficient representation of the data's complexity by the features used and the influence of the quantity and quality of the training data on the prediction outcomes.Other than that, to enhance the accuracy of the classification, it is suggested to incorporate more informative features, enhance the data preprocessing steps, employ techniques like dimensionality reduction or feature selection, and improve the quantity and quality of the training data.Furthermore, utilizing ensemble techniques and performing data normalization can also contribute to achieving higher accuracy levels.
The graphs in Fig. 6 compare sentiment classification performance using SVM with other algorithms, namely Naive Bayes, Random Forest, Decision Tree, and K-Nearest Neighbours (KNN).The SVM algorithm with the SVM kernel exhibits the highest accuracy in sentiment classification for the Kanjuruhan tragedy dataset, outperforming other machine learning algorithms in both the 3-label sentiment case and the 2label sentiment case.

IV. CONCLUSION
Based on the testing results conducted on the tweet and YouTube comment data discussing the Kanjuruhan tragedy labeled with the Indonesian Roberta sentiment classifier inference, a total of 13705 final data were labeled using this deep learning model with three sentiment labels, namely 6907 neutral, 4901 negatives, and 1897 positive.In the two sentiment labels, negative and positive, the sentiment labels became 6748 negative and 6957 positive.SVM implementation was successfully carried out with the help of preprocessing, labeling, TFIDF weighting, and hyperparameter tuning of each kernel.The SVM kernels and their optimal parameters were utilized to predict data with three labels: negative, neutral, and positive.The RBF kernel with a value of C = 10 and gamma = 1 exhibited the highest accuracy, precision-recall, and F1 scores among all kernels with values of 76.40%, 75.74%, 76.40%, and 75.18%, respectively.The RBF kernel obtained the highest accuracy, precision, recall, and F1 scores in the data with two negative and positive labels.The testing on data with two labels increased accuracy, precision, recall, and F1 scores, which were 81.54%, 81.56%, 81.54%, and 81.56%, respectively.This increase also occurred in other kernels, namely linear, polynomial, and sigmoid.Although there are differences in labeling between actual data and predicted data, the accuracy obtained is quite good, which is 76.40% for data with three labels and 81.54% for data with two labels.This study suggests several specific areas and techniques that could be explored to enhance the accuracy and efficiency of sentiment analysis in similar contexts.These include investigating advanced feature engineerings techniques such as n-gram analysis, syntactic parsing, and semantic analysis, which can provide more comprehensive contextual information and enhance the model's understanding of subtle language nuances.It is recommended to consider ensemble methods, which involve combining multiple classifiers or utilizing algorithms to leverage the strengths of different models and improve overall performance.Furthermore, further research on advanced natural language processing techniques, increasing the quantity and quality of training data, and adapting models to specific contexts is advised.By exploring these areas and techniques, sentiment analysis is anticipated to yield more precise and contextually relevant results in future studies.

Fig. 6
Fig. 6 Comparison diagram of SVM evaluation results with other algorithms