Implementation of PPCA Imputation, SMOTE-N Class Balancing in Hepatitis Classification Using Naïve Bayes
DOI:
https://doi.org/10.30595/juita.v12i2.21528Keywords:
classification, naïve bayes, ppca, smote-n, hepatitisAbstract
The availability of complete data in research is crucial, especially in the initial stages. The Hepatitis data used in this study encountered issues such as missing data and class imbalance, which hindered its optimal utilization. The method employed to address missing data was the PPCA imputation method. After filling in the missing data, the data was balanced using the SMOTE-N class balancing method and classified using Gaussian Naïve Bayes. The aim of this research was to compare the classification evaluation of hepatitis disease using Naive Bayes with the PPCA imputation approach and SMOTE-N class balancing. The best results from each scenario yielded an AUC value of 0.833 in the first scenario with an 80:20 data split for training and testing, and 0.875 in the second scenario with a 90:10 data split. The highest AUC value was obtained in the application of PPCA imputation with SMOTE-N class balancing using Naive Bayes classification. This demonstrates that the implementation of PPCA imputation with SMOTE-N class balancing has a better impact on the performance of Naïve Bayes classification.References
[1] H. P. Sari, D. Indriastuti, M. Asrul, and Elyasari, “Perbedaan Pengetahuan Pre Dan Post Pendidikan Kesehatan Pada,” J. Keperawatan, vol. 2, no. 3, pp. 9–16, 2019.
[2] A. T. Jalil, S. H. Dilfy, A. Karevskiy, and N. N. Mubark, “Viral hepatitis in Dhi-Qar province: demographics and hematological characteristics of patients,” Int. J. Pharm. Res., vol. 12, no. 1, pp. 2081–2087, 2020, doi: 10.31838/ijpr/2020.12.01.326.
[3] Amrin and O. Pahlevi, “Implementasi Algoritma Klasifikasi Logistic Regression dan Naïve Bayes untuk Diagnosa Penyakit Hepatitis,” J. Tek. Komput. AMIK BSI, vol. 8, no. 2, pp. 174–180, 2022, doi: 10.31294/jtk.v4i2.
[4] H. Susana, N. Suarna, Fathurrohman, and Kaslani, “Penerapan Model Klasifikasi Metode Naive Bayes Terhadap Penggunaan Akses Internet,” J. Ris. Sist. Inf. dan Teknol. Inf., vol. 4, no. 1, pp. 1–8, 2022, doi: 10.52005/jursistekni.v4i1.96.
[5] A. Haditsah, “Klasifikasi Masyarakat Miskin menggunakan Metode Naïve Bayes,” Ilk. J. Ilm., vol. 10, no. 2, pp. 160–165, 2018.
[6] E. K. Ampomah, G. Nyame, Z. Qin, P. C. Addo, E. O. Gyamfi, and M. Gyan, “Stock market prediction with gaussian naïve bayes machine learning algorithm,” Inform., vol. 45, no. 2, pp. 243–256, 2021, doi: 10.31449/inf.v45i2.3407.
[7] H. F. Husniah and T. Arifin, “Implementasi Algoritma Naïve Bayes Berbasis Particle Swarm Optimization Untuk Memprediksi Penyakit Hepatitis,” J. Ilmu Komput., vol. 14, no. 2, pp. 37–49, 2019.
[8] M. E. Febrian, F. X. Ferdinan, G. P. Sendani, K. M. Suryanigrum, and R. Yunanda, “Diabetes prediction using supervised machine learning,” Procedia Comput. Sci., vol. 216, no. 2022, pp. 21–30, 2022, doi: 10.1016/j.procs.2022.12.107.
[9] D. Derisma, “Perbandingan Kinerja Algoritma untuk Prediksi Penyakit Jantung dengan Teknik Data Mining,” J. Appl. Informatics Comput., vol. 4, no. 1, pp. 84–88, 2020, doi: 10.30871/jaic.v4i1.2152.
[10] A. Ilham, “Hybrid metode boostrap Dan teknik imputasi pada metode C4-5 untuk prediksi penyakit ginjal kronis,” Statistika, vol. 8, no. 1, pp. 43–51, 2020, [Online]. Available: https://jurnal.unimus.ac.id/index.php/statistik/article/view/5765.
[11] M. Alabadla et al., “Systematic Review of Using Machine Learning in Imputing Missing Values,” IEEE Access, vol. 10, pp. 44483–44502, 2022, doi: 10.1109/ACCESS.2022.3160841.
[12] P. Madley-Dowd, R. Hughes, K. Tilling, and J. Heron, “The proportion of missing data should not be used to guide decisions on multiple imputation,” J. Clin. Epidemiol., vol. 110, pp. 63–73, 2019, doi: 10.1016/j.jclinepi.2019.02.016.
[13] M. Lutfi and M. Hasyim, “Penanganan data missing value pada kualitas produksi jagung dengan menggunakan metode K-NN Imputation pada algoritma C4.5,” J. Resist., vol. 2, no. 2, 2019.
[14] H. Hegde, N. Shimpi, A. Panny, I. Glurich, P. Christie, and A. Acharya, “MICE vs PPCA: Missing data imputation in healthcare,” Informatics Med. Unlocked, vol. 17, no. November, p. 100275, 2019, doi: 10.1016/j.imu.2019.100275.
[15] Y. E. Kurniawati, “Class Imbalanced Learning Menggunakan Algoritma Synthetic Minority Over-sampling Technique – Nominal (SMOTE-N) pada Dataset Tuberculosis Anak,” J. Buana Inform., vol. 10, no. 2, p. 134, 2019, doi: 10.24002/jbi.v10i2.2441.
[16] E. Erlin, Y. Desnelita, N. Nasution, L. Suryati, and F. Zoromi, “Dampak SMOTE terhadap Kinerja Random Forest Classifier berdasarkan Data Tidak seimbang,” MATRIK J. Manajemen, Tek. Inform. dan Rekayasa Komput., vol. 21, no. 3, pp. 677–690, 2022, doi: 10.30812/matrik.v21i3.1726.
[17] F. H. Alfebi and M. D. Anasanti, “Improving Cardiovascular Disease Prediction by Integrating Imputation, Imbalance Resampling, and Feature Selection Techniques into Machine Learning Model,” IJCCS (Indonesian J. Comput. Cybern. Syst., vol. 17, no. 1, p. 55, 2023, doi: 10.22146/ijccs.80214.
[18] M. P. Pulungan, A. Purnomo, and A. Kurniasih, “Penerapan SMOTE untuk Mengatasi Imbalance Class dalam Klasifikasi Kepribadian MBTI Menggunakan Naive Bayes Classifier,” J. Teknol. Inf. dan Ilmu Komput., vol. 10, no. 7, pp. 1493–1502, 2023, doi: 10.25126/jtiik.1077989.
[19] N. P. Y. T. Wijayanti, E. N. Kencana, And I. W. Sumarjaya, “Smote: Potensi Dan Kekurangannya Pada Survei,” E-Jurnal Mat., vol. 10, no. 4, p. 235, 2021, doi: 10.24843/mtk.2021.v10.i04.p348.
[20] B. Wang, Z. Li, Z. Dai, N. Lawrence, and X. Yan, “A probabilistic principal component analysis-based approach in process monitoring and fault diagnosis with application in wastewater treatment plant,” Appl. Soft Comput. J., vol. 82, 2019, doi: 10.1016/j.asoc.2019.105527.
[21] C. Song, S. Yoon, and V. Pavlovic, “Fast ADMM algorithm for distributed optimization with adaptive penalty,” 30th AAAI Conf. Artif. Intell. AAAI 2016, pp. 753–759, 2016, doi: 10.1609/aaai.v30i1.10069.
[22] L. Huang, Z. Li, R. Luo, and R. Su, “Missing Traffic Data Imputation with a Linear Generative Model Based on Probabilistic Principal Component Analysis,” Sensors, vol. 23, no. 1, 2023, doi: 10.3390/s23010204.
[23] A. Afandi, N. Noviana, and D. Nurdianah, “Naive Bayes Method and C4.5 in Classification of Birth Data,” IJCCS (Indonesian J. Comput. Cybern. Syst., vol. 16, no. 4, p. 435, 2022, doi: 10.22146/ijccs.78198.
[24] F. Tempola, M. Muhammad, and A. Khairan, “Perbandingan Klasifikasi Antara KNN dan Naive Bayes pada Penentuan Status Gunung Berapi dengan K-Fold Cross Validation,” J. Teknol. Inf. dan Ilmu Komput., vol. 5, no. 5, pp. 577–584, 2018, doi: 10.25126/jtiik.201855983.
[25] N. F. Mustamin, F. Aziz, F. Firmansyah, and P. Ishak, “Classification Of Maternal Health Risk Using Three Models Naive Bayes Method,” IJCCS (Indonesian J. Comput. Cybern. Syst., vol. 17, no. 4, p. 395, 2023, doi: 10.22146/ijccs.84242.
[26] L. Qadrini, A. Sepperwali, and A. Aina, “Decision Tree Dan Adaboost Pada Klasifikasi Penerima Program Bantuan Sosial,” J. Inov. Penelit., vol. 2, no. 7, pp. 1959–1966, 2021.
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2024 JUITA : Jurnal Informatika

This work is licensed under a Creative Commons Attribution 4.0 International License.

JUITA: Jurnal Informatika is licensed under a Creative Commons Attribution 4.0 International License.