Performance Analysis of Resampling Techniques for Overcoming Data Imbalance in Multiclass Classification

Anggit Larasati; Sugiyarto Surono; Aris Thobirin; Deshinta Arrova Dewi

doi:10.30595/juita.v13i1.25270

Authors

Anggit Larasati Ahmad Dahlan University
Sugiyarto Surono Ahmad Dahlan University
Aris Thobirin Ahmad Dahlan University
Deshinta Arrova Dewi INTI International University, Malaysia

DOI:

https://doi.org/10.30595/juita.v13i1.25270

Keywords:

Convolutional Neural Network, data augmentation, optimizer comparison, resampling.

Abstract

In the digital era, the development of modern technology has brought significant transformation to the medical world. The main objective of this research is to identify the performance of deep learning models in classifying kidney disease. By integrating the Convolutional Neural Network model, the performance of the classification process can be analyzed effectively and efficiently. However, data imbalance dramatically affects the performance evaluation of a model, requiring data resampling techniques. This research applies two resampling techniques, bootstrap-based random oversampling and random undersampling, to training data and adds data augmentation to increase image variations to prevent model overfitting. The architecture uses MobileNetV2, which compares hyperparameter fine-tuning in three optimizers. This research shows that the performance of MobileNetV2, which implements the bootstrap-based random oversampling technique, has the highest accuracy compared to random undersampling and no resampling methods. The oversampling technique with the RMSprop optimizer produced the highest accuracy, namely 95%. With precision, recall, and F-1 score, respectively, 0.93, 0.95, 0.94. The accuracy of oversampling with the Adam and Nadam optimizer is 94%. So, the contribution of this research is by applying bootstrap-based oversampling techniques and adding data augmentation to produce good model performance to be used to classify medical images.

Author Biographies

Anggit Larasati, Ahmad Dahlan University

Department of Mathematics

Sugiyarto Surono, Ahmad Dahlan University

Department of Mathematics

Aris Thobirin, Ahmad Dahlan University

Department of Mathematics

Deshinta Arrova Dewi, INTI International University, Malaysia

Faculty of Data Science and Information Technology

References

[1] X. Bai, J. Duan, Bo Li, S. Fu, W. Yin and Z. Qu, “Global quantitative analysis andBai, X., Duan, J., Li, B., Fu, S., Yin, W., Yang, Z., & Qu, Z. (2024). Global quantitative analysis and visualization of big data and medical devices based on bibliometrics. Expert Systems with Applications, 254(June). http,” Expert Syst. Appl., vol. 254, no. June, 2024, doi: 10.1016/j.eswa.2024.124398.

[2] V. Muthukrishnan, S. Jaipurkar, and N. Damodaran, “Continuum topological derivative - A novel application tool for segmentation of CT and MRI images,” NeuroImage: Reports, vol. 4, no. 3, 2024, doi: 10.1016/j.ynirp.2024.100215.

[3] M. Momeny, A. A. Neshat, M. A. Hussain, S. Kia, M. Marhamati, A. Jahanbakhshi, “Learning-to-augment strategy using noisy and denoised data: Improving generalizability of deep CNN for the detection of COVID-19 in X-ray images,” Comput. Biol. Med., vol. 136, no. July, 2021, doi: 10.1016/j.compbiomed.2021.104704.

[4] Y. Kim, S. Bu, C. Tao, K. T. Bae, P. Kidney, and D. Study, “Deep Learning – Based Automated Imaging Classi fi cation of ADPKD,” pp. 1802–1809, 2024, doi: 10.1016/j.ekir.2024.04.002

[5] J. Achatz, M. Lukovic, S. Hilt, T. Lädrach, and M. Schubert, “Convolutional neural networks for quality and species sorting of roundwood with image and numerical data,” Expert Syst. Appl., vol. 246, no. October 2023, 2024, doi: 10.1016/j.eswa.2023.123117.

[6] S. Surono, M. Y. F. Afitian, A. Setyawan, D. K. E. Arofah, and A. Thobirin, “Comparison of CNN Classification Model using Machine Learning with Bayesian Optimizer,” HighTech Innov. J., vol. 4, no. 3, pp. 531–542, 2023, doi: 10.28991/HIJ-2023-04-03-05.

[7] C. W. O. Khang Wen Goh, Sugiyarto Surono, M. Y. Firza Afiatin, K. Robiatul Mahmudah, Nursyiva Irsalinda, Mesith Chaimanee, “Comparison of Activation Functions in Convolutional Neural Network for Poisson Noisy Image Classification,” Emerging Science Journal, vol. 8, no. 2. pp. 592–602, 2024. doi: 10.28991/ESJ-2024-08-02-014.

[8] N. A. Mohammed, M. H. Abed, and A. T. Albu-Salih, “Convolutional neural network for color images classification,” Bull. Electr. Eng. Informatics, vol. 11, no. 3, pp. 1343–1349, 2022, doi: 10.11591/eei.v11i3.3730.

[9] V. Sharma, S. Kannan, S. Tanya, and N. Panda, “Detecting Plant Diseases at Scale: A Distributed CNN Approach with PySpark and Hadoop,” Procedia Comput. Sci., vol. 235, no. 2023, pp. 1044–1057, 2024, doi: 10.1016/j.procs.2024.04.099.

[10] F. S. Ishengoma and N. N. Lyimo, “Ensemble model for grape leaf disease detection using CNN feature extractors and random forest classifier,” Heliyon, vol. 10, no. 12, 2024, doi: 10.1016/j.heliyon.2024.e33377.

[11] M. Elena, V. Barbu, and V. Barbu, “ScienceDirect Advanced methods for dealing with high data imbalance for Embryo Advanced dealing high Advanced methods methods for for Fertility dealing with with high data data imbalance imbalance for for Embryo Embryo Classification Fertility Fertility C,” 2024, doi: 10.1016/j.procs.2024.09.230

[12] B. Zhu, X. Jing, L. Qiu, and R. Li, “An Imbalanced Data Classification Method Based on Hybrid Resampling and Fine Cost Sensitive Support Vector Machine,” Comput. Mater. Contin., vol. 79, no. 3, pp. 3977–3999, 2024, doi: 10.32604/cmc.2024.048062.

[13] P. Wibowo and C. Fatichah, “Pruning-based oversampling technique with smoothed bootstrap resampling for imbalanced clinical dataset of Covid-19,” J. King Saud Univ. - Comput. Inf. Sci., vol. 34, no. 9, pp. 7830–7839, 2022, doi: 10.1016/j.jksuci.2021.09.021.

[14] A. S. Paymode and V. B. Malode, “Transfer Learning for Multi-Crop Leaf Disease Image Classification using Convolutional Neural Network VGG,” Artif. Intell. Agric., vol. 6, pp. 23–33, 2022, doi: 10.1016/j.aiia.2021.12.002.

[15] R. Ye, A. Boukerche, X. S. Yu, C. Zhang, B. Yan, and X. J. Zhou, “Data augmentation method for insulators based on Cycle GAN,” J. Electron. Sci. Technol., vol. 22, no. 2, 2024, doi: 10.1016/j.jnlest.2024.100250.

[16] K. Alhada-Lahbabi, D. Deleruyelle, and B. Gautier, “Transfer learning for accelerating phase-field modeling of ferroelectric domain formation in large-scale 3D systems,” Comput. Methods Appl. Mech. Eng., vol. 429, no. June, pp. 1–16, 2024, doi: 10.1016/j.cma.2024.117167.

[17] S. G. Paul, A. A. Biswas, A. Saha, Md. S. Zulfiker, N. A. Ritu, I. Zahan, M. Rahman, “A real-time application-based convolutional neural network approach for tomato leaf disease classification,” Array, vol. 19, no. March, 2023, doi: 10.1016/j.array.2023.100313.

[18] G. Zhou, Q. He, X. Liu, X. Kai, W. Cao, J. Ding, B. Zhuang, S. Xu, “Optimizing MobileNetV2 for improved accuracy in early gastric cancer detection based on dynamic pelican optimizer,” Heliyon, vol. 10, no. 16, pp. 1–17, 2024, doi: 10.1016/j.heliyon.2024.e35854.

[19] V. Singh, A. Chug, and A. P. Singh, “Classification of Beans Leaf Diseases using Fine Tuned CNN Model,” Procedia Comput. Sci., vol. 218, no. 2022, pp. 348–356, 2022, doi: 10.1016/j.procs.2023.01.017.

[20] R. Bertolini, S. J. Finch, and R. H. Nehm, “Quantifying variability in predictions of student performance: Examining the impact of bootstrap resampling in data pipelines,” Comput. Educ. Artif. Intell., vol. 3, no. February, p. 100067, 2022, doi: 10.1016/j.caeai.2022.100067.

[21] K. Baran, “Smartphone thermal imaging for stressed people classification using CNN+MobileNetV2,” Procedia Comput. Sci., vol. 225, pp. 2507–2515, 2023, doi: 10.1016/j.procs.2023.10.242.

[22] G. S. Manivannan, H. Rajaguru, R. S, and S. V. Talawar, “Cardiovascular disease detection from cardiac arrhythmia ECG signals using artificial intelligence models with hyperparameters tuning methodologies,” Heliyon, vol. 10, no. 17, 2024, doi: 10.1016/j.heliyon.2024.e36751.

[23] J. Kang, X. Zhu, L. Shen, and M. Li, “Fault diagnosis of a wave energy converter gearbox based on an Adam optimized CNN-LSTM algorithm,” Renew. Energy, vol. 231, no. July, 2024, doi: 10.1016/j.renene.2024.121022.

[24] R. Abdulkadirov, P. Lyakhov, and N. Nagornov, “Survey of Optimization Algorithms in Modern Neural Networks,” Mathematics, vol. 11, no. 11, 2023, doi: 10.3390/math11112466.

[25] S. A. Hasib, M. M. Gulzar, A. Shakoor, S. Habib, and A. F. Murtaza, “Optimizing electric vehicle driving range prediction using deep learning: A deep neural network (DNN) approach,” Results Eng., vol. 24, no. October, 2024, doi: 10.1016/j.rineng.2024.103630.