Comparative Analysis of Machine Learning Algorithms for Groundwater Potability Classification in Jakarta

Authors

  • Diky Arianto Tarihoran Universitas Mercu Buana
  • Hadi Santoso Universitas Mercu Buana

DOI:

https://doi.org/10.30595/juita.v13i3.27348

Keywords:

Groundwater quality, machine learning, Internet of Things (IoT), hyperparameter tuning

Abstract

Groundwater quality is a fundamental aspect of fulfilling clean water needs, particularly in urban areas such as Jakarta, which faces significant supply limitations due to severe contamination from domestic waste, chemical pollutants, industrial activities, and septic tank leakage. This study aims to compare the performance of nine machine learning algorithms in developing a classification model for groundwater feasibility based on physical parameters. Real-time data were collected from three administrative regions in Jakarta using Internet of Things (IoT) sensors, which monitored pH, temperature, total dissolved solids (TDS), and turbidity. Model evaluation involved hyperparameter tuning, cross-validation, feature importance analysis, LIME interpretation, and performance metrics including AUC, accuracy, precision, recall, and F1-score. The results indicate that CatBoost achieved the highest overall performance (AUC: 0.9448, accuracy: 0.9318, F1-score: 0.9209). LightGBM demonstrated competitive results with an F1-score of 0.9211 and AUC of 0.9431, while XGBoost recorded the highest recall at 0.9359. Random Forest and AdaBoost also exhibited consistent performance, with precision of 0.9094 and recall of 0.9327, respectively. In contrast, Support Vector Machine (SVM) yielded the lowest performance (AUC: 0.8860, accuracy: 0.8499). Based on a comprehensive evaluation, CatBoost model is recommended as the most suitable model for IoT-based groundwater quality classification systems.

References

[1] P. Jayaraman, K. K. Nagarajan, P. Partheeban, and V. Krishnamurthy, “Critical review on water quality analysis using IoT and machine learning models,” Int. J. Inf. Manag. Data Insights, vol. 4, no. 1, p. 100210, 2024, doi: 10.1016/j.jjimei.2023.100210.

[2] R. Haggerty, J. Sun, H. Yu, and Y. Li, “Application of machine learning in groundwater quality modeling - A comprehensive review,” Water Res., vol. 233, no. February, p. 119745, 2023, doi: 10.1016/j.watres.2023.119745.

[3] N. Ardhianie, D. Daniel, P. Purwanto, and K. Kismartini, “Jakarta water supply provision strategy based on supply and demand analysis,” H2Open J., vol. 5, no. 2, pp. 221–233, 2022, doi: 10.2166/h2oj.2022.076.

[4] D. P. J. D. L. Hidup, “Laporan Akhir Pemantauan Kualitas Air Tanah Provinsi DKI Jakarta tahun 2022,” pp. 78–79, 2019.

[5] Y. I. Memon, S. A. Shah, S. H. Mahesar, S. H. Memon, and M. K. Jatoi, “Statistical analysis and physicochemical characteristics of groundwater ‎quality parameters: a case study,” Int. J. Environ. Anal. Chem., vol. 103, no. 10, pp. 2270–2291, 2023, doi: 10.1080/03067319.2021.1890064.

[6] M. Akhlaq, A. Ellahi, R. Niaz, M. Khan, S. S. Sammen, and M. Scholz, “Comparative Analysis of Machine Learning Algorithms for Water Quality Prediction,” Tellus A Dyn. Meteorol. Oceanogr., vol. 76, no. 1, pp. 177–192, 2024, doi: 10.16993/tellusa.4069.

[7] E. Dritsas and M. Trigka, “Efficient Data-Driven Machine Learning Models for Water Quality Prediction,” Computation, vol. 11, no. 2, 2023, doi: 10.3390/computation11020016.

[8] M. Y. Vishnoi and S. V. Taral, “IoT-Enabled Machine Learning for Water Quality Monitoring,” Int J. Adv Comp Theory Engg, vol. 14, no. 1, pp. 123–134, 2025, [Online]. Available: https://journals.mriindia.com/index.php/ijacte/article/view/338

[9] A. A. Suleiman, A. K. Yousafzai, and M. Zubair, “Comparative Analysis of Machine Learning and Deep Learning Models for Groundwater Potability Classification †,” Eng. Proc., vol. 56, no. 1, 2023, doi: 10.3390/ASEC2023-15506.

[10] S. Kaddoura, “Evaluation of Machine Learning Algorithm on Drinking Water Quality for Better Sustainability,” Sustain., vol. 14, no. 18, 2022, doi: 10.3390/su141811478.

[11] E. O. Thomas, “Evaluation of groundwater quality using multivariate, parametric and non-parametric statistics, and GWQI in Ibadan, Nigeria,” Water Sci., vol. 37, no. 1, pp. 117–130, 2023, doi: 10.1080/23570008.2023.2221493.

[12] B. Hamma, A. Alodah, F. Bouaicha, M. F. Bekkouche, A. Barkat, and E. E. Hussein, “Hydrochemical assessment of groundwater using multivariate statistical methods and water quality indices (WQIs),” Appl. Water Sci., vol. 14, no. 2, pp. 1–18, 2024, doi: 10.1007/s13201-023-02084-0.

[13] N. Q. Pham and G. T. Nguyen, “Evaluating Groundwater Quality Using Multivariate Statistical Analysis and Groundwater Quality Index,” Civ. Eng. J., vol. 10, no. 3, pp. 699–713, 2024, doi: 10.28991/CEJ-2024-010-03-03.

[14] A. Suleiman, A. Ibrahim, and U. Abdullahi, “Statistical Explanatory Assessment of Groundwater Quality in Gwale LGA, Kano State, Northwest Nigeria,” Hydrospatial Anal., vol. 4, no. 1, pp. 1–13, 2020, doi: 10.21523/gcj3.2020040101.

[15] A. A. Suleiman, A. Ibrahim, U. A. Abdullahi, and S. A. Suleiman, “Assessment of probability distributions of groundwater quality data in Gwale area, north-western Nigeria,” Ann. Optim. Theory Pract., vol. 3, no. 1, pp. 37–46, 2020, doi: 10.22121/aotp.2020.243381.1039.

[16] A. Ibrahim, A. A. Suleiman, U. A. Abdullahi, and S. A. Suleiman, “Monitoring Groundwater Quality using Probability Distribution in Gwale, Kano state, Nigeria,” J. Stat. Model. Anal., vol. 3, no. 2, pp. 95–108, 2021, doi: 10.22452/josma.vol3no2.6.

[17] S. Arıman, N. G. Soydan-Oksal, N. Beden, and H. Ahmadzai, “Assessment of Groundwater Quality through Hydrochemistry Using Principal Components Analysis (PCA) and Water Quality Index (WQI) in Kızılırmak Delta, Turkey,” Water, vol. 16, no. 11, p. 1570, May 2024, doi: 10.3390/w16111570.

[18] A. J. F. Gadelha, C. O. da Rocha, J. G. V. Neto, and M. A. Gomes, “Multivariate statistical analysis of physicochemical parameters of groundwater quality using PCA and HCA techniques,” Eclet. Quim., vol. 48, no. 4, pp. 37–47, 2023, doi: 10.26850/1678-4618eqj.v48.4.2023.p37-47.

[19] P. Hartanto, R. F. Lubis, B. Y. C. S. S. Syah Alam, Y. A. Sendjaja, I. Ismawan, T. Y. W. M. Iskandarsyah, and H. Hendarmawan, “Multivariate Data Analysis to Assess Groundwater Hydrochemical Characterization in Rawadanau Basin, Banten Indonesia,” Rud. Geol. Naft. Zb., vol. 39, no. 1, pp. 141–154, 2024, doi: 10.17794/rgn.2024.1.12.

[20] A. A. Suleiman, U. A. Abdullahi, A. Suleiman, S. A. Suleiman, and H. U. Abubakar, “Correlation and Regression Model for Physicochemical Quality of Groundwater in the Jaen District of Kano State, Nigeria,” J. Stat. Model. Anal., vol. 4, no. 1, pp. 14–24, 2022, doi: 10.22452/josma.vol4no1.2.

[21] N. Muniroh and E. Agus Priatno, “PENERAPAN ALGORITMA K-NN PADA MACHINE LEARNING UNTUK KLASIFIKASI KUALITAS AIR BUDIDAYA AKUAPONIK BERBASIS IoT,” J. Teknol. dan Bisnis, vol. 4, no. 2, pp. 73–86, 2022, doi: 10.37087/jtb.v4i2.87.

[22] C. N. Ihsan, N. Agustina, M. Naseer, H. Gusdevi, J. F. Rusdi, A. Hadhiwibowo, and F. Abdullah, “Comparison of Machine Learning Algorithms in Detecting Tea Leaf Diseases,” J. RESTI (Rekayasa Sist. dan Teknol. Informasi), vol. 8, no. 1, pp. 135–141, 2024, doi: 10.29207/resti.v8i1.5587.

[23] R. G. de Luna, K. L. Enriquez, E. S. A. Molino, V. C. Magnaye, J. M. R. Dalguntas, A. A. L. Pucyutan, D. Y. D. Umali, R. A. L. Reaño, A. J. M. Lizardo, and J. C. Solis, “A Comparative Study of Machine Learning Techniques for Water Potability Classification,” IEEE Reg. 10 Annu. Int. Conf. Proceedings/TENCON, pp. 1345–1350, 2023, doi: 10.1109/TENCON58879.2023.10322335.

[24] T. Z. Jasman, M. A. Fadhlullah, A. L. Pratama, and R. Rismayani, “Analisis Algoritma Gradient Boosting, Adaboost dan Catboost dalam Klasifikasi Kualitas Air,” J. Tek. Inform. dan Sist. Inf., vol. 8, no. 2, pp. 392–402, 2022, doi: 10.28932/jutisi.v8i2.4906.

[25] J. Patel, C. Amipara, T. A. Ahanger, K. Ladhva, R. K. Gupta, H. O. Alsaab, Y. S. Althobaiti, and R. Ratna, “A Machine Learning-Based Water Potability Prediction Model by Using Synthetic Minority Oversampling Technique and Explainable AI,” Comput. Intell. Neurosci., vol. 2022, 2022, doi: 10.1155/2022/9283293.

[26] A. M. Salih, Z. Raisi-Estabragh, I. B. Galazzo, P. Radeva, S. E. Petersen, K. Lekadir, and G. Menegaz, “A Perspective on Explainable Artificial Intelligence Methods: SHAP and LIME,” Adv. Intell. Syst., vol. 2400304, pp. 1–8, 2024, doi: 10.1002/aisy.202400304.

[27] E. P. Umar, A. Nawir, H. M. Pakka, J. Jamaluddin, N. S. Tappa, and W. Joemsittiprasert, “Analysis of Shallow Groundwater Quality as Consumable Water in Maros Baru District Aquifer Systems, South Sulawesi, Indonesia,” Int. J. Hydrol. Environ. Sustain., vol. 1, no. 1, pp. 33–40, 2022, doi: 10.58524/ijhes.v1i1.55.

[28] A. Orvieto, H. Kersting, F. Proske, F. Bach, and A. Lucchi, “Anticorrelated Noise Injection for Improved Generalization,” Proc. Mach. Learn. Res., vol. 162, pp. 17094–17116, 2022, doi: https://doi.org/10.48550/arXiv.2202.02831.

[29] A. Larasati, S. Surono, A. Thobirin, and D. A. Dewi, “Performance Analysis of Resampling Techniques for Overcoming Data Imbalance in Multiclass Classification,” JUITA J. Inform., vol. 13, no. 1, pp. 57–66, Mar. 2025, doi: 10.30595/juita.v13i1.25270.

[30] J. A. Ilemobayo, O. Durodola, O. Alade, O. J. Awotunde, A. T. Olanrewaju, O. Falana, A. Ogungbire, A. Osinuga, D. Ogunbiyi, A. Ifeanyi, I. E. Odezuligbo, and O. E. Edu, “Hyperparameter Tuning in Machine Learning: A Comprehensive Review,” J. Eng. Res. Reports, vol. 26, no. 6, pp. 388–395, 2024, doi: 10.9734/jerr/2024/v26i61188.

[31] T. Elansari, M. Ouanan, and H. Bourray, “Mixed Radial Basis Function Neural Network Training Using Genetic Algorithm,” Neural Process. Lett., vol. 55, no. 8, pp. 10569–10587, 2023, doi: 10.1007/s11063-023-11339-5.

Downloads

Published

2025-11-08

How to Cite

Tarihoran, D. A., & Santoso, H. (2025). Comparative Analysis of Machine Learning Algorithms for Groundwater Potability Classification in Jakarta. JUITA: Jurnal Informatika, 13(3), 371–381. https://doi.org/10.30595/juita.v13i3.27348

Similar Articles

1 > >> 

You may also start an advanced similarity search for this article.