Fraud Detection Using Random Forest Classifier, Logistic Regression, and Gradient Boosting Classifier Algorithms on Credit Cards

Muhamad Sopiyan; Fauziah Fauziah; Yunan Fauzi Wijaya

doi:10.30595/juita.v10i1.12050

Authors

Muhamad Sopiyan <span lang="EN-US">Universitas Nasional, Indonesia</span>
Fauziah Fauziah Universitas Nasional, Indonesia
Yunan Fauzi Wijaya Universitas Nasional, Indonesia

DOI:

https://doi.org/10.30595/juita.v10i1.12050

Keywords:

Data Meaning, Fraud Detection, Gradient Boosting Classifier (GBC), Logistic Regression (LGR), Random Forest Classifier (RFC).

Abstract

The following credit card records were used in this study of 284.807 transactions made by credit card holders in Europe for two days from the Kaggle dataset. This is a very poor data set, having 492 transactions, an imbalance of only 0.172% of the 284.807 transactions. The purpose of this study is to obtain the best model and then simulate it by electronically detecting unauthorized financial transactions in bank payment systems. The dataset for this study is unbalanced class data with 99.80% for the major class and 0.2% for the minor class. This type of class-imbalanced data problem is solved by applying method a combination of minority oversampling techniques using Synthetic Minority Oversampling Technique (SMOTE). To determine the most appropriate and accurate classification in solving class balance problems, comparisons were made with the Random Forest Classifier (RFC), Logistic Regression (LGR), and Gradient Boosting Classifier (GBC) algorithms. The test results in this study are the Random Forest Classifier (RFC) algorithm is better than other algorithms because it has the highest accuracy the percentage of data-train is 100% and data-test is 99.99% and the evaluation of the AUC score as a result of algorithm testing is 0.9999.

Author Biographies

Muhamad Sopiyan, <span lang="EN-US">Universitas Nasional, Indonesia</span>

Informatics,Â Faculty of Communication and Information Technology

Fauziah Fauziah, Universitas Nasional, Indonesia

Informatics,Â Faculty of Communication and Information Technology

Yunan Fauzi Wijaya, Universitas Nasional, Indonesia

Informatics,Â Faculty of Communication and Information Technology

References

[1] Y. P. Anggodo, W. Cahyaningrum, A. N. Fauziyah, I. L. Khoiriyah, O. Kartikasari, and I. Cholissodin, “Hybrid K-Means Dan Particle Swarm Optimization Untuk,” J. Teknol. Inf. dan Ilmu Komput., vol. 4, no. 2, pp. 104–110, 2017.

[2] A. Roihan, P. A. Sunarya, and A. S. Rafika, “Pemanfaatan Machine Learning dalam Berbagai Bidang: Review paper,” IJCIT (Indonesian J. Comput. Inf. Technol., vol. 5, no. 1, pp. 75–82, 2020.

[3] H. Abijono, P. Santoso, and N. L. Anggreini, “Algoritma Supervised Learning Dan Unsupervised Learning Dalam Pengolahan Data,” J. Teknol. Terap. G-Tech, vol. 4, no. 2, pp. 315–318, 2021.

[4] A. Bisri and R. Rachmatika, “Integrasi Gradient Boosted Trees dengan SMOTE dan Bagging untuk Deteksi Kelulusan Mahasiswa,” J. Nas. Tek. Elektro dan Teknol. Inf., vol. 8, no. 4, p. 309, 2019.

[5] S. Apriliana and L. Agustina, “The Analysis of Fraudulent Financial Reporting Determinant through Fraud Pentagon Approach,” J. Din. Akunt., vol. 9, no. 2, pp. 154–165, 2017.

[6] S. Sugidamayatno and D. Lelono, “Outlier Detection Credit Card Transactions Using Local Outlier Factor Algorithm (LOF),” IJCCS (Indonesian J. Comput. Cybern. Syst., vol. 13, no. 4, p. 409, 2019.

[7] M. S. Kumar, V. Soundarya, S. Kavitha, E. S. Keerthika, and E. Aswini, “Credit Card Fraud Detection Using Random Forest Algorithm,” 2019 Proc. 3rd Int. Conf. Comput. Commun. Technol. ICCCT 2019, vol. 5, no. 2, pp. 149–153, 2019.

[8] A. Syukron and A. Subekti, “Penerapan Metode Random Over-Under Sampling dan Random Forest Untuk Klasifikasi Penilaian Kredit,” J. Inform., vol. 5, no. 2, pp. 175–185, 2018.

[9] Y. Yazid and A. Fiananta, “Mendeteksi Kecurangan Pada Transaksi Kartu Kredit Untuk Verifikasi Transaksi Menggunakan Metode Svm,” Indones. J. Appl. Informatics, vol. 1, no. 2, pp. 61–66, 2017.

[10] L. D. Perwara, F. A. Bachtiar, and Indriati, “Penerapan Algoritma Decision Tree C4 . 5 Untuk Deteksi Fraud Pada Kartu Kredit dengan Oversampling Synthetic Minority Technique ( SMOTE ),” J. Pengemb. Teknol. Inf. dan Ilmu Komput., vol. 4, no. 8, pp. 2664–2669, 2020.

[11] G. Niveditha, K. Abarna, and G. V. Akshaya, “Credit Card Fraud Detection Using Random Forest Algorithm,” Int. J. Sci. Res. Comput. Sci. Eng. Inf. Technol., pp. 301–306, 2019.

[12] H. Rianto and R. S. Wahono, “Resampling Logistic Regression untuk Penanganan Ketidakseimbangan Class pada Prediksi Cacat Software,” IlmuKomputer.com J. Softw. Eng., vol. 1, no. 1, pp. 46–53, 2015.

[13] F. Zamachsari and N. Puspitasari, “Penerapan Deep Learning dalam Deteksi Penipuan Transaksi Keuangan Secara Elektronik,” J. RESTI (Rekayasa Sist. dan Teknol. Informasi), vol. 5, no. 2, pp. 203–212, 2021.

[14] F. Mar’i and A. A. Supianto, “Clustering Credit Card Holder Berdasarkan Pembayaran Tagihan Menggunakan Improved K-Means dengan Particle Swarm Optimization,” J. Teknol. Inf. dan Ilmu Komput., vol. 5, no. 6, p. 737, 2018.

[15] M. Y. Sahroni, N. A. Setifani, and D. N. Fitriana, “Analisis perbandingan algoritma Naïve Bayes, k-Nearest Neighbor dan Neural Network untuk permasalahan class-imbalanced data pada kasus credit card fraud dataset,” Teknologi, vol. 11, no. 2, pp. 69–73, 2021.