Comparative Study of Predictive Classification Models on Data with Severely Imbalanced Predictors

Embay Rohaeti, Ani Andriyati

Abstract


Analysing pre-COVID-19 unemployment in West Java is vital for comprehending and tackling Indonesia’s economic challenges. This significance arises not only due to the region’s high unemployment rate, but also from the need to understand unemployment patterns before COVID-19, which has become more relevant now during the country’s post-pandemic recovery phase. This study evaluates four machine learning models (Random Forest, Linear SVM, RBF SVM, and Polynomial SVM) to classify employment status using demographic and job-related variables. The objective is to find the most suitable model, particularly considering the imbalanced nature of the study-case data. Data from the National Labor Force Survey (SAKERNAS) in August 2019 is utilized, comprising 54,429 respondents across districts in West Java. The four models are evaluated using holdout validation with a 70:30 stratified proportion, repeated for 100 times. Results indicate that the random forest model outperforms others in balanced accuracy, F1-score, and computational time. The random forest model also underscores the importance of gender and age in classifying employment status in West Java, suggesting a need for targeted intervention, especially for female citizens and individuals in productive age groups.


Keywords


Unemployment, Random Forest, Linear SVM, RBF SVM, Polynomial SVM

References


[1] R. Layard and J.-E. De Neve, “Unemployment,” Wellbeing, pp. 166–177, Mar. 2023, doi: 10.1017/9781009298957.015.

[2] M. G. Celbiş, “Unemployment in Rural Europe: A Machine Learning Perspective,” Appl Spat Anal Policy, vol. 16, no. 3, 2023, doi: 10.1007/s12061-022-09464-0.

[3] M. Sen, Shreya Basu, Arijit Chatterjee, Anwesha Banerjee, Saheli Pal, Pritam Kumar Mukhopadhyay, Stobak Dutta, Arunabha Tarafdar, “Prediction of Unemployment using Machine Learning Approach,” in Proceedings - 2022 OITS International Conference on Information Technology, OCIT 2022, 2022. doi: 10.1109/OCIT56763.2022.00072.

[4] O. Awujoola, Philip O Odion, Martins E Irhebhude, and Halima Aminu, “Performance Evaluation of Machine Learning Predictive Analytical Model for Determining the Job Applicants Employment Status,” Malaysian Journal of Applied Sciences, vol. 6, no. 1, 2021, doi: 10.37231/myjas.2021.6.1.276.

[5] Badan Pusat Statistik, “Tingkat Pengangguran Terbuka Menurut Provinsi (Persen), 2023,” Nov. 2023. Accessed: Mar. 03, 2024. [Online]. Available: https://www.bps.go.id/id/statistics-table/2/NTQzIzI=/tingkat-pengangguran-terbuka--agustus-2023.html

[6] Badan Pusat Statistik, “Booklet Agustus 2019 Survei Angkatan Kerja Nasional,” 2019.

[7] “Planning a Sustainable Post-Pandemic Recovery in Latin America and the Caribbean,” in The Socio-Economic Implications of the COVID-19 Pandemic, 2021. doi: 10.18356/9789210055390c014.

[8] A. J. Mohammed, “Improving Classification Performance for a Novel Imbalanced Medical Dataset using SMOTE Method,” International Journal of Advanced Trends in Computer Science and Engineering, vol. 9, no. 3, 2020, doi: 10.30534/ijatcse/2020/104932020.

[9] D. Brzezinski, L. L. Minku, T. Pewinski, J. Stefanowski, and A. Szumaczuk, “The impact of data difficulty factors on classification of imbalanced and concept drifting data streams,” Knowl Inf Syst, vol. 63, no. 6, 2021, doi: 10.1007/s10115-021-01560-w.

[10] M. N. Wright and A. Ziegler, “Ranger: A fast implementation of random forests for high dimensional data in C++ and R,” J Stat Softw, vol. 77, no. 1, 2017, doi: 10.18637/jss.v077.i01.

[11] M. Azimi-Pour, H. Eskandari-Naddaf, and A. Pakzad, “Linear and non-linear SVM prediction for fresh properties and compressive strength of high volume fly ash self-compacting concrete,” Constr Build Mater, vol. 230, 2020, doi: 10.1016/j.conbuildmat.2019.117021.

[12] J. Bao, J. Nie, C. Liu, B. Jiang, F. Zhu, and J. He, “Improved blind spectrum sensing by covariance matrix cholesky decomposition and RBF-SVM decision classification at low SNRs,” IEEE Access, vol. 7, 2019, doi: 10.1109/ACCESS.2019.2929316.

[13] S. K. Lee, J. H. Shin, J. Ahn, J. Y. Lee, and D. E. Jang, “Identifying the risk factors associated with nursing home residents’ pressure ulcers using machine learning methods,” Int J Environ Res Public Health, vol. 18, no. 6, 2021, doi: 10.3390/ijerph18062954.

[14] A. Salim, “Karakteristik Tenaga Kerja dan Pertumbuhan Ekonomi Terhadap Pengangguran Tenaga Kerja Terdidik di Indonesia,” Jurnal Ekonomi-Qu, vol. 13, no. 1, 2023, doi: 10.35448/jequ.v13i1.20534.

[15] A. Fakih, N. Haimoun, and M. Kassem, “Youth Unemployment, Gender and Institutions During Transition: Evidence from the Arab Spring,” Soc Indic Res, vol. 150, no. 1, 2020, doi: 10.1007/s11205-020-02300-3.

[16] M. Ryczkowski and M. Zinecker, “Gender unemployment in the Czech and Polish labour market,” Argumenta Oeconomica, vol. 2020, no. 2, 2020, doi: 10.15611/aoe.2020.2.09.

[17] L. B. Strober and R. M. Callanan, “Unemployment in multiple sclerosis across the ages: How factors of unemployment differ among the decades of life,” J Health Psychol, vol. 26, no. 9, 2021, doi: 10.1177/1359105319876340.

[18] R. Mulero and A. Garcia-Hiernaux, “Forecasting unemployment with Google Trends: age, gender and digital divide,” Empir Econ, vol. 65, no. 2, 2023, doi: 10.1007/s00181-022-02347-w.

[19] A. Manzoni and I. Mooi-Reci, “The cumulative disadvantage of unemployment: Longitudinal evidence across gender and age at first unemployment in Germany,” PLoS One, vol. 15, no. 6, 2020, doi: 10.1371/journal.pone.0234786.

[20] S. K. Jwasshaka and N. Fadila, “Minimizing Unemployment of Graduates through Technical Education and Training: Meta-Analysis Approach in Nigeria,” International Journal of Academic Research in Business and Social Sciences, vol. 10, no. 2, 2020, doi: 10.6007/ijarbss/v10-i2/6858.

[21] R. D. P. Loka and P. A. P. Purwanti, “THE EFFECT OF UNEMPLOYMENT, EDUCATION AND THE NUMBER OF POPULATION ON THE POVERTY LEVEL OF REGENCY/CITY IN BALI PROVINCE,” International Journal of Economics, Business and Accounting Research (IJEBAR), vol. 6, no. 2, 2022, doi: 10.29040/ijebar.v6i2.5357.

[22] E. A. J. van Hooft, J. D. Kammeyer-Mueller, C. R. Wanberg, R. Kanfer, and G. Basbug, “Job search and employment success: A quantitative review and future research agenda.,” Journal of Applied Psychology, vol. 106, no. 5, 2021, doi: 10.1037/apl0000675.

[23] C. R. Wanberg, A. A. Ali, and B. Csillag, “Job Seeking: The Process and Experience of Looking for a Job,” Annual Review of Organizational Psychology and Organizational Behavior, vol. 7. 2020. doi: 10.1146/annurev-orgpsych-012119-044939.


Full Text: PDF

DOI: 10.30595/juita.v12i1.21491

Refbacks

  • There are currently no refbacks.


Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 International License.

ISSN: 2579-8901