A Blending Ensemble Approach to Predicting Student Dropout in Massive Open Online Courses (MOOCs)

Muhammad Ricky Perdana Putra; Ema Utami

doi:10.30595/juita.v13i1.24061

Authors

Muhammad Ricky Perdana Putra Universitas Amikom Yogyakarta
Ema Utami Universitas Amikom Yogyakarta

DOI:

https://doi.org/10.30595/juita.v13i1.24061

Keywords:

blending ensemble learning, MOOC, prediction, dropout, SMOTE.

Abstract

The problem faced in the implementation of Massive Open Online Course (MOOC) is the high dropout rate (DO) reaching 90% which exceeds the formal school dropout rate. Preventive action needs to be taken to minimize the impact on MOOCs, instructors, and students. One solution is to do machine learning (ML) based prediction. The use of ML does not escape the problem of prediction performance that is still less accurate so it needs to be improved by blending ensemble learning (BEL). This research builds a BEL model consisting of two layers including base model with KNN, Decision Tree, and NaÃ¯ve Bayes algorithms, then meta model with XGBoost. The dataset from KDD Cup 2015 contains clickstream from XuetangX website. The pre-processing stage includes selecting the course with the most participants, normalization, SMOTE, feature selection, and breaking it into three: ensemble, blender, and test data. The BEL model evaluation results obtained an accuracy value of 90.16%, precision of 85.64%, recall of 97.31%, F1-Score of 91.10%, and AUC of 92.83%.

Author Biographies

Muhammad Ricky Perdana Putra, Universitas Amikom Yogyakarta

Magister of Informatics

Ema Utami, Universitas Amikom Yogyakarta

Magister of Informatics

References

[1] Z. Chi, S. Zhang, and L. Shi, “Analysis and Prediction of MOOC Learners’ Dropout Behavior,” Applied Sciences (Switzerland), vol. 13, no. 2, Jan. 2023, doi: 10.3390/app13021068.

[2] F. Agrusti, G. Bonavolontà, and M. Mezzini, “University dropout prediction through educational data mining techniques: A systematic review,” Journal of E-Learning and Knowledge Society, vol. 15, no. 3, pp. 161–182, Oct. 2019, doi: 10.20368/1971-8829/1135017.

[3] W. Wunnasri, P. Musikawan, and C. So-In, “A Two-Phase Ensemble-Based Method for Predicting Learners’ Grade in MOOCs,” Applied Sciences (Switzerland), vol. 13, no. 3, Feb. 2023, doi: 10.3390/app13031492.

[4] K. Coussement, M. Phan, A. De Caigny, D. F. Benoit, and A. Raes, “Predicting student dropout in subscription-based online learning environments: The beneficial impact of the logit leaf model,” Decis Support Syst, vol. 135, Aug. 2020, doi: 10.1016/j.dss.2020.113325.

[5] A. Alamri et al., “Predicting MOOCs dropout using only two easily obtainable features from the first week’s activities,” in Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), Springer Verlag, 2019, pp. 163–173. doi: 10.1007/978-3-030-22244-4_20.

[6] A. Kisnu Darmawan and M. Makruf, “KLIK: Kajian Ilmiah Informatika dan Komputer Deteksi Gaya Belajar Siswa SMA pada Virtual Based Learning Environment(VBLE) dengan Decision Tree C4.5 dan Naive Bayes,” Media Online, vol. 3, no. 5, pp. 532–544, 2023, [Online]. Available: https://djournals.com/klik

[7] L. J. Rodríguez-Muñiz, A. B. Bernardo, M. Esteban, and I. Díaz, “Dropout and transfer paths: What are the risky profiles when analyzing university persistence with machine learning techniques?,” PLoS One, vol. 14, no. 6, Jun. 2019, doi: 10.1371/journal.pone.0218796.

[8] H. Huang, L. Jew, and D. Qi, “Take a MOOC and then drop: A systematic review of MOOC engagement pattern and dropout factor,” Apr. 01, 2023, Elsevier Ltd. doi: 10.1016/j.heliyon.2023.e15220.

[9] H. Aldowah, H. Al-Samarraie, A. I. Alzahrani, and N. Alalwan, “Factors affecting student dropout in MOOCs: a cause and effect decision‐making model,” J Comput High Educ, vol. 32, no. 2, pp. 429–454, Aug. 2020, doi: 10.1007/s12528-019-09241-y.

[10] H. S. Park and J. Yoo, “Early Dropout Prediction in Online Learning of University using Machine Learning,” 2021. [Online]. Available: www.joiv.org/index.php/joiv

[11] S. Nithya and S. Umarani, “MOOC Dropout Prediction using FIAR-ANN Model based on Learner Behavioral Features.” [Online]. Available: www.ijacsa.thesai.org

[12] M. Şahin, “A Comparative Analysis of Dropout Prediction in Massive Open Online Courses,” Arab J Sci Eng, vol. 46, no. 2, pp. 1845–1861, Feb. 2021, doi: 10.1007/s13369-020-05127-9.

[13] C. Jin, “MOOC student dropout prediction model based on learning behavior features and parameter optimization,” Interactive Learning Environments, vol. 31, no. 2, pp. 714–732, 2023, doi: 10.1080/10494820.2020.1802300.

[14] T. Wu, W. Zhang, X. Jiao, W. Guo, and Y. Alhaj Hamoud, “Evaluation of stacking and blending ensemble learning methods for estimating daily reference evapotranspiration,” Comput Electron Agric, vol. 184, May 2021, doi: 10.1016/j.compag.2021.106039.

[15] J. Niyogisubizo, L. Liao, E. Nziyumva, E. Murwanashyaka, and P. C. Nshimyumukiza, “Predicting student’s dropout in university classes using two-layer ensemble machine learning approach: A novel stacked generalization,” Computers and Education: Artificial Intelligence, vol. 3, Jan. 2022, doi: 10.1016/j.caeai.2022.100066.

[16] N. I. Jha, I. Ghergulescu, and A. N. Moldovan, “OULAD MOOC dropout and result prediction using ensemble, deep learning and regression techniques,” in CSEDU 2019 - Proceedings of the 11th International Conference on Computer Supported Education, SciTePress, 2019, pp. 154–164. doi: 10.5220/0007767901540164.

[17] D. Roh, D. Han, D. Kim, K. Han, and M. Y. Yi, “SIG-Net: GNN based dropout prediction in MOOCs using Student Interaction Graph,” in Proceedings of the ACM Symposium on Applied Computing, Association for Computing Machinery, Apr. 2024, pp. 29–37. doi: 10.1145/3605098.3636002.

[18] G. Kumar, A. Singh, and A. Sharma, “Ensemble Deep Learning Network Model for Dropout Prediction in MOOCs,” 2023.

[19] E. M. Smaili, M. Daoudi, I. Oumaira, S. Azzouzi, and M. El Hassan Charaf, “Towards an Adaptive Learning Model using Optimal Learning Paths to Prevent MOOC Dropout,” International Journal of Engineering Pedagogy, vol. 13, no. 7, pp. 128–144, 2023, doi: 10.3991/ijep.v13i7.40075.

[20] Henderi , T. Wahyuningsih , and E. Rahwanto, “Comparison of Min-Max normalization and Z-Score Normalization in the K-nearest neighbor (kNN) Algorithm to Test the Accuracy of Types of Breast Cancer,” Mar. 2021. [Online]. Available: http://archive.ics.uci.edu/ml.

[21] M. Utari, “Implementation of Data Mining for Drop-Out Prediction using Random Forest Method,” 2020.

[22] G. Psathas, T. K. Chatzidaki, and S. N. Demetriadis, “Predictive Modeling of Student Dropout in MOOCs and Self-Regulated Learning,” Computers, vol. 12, no. 10, Oct. 2023, doi: 10.3390/computers12100194.

[23] F. Henni, B. Atmani, F. Atmani, and F. Saadi, “Improving Coronary Artery Disease Prediction: Use of Random Forest, Feature Importance and Case-Based Reasoning,” International Journal of Decision Support System Technology, vol. 15, no. 1, 2023, doi: 10.4018/ijdsst.319307.

[24] S. Dass, K. Gary, and J. Cunningham, “Predicting student dropout in self-paced mooc course using random forest model,” Information (Switzerland), vol. 12, no. 11, Nov. 2021, doi: 10.3390/info12110476.

[25] I. Y. Yunianto, M. M. M. Mutoffar, and A. K. Kurniawan, “Comparison of Decision Tree, KNN and Naïve Bayes Methods In Predicting Student Late Graduation In the Informatics Engineering Department, Institute Business XYZ,” Adpebi International Journal of Multidisciplinary Sciences, vol. 1, no. 1, pp. 374–383, 2022, doi: 10.54099/aijms.v1i1.304.

[26] T. Chen and C. Guestrin, “XGBoost: A scalable tree boosting system,” in Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Association for Computing Machinery, Aug. 2016, pp. 785–794. doi: 10.1145/2939672.2939785.

[27] A. Hansrajh, T. T. Adeliyi, and J. Wing, “Detection of Online Fake News Using Blending Ensemble Learning,” Sci Program, vol. 2021, 2021, doi: 10.1155/2021/3434458.

[28] K. Kristiawan and A. Widjaja, “Perbandingan Algoritma Machine Learning dalam Menilai Sebuah Lokasi Toko Ritel,” Jurnal Teknik Informatika dan Sistem Informasi, vol. 7, no. 1, Apr. 2021, doi: 10.28932/jutisi.v7i1.3182.

[29] E. Richardson, R. Trevizani, J. A. Greenbaum, H. Carter, M. Nielsen, and B. Peters, “The receiver operating characteristic curve accurately assesses imbalanced datasets,” Patterns, vol. 5, no. 6, Jun. 2024, doi: 10.1016/j.patter.2024.100994.

[30] F. Movahedi, R. Padman, and J. F. Antaki, “Limitations of receiver operating characteristic curve on imbalanced data: Assist device mortality risk scores,” Journal of Thoracic and Cardiovascular Surgery, vol. 165, no. 4, pp. 1433-1442.e2, Apr. 2023, doi: 10.1016/j.jtcvs.2021.07.041.

[31] B. G. Marcot and A. M. Hanea, “What is an optimal value of k in k-fold cross-validation in discrete Bayesian network analysis?,” Comput Stat, vol. 36, no. 3, pp. 2009–2031, Sep. 2021, doi: 10.1007/s00180-020-00999-9.