Improving Stroke Detection with Hybrid Sampling and Cascade Generalization

Widya Putri Nurmawati, Indahwati Indahwati, Farit Mochamad Afendi

Abstract


The prevalence of stroke in Indonesia has increased. One survey in Indonesia that contains information about the health conditions of the Indonesian people is the Indonesian Family Life Survey (IFLS). The proportion of respondents who had a stroke and non-stroke in IFLS5 showed an imbalance with an extreme level of imbalance; hence, this research aims to overcome this problem with SMOTE, SMOTE-Tomek Link, and SMOTE-ENN; then, the balanced dataset is classified using the ensemble and cascade approaches to improve the detection of stroke risk and to identify the important variables. However, the stroke respondents were still challenging to classify after imbalance class handling, presumably because of the large amount of data before and after balancing. The solution is to balance the training data with various percentages. The results showed the best percentage is applied to 5% of the training data, balanced by the SMOTE-ENN, and the ensemble method with the cascade approach increases the sensitivity and balanced accuracy values. Random forest and logistic regression combine models that produce the best performance, with a classification tree as the final model. The important variables obtained from this combination are the addition of probability from random forest, logistic regression, history of hypertension, age, and physical activity.


Keywords


Ensemble; IFLS; Imbalanced; Cascade; Stroke

References


[1] D. Kuriakose and Z. Xiao, “Pathophysiology and treatment of stroke: Present status and future perspectives,” Int. J. Mol. Sci., vol. 21, no. 20, pp. 1–24, 2020, doi: 10.3390/ijms21207609.

[2] Kemenkes RI, “Hasil Riset Kesehatan Dasar Tahun 2018,” Kementrian Kesehat. RI, vol. 53, no. 9, pp. 1689–1699, 2018.

[3] Kemenkes RI, “Stroke Dont Be The One.” p. 10, 2018.

[4] Y. Wu and Y. Fang, “Stroke prediction with machine learning methods among older chinese,” Int. J. Environ. Res. Public Health, vol. 17, no. 6, pp. 1–11, 2020, doi: 10.3390/ijerph17061828.

[5] M. Shiozawa et al., “Association of body mass index with ischemic and hemorrhagic stroke,” Nutrients, vol. 13, no. 7, pp. 1–13, 2021, doi: 10.3390/nu13072343.

[6] C. A. Jackson, C. L. M. Sudlow, and G. D. Mishra, “Education, sex and risk of stroke: a prospective cohort study in New South Wales, Australia,” BMJ Open, vol. 8, no. 9, p. e024070, Sep. 2018, doi: 10.1136/bmjopen-2018-024070.

[7] Q. Liu et al., “Association between marriage and outcomes in patients with acute ischemic stroke,” J. Neurol., vol. 265, no. 4, pp. 942–948, 2018, doi: 10.1007/s00415-018-8793-z.

[8] B. Pan, X. Jin, L. Jun, S. Qiu, Q. Zheng, and M. Pan, “The relationship between smoking and stroke A meta-analysis,” Med. (United States), vol. 98, no. 12, pp. 1–8, 2019, doi: 10.1097/MD.0000000000014872.

[9] S. Ghozy et al., “Physical activity level and stroke risk in US population: A matched case–control study of 102,578 individuals,” Ann. Clin. Transl. Neurol., vol. 9, no. 3, pp. 264–275, 2022, doi: 10.1002/acn3.51511.

[10] A. Hidayati, S. Martini, and L. Y. Hendrati, “Determinan Kejadian Stroke pada Pasien Hipertensi (Analisis Data Sekunder IFLS 5),” J. Kesehat. Glob., vol. 4, no. 2, pp. 54–65, 2021, doi: 10.33085/jkg.v4i2.4794.

[11] B. W. Yap, K. A. Rani, H. A. Abd Rahman, S. Fong, Z. Khairudin, and N. N. Abdullah, “An application of oversampling, undersampling, bagging and boosting in handling imbalanced datasets,” in Lecture Notes in Electrical Engineering, 2014, vol. 285, pp. 13–22, doi: 10.1007/978-981-4585-18-7_2.

[12] N. V. Chawla, K. W. Bowyer, L. O. Hall, and W. P. Kegelmeyer, “SMOTE: Synthetic Minority Over-sampling Technique,” J. Artif. Intell. Res., vol. 16, pp. 321–357, Jun. 2002, doi: 10.1613/jair.953.

[13] F. Yang, K. Wang, L. Sun, M. Zhai, J. Song, and H. Wang, “A hybrid sampling algorithm combining synthetic minority over-sampling technique and edited nearest neighbor for missed abortion diagnosis,” BMC Med. Inform. Decis. Mak., vol. 22, no. 1, pp. 1–14, 2022, doi: 10.1186/s12911-022-02075-2.

[14] T. Kimura, “Customer Churn Prediction With Hybrid Resampling and Ensemble Learning,” J. Manag. Inf. Decis. Sci., vol. 25, no. 1, pp. 1–23, 2022.

[15] J. Gama and P. Brazdil, “Cascade Generalization,” Mach. Learn., vol. 41, no. 3, pp. 315–343, 2000, doi: 10.1023/A:1007652114878.

[16] K. A. Nugroho, N. A. Setiawan, and T. B. Adji, “Cascade generalization for breast cancer detection,” in Proceedings - 2013 International Conference on Information Technology and Electrical Engineering: “Intelligent and Green Technologies for Sustainable Development”, ICITEE 2013, 2013, pp. 57–61, doi: 10.1109/ICITEED.2013.6676211.

[17] A. A. Aziz, Indahwati, and B. Sartono, “Improving prediction accuracy of classification model using cascading ensemble classifiers,” in IOP Conference Series: Earth and Environmental Science, Jul. 2019, vol. 299, no. 1, p. 012025, doi: 10.1088/1755-1315/299/1/012025.

[18] J. Strauss, F. Witoelar, and B. Sikoki, “User’s Guide for the Indonesia Family Life Survey, Wave 5: Volume 2,” RAND Corporation, 2016. doi: 10.7249/WR1143.2.

[19] G. E. Batista, A. L. Bazzan, and M. C. Monard, “Balancing Training Data for Automated Annotation of Keywords: a Case Study,” WOB, vol. 3, pp. 10–18, 2003.

[20] G. E. Batista, R. C. Prati, and M. C. Monard, “A study of the behavior of several methods for balancing machine learning training data,” ACM SIGKDD Explor. Newsl., vol. 6, no. 1, pp. 20–29, Jun. 2004, doi: 10.1145/1007730.1007735.

[21] S. Singh, R. Shankar, and G. P. Singh, “Prevalence and Associated Risk Factors of Hypertension: A Cross-Sectional Study in Urban Varanasi,” Int. J. Hypertens., vol. 2017, pp. 1–10, 2017, doi: 10.1155/2017/5491838.

[22] I. Setyopranoto et al., “Prevalence of Stroke and Associated Risk Factors in Sleman District of Yogyakarta Special Region, Indonesia,” Stroke Res. Treat., vol. 2019, pp. 1–8, May 2019, doi: 10.1155/2019/2642458.

[23] A. Yonata and A. S. P. Pratama, “Hipertensi sebagai Faktor Pencetus Terjadinya Stroke,” J. Major., vol. 5, no. 3, pp. 17–21, 2016.

[24] M. J. Cipolla, D. S. Liebeskind, and S. L. Chan, “The importance of comorbidities in ischemic stroke: Impact of hypertension on the cerebral circulation,” J. Cereb. Blood Flow Metab., vol. 38, no. 12, pp. 2129–2149, 2018, doi: 10.1177/0271678X18800589.

[25] T. W. Buford, “Hypertension and aging,” Ageing Res. Rev., vol. 26, no. 10, pp. 96–111, Mar. 2016, doi: 10.1016/j.arr.2016.01.007.


Full Text: PDF

DOI: 10.30595/juita.v12i1.19386

Refbacks

  • There are currently no refbacks.


Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 International License.

ISSN: 2579-8901