Simulation Study to Identify Factors Affecting the Performance of LSTM and XGBoost for Anomaly Detection on Labeled Time Series Data

Authors

  • Muhammad Rizky Nurhambali IPB University
  • Yenni Angraini IPB University
  • Anwar Fitrianto IPB University

DOI:

https://doi.org/10.30595/juita.v13i2.26604

Keywords:

Anomaly Detection, Forecasting, LSTM, Time Series, XGBoost

Abstract

Time series analysis has evolved to include forecasting and anomaly detection, which can be applied in various fields. Machine learning methods, such as long short-term memory (LSTM) and extreme gradient boosting (XGBoost), are widely developed because they are considered superior to conventional methods. Both use a forecasting approach for anomaly detection. However, the limitations of both methods on anomalies, such as data length, labeling method, and number of anomalies have not been explored. Therefore, this study aims to identify factors that affect the performance of LSTM and XGBoost in forecasting and anomaly detection through various scenarios and compare their metrics evaluation. The study utilizes Jakarta's air quality index data for 2018–2023, which was preprocessed and augmented for simulation purposes. The study shows that the LSTM method is superior to XGBoost, as shown by the lower MAPE (14.7024%), lower RMSE (13.9909), and higher balanced accuracy (0.9935). These results are reinforced by the significant Mann-Whitney test between the two methods, indicating a difference in the method's accuracy. In addition, the Kruskal-Wallis test for each combination of method and treatment showed significant results. These results indicate that data length, labeling method, and number of anomalies affect the method's accuracy

References

[1] S. Siami-Namini, N. Tavakoli, and A. Siami Namin, “A Comparison of ARIMA and LSTM in Forecasting Time Series,” in 2018 17th IEEE International Conference on Machine Learning and Applications (ICMLA), Orlando, FL: IEEE, Dec. 2018, pp. 1394–1401. doi: 10.1109/ICMLA.2018.00227.

[2] U. M. Sirisha, M. C. Belavagi, and G. Attigeri, “Profit Prediction Using ARIMA, SARIMA and LSTM Models in Time Series Forecasting: A Comparison,” IEEE Access, vol. 10, pp. 124715–124727, 2022, doi: 10.1109/ACCESS.2022.3224938.

[3] M. Alim, G. H. Ye, P. Guan, D. S. Huang, B. Sen Zhou, and W. Wu, “Comparison of ARIMA model and XGBoost model for prediction of human brucellosis in mainland China: a time-series study,” BMJ Open, vol. 10, no. 12, p. e039676, Dec. 2020, doi: 10.1136/BMJOPEN-2020-039676.

[4] M. Noorunnahar, A. H. Chowdhury, and F. A. Mila, “A tree based eXtreme Gradient Boosting (XGBoost) machine learning model to forecast the annual rice production in Bangladesh,” PLOS ONE, vol. 18, no. 3, p. e0283452, Mar. 2023, doi: 10.1371/JOURNAL.PONE.0283452.

[5] E. J. da S. Luz, W. R. Schwartz, G. Cámara-Chávez, and D. Menotti, “ECG-based heartbeat classification for arrhythmia detection: A survey,” Computer Methods and Programs in Biomedicine, vol. 127, pp. 144–164, Apr. 2016, doi: 10.1016/J.CMPB.2015.12.008.

[6] Z. Z. Darban, G. I. Webb, S. Pan, C. Aggarwal, and M. Salehi, “Deep Learning for Time Series Anomaly Detection: A Survey,” ACM Computing Surveys, vol. 57, p. 42, Jan. 2024, doi: 10.1145/3691338.

[7] Y. Cai, M. L. Shyu, Y. X. Tu, Y. T. Teng, and X. X. Hu, “Anomaly detection of earthquake precursor data using long short-term memory networks,” Applied Geophysics, vol. 16, no. 3, pp. 257–266, Sep. 2019, doi: 10.1007/S11770-019-0774-1/METRICS.

[8] Z. Que and Z. Xu, “A Data-Driven Health Prognostics Approach for Steam Turbines Based on Xgboost and DTW,” IEEE Access, vol. 7, pp. 93131–93138, 2019, doi: 10.1109/ACCESS.2019.2927488.

[9] Z. Chen, Z. W. Li, J. Huang, S. Z. Liu, and H. X. Long, “An effective method for anomaly detection in industrial Internet of Things using XGBoost and LSTM,” Scientific reports, vol. 14, no. 1, p. 23969, Dec. 2024, doi: 10.1038/s41598-024-74822-6.

[10] S. K. Goyal and C. V. C. Rao, “Assessment of atmospheric assimilation potential for industrial development in an urban environment: Kochi (India),” Science of The Total Environment, vol. 376, no. 1–3, pp. 27–39, Apr. 2007, doi: 10.1016/J.SCITOTENV.2007.01.067.

[11] H. Zhang, Y. Wang, J. Hu, Q. Ying, and X. M. Hu, “Relationships between meteorological parameters and criteria air pollutants in three megacities in China,” Environmental Research, vol. 140, pp. 242–254, Jul. 2015, doi: 10.1016/J.ENVRES.2015.04.004.

[12] M. R. Nurhambali, Y. Angraini, and A. Fitrianto, “Implementation of Long Short-Term Memory for Gold Prices Forecasting,” Malaysian Journal of Mathematical Sciences, vol. 18, no. 2, pp. 399–422, 2024, doi: 10.47836/mjms.18.2.11.

[13] M. A. Haq, “SMOTEDNN: A Novel Model for Air Pollution Forecasting and AQI Classification,” Computers, Materials & Continua, vol. 71, no. 1, pp. 1403–1425, Nov. 2021, doi: 10.32604/CMC.2022.021968.

[14] S. Bej, N. Davtyan, M. Wolfien, M. Nassar, and O. Wolkenhauer, “LoRAS: an oversampling approach for imbalanced datasets,” Machine Learning, vol. 110, no. 2, pp. 279–301, Feb. 2021, doi: 10.1007/S10994-020-05913-4/FIGURES/2.

[15] Y. Zhu, “Stock Price Prediction based on LSTM and XGBoost Combination Model,” Transactions on Computer Science and Intelligent Systems Research, vol. 1, pp. 94–109, Oct. 2023, doi: 10.62051/Z6DERE47.

[16] J. Luo, Z. Zhang, Y. Fu, and F. Rao, “Time series prediction of COVID-19 transmission in America using LSTM and XGBoost algorithms,” Results in Physics, vol. 27, p. 104462, Aug. 2021, doi: 10.1016/J.RINP.2021.104462.

[17] X. Wang and X. Lu, “A host-based anomaly detection framework using XGBoost and LSTM for IoT devices,” Wireless Communications and Mobile Computing, vol. 2020, 2020, doi: 10.1155/2020/8838571.

[18] Z. Wang, T. Hong, and M. A. Piette, “Building thermal load prediction through shallow machine learning and deep learning,” Applied Energy, vol. 263, p. 114683, Apr. 2020, doi: 10.1016/J.APENERGY.2020.114683.

[19] S. E. Rigdon, E. N. Cruthis, and C. W. Champ, “Design Strategies for Individuals and Moving Range Control Charts,” Journal of Quality Technology, vol. 26, no. 4, pp. 274–287, 1994, doi: 10.1080/00224065.1994.11979539.

[20] P. M. Berthouex, “Constructing Control Charts for Wastewater Treatment Plant Operation,” Research Journal of the Water Pollution Control Federation, vol. 61, no. 9, pp. 1534–1551, 1989.

[21] O. M. Osama, K. Alakkari, M. Abotaleb, and E. S. M. El-Kenawy, “Forecasting Global Monkeypox Infections Using LSTM: A Non-Stationary Time Series Analysis,” in ICEEM 2023 - 3rd IEEE International Conference on Electronic Engineering, Menouf: Institute of Electrical and Electronics Engineers Inc., Oct. 2023. doi: 10.1109/ICEEM58740.2023.10319532.

[22] P. H. Vuong, T. T. Dat, T. K. Mai, P. H. Uyen, and P. T. Bao, “Stock-Price Forecasting Based on XGBoost and LSTM,” Computer Systems Science and Engineering, vol. 40, no. 1, pp. 237–246, Aug. 2021, doi: 10.32604/CSSE.2022.017685.

[23] X. Yidan, H. Shaolin, and Y. Guotao, “Analysis and Improvement Approach of the Impact of Data Disturbance on LSTM Prediction Algorithm,” Transactions on Engineering and Computing Sciences, vol. 11, no. 5, pp. 1–15, Sep. 2023, doi: 10.14738/TECS.115.15411.

[24] R. M. Shukla and S. Sengupta, “Scalable and Robust Outlier Detector using Hierarchical Clustering and Long Short-Term Memory (LSTM) Neural Network for the Internet of Things,” Internet of Things, vol. 9, p. 100167, Mar. 2020, doi: 10.1016/J.IOT.2020.100167.

[25] S. Schmidl, P. Wenig, and T. Papenbrock, “Anomaly detection in time series,” Proceedings of the VLDB Endowment, vol. 15, no. 9, pp. 1779–1797, 2022, doi: 10.14778/3538598.3538602.

[26] H. Huang, “Rank Based Anomaly Detection Algorithms,” Syracuse University, 2013.

[27] K. G. Mehrotra, C. K. Mohan, and H. Huang, Anomaly Detection Principles and Algorithms. in Terrorism, Security, and Computation. Cham: Springer International Publishing, 2017. doi: 10.1007/978-3-319-67526-8.

[28] W. Wu, “Developing an Unsupervised Real-time Anomaly Detection Scheme for Time Series with Multi-seasonality,” IEEE Transactions on Knowledge and Data Engineering, vol. 34, no. 9, pp. 4147–4160, Aug. 2019, doi: 10.1109/TKDE.2020.3035685.

[29] A. M. Committee, “Robust statistics–how not to reject outliers. Part 1. Basic concepts,” Analyst, vol. 114, no. 12, pp. 1693–1697, Jan. 1989, doi: 10.1039/AN9891401693.

[30] T. Iwata, M. Toyoda, S. Tora, and N. Ueda, “Anomaly detection with inexact labels,” Machine Learning, vol. 109, no. 8, pp. 1617–1633, Aug. 2020, doi: 10.1007/S10994-020-05880-W/TABLES/4.

Downloads

Published

2025-08-04

How to Cite

Nurhambali, M. R., Angraini, Y., & Fitrianto, A. (2025). Simulation Study to Identify Factors Affecting the Performance of LSTM and XGBoost for Anomaly Detection on Labeled Time Series Data. JUITA: Jurnal Informatika, 13(2), 219–228. https://doi.org/10.30595/juita.v13i2.26604

Issue

Section

Articles

Similar Articles

1 > >> 

You may also start an advanced similarity search for this article.