Characteristics of Machine Learning-based Univariate Time Series Imputation Method

Dini Ramadhani; Agus Mohamad Soleh; Erfiani Erfiani

doi:10.30595/juita.v12i2.23453

Authors

Dini Ramadhani IPB University, Bogor, Indonesia
Agus Mohamad Soleh IPB University, Bogor, Indonesia
Erfiani Erfiani IPB University, Bogor, Indonesia

DOI:

https://doi.org/10.30595/juita.v12i2.23453

Keywords:

characteristics, imputation, machine learning, missing data, univariate time series

Abstract

Handling missing values in univariate time series analysis poses a challenge, potentially leading to inaccurate conclusions, especially with frequently occurring consecutive missing values. Machine Learning-based Univariate Time Series Imputation (MLBUI) methods, utilizing Random Forest Regression (RFR) and Support Vector Regression (SVR), aim to address this challenge. Considering factors such as time series patterns, missing data patterns, and volume, this study explores the performance of MLBUI in simulated Autoregressive Integrated Moving Average (ARIMA) datasets. Various missing data scenarios (6%, 10%, and 14%) and model scenarios (Autoregressive (AR) models: AR(1) and AR(2); Moving Average (MA) models: MA(1) and MA(2); Autoregressive Moving Average (ARMA) models: ARMA(1,1) and ARMA(2,2); and Autoregressive Integrated Moving Average (ARIMA) models: ARIMA(1,1,1) and ARIMA(1,2,1)) with different standard deviations (0.5, 1, and 2) were examined. Five comparative methods were also used in this research, including Kalman StructTS, Kalman Auto-ARIMA, Spline Interpolation, Stine Interpolation, and Moving Average. The research findings indicate that MLBUI performs exceptionally well in imputing successive missing values. The results of this study indicate that the performance of MLBUI in imputing consecutive missing values, based on MAPE, yielded values of less than 10% across all scenarios used.

Author Biographies

Dini Ramadhani, IPB University, Bogor, Indonesia

Department of Statistic, Faculty of Mathematics and Natural Science

Agus Mohamad Soleh, IPB University, Bogor, Indonesia

Department of Statistic, Faculty of Mathematics and Natural Science

Erfiani Erfiani, IPB University, Bogor, Indonesia

Department of Statistic, Faculty of Mathematics and Natural Science

References

[1] A. Syukur and A. Marjuni, “Stock price forecasting using univariate singular spectral analysis through hadamard transform,” Int. J. Intell. Eng. Syst, vol. 13, no. 2, pp. 96–107, 2020, doi: 10.22266/ijies2020.0430.10.

[2] S. Mishra, C. Bordin, K. Taharaguchi, and I. Palu, “Comparison of deep learning models for multivariate prediction of time series wind power generation and temperature,” Energy Reports, vol. 6, no. 3, pp. 273–286, 2020, doi: 10.1016/j.egyr.2019.11.009.

[3] Y. Ensafi, S. H. Amin, G. Zhang, and B. Shah, “Time Series Forecasting of Seasonal Items Sales using Machine Learning – A comparative analysis,” Int. J. Inf. Manag. Data Insights, vol. 2, no. 1, p. 100058, 2022, doi: 10.1016/j.jjimei.2022.100058.

[4] J. Park, J. Muller, B. Arora, B. Faybishenko, G. Pastorello, C. Varadharajan, R. Suhu, and D. Argarwal, “Long-term missing value imputation for time series data using deep neural networks,” Neural Comput. Appl, vol. 35, no. 12, pp. 9071–9091, 2023, doi: 10.1007/s00521-022-08165-6.

[5] G. Chhabra, “Comparison of imputation methods for univariate time series,” Int. J. Recent Innov. Trends Comput. Commun, vol. 11, no. 2s, pp. 286–292, 2023, doi: 10.17762/ijritcc.v11i2s.6148.

[6] M. Meggiorin, G. Passadore, S. Bertoldo, A. Sottani, and A. Rinaldo, “Comparison of three imputation methods for groundwater level timeseries,” Water (Switzerland), vol. 15, no. 4, p. 801, 2023, doi: 10.3390/w15040801.

[7] D. A. Gomez‑Cravioto, R. E. Diaz‑Ramos, F. J. Cantu‑Ortiz, and H. G. Ceballos, “Data analysis and forecasting of the COVID‑19 spread: A comparison of recurrent neural networks and time series models,” Cognit. Comput, vol. 16, no. 4, pp. 1794-1805, 2021.

[8] A. Y. Yldz, E. Koc, and A. Koc, “Multivariate time series imputation with transformers,” IEEE Signal Process. Lett, vol. 29, no. 507, pp. 2517–2521, 2022, doi: 10.1109/LSP.2022.3224880.

[9] S. Jain, N. Choudhary, and K. Jain, “Outlier detection and imputation of missing data in stock related time series mulitivariate data using LSTM autoencoder,” J. Integr. Sci. Technol, vol. 12, no. 3, pp. 761-761, 2024, doi: 10.62110/sciencein.jist.2024.v12.761.

[10] T. T. H. Phan, “Machine Learning for Univariate Time Series Imputation,” in 2020 International Conference on Multimedia Analysis and Pattern Recognition (MAPR), 2020, pp. 1–6. doi: 10.1109/MAPR49794.2020.9237768.

[11] G. Shanmugasundar, M. Vanitha, R. Čep, V. Kumar, K. Kalita, and M. Ramachandran, “A comparative study of linear, random forest and adaboost regressions for modeling non-traditional machining,” Processes, vol. 9, no. 11, p. 2015, 2021, doi: 10.3390/pr9112015.

[12] R. K. Dash, T. N. Nguyen, K. Cengiz, and A. Sharma, “Fine-tuned support vector regression model for stock predictions,” Neural Comput. Appl, vol. 35, no. 32, pp. 23295–23309, 2023, doi: 10.1007/s00521-021-05842-w.

[13] Y. Lai and D. A. Dzombak, “Use of the Autoregressive Integrated Moving Average (ARIMA) model to forecast near-term regional temperature and precipitation,” Weather Forecast, vol. 35, no. 3, pp. 959–976, 2020, doi: 10.1175/WAF-D-19-0158.1.

[14] A. Denhard, S. Bandyopadhyay, A. Habte, and M. Sengupta, “A Comparison of Time Series Gap-Filling Methods to Impute Solar Radiation Data,” in Proceedings - ISES Solar World Congress 2021, 2021, pp. 1–14. doi: 10.18086/swc.2021.38.03.

[15] A. A. Mir, K. J. Kearfott, F. V. Çelebi, and M. Rafique, “Imputation by Feature Importance (IBFI): A methodology to envelop machine learning method for imputing missing patterns in time series data,” PLoS One, vol. 17, no. 1, p. e0262131, 2022, doi: 10.1371/journal.pone.0262131.

[16] D. G. da Silva, M. T. B. Geller, M. S. dos S. Moura, and A. A. de M. Meneses, “Performance evaluation of LSTM neural networks for consumption prediction,” e-Prime - Adv. Electr. Eng. Electron. Energy, vol. 2, no. 47, p.100030, 2022, doi: 10.1016/j.prime.2022.100030.