year 16, Issue 3 (May - June 2022)                   Iran J Med Microbiol 2022, 16(3): 221-232 | Back to browse issues page

XML Persian Abstract Print

Download citation:
BibTeX | RIS | EndNote | Medlars | ProCite | Reference Manager | RefWorks
Send citation to:

Talkhi N, Akhavan Fatemi N, Jabbari Nooghabi M. Revealing Behavior Patterns of SARS-CoV-2 using Clustering Analysis and XGBoost Error Forecasting Models. Iran J Med Microbiol. 2022; 16 (3) :221-232
1- Department of Biostatistics, School of Health, Mashhad University of Medical Sciences, Mashhad, Iran
2- Department of Statistics, Ferdowsi University of Mashhad, Mashhad, Iran
3- Department of Statistics, Ferdowsi University of Mashhad, Mashhad, Iran ,
Abstract:   (496 Views)

Background and Objective: COVID-19 is a highly contagious infectious disease, and it has affected people's daily life and has raised great concern for governments and public health officials. Forecasting its future behavior may be useful for allocating medical resources and defining effective strategies for disease control, etc.
Methods: The collected data was the cumulative and the absolute number of confirmed, death, and recovered cases of COVID-19 from February 20 to July 03, 2021. We used hierarchical cluster analysis. To forecast the future behavior of COVID-19, the Auto-Regressive Integrated Moving Average (ARIMA), Exponential Smoothing (ETS), Automatic Forecasting Procedure (Prophet), Naive, Seasonal Naive (s-Naive), boosted ARIMA, and boosted Prophet models were used.
Results: The results of clustering showed a similar behavior of coronavirus in Iran and other countries such as France, Russia, Turkey, United Kingdom (UK), Argentina, Colombia, Italy, Spain, Germany, Poland, Mexico, and Indonesia. It also revealed similar patterns of SARS-CoV-2 for the same countries in six groups. Results showed that XGBoost models' family had higher accuracy than other models.
Conclusion: In Iran, COVID-19 showed similar behavior patterns compared to the studied developed countries. The family of XGBoost models showed practical results and high precision in forecasting behavior patterns of the virus. Concerning the rapid spread of the virus worldwide, these models can be used to forecast the behavior patterns of SARS-CoV-2. Preventing the spread of the coronavirus, controlling the disease, and breaking down its chain necessitates community assistance, and in this mission, the role of statisticians cannot be neglected.

Full-Text [PDF 601 kb]   (114 Downloads) |   |   Full-Text (HTML)  (133 Views)  
Type of Study: Original Research Article | Subject: Deep learning
Received: 2021/07/18 | Accepted: 2022/01/30 | ePublished: 2022/03/20

1. Mi YN, Huang TT, Zhang JX, Qin Q, Gong YX, Liu SY, et al. Estimating the instant case fatality rate of COVID-19 in China. Int J Infect Dis. 2020;97:1-6. [DOI:10.1016/j.ijid.2020.04.055] [PMID] [PMCID]
2. Ceylan Z. Estimation of COVID-19 prevalence in Italy, Spain, and France. Sci Total Environ. 2020;729:138817. [DOI:10.1016/j.scitotenv.2020.138817] [PMID] [PMCID]
3. Magalhaes JJF, Mendes RPG, Silva C, Silva S, Guarines KM, Pena L, et al. Epidemiological and clinical characteristics of the first 557 successive patients with COVID-19 in Pernambuco state, Northeast Brazil. Travel Med Infect Dis. 2020;38:101884. [DOI:10.1016/j.tmaid.2020.101884] [PMID] [PMCID]
4. Ozturk T, Talo M, Yildirim EA, Baloglu UB, Yildirim O, Rajendra Acharya U. Automated detection of COVID-19 cases using deep neural networks with X-ray images. Comput Biol Med. 2020;121:103792. [DOI:10.1016/j.compbiomed.2020.103792] [PMID] [PMCID]
5. Faiz SHR, Riahi T, Rahimzadeh P, Nikoubakht N. Commentary: Remote electronic consultation for COVID-19 patients in teaching hospitals in Tehran, Iran. Med J Islam Repub Iran. 2020;34(1):31. [DOI:10.47176/mjiri.34.31] [PMID] [PMCID]
6. Al-Qaness MAA, Ewees AA, Fan H, Abd El Aziz M. Optimization Method for Forecasting Confirmed Cases of COVID-19 in China. J Clin Med. 2020;9(3). [DOI:10.3390/jcm9030674] [PMID] [PMCID]
7. Moftakhar L, Seif M, Safe MS. Exponentially Increasing Trend of Infected Patients with COVID-19 in Iran: A Comparison of Neural Network and ARIMA Forecasting Models. Iran J Public Health. 2020;49(Suppl 1):92-100. [DOI:10.18502/ijph.v49iS1.3675] [PMID] [PMCID]
8. Fan J, Liu X, Shao G, Qi J, Li Y, Pan W, et al. The epidemiology of reverse transmission of COVID-19 in Gansu Province, China. Travel Med Infect Dis. 2020;37:101741. [DOI:10.1016/j.tmaid.2020.101741] [PMID] [PMCID]
9. Pontoh RS, Z S, Hidayat Y, Aldella R, Jiwani NM, Sukono. Covid-19 Modelling in South Korea using A Time Series Approach. Intl J Adv Sci Technol. 2020;29(7):1620 - 32.
10. Maleki M, Mahmoudi MR, Wraith D, Pho K-H. Time series modelling to forecast the confirmed and recovered cases of COVID-19. Travel Med Infect Dis. 2020;37:101742. [DOI:10.1016/j.tmaid.2020.101742] [PMCID]
11. Mohammadzadeh rostami F, Nasr Esfahani BN, Ahadi AM, Shalibeik S. A Review of Novel Coronavirus, Severe Acute Respiratory Syndrome Coronavirus 2 (SARS-CoV-2). Iranian Journal of Medical Microbiology. 2020;14(2):154-61. [DOI:10.30699/ijmm.14.2.154]
12. Papastefanopoulos V, Linardatos P, Kotsiantis S. COVID-19: A Comparison of Time Series Methods to Forecast Percentage of Active Cases per Population. Appl Sci. 2020;10(11):3880. [DOI:10.3390/app10113880]
13. Parvizi P, Jalilian M, Parvizi H, Amiri S, Mohammad Doust H. The COVID-19 Pandemic: Data Analysis, Impacts and Future Considerations. Iranian Journal of Medical Microbiology. 2021;15(1):1-17. [DOI:10.30699/ijmm.15.1.1]
14. Ghanbari B. On forecasting the spread of the COVID-19 in Iran: The second wave. Chaos Solitons Fractals. 2020;140:110176. [DOI:10.1016/j.chaos.2020.110176] [PMID] [PMCID]
15. Acevedo ML, Alonso-Palomares L, Bustamante A, Gaggero A, Paredes F, Cortés CP, et al. Infectivity and immune escape of the new SARS-CoV-2 variant of interest Lambda. medRxiv. 2021:2021.06.28.21259673. [DOI:10.1101/2021.06.28.21259673]
16. Mahase E. Delta variant: What is happening with transmission, hospital admissions, and restrictions? BMJ. 2021;373:n1513. [DOI:10.1136/bmj.n1513] [PMID]
17. Zhang Z, Murtagh F, Van Poucke S, Lin S, Lan P. Hierarchical cluster analysis in clinical research with heterogeneous study population: highlighting its visualization with R. Ann Transl Med. 2017;5(4):75. [DOI:10.21037/atm.2017.02.05] [PMID] [PMCID]
18. Renjith S, Sreekumar A, Jathavedan M. Performance evaluation of clustering algorithms for varying cardinality and dimensionality of data sets. Mater Today. 2020;27:627-33. [DOI:10.1016/j.matpr.2020.01.110]
19. Patel S, Sihmar S, Jatain A. A study of hierarchical clustering algorithms. Int J Inf Comput Technol. 2015;3(11):1225-32.
20. Yonar H. Modeling and Forecasting for the number of cases of the COVID-19 pandemic with the Curve Estimation Models, the Box-Jenkins and Exponential Smoothing Methods. Eurasian J Med Oncol. 2020;4(2):160-5. [DOI:10.14744/ejmo.2020.28273]
21. Chaurasia V, Pal S. Covid-19 Pandemic: ARIMA and Regression Model based Worldwide Death Cases Predictions. Research Square; 2020. [DOI:10.21203/]
22. Almasarweh M, Wadi SA. ARIMA Model in Predicting Banking Stock Market Data. Mod Appl Sci. 2018;12(11):4. [DOI:10.5539/mas.v12n11p309]
23. Hyndman R, Koehler AB, Ord JK, Snyder RD. Forecasting with Exponential Smoothing: Springer-Verlag Berlin Heidelberg; 2008. [DOI:10.1007/978-3-540-71918-2]
24. Awajan AM, Ismail MT, Al Wadi S. Improving forecasting accuracy for stock market data using EMD-HW bagging. PloS one. 2018;13(7):e0199582. [DOI:10.1371/journal.pone.0199582] [PMID] [PMCID]
25. Abdulmajeed K, Adeleke M, Popoola L. Online Forecasting of Covid-19 Cases in Nigeria Using Limited Data. Data Brief. 2020;30:105683. [DOI:10.1016/j.dib.2020.105683] [PMID] [PMCID]
26. Hyndman RJ, Athanasopoulos G. Forecasting: principles and practice: OTexts; 2018.
27. Dhakal CP. A naïve approach for comparing a forecast model. Int J Thesis Projects Dissert. 2017;5(1):1-3.
28. Islam SFN, Sholahuddin A, Abdullah AS. Extreme gradient boosting (XGBoost) method in making forecasting application and analysis of USD exchange rates against rupiah. J Phys Conf Ser. 2021;1722(1):012016. [DOI:10.1088/1742-6596/1722/1/012016]
29. Dancho M. modeltime: The Tidymodels Extension for Time Series Modeling 2021 [Available from:
30. Abdullah D, Susilo S, Ahmar AS, Rusli R, Hidayat R. The application of K-means clustering for province clustering in Indonesia of the risk of the COVID-19 pandemic based on COVID-19 data. Qual Quant. 2021:1-9. [DOI:10.1007/s11135-021-01176-w] [PMID] [PMCID]
31. Talkhi N, Akhavan Fatemi N, Ataei Z, Jabbari Nooghabi M. Modeling and forecasting number of confirmed and death caused COVID-19 in IRAN: A comparison of time series forecasting methods. Biomed Signal Process Control. 2021;66:102494. [DOI:10.1016/j.bspc.2021.102494] [PMID] [PMCID]
32. Yadav M, Perumal M, Srinivas M. Analysis on novel coronavirus (COVID-19) using machine learning methods. Chaos Solitons Fractals. 2020;139:110050. [DOI:10.1016/j.chaos.2020.110050] [PMID] [PMCID]
33. Singh PK, Chouhan A, Bhatt RK, Kiran R, Ahmar AS. Implementation of the SutteARIMA method to predict short-term cases of stock market and COVID-19 pandemic in USA. Qual Quant. 2021:1-11. [DOI:10.1007/s11135-021-01207-6] [PMID] [PMCID]
34. Ahmadi A, Fadai Y, Shirani M, Rahmani F. Modeling and forecasting trend of COVID-19 epidemic in Iran until May 13, 2020. Medical Journal of The Islamic Republic of Iran. 2020;34(1):183-95. [DOI:10.47176/mjiri.34.27]
35. Yang Q, Wang J, Ma H, Wang X. Research on COVID-19 based on ARIMA model(Delta)-Taking Hubei, China as an example to see the epidemic in Italy. J Infect Public Health. 2020;13(10):1415-8. [DOI:10.1016/j.jiph.2020.06.019] [PMID] [PMCID]
36. Farooq J, Bazaz MA. A deep learning algorithm for modeling and forecasting of COVID-19 in five worst affected states of India. Alex Eng J. 2021;60(1):587-96. [DOI:10.1016/j.aej.2020.09.037] [PMCID]
37. Christie N, Basri MH. Personal Protective Equipment Demand Forecasting and Inventory Management during COVID-19 Case Study: Public Hospital at Bandung, Indonesia. international conference on management, economics & finance2021. [DOI:10.33422/3rd.icmef.2021.02.135]
38. Rostami-Tabar B, Rendon-Sanchez JF. Forecasting COVID-19 daily cases using phone call data. Appl Soft Comput. 2021;100:106932. [DOI:10.1016/j.asoc.2020.106932] [PMID] [PMCID]
39. Hu H, van der Westhuysen AJ, Chu P, Fujisaki-Manome A. Predicting Lake Erie wave heights and periods using XGBoost and LSTM. Ocean Model. 2021;164:101832. [DOI:10.1016/j.ocemod.2021.101832]
40. Paliari I, Karanikola A, Kotsiantis S, editors. A comparison of the optimized LSTM, XGBOOST and ARIMA in Time Series forecasting. 2021 12th International Conference on Information, Intelligence, Systems & Applications (IISA); 2021 12-14 July 2021. [DOI:10.1109/IISA52424.2021.9555520]

Add your comments about this article : Your username or Email:

Send email to the article author

Rights and permissions
Creative Commons License This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.

© 2022 CC BY-NC 4.0 | Iranian Journal of Medical Microbiology

Designed & Developed by : Yektaweb | Publisher: Farname Inc