Applied Machine Learning and Multi-Criteria Decision-Making in Healthcare

Comparison of Forecasting Models in the HIV Epidemiology Using Machine Learning Methods

Author(s): Önder Yakut*, Murat Sayan and Emine Doğru Bolat

Pp: 98-123 (26)

DOI: 10.2174/9781681088716121010009

* (Excluding Mailing and Handling)


In analyzing the Human immunodeficiency virus (HIV) epidemic dynamics, the biggest problem is uncertainty when planning for the future. In future evaluations, predicting what might happen will make the decisions’ results more realistic. Policymakers will have the opportunity to take precautions against any negative changes that may occur. Machine learning methods that produce good and effective predictive results are needed to plan future policies, eliminate the negativities and overcome deciding in an uncertain environment. In this study, seven machine learning models used to make time-series analysis for medical purposes are theoretically explained. Machine learning methods such as Linear Regression, RepTree, Alternating Model Trees, M5, k Nearest Neighbor (kNN), Autoregressive Integrated Moving Average (ARIMA), and Random Forest were used. The dynamics of the HIV epidemic in Turkey have been made stationary time series, considering compliance of the correlation. Then, the time series were preprocessed using the Moving Average technique, and the time series was softened. The time series is divided into 2/3 training and 1/3 test sets. Machine learning methods were trained using these sets, parameter optimization of models was made and tested. Then these models were used to forecast the HIV epidemic Dynamics in Turkey in 3 years between 2019-Q4 and 2022-Q3. The Random Forest method has been successful as the model that produces the least error rate (Mean Absolute Percentage Error, MAPE) among these seven models. According to the estimation results of the Random Forest model, R2 (the coefficient of determination) value was 82.16%, E (efficiency) value was 0.6268, Slope value was 2.3362, and MAPE value was 5.4132%. The Random Forest model has been observed to give excellent results for the three-year forecast of dynamics of the HIV epidemic in Turkey.

Keywords: Akaike Information Criterion (AIC), Alternating Model Trees, ARIMA, Autocorrelation Function (ACF), Bayes Information Criterion (BIC), Chi-square, Efficiency, HIV Epidemiology, kNN, Ljung-Box Q-statistic (LBQ).

Related Journals
Related Books
© 2023 Bentham Science Publishers | Privacy Policy