Download as pdf or txt
Download as pdf or txt
You are on page 1of 16

Artificial Intelligence In Medicine 119 (2021) 102157

Contents lists available at ScienceDirect

Artificial Intelligence In Medicine


journal homepage: www.elsevier.com/locate/artmed

Dengue models based on machine learning techniques: A systematic


literature review
William Hoyos a, b, *, Jose Aguilar b, c, d, Mauricio Toro b
a
Grupo de Investigaciones Microbiológicas y Biomédicas de Córdoba, Universidad de Córdoba, Montería, Colombia
b
Grupo de Investigación en I+D+i en TIC, Universidad EAFIT, Medellín, Colombia
c
Centro de Estudios en Microelectrónica y Sistemas Distribuidos, Universidad de Los Andes, Mérida, Venezuela
d
Universidad de Alcalá, Depto. de Automática, Alcalá de Henares, Spain

A R T I C L E I N F O A B S T R A C T

Keywords: Background: Dengue modeling is a research topic that has increased in recent years. Early prediction and
Dengue decision-making are key factors to control dengue. This Systematic Literature Review (SLR) analyzes three
Diagnostic model modeling approaches of dengue: diagnostic, epidemic, intervention. These approaches require models of pre­
Epidemic model
diction, prescription and optimization. This SLR establishes the state-of-the-art in dengue modeling, using ma­
Intervention model
Machine learning
chine learning, in the last years.
Methods: Several databases were selected to search the articles. The selection was made based on Preferred
Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) methodology. Sixty-four articles were
obtained and analyzed to describe their strengths and limitations. Finally, challenges and opportunities for
research on machine-learning for dengue modeling were identified.
Results: Logistic regression was the most used modeling approach for the diagnosis of dengue (59.1%). The
analysis of the epidemic approach showed that linear regression (17.4%) is the most used technique within the
spatial analysis. Finally, the most used intervention modeling is General Linear Model with 70%.
Conclusions: We conclude that cause-effect models may improve diagnosis and understanding of dengue. Models
that manage uncertainty can also be helpful, because of low data-quality in healthcare. Finally, decentralization
of data, using federated learning, may decrease computational costs and allow model building without
compromising data security.

1. Introduction have been identified: DENV-1, DENV-2, DENV-3 and DENV-4 [3]. The
infection is transmitted to humans by the bite of mosquitoes of the genus
Dengue is a vector-borne disease, with high importance in public Aedes, mainly A. aegypti and A. albopictus [4].
health [1]. This disease is widely distributed worldwide; especially, in In 1997, the World Health Organization (WHO) classified the disease
tropical and subtropical areas [2]. The disease is produced by an arbo­ like dengue fever and dengue hemorrhagic fever [5]. A new classifica­
virus (DENV) that receives the same name. To date, four virus serotypes tion was proposed in 2009, which was based on the severity level of the

Abbreviations: DENV, dengue virus; WHO, World Health Organization; SD, severe dengue; DSS, dengue shock syndrome; SLR, systematic literature review;
PRISMA, preferred reporting items for systematic reviews and meta-analyses; LoR, logistic regression; ANN, artificial neural networks; MLP, multilayer perceptron;
MCS, modified cuckoo search algorithm; PSO, particle swarm optimization; SE, structural equations; AUC, area under the curve; OR, odds ratio; CI, confidence
intervals; RF, random forest; PAF, platelet activating factor; S1P, sphingosine 1-phosphate; IL-1β, interleukin 1 beta; TNFα, tumor necrosis factor alpha; IL-10,
interleukin 10; FL, fuzzy logic; VEGF, vascular endothelial growth factor; SVM, support vector machines; DNA, deoxyribonucleic acid; DT, decision trees; APRI,
aspartate aminotransferase/platelet count ratio index; LASSO, adaptive least absolute shrinkage and selection operator; GAM, generalized additive model; CART,
classification and regression trees; GBM, gradient boosting machines; BRT, boosted regression machines; GWR, geographically weighted regression; SOM, self-
organized maps; SARIMAX, seasonal autoregressive integrated moving average; MAE, mean absolute error; LiR, linear regression; BI, Breteau index; HI, house index;
CoI, container index; AI, Adult index; KNN, k-nearest neighbors; MSE, mean square error; GT, Google trends; DBSI, dengue Baidu search index; RMSE, root mean
square error; GLM, generalized linear model; ALT, alanine aminotransferase; BTS, bayesian time series; CRF, climate risk factor index; SS, stochastic simulation.
* Corresponding author at: Grupo de Investigaciones Microbiológicas y Biomédicas de Córdoba, Universidad de Córdoba, Montería, Colombia.
E-mail address: whoyos@correo.unicordoba.edu.co (W. Hoyos).

https://doi.org/10.1016/j.artmed.2021.102157
Received 19 November 2020; Received in revised form 8 May 2021; Accepted 17 August 2021
Available online 24 August 2021
0933-3657/© 2021 Elsevier B.V. All rights reserved.
W. Hoyos et al. Artificial Intelligence In Medicine 119 (2021) 102157

disease: non-severe dengue (with or without warning signs) and severe the challenges and opportunities for research on dengue modeling for
dengue (SD). This last includes the dengue shock syndrome (DSS) [6]. diagnosis, epidemics and interventions to control dengue. The last sec­
According to the WHO, more than 350 million dengue virus infections tion shows the conclusions, with a description of the works that would
occur annually worldwide. In addition, 20,000 deaths related to dengue be a priority to develop in this research domain.
in the same period of time [7].
Dengue has been the subject of various studies worldwide. Its high 2. Methodology
prevalence in tropical and subtropical regions of the world has gener­
ated interest in its diagnosis, treatment and control. Different systematic This review was based on the PRISMA methodology [27]. The first
literature reviews (SLRs) have been carried out of dengue. Most of them step is to establish research questions; the second is to define a search
have been focused on the evaluation of molecules for the generation of strategy to delimit the findings; the third is to select the papers using
vaccines, control of transmission, epidemiology and development of eligibility criteria; and, finally, the last is to analyze the articles to
rapid-detection tests. In what follows, we briefly explain previous SLRs. extract strengths, limitations and challenges to overcome. To achieve
Several SLRs have described the epidemiology of dengue. Jing and the goal of this review, three research questions were proposed:
Wang [8] showed the epidemiology of dengue according to its
Q1. Which machine learning models have been developed for dengue
geographical and temporal distribution. Besides, Jing and Wang evalu­
diagnosis?
ated risk factors for transmission and control of dengue. Alhaeli et al. [9]
conducted a review of the epidemiology of dengue in Saudi Arabia, Q2. Which machine learning models have been developed for the
where environmental conditions are extreme. Other reviews on the analysis of dengue epidemics?
epidemiology of dengue have been carried out in different countries,
Q3. Which machine learning models have been developed for the
such as Pakistan [10], Thailand [11], Malaysia [12], Philippines [13],
evaluation of dengue control strategies?
Mexico [14] and Brazil [15]. Finally, Villar et al. [1] conducted a SLR of
the epidemiological trends of dengue, in Colombia, for over 12 years
(2000− 2011). 2.1. Search strategy
Another group of SLRs has focused on the production of rapid-
detection tests and vaccines against the virus. For instance, Lim et al. We used several digital libraries (databases): ScienceDirect, IEEE
[16] and Luo et al. [17] conducted reviews and meta-analyses to assess Xplorer, Google Scholar, Emerald, Taylor & Francis and Pubmed. The
the economic impact of rapid-screening tests. Reviews have also been inclusion criteria for the selection of publications were: i) articles from
conducted to identify the latest economic studies of dengue vaccination January 2015 to March 2021, in the English language, related to the
[18,19]. For the development of dengue vaccines, it has been evaluated development and implementation of diagnostic, epidemic and inter­
the immunogenicity, safety and efficacy of the vaccine [20,21,22]. vention models of dengue; ii) articles that match the search terms that
In recent years, with the emergence of machine learning and the describe the research questions. The criteria that allow discarding
increase in data generation, computational methods have been devel­ publications were: i) articles representing the personal opinions of in­
oped for the prediction and evaluation of disease-transmission dy­ dividual experts, ii) conference papers, posters, abstracts, short articles
namics. This has generated interest for SLRs on this subject, to know the and unpublished works; and iii) articles using ordinary-differential-
latest developments and opportunities in this domain. As an example, equations models and other deterministic approaches. Table 1 shows
Louis et al. [23] developed an SLR of dengue to identify the main the search strings derived from the research questions. Search strings
modeling approaches of the disease risk. Another SLR on computational were structured using the logical operators “OR” and “AND”.
methods was conducted by Naish et al. [24], focusing on quantitative
modeling with respect to climate change. Andraud et al. [25] conducted 2.2. Selection procedure
a review of deterministic models of dengue transmission to identify
features for future models. Finally, Lourenço et al. [26] published a The selection procedure was carried out in three stages using inclu­
review of the challenges in dengue research from a computational sion and exclusion criteria above mentioned: i) We chose articles by
perspective. The authors focused on real-time data collection, genetic evaluating their title and keywords to exclude any non-relevant work.
analysis and integrative modeling approaches. Particularly, integrative- We also removed duplicate papers; ii) We examined the summaries of
modeling approaches simulate the epidemiology and molecular evolu­ candidate papers from Stage 1. Then, we evaluated each paper to define
tion of the virus. if it is selected to the next step; iii) We evaluated the full texts of selected
We present a review of three modeling approaches of dengue: papers from Stage 2 to exclude papers that did not meet the criteria.
diagnostic, epidemic and intervention. The goal is to present the Fig. 1 shows the flowchart of the selection process.
development of machine learning models for these contexts. The first A total of 19.327 papers were recovered from the scientific libraries.
approach is to determine whether a patient has dengue or any of its After reviewing the title and keywords, and removing duplicate ele­
variants. The second is to analyze the population-level dengue epidemic; ments (Stage 1), 418 papers were selected. Stage 2 consisted of abstract
in addition, to study morbidity and mortality rates. The third is to review, which allowed the selection of 203 papers. Finally, 64 articles
analyze the impact of interventions to mitigate epidemics of dengue. To met all the eligibility criteria (Stage 3), where 27 were about diagnostic
date, there is no SLR that studies these three aspects related to the dis­ modeling, 29 about epidemic models and 8 about prescriptive
ease together. In addition, it is the first SLR to focus on models to
evaluate the impact of interventions to mitigate dengue epidemics.
Table 1
Finally, this SLR establishes the state-of-the-art in these approaches, Search strings used for each research question.
and, additionally, defines new challenges and opportunities for future
Question Approach String search
research. The objectives of this SLR are:
Q1 Diagnostic [(diagnosis OR diagnostic OR infection) AND (gender
• To collect and describe machine learning models for dengue. OR age OR phenotype OR race OR “clinical profile”)
AND model AND dengue]
• To visualize challenges for future work in dengue modeling. Q2 Epidemic [(epidemic OR outbreak) AND (predictive OR predicting
OR prediction) AND model AND (dengue OR aedes)]
The present document is structured as follows: Section 2 describes Q3 Intervention [(fumigation OR vaccine OR “biologic control” OR
search and selection process of relevant articles; Section 3 describes “decision making” OR intervention OR prescriptive)
AND model AND (dengue OR aedes)]
general results of the research; Section 4 discusses the papers, as well as

2
W. Hoyos et al. Artificial Intelligence In Medicine 119 (2021) 102157

Fig. 1. Flowchart of the selection process.

(intervention) modeling. Taiwan with 9 (14%) articles, followed by Brazil and Vietnam, with 7
(11%) articles in both countries. It was expected that Taiwan, Brazil and
2.3. Preliminary analysis Vietnam would be in the top positions. First, because they are endemic
countries where the amount of data available is greater. Second, because
This subsection shows the preliminary results of selected articles. they are countries close to the equatorial axis and contain tropical and
Fig. 2 shows the distribution of the reviewed articles on dengue subtropical regions. In contrast, there are dengue-endemic countries
modeling around the world. The highest number of studies were from with a low amount of publications on diagnostic, epidemic and

Taiwan
Vietnam

Brazil

0 2 4 6 9
Frequency 1 3 5 7

Fig. 2. Worldwide distribution of the reviewed papers on dengue modeling.

3
W. Hoyos et al. Artificial Intelligence In Medicine 119 (2021) 102157

intervention modeling. Among these are Colombia, Ecuador and developed on diagnostic, epidemic and intervention modeling of
Venezuela. These countries may not have many publications because dengue.
they invest very low in science and research [28].
Many predictors/features/variables are currently used for dengue 3.1. Diagnostic models of dengue
modeling, which are classified according to their own characteristics
and their method of collection. We classified them as: demographic, This section analyzes articles related to diagnostic models of dengue.
economic, clinical, laboratory, environmental, climatic, among others. The analysis was carried out according to different aspects of the disease
For a better understanding, a brief description of them, with some ex­ that are currently important: early detection, seroprevalence used to
amples, is shown in Table 2. In general, the most used variables were determine populations at risk to acquire the disease, use of cytokines and
socio-economic and demographic data. Their easy access and avail­ plasma leakage as early markers of severity, and new diagnostic
ability would explain their high frequency of use. The combination of methods such as Raman spectroscopy.
clinical and laboratory data was the most used for diagnostic modeling.
The use of this data type is crucial for this approach because it allows 3.1.1. Early diagnosis of dengue
finding relationships between the data and early detection of the dis­ Early diagnosis of dengue could prevent complications and death.
ease. In terms of the epidemic approach, the most widely used pre­ For this reason, Macedo-Hair et al. [29] presented the analysis of clinical
dictors/variables were climatic, environmental and meteorological. profiles of 523 dengue patients. In this case, Macedo-Hair et al. used
These data types are widely used for the spatial-temporal analysis of unsupervised learning to find natural clusters associated with clinical
dengue to map the distribution of the mosquito or disease. Finally, patterns in confirmed cases of dengue. The results showed that the
intervention modeling focused on the evaluation of control strategies of model can classify dengue into four states (4 clusters): dengue without
the mosquito. For this reason, entomological data were the most used for warning signs, dengue with warning signs, SD and an intermediate state.
this purpose. These clusters can be used as risk criteria to diagnose dengue. Fernandez
According to this review, the least used variables for dengue et al. [30] presented a model based on logistic regression (LoR) to
modeling are genomics, cellphone and thermal-imaging data. Genomics differentiate dengue from other febrile diseases. The use of laboratory,
data obtained from genetic tests were not usually performed for the clinical and demographic data was useful to reveal the association of
diagnosis of dengue. Cellphone data are not easily acquired due to user predictive variables with the risk of suffering dengue. The results
privacy issues, and thermal imaging requires specialized tools that are showed that there was a strong association of dengue with the explan­
not available in clinical practice. atory variables: male sex, petechiae, skin rashes, myalgias, retro-ocular
pain, positive tourniquet test and gingival bleeding. The accuracy of the
3. Analysis of reviewed papers model to diagnose dengue was 69.2%.
Artificial neural networks (ANN) are machine learning algorithms that
Sixty-four articles were reviewed and analyzed to find what has been describe functional dependencies between input and output variables.
One drawback of ANN is their optimization functions. ANN is set-up as a
Table 2
non-convex optimization problem where there could be a local mini­
Description, examples and frequency of predictors used in dengue modeling in mum that is not a global minimum. To overcome this, Chatterjee et al.
the reviewed papers. [31] used an ANN (multilayer perceptron (MLP)) with a Cukoo search
algorithm (ANN-MCS), to classify healthy people, patients with dengue
Type of predictor Definition Examples Articles
and SD. The results showed that ANN-MCS improves the model accuracy
Demographic + Characteristics related Age, sex, population, 31 (95.65%), compared to the use of unmodified ANN (87.5%). Gambhir
Social + to the development of a housing type, socio-
Economic + population, from a economic level.
et al. [32] also used an ANN (MLP) to early predict dengue. The authors
Population quantitative supplemented ANN with the particle swarm-optimization (PSO) algo­
perspective. rithm. The results of the combined model showed an accuracy of 87%,
Climatic + Characteristics related Temperature, 29 higher than the accuracy of ANN without PSO (79%). A deep neural
Environmental + to climate and rainfall,
network (DNN) is an ANN with several hidden layers between the input
Meteorological + environment. precipitation,
Topographic elevation. and output layers. This kind of ANN can model more complex nonlinear
Laboratory Analytical Platelet and 25 relationships [33]. Ho et al. [34] used a DNN to identify laboratory-
determinations of leukocyte count, confirmed dengue cases using only four input variables (age, body
metabolites in the hematocrit, albumin, temperature, leukocyte count and platelets). The developed model by
blood that may be transaminases
altered in patients with
Ho et al. was compared to LoR and DT. The area under the curve (AUC)
dengue. was used to evaluate the performance. The results showed similar per­
Clinical Signs and symptoms of Blood pressure, fever, 21 formance in the developed models (DNN = 0.86 Vs. DT = 0.85 Vs. LoR
patients with dengue. joint pain, headache, = 0.84), with DNN being slightly better.
retro-ocular pain,
Park et al. [35] developed models to classify dengue, SD and DSS
arthralgia, myalgia.
Search Index and Data from Internet Google Trends, Baidu 7 using structural equation (SE) modeling. Park et al. used clinical and
social networks Search Index, laboratory data for this purpose, and their models showed good per­
Twitter. formance for each disease variety (dengue: AUC = 0.84, SD: AUC =
Entomological Data related to the Breteau index, 7 0.67, DSS: AUC = 0.70). Finally, Khosavanna et al. [36] developed two
biological vector and container index,
its propagation. house index, one
diagnostic algorithms (LoR and DT) based on clinical symptoms of
adult index, dengue patients. The models performed similarly in specificity (LoR =
predations rates. 0.63 Vs. DT = 0.67) and sensitivity (LoR = 0.78 Vs. DT = 0.77).
Genomic Genetic data Gene-expression 2
levels
3.1.2. Seroprevalence of dengue
Thermal Images Images obtained from Thermograms 1
infrared cameras Dengue seroprevalence studies allow knowing past or current cir­
Cellphone Data obtained from Geo-localization 1 culation of the virus in a specific area. This allows, among other things,
cellphones to determine the populations at risk for a disease, and to evaluate the
Mobility Air-passengers travel Destination country 1 mechanisms of transmission [37]. Al-Raddadi et al. [38] developed the
data
first multivariate model using LoR to estimate dengue seroprevalence in

4
W. Hoyos et al. Artificial Intelligence In Medicine 119 (2021) 102157

four endemic cities in Saudi Arabia. The authors analyzed the associa­ Davi et al. [54] used gene expression data to diagnose SD. The au­
tion of risk factors (demographic, clinical, and environmental) with the thors used an MLP model with an average accuracy of 86%. Another
disease using a multivariate LoR. They used the odds ratio (OR) with a study conducted by Tuan et al. [55], in Vietnam, applied multivariate
95% confidence interval (CI) to present the results. The predictors asso­ LoR models to demographic, clinical and laboratory data. The AUC re­
ciated with the highest seroprevalence rate were: age over 30 years (OR sults were 0.95, with a sensitivity of 87% and specificity of 88%.
[95% CI] = 3.91[2.78, 5.50]), housing type (OR [95% CI] = 1.93[1.62, Another model, using the same data and technique, was developed by
2.31]), absence of pest-control activities (OR [95% CI] = 1.39 [1.13, Ahmad et al. [56]. The authors evaluated the main warning signs for SD.
1.72]) and presence of mosquitoes at home (OR [95% CI] = 1.39 [1.14, The best results showed that the model had a sensitivity of 91% when at
1.70]). Aguas et al. [39] used random forest (RF), on laboratory data least one warning sign was present, and a specificity of 99% when there
(antibody titration), to estimate the proportion of asymptomatic dengue were more than 5 warning signs. The study of Phakhounthong et al. [57]
in children. The algorithm presented an accuracy of 99.45%, correctly presented the use of decision trees (DT) for SD in children. The model was
classifying 361 cases out of 363. based on clinical and socio-demographic data of dengue patients. The
sensitivity, specificity, and accuracy of the model were 60.5%, 65% and
3.1.3. Cytokines 64.1%, respectively. Finally, Huang et al. [58] developed several models
Cytokines are molecules that increase their levels in the blood after a to diagnose severe dengue using demographic information and
severe infection. Jayasundara et al. [40] presented a study describing laboratory-test results. Huang et al. applied several machine learning
the role of cytokines in SD: platelet activating factor (PAF), sphingosine 1- techniques, such as LoR, RF, GBM, SVM and ANN. The best model was
phosphate (S1P), interleukin-1 beta (IL-1β), tumor necrosis factor-alpha ANN with an accuracy of 75% and an AUC of 0.83.
(TNFα) and interleukin-10 (IL-10). They used fuzzy-logic (FL) to di­ Zhang et al. [59], in their study, showed a new variable to diagnose
agnose SD. The patients were analyzed 96, 108 and 120 h from onset of SD: the aspartate aminotransferase/platelet count ratio index (APRI). The
fever, using blood levels of cytokines for each time point. The developed authors developed an LoR to evaluate the performance of APRI, in
model showed the best accuracy after 108 h from the onset of fever conjunction with other laboratory variables, such as prothrombin time
(85%). Low et al. [41] evaluated vascular endothelial growth factor and leukocyte count. The model performed well, reporting an AUC of
(VEGF) and pentraxin-3 to classify the disease into severe and non- 0.87. Another work carried out by Lin et al., [60] also used LoR, but
severe. The proposed diagnostic model using LoR showed 76.2% and using hyaluronic acid as a feature or variable. The developed model by
73.6% of sensitivity and specificity, respectively. According to the re­ these authors had a moderate performance, with an AUC of 0.69,
sults, pentraxin-3 is not useful to differentiate SD from non-severe specificity of 55%, and sensitivity of 76%. Lee et al. [61] implemented
dengue. There was no significant difference between the two groups LoR models for the development of a clinical-risk score for early diag­
for this cytokine's blood levels. nosis of SD. The researchers proposed that the coefficients in the model
can be used as a risk score. The best developed model obtained an AUC
3.1.4. Raman spectroscopy of 0.92. The sensitivity and specificity of the model were 80.3% and
In recent years, Raman spectroscopy has been used for medical di­ 85.8%, respectively.
agnoses, such as cancer [42,43], liver diseases [44,45], and infectious DSS is a potentially life-threatening complication of the disease. Lam
diseases, such as tuberculosis [46] and Chagas disease [47]. Khan et al. et al. [62] proposed a diagnostic model to detect DSS in children. The
[48] proposed the extraction of the Raman spectrum, in serum samples, authors developed LoR models to determine the relationships between
from healthy people and dengue patients. The goal was to classify the clinical and laboratory variables with the presence of DSS. In addition,
samples into normal and pathological using RF. The model had a good alternative techniques, such as adaptive least absolute shrinkage and se­
performance (accuracy = 91%). The same authors published another lection operator (LASSO), generalized additive model (GAM), classification
paper [49], but this time, they used a support vector machine (SVM) for and regression trees (CART) and gradient boosting machine (GBM), were
the classification task. Different kernels were used: linear, polynomial compared. The logistic model performed favorably (AUC = 0.74),
and radial. The best kernel was the grade one polynomial; however, the compared to the alternative modeling strategies (LASSO = 0.73, CART
performance with SVM was lower (85%) compared to RF (91%). = 0.61, GAM = 0.69, GBM = 0.72). Another research work using LoR
was carried out by Lam et al. [63], in Vietnam, where they evaluated the
3.1.5. Plasma leakage and severity platelet count for the diagnosis of DSS. The model was better with the
Plasma extravasation is a warning sign for SD and should be detected use of platelet count than without it (AUC = 0.73 Vs. AUC = 0.66).
early to avoid complications and death. This sign is characterized by In summary, most of the modeling approaches for dengue diagnosis
serous effusions at the level of various cavities, such as pleura, peri­ were based on LoR (see Fig. 3). Logistic models are widely used in the
cardium and peritoneum. For this reason, Suwarto et al. [50] developed health field because of their simplicity to perform and interpret the re­
a scoring system to detect pleural and/or ascitic effusion. Suwarto et al. sults. These models were developed primarily to assess the factors
implemented a LoR using laboratory data. To each factor, a score is associated with the risk of dengue infection, and to determine the as­
assigned to determine the risk of plasma leakage. The higher score is sociation of predictive factors with disease severity. The categories of
assigned to a patient, the more likely the patient is to leak plasma. The variables most frequently used to construct these models were de­
developed model detected plasma leakage or ascites, with an accuracy of mographic, clinical and laboratory parameters (see Fig. 3). This result is
77.4%. Another study by da Silva et al. [51], used the same regression possibly due to the fact that these types of features are the most available
technique (LoR) to evaluate risk factors for hospitalization after dengue in all countries with mandatory surveillance systems. GBM, FL and SE
infection. The explanatory variables used were demographic, clinical models are possibly the least used because of their mathematical
and laboratory. The authors found that multi-organ failures are the most complexity and difficulty to implement. Finally, the reviewed models for
influential factors in hospitalization (OR[95% CI] = 5.75[3.53, 9.37]). diagnosis were implemented as classification tasks: They only deter­
Fernandez et al. [52] used a multivariate logistic model, in Honduran mined whether the patient had the disease or not. Diagnostic models
patients, using demographic, clinical and laboratory data. Fernandez should go further, and evaluate causal relationships among predictors
et al. used plasma leakage as the target variable since this is the main and dengue.
warning sign of SD. The developed model achieved an accuracy of
70.9%, with a sensitivity of 76.4% and a specificity of 70.3%. The same 3.2. Epidemic models of dengue
modeling technique was used by Phuong et al. [53], with the addition of
free plasma deoxyribonucleic acid (DNA) as a predictor variable. The This section analyzes epidemic modeling approaches of dengue. This
model achieved a sensitivity of 87.5%, and a specificity of 54.7%. section was divided into subsections according to key aspects, such as

5
W. Hoyos et al. Artificial Intelligence In Medicine 119 (2021) 102157

40
Data source
Clinical
Demographic
Environmental
Genomic
30 Laboratory
Raman spectrograms
Social
Frequency

20

10

LoR ANN RF DT SVM GBM FL SE


Model

Fig. 3. Types of models vs. type of data sources for diagnostic models. The frequency indicates the number of times the technique was implemented in the studies.
Abbreviations: LoR = Logistic Regression, ANN = Artificial Neural Networks, RF = Random Forest, DT = Decision Trees, SVM = Support Vector Machines, GBM =
Gradient Boosting Machines, FL = Fuzzy Logic, SE = Structural Equations.

the type of analysis performed (e.g., spatial-temporal analysis), the types A study by Akter et al. [68] used linear regression (LiR), with
of data used (e.g., the Internet data from the social networks and search ecological and socio-demographic factors, to observe the spatial-
indexes). Finally, there is an important section dealing with the pre­ temporal trend of dengue in Australia. The results of regression anal­
diction of mortality, a latent problem that should be addressed with ysis showed an increased trend of dengue incidence with some factors,
predictive modeling. such as housing types and households with rainwater tanks. Yue et al.
[69] and Reyes-Castro et al. [70] used the same technique in five dis­
3.2.1. Spatial-temporal analysis of dengue tricts of China and two arid cities of Mexico, respectively. The two
The spatial-temporal analysis of dengue is the most studied field of studies used environmental and spatial data to build the models. Be­
the disease, among the reviewed articles. There are many studies that sides, socioeconomic data were aggregated for model improvement. On
use machine learning to evaluate the spread of vectors and diseases. one hand, Yue et al. indicated the factors and dengue outbreak were
Rossi et al. [64] used boosted regression-trees (BRT) to conduct a spatial- significantly positively correlated. On the other hand, Reyes-Castro et al.
temporal analysis of dengue with data from 76 countries. BRT is a showed that transmission foci started in neighborhoods with high-
modeling technique used primarily in ecology to explain or predict a population density and low access to health services.
phenomenon. The data collected were temperature, rainfall, migration In Brazil, according to the Ministry of Public Health, a year is
and population density. The study showed that higher population den­ epidemic, in a city, if the incidence is greater than 100 cases per 100,000
sity and shorter distances between countries with dengue outbreaks are inhabitants [71]. In this regard, Stolerman et al. [72] developed an SVM
relevant factors that characterize the disease. Geographically weighted to predict whether a year will be epidemic or not. The data used by
regression (GWR) was used by Delmelle et al. [65], to evaluate the role of Stolerman et al. were climatic and epidemiological of 16 years. The SVM
environmental and socioeconomic determinants of dengue in Cali, predicted the epidemicity of a year at 91% accuracy.
Colombia. The authors found that socioeconomic status, population Several studies have compared the performance of different machine
density, proximity to both tire shops and plant nurseries; and the pres­ learning models to predict dengue burden, outbreaks and importation of
ence of sewage systems, are related to the disease. Mao et al. [66] used dengue into Europe. Carvajal et al. [73] compared machine learning
RF to predict the presence of dengue cases in a given area using topo­ models, using weather factors, in the Philippines. The objective was to
graphic, climatic and population data. Since people are more likely to find out which meteorological factors were the best predictors of dengue
become ill when they travel to other locations, an important contribu­ in that country. The techniques used were GAM, seasonal autoregressive
tion of this work is the use of cellphone tracking data. Based on this, the integrated moving average with exogenous variables (SARIMAX), RF and
authors reported an accuracy of 95%. Finally, Mutheneni et al. [67] GBM. They reported that relative humidity is the most important
mapped the levels of dengue endemicity in some districts of India, to meteorological factor in the model. The highest performance, in terms of
identify groups at risk. The authors used self-organizing maps (SOM), mean absolute error (MAE), was RF (0.23), followed by GBM (0.24). Zhao
with environmental data, for this purpose. The results indicated that the et al. [74] compared RF and ANN to predict dengue burden, in
districts of Warangal, Karimnagar, Khammam and Vizianagaram are hot Colombia, at national and local scales. The comparison between the
spot regions. models was performed using MAE, and the results showed that RF (0.86)

6
W. Hoyos et al. Artificial Intelligence In Medicine 119 (2021) 102157

performs slightly better than ANN (0.95). According to the level of compared the accuracy of GT with conventional surveillance systems, in
prediction, the results showed that RF performs better at the national Venezuela, for 10 years. The authors used LiR to predict the cases, re­
level than the sub-national level, demonstrated by lower MAE values in ported officially by the Ministry of Health, based on GT data. The overall
12-week forecasts (national = 0.86 Vs. local = 0.97). Salim et al. [75] coefficient of determination (R2) was 0.75.
used various machine learning techniques (SVM, DT, ANN, Bayes In countries such as China, GT is not available to users, but there are
Network) to predict epidemics in Malaysia. Climatic variables were used alternatives, such as Baidu Search, a search engine that stores the
as a predictor to build the models. According to the results reported by searches made by users. Li et al. [84] used the data from Baidu index
Salim et al., linear SVM performed the best, with an accuracy of 70%, database and calculated the Dengue Baidu search index (DBSI) to improve
specificity of 95% and sensitivity of 14%, when using the original data. the prediction of local dengue epidemics in Guangzhou. Climatic data
On the other hand, class balancing improved the sensitivity to 64%. also were used by Li et al. to train a GAM. The model performance was
Finally, Salami et al. [76] developed and compared machine learning evaluated with the root mean square error (RMSE). The results showed
models to predict dengue importation into Europe. Salami et al. used air- that the model with DBSI was better than without DBSI (RMSE = 59.9
passenger data to create connectivity indices between a source and Vs. RMSE = 203.3). Another study, conducted in China, by Liu et al.
destination country. The techniques implemented were RF, GLM, GBM [85], used regression trees on DBSI data. The results demonstrated a
and partial least squares. GBM had the best performance, with an AUC of strong association between DBSI and dengue incidences. The accuracy of
0.94, a sensitivity of 0.94 and a specificity of 0.93. the models was above 90%.

3.2.2. Distribution of the vector 3.2.4. Social networks


Simulation of vector distribution that transmits dengue is important Social networks provide information on the mobility of individuals in
to establish control strategies by health authorities. For this type of a population because a large percentage of social-network data is geo-
modeling, entomological data, such as the Breteau index (BI), house index tagged. According to this review, Twitter is the most used social
(HI), container index (CoI) and adult index (AI), are commonly used. BI is network to predict dengue. Marques-Toledo et al. [86] used tweets to
the number of positive containers per 100 houses inspected. HI is the predict dengue, at local and national levels, in Brazil. Social network
percentage of houses infested with mosquito larvae or pupae. CoI is the data were supplemented with demographic and incidence variables. The
percentage of water containers infested with mosquito larvae or pupae. model had the ability to predict an outbreak up to 8 weeks in advance,
Finally, AI is the number of female mosquitoes captured divided by the with an MAE of 0.35. Another similar work was carried out by Ram­
number of houses inspected [77,78]. adona et al. [87], who used geo-tagged data from Twitter and a gener­
Parra et al. [79] developed a GAM using BI and mosquito genetic alized linear model (GLM). The main objective of this research was to
data. Meteorological data were also used to build the model. The pro­ predict the risk of dengue in Yogyakarta, Indonesia. The model yielded
posed model required 71.5% fewer human and operational resources an RMSE value of 0.78 when including Twitter data with the dynamic
than the BI measurement. Similarly, Chang et al. [77] used entomo­ index of incidence weighted by mobility. Finally, Souza et al. [88] used
logical indices as a tool for early prediction of a dengue epidemic. The Twitter data to create unsupervised models that detect spatial clusters to
implemented regression models obtained accuracies of 83.8, 87.8, 88.3 characterize high-risk regions of dengue.
and 88.4%, for BI, HI, CoI and AI, respectively. Ding et al. [80] simulated
the distribution of A. aegypti and A. albopictus using environmental, 3.2.5. Prediction of morbidity and mortality
climatic, social data and three machine learning methods (SVM, GBM The mortality rate in patients with SD is too high, mainly in children
and RF). Models with RF performed better followed by GBM; however, and geriatric patients [89,90]. For this reason, it has been important to
there were no significant differences between the results of AUC develop models to predict morbidity and mortality. Kesorn et al. [91]
(A. aegypti: 0.973 Vs. 0.974 and A. albopictus: 0.971 Vs. 0.972). Jacome applied various machine learning techniques to predict the morbidity
et al. [81] used LiR to identify the most important risk factors for the rate of SD. DT, KNN, SVM (with linear, polynomial and radial kernels)
distribution of A. aegypti in a coastal zone in Ecuador. Environmental and ANN were applied to climatic and demographic data. A. aegypti
and spatial data were used for this purpose. Temperature and population infection rates were also used to improve model performance. The re­
density were the factors most likely to predict the number of cases. sults showed that SVM with radial kernel had better accuracy, with
Modeling of mosquito breeding sites, using remote sensing, is gain­ 88.4%, when the infection rate in the mosquito was added. Md-Sani
ing interest in the scientific community. Scavuzzo et al. [78] imple­ et al. [92] developed an LoR model in Malaysia. The goal was to iden­
mented several machine learning techniques to model the oviposition tify risk factors that would allow the prediction of mortality. The best
activity of A. aegypti, using time series obtained from satellite image developed model was with age, serum bicarbonate, serum lactate and
data. The use of these techniques allowed finding non-linear relation­ alanine aminotransferase (ALT), with an AUC of 0.84.
ships between environmental variables and the oviposition of A. aegypti. Huang et al. [89] used demographic, clinical and laboratory data of
The techniques used were: SVM, MLP, k-nearest neighbors (KNN) and DT. patients over 65 years of age (N = 627). They used an LoR model to
The evaluation of the models was done with mean square error (MSE). estimate the mortality of dengue. The model predicted a mortality of
The results showed that the best model was KNN (MSE = 0.49), followed 57.1% when at least two predictors were present in dengue patients.
by MLP (MSE = 0.52), SVM (MSE = 0.61) and DT (MSE = 0.77). Huang et al. [90] conducted another study with a larger sample (N =
2358). Huang et al. developed a similar model to the one developed in
3.2.3. Search-index data [89] to assign a score to each patient, to know its probability of death.
The use of Internet searches has become a useful tool to predict The results showed that the model had an AUC of 0.85.
disease outbreaks, where Google trends (GT) is the reference in this field.
GT is a tool from Google that displays the most popular search terms in a 3.2.6. Thermal images
fixed time and location. Data associated with these searches is used for Nagori et al. [93] used thermal imaging for the prediction of DSS.
prediction. The authors used images of pediatric patients to train a GLM. The
The research conducted by Wu et al. [82] used climatic data from developed model demonstrated the usefulness of thermal imaging to
Taiwan combined with GT data. The model was built with DT and the predict DSS, with an AUC of 0.76.
findings revealed that temperature and humidity were the most relevant In summary, compared to diagnostic models, where LoR was most
factors, with the greatest power of classification, while age and gender commonly used (see Fig. 3), in epidemic models, there is a variability in
were the least relevant. Wu et al. [82] found that the use of GT data the frequency of the modeling approaches used. Although LiR was the
decreases the accuracy of the model (96% Vs. 94%). Strauss et al. [83] most used approach for prediction, other techniques, with a high

7
W. Hoyos et al. Artificial Intelligence In Medicine 119 (2021) 102157

frequency, such as RF, SVM, LoR and GAM/GLM, were also imple­ fumigation.
mented. Spatial-temporal analyses were used for the case of prediction
using regression approaches. The most used technique for this task was 3.3.1. Copepods
LiR, where several studies used 16 times this type of modeling. As we see In recent years, biological control of the vector A. aegypti and
in Fig. 4, there is a relationship between the use of data types and A. albopictus has emerged using biological predators called copepods.
modeling types. The LiR models used socioeconomic, demographic and These are crustaceans with the ability to systematically devour young
environmental data. This type of data has been widely used for LiR mosquito larvae [95]. Kalimuthu et al. [95] used a GLM to evaluate the
because it allows mapping the distribution of the mosquito and the predation efficiency of Mesocyclops formosanus on young larval pop­
disease. These types of data were used in almost all the epidemic ulations. The developed model showed the effectiveness of using co­
modeling approaches (see Fig. 4), except with LoR, which did not use pepods to control the vector, and thus the disease. Another study, by
environmental data. Clinical and laboratory data were mainly used with Udayanga et al. [96], also used a GLM to compare the predation effi­
LoR, a technique widely used by medical personnel and epidemiologists ciency rates of five copepods on Aedes larvae. According to the model,
to predict the presence or absence of a disease (classification task, 9 the highest predation efficiency rates were higher with Mesocyclops
papers). ANN have been little used, probably because of the low-quality leuckarti, with 17.45% and 16.75% for A. aegypti and A. albopictus,
and availability of data for dengue. It has been demonstrated that the respectively.
performance of ANN is directly proportional to the quantity and quality
of data [94]. Cellphone data have been little used to map dengue risk. 3.3.2. Entomopathogenic fungi
The limited availability of data and the problem of data privacy could be Entomopathogenic fungi produces diseases and cause the death of
the reason for this inconvenience. Finally, the use of GAM/GLM has insects and arthropods. These types of fungi are a useful alternative to
increased with data extracted from the Internet (search indexes and control the mosquito that transmits dengue. Lee et al. [97] built a GLM
social networks). These non-parametric models are increasing their use to evaluate the pathogenic activity of six species of fungi (Beauveria,
because they can capture features of a non-linear nature from unstruc­ Cordyceps, Metarhizium, Paecilomyces, Purpureocillium, and Verticillium)
tured data, such as trend data or tweets. on A. albopictus. The model showed that Metarhizium anisopliae had the
highest activity to eliminate A. albopictus, with a mortality rate of 73%
after two days, and 90% after 5 days.
3.3. Strategy evaluation models to control dengue
3.3.3. Wolbachia strains
Few intervention models have been developed for the evaluation of Wolbachia is a bacterium that naturally infects insects. Infection of
dengue control strategies. In this subsection, we analyze the articles males with this bacterium produces a generation with unviable offspring
according to different approaches, such as biological control using co­ when mated with an uninfected female [98]. This approach has been
pepods, entomopathogenic fungi, Wolbachia strains, vaccination and

15
Data source
Cell Phone
Clim. + Env. + Met. + Top.
Clinical + Lab.
Entomological
GT + DBSI + Twitter
Mobility
Soc. + Econ. + Dem. + Pop.
10 Thermal Images
Frequency

LiR RF SVM LoR GAM/GLM DT GBM ANN GWR KNN BRT Other
Model

Fig. 4. Types of models vs. type of data sources for epidemic models. The frequency indicates the number of times the technique was implemented by the studies.
Abbreviations: Clim = Climatic, Env = Environmental, Met: Metrological, Top = Topographic, Lab = Laboratory, GT = Google Trends, DBSI = Dengue Baidu Search
Index, Soc = Social, Econ = Economic, Dem = Demographic, Pop = Population, LiR = Linear Regression, RF = Random Forest, SVM = Support Vector Machines, LoR
= Logistic Regression, GAM = Generalized Additive Model, GLM = Generalized Linear Model, DT = Decision Trees, GBM = Gradient Boosting Machine, ANN =
Artificial Neural Networks, GWR = Geographically Weighted Regression, KNN = K-Nearest Neighbors, BRT = Boosted Regression Trees.

8
W. Hoyos et al. Artificial Intelligence In Medicine 119 (2021) 102157

developed as a method to control the spread of Aedes. Nazni et al. [99] modeling. GLM was implemented with entomological, climatic and
used a Bayesian model (Bayesian time series (BTS)) to estimate the population data, where entomological data were the most used. A sto­
reduction of dengue cases in Malaysia after infecting mosquitoes with chastic simulation was only used in one work, with entomological and
Wolbachia wAlbB strain. The model estimated dengue case reduction of population data. Finally, BTS was implemented in one work with
40.3% in intervention sites. Other authors have also developed models entomological data. Entomological data is very frequent because dengue
to evaluate the impact of Wolbachia against Aedes; for instance, Indriani control is based mainly on the control of the mosquito, the vector of
et al. [100] used a GLM model to estimate the effect of Wolbachia (wMel dengue transmission. Fig. 5 shows the types of data used for each
strain) over the reduction of dengue incidence in Indonesia. The intervention modeling approach.
developed model was able to reduce dengue incidence by 73% (95% CI:
49%–86%). Finally, Ryan et al. [101] used the same model and the same
3.4. A general analysis of data types and machine-learning techniques for
Wolbachia strain, and achieved a 96% (95% CI: 84%–99%) reduction of
dengue modeling
dengue incidence in Australia.
The types of data and techniques considered in the reviewed studies
3.3.4. Vaccination
were analyzed by question (see previous sub-sections); however, it is
Another option for dengue control is vaccination. Lee et al. [102]
important to analyze them globally. Fig. 6 shows the intersection be­
implemented a GLM for validation between the climatic risk factor index
tween the types of data and techniques used in all the reviewed studies.
(CRF) and dengue incidence, to estimate the vaccination coverage rate
In this figure, we may see that the predominant technique used was LoR,
and the number of doses required. CFR index was created using 12-
and, as we comment in the section of diagnostic models, this technique
month moving averages of climatic and non-climatic factors. The cli­
was frequently used with clinical data and laboratory results. LoR
matic factors were temperature, precipitation and humidity. The non-
models are often used by medical personnel to diagnose dengue. Ac­
climatic factors were population, density and elevation. The study was
cording to Fig. 6, the second is RF, with different data sources used by
conducted in Colombia, Thailand and Vietnam, and the estimated
this technique, such as climatic, clinical and sociodemographic vari­
vaccination coverage rates were 63%, 90% and 91%, respectively.
ables. This variability in the use of predictors/variables is explained
because RF is a technique that was found in diagnostic models and
3.3.5. Fumigation
epidemic models, and can be used for both regression (spatio-temporal
Fumigation has been widely used worldwide to reduce the burden of
analysis) and classification (dengue diagnosis) tasks. Also, Fig. 6 shows
dengue virus-infected mosquitoes. Thus, Hladish et al. [103] con­
that LiR is used in spatio-temporal analysis where logically the utiliza­
structed a stochastic simulation (SS) model to predict the effectiveness of
tion of climatic and sociodemographic variables to map the distribution
spraying in Yucatán, Mexico. The results of the model indicate that the
of the disease or vector was predominant. Finally, another of the most
proactive application of this control method could reduce symptomatic
used techniques was GAM/GLM, a technique mainly used in epidemic
infections by up to 89.7% in the first year, and 78.2% in the five cu­
models and intervention models. For the first case, the objective was to
mulative years.
determine relationships between predictors and dengue incidence, and
In summary, GLM was the most applied technique for intervention
for the second case, the objective was to evaluate the impact of dengue

Data source
Climatic
Entomological
Population

4
Frequency

GLM SS BTS
Model

Fig. 5. Types of models vs. type of data sources for intervention models. The frequency indicates the number of times the technique was implemented by the studies.
Abbreviations: GLM = Generalized Linear Model, SS = Stochastic Simulation, BTS = Bayesian Time Series.

9
W. Hoyos et al. Artificial Intelligence In Medicine 119 (2021) 102157

Thermal Images
GAM/GLM
Entomological BRT
GT + DBSI + Twitter KNN

LiR

Clim. + Env. + Met. + Top. Other


GWR
Genomic ANN

DT
Soc. + Econ. + Dem. + Pop.
SVM

GBM
Mobility
RF

Clinical + Lab.

LoR

Raman spectrograms
Cell Phone

Fig. 6. Types of models vs. type of data sources for all models. The thickness of the links indicates the number of times the technique and data type was implemented
by the studies. Abbreviations: GT = Google Trends, DBSI = Dengue Baidu Search, Clim = Climatic, Env = Environmental, Met: Metrological, Top = Topographic, Soc
= Social, Econ = Economic, Dem = Demographic, Pop = Population, LoR = Logistic Regression, RF = Random Forest, LiR = Linear Regression, GAM = Generalized
Additive Model, GLM = Generalized Linear Model, DT = Decision Trees, SVM = Support Vector Machines, ANN = Artificial Neural Networks, GBM = Gradient
Boosting Machine, KNN = K-Nearest Neighbors, GWR = Geographically Weighted Regression, BRT = Boosted Regression Trees.

control strategies. interrelationship among the characteristics present in the disease, and
the influence on the variants of dengue. For example, the comorbidities
4. Discussion are basic diseases that can occur jointly with dengue, being the most
common chronic-renal disease and chronic-hepatic disease. In these
Dengue modeling is a key tool for early detection of dengue, evalu­ diseases, the blood levels of some parameters are elevated, which also
ation of risk factors for SD, and may also be useful to control vectors that increases in dengue. This aspect must be taken into account to develop
transmit the disease. Although extensive works have been done on these future models of dengue.
issues, it is important to know what aspects of dengue modeling have not According to the reviewed articles, there are many predictors/fea­
been worked on, to develop future works that will allow a significant tures used for the diagnostic, epidemic and intervention models of
decrease in disease morbidity rates. The main objective of this work was dengue (see Table 2). However, other predictors could be used and
to give an overview of diagnostic, epidemic and intervention modeling, evaluated for this purpose. Raman spectroscopy data would be useful
in addition, to determine important challenges for future works. because the technique has the ability to diagnose the disease early [48].
Another type of predictor that could be used for modeling would be
4.1. Limitations of the studies and challenges genomic data. According to this review, only two papers [31,54] have
used this type of data to look for relationships with the disease. Tech­
This section is focused on the limitations of the reviewed studies. niques to measure genetic data are expensive and are not usually per­
Based on those limitations, we describe some research challenges or formed in clinical practice, making it difficult to obtain such data. The
opportunities for each of the approaches presented: diagnostic, epidemic wide collection of genetic data would allow a better understanding of
and intervention. the dynamics of dengue at the population level, providing key insights
into genetic factors that are difficult to track with clinical records alone.
4.1.1. Diagnostic models The use of a large number of variables could be useful to model
The diagnostic models reported in the reviewed articles focused on dengue; however, a disadvantage of this approach is that the use of too
the detection of dengue or the differentiation between other diseases, many descriptors could cause the problem of the curse of dimension­
such as zika, chikungunya and malaria [29,30,35,39,51,59]. This is ality. This is characterized by the high dimensionality of the feature
useful because the characteristics presented in these diseases, as signs space where patterns cannot be easily recognized [104]. In addition, this
and symptoms, are also present in dengue. However, it is necessary to go phenomenon can sometimes hinder the optimization and speed of
further and develop cause-effect models that allow a deeper under­ execution of the models. To solve this problem, different preprocessing
standing of the main causes that lead to high morbidity and mortality techniques have been developed. However, in the reviewed papers, very
rates of dengue. It is fundamental to develop cause-effect models of little use of these techniques was reported (see Fig. 8). The most-
dengue to know the importance of the factors that contribute to the reported preprocessing technique was normalization with 9.4%, fol­
disease. Specifically, there is a need to understand both the lowed by PCA with 6.3%. This shows that it is necessary to increase the

10
W. Hoyos et al. Artificial Intelligence In Medicine 119 (2021) 102157

use of preprocessing techniques (feature engineering processes training data in one device or data center. This could be a problem if the
[105,106]) that allow, among other things, analysis between all the data are in different locations. According to this review, all articles used
predictors available, to find the most influential features on the disease, the centralized machine-learning approach. Most dengue information
weighting their influence. systems, in many countries, do not collect their data in one place. To
overcome this drawback, one of the most secure and robust cloud in­
4.1.2. Epidemic models frastructures to process this data, developed by Google, could be used.
Epidemic models of dengue are the most published modeling The approach, called federated learning, is a kind of collaborative
approach at present, according to this literature review. The largest machine-learning without centralized training data and works as follows
percentage of reviewed studies use environmental and climatic data to (see Fig. 7): the device downloads the current model, improves it by
analyze the distribution of the mosquito and the disease. learning from the phone data, and then summarizes the changes as a
In recent years, there has been an increasing interest to map the small focused update. Only this updated model is sent to the cloud, using
distribution of mosquitoes (A. aegypti and A. albopictus). Knowledge of encrypted communication, where it is immediately averaged with the
the distribution of these vectors can help prevent the disease [99]. For global model to improve the shared model. All training data remains on
this task, geographic information systems (GIS) have been used to analyze the device, and no individual update is stored in the cloud [117].
the relationship between climatic conditions and the distribution of the Finally, no studies were found that have considered implementing
vector [64,79,97,102]. However, this type of information, collected and combined models for diagnosis, prediction and prescription. An
used for the models generation, has not a high resolution [80]. For this important challenge of dengue modeling is the development of these
reason, a deep-learning approach and high-resolution Google images types of hybrid models. Diagnostic, epidemic and intervention models
could be considered, using significant features (shrubs, urban areas, combined could be superior in performance to the three models sepa­
roads and puddles) identified from the images. The goal is to predict rately developed. Additionally, real-time updating for diagnostic and
regions suitable for mosquitoes on a finer scale. epidemic models, with the automation of prescriptive model decision-
Data quality is an important aspect for some machine learning al­ making, would considerably decrease the uncertainty present in these
gorithms because their performances depend on this feature [107]. The types of problems.
management of dengue data quality is one of the most important chal­
lenges at present. Most of the databases, with clinical and epidemio­ 4.2. General summary
logical data, reported by surveillance systems, have some problems,
such as incorrect data or missing data [51]. The high demand for This section shows a brief summary of the challenges found that
healthcare, in some places, may cause that medical personnel do not could be used to develop future works. Fig. 8 represents and summarizes
correctly fill out the epidemiological forms provided for this purpose. the aspects evaluated in this article. In the upper part, we can note the
This was one of the most common limitations found in the reviewed aspects evaluated: i) the most used preprocessing techniques, ii) the
studies [35,38,40,53,57,59,93]. When this type of data needs to be applied machine learning tasks, iii) the type of modeling approach, iv)
analyzed, this inconvenience sometimes forces the elimination of com­ the technique used and, v) the varieties of dengue that were studied.
plete records, which reduces the size of the database. According to the Fig. 8 shows the characteristics most and least used to model dengue.
reviewed articles, none of the papers used models to manage uncertainty The width of the nodes and links is proportional to the number of
related to data quality. To solve this problem in dengue, several machine reviewed articles that fall into each of the categories. Different colors
learning alternatives could be used, such as Bayesian models and fuzzy were used in the links, to facilitate the visualization of the connections.
approaches, which have been used in other domains [108,109,110,111]. Fig. 8 clearly shows the little use of preprocessing techniques (feature
Besides, approaches that use robust estimators to deal with the problem engineering) in the reviewed articles (45 of 64 articles do not report
of missing values and outliers have been developed [112]. Another preprocessing techniques). Perhaps, the authors assume preprocessing
option could be to generate complementary data to the existing ones. as an implicit stage within modeling. Also, we can observe that almost
This consists of creating new data that have similar characteristics (e.g., all articles on diagnostic modeling used the classification task to detect
distribution) to the available data. In recent years, also, the generation the presence of the disease. The regression task was closely related to
of synthetic data has allowed the construction of more robust models in epidemic and intervention models. As mentioned above, intervention
other areas of knowledge, such as environmental sciences [113,114]. modeling was the least frequent approach in the reviewed articles.
This could also be explored in this domain. Finally, it can be observed that SSD is the least worked variation of the
disease, mainly because of the low frequency and low quality of the data
4.1.3. Intervention models related to this syndrome.
Of the three approaches described in this article, this is the most For the diagnostic modeling approach, three challenges could be
promising for future works, because the use of these models, in decision- identified. As we could see in this work, models to detect the disease and
making for dengue, is very limited [102]. Below, we show some fields differentiate it from similar ones are quite implemented. Now, it is
that have not been much explored on dengue modeling. fundamental to develop cause-effect models of dengue because they are
Prescriptive models have uncertainty, and being probabilistic, can necessary to know the importance of the factors that can contribute to
lead to incorrect decision-making [115]. Particularly, the uncertainty of the disease. As can be seen in Fig. 3, several types of features have been
prescriptive models is one of the main challenges in dengue. The key used for the diagnosis of dengue. Adding new predictors for diagnosis
components of this aspect are the lack of certainty of the model, the could facilitate and improve detection time for treatment and avoidance
uncertainty in the quality of data, and the subjectivity of the human of death. Although the use of many predictors for diagnosis could be
being to build the prescriptive model [116]. There is a need to generate beneficial, it could also cause difficulties with respect to the dimension
prescriptive models based on data and less based on expert knowledge of of the data. One opportunity is the creation or improvement of pre­
the domain. Additionally, real-time data processing has not been processing techniques that will enable the identification of key charac­
explored. The developed models are static, and the priority is to use teristics that are most related to dengue.
strategies that have the capacity to process time-varying data. Auto­ Although dengue epidemic modeling is the most reported in the
mation of prescriptive processes is needed, where the model is always literature, data available for dengue has quality problems. The uncer­
adjusted in case any inconvenience occurs. In this way, higher-quality tainty of models based on this type of data is high. One of the main
decisions can be made in the shortest possible time to overcome the challenges of epidemic approaches is the development of models that
problem presented. address and clearly express the associated uncertainty and measure the
Standard machine-learning approaches require the centralization of reliability of the predictions. It is crucial that improvements are

11
W. Hoyos et al. Artificial Intelligence In Medicine 119 (2021) 102157

Fig. 7. Description of federated learning. The device personalizes the model locally, based on the usage (A). Many users' updates are aggregated (B) to form a
consensus change (C) to the shared model, after which the procedure is repeated. Source: https://tinyurl.com/y9ykdbve

Fig. 8. Trends of reviewed papers for dengue modeling. Abbreviations: ML = Machine Learning, NR = Not reported, PCA = Principal Component Analysis, SD:
Severe Dengue, DSS = Dengue Shock Syndrome.

developed, or new techniques are created, to generate more accurate Decision-making for treatment and control of the disease depends on
results. this modeling approach. Prescriptive analysis, in general, presents
The frequency of articles on intervention modeling of dengue is low. challenges that must be taken into account. One of these is the low

12
W. Hoyos et al. Artificial Intelligence In Medicine 119 (2021) 102157

number of automatic prescriptive models for data-based decision-mak­ robust diagnostic models, would help considerably in prevention and
ing systems. The subjectivity of the domain expert could affect the reducing complications and deaths. Likewise, modeling for the man­
quality of the decision made, and static models can not handle the agement of data uncertainty is urgent. The low quality of epidemio­
changes that occur. Data-based and up-to-date prescriptions would be a logical data on dengue is one of the main obstacles for the improvement
valuable tool for the treatment and/or control of dengue. of existing models. The use of other types of predictors, such as genetic
Finally, a challenge common to all three approaches reviewed in this data and spectrograms, could be useful, but their high cost of determi­
article is the development of combined models (diagnostic, epidemic nation and collection could be a limitation. Finally, the use of autono­
and intervention), to automate the prescription. Autonomous cycles of mous cycles of data analysis tasks would automate decisions for disease
data analysis tasks (see [118,119] for more details about this concept), control.
which integrate the previous models, can assist in decision-making as This study has a few limitations. The first limitation is that some
quickly as possible. For dengue, this is crucial, due to the high morbidity online databases (ScienceDirect, IEEE Xplorer, Google Scholar, Emerald,
and mortality of the disease. Taylor & Francis and PubMed) were used, and interesting articles from
other digital libraries could have been ignored. Second, the language
5. Conclusions chosen was English because most of the latest advances in dengue
modeling are written and published in this language. The absence of
We conducted an SLR on dengue modeling based on machine articles in other languages, such as Spanish and Portuguese, limits the
learning. The main objective was to know about diagnostic, epidemic scope of the results of this study.
and intervention models that have been developed for the disease. Sixty-
four articles were selected and analyzed from several scientific libraries, CRediT authorship contribution statement
to find out the state-of-the-art in the three approaches mentioned above.
The results show that dengue modeling is constantly growing. All authors have participated in the design of the original study, data
The most frequent diagnostic models were based on LoR. LoR is one analysis, interpretation of results. They also participated in the writing,
of the most used modeling techniques because of its ease of realization review and approval of the final manuscript.
and interpretation of results. Although other techniques, such as deci­
sion trees, can be easily interpreted, they consist of a large number of Declaration of competing interest
nodes, which can require a significant amount of mental effort to un­
derstand a particular prediction. In contrast, an LoR model is simply a The authors declare no conflict of interest.
list of coefficients, which is attractive to know the influence of charac­
teristics on the target variable. In addition, regression does not require Acknowledgements
that the continuous independent variables follow a normal distribution,
and continuous and discrete predictors can be used together. With This study was partially funded by Colombian Administrative
respect to the category of features, the most used for these models were Department of Science, Technology and Innovation - COLCIENCIAS
demographic, clinical and laboratory data. Data are readily available (grant number 111572553478) (M. Toro) and Colombian Ministry of
from local health authorities in each country. Science and Technology Bicentennial PhD Grant (W. Hoyos).
In general, the most frequent epidemic models were based on LiR, RF
and SVM, with socioeconomic, demographic, climatic and environ­ References
mental data. From this category, the most explored approach is the
spatial-temporal analysis of dengue and its transmission vector. These [1] Villar LA, Rojas DP, Besada-Lombana S, Sarti E. Epidemiological trends of dengue
disease in Colombia (2000− 2011): a systematic review. PLoS Negl Trop Dis 2015;
techniques are commonly implemented to map disease risk in endemic 9:e0003499. https://doi.org/10.1371/journal.pntd.0003499.
areas, and establish relationships between risk factors and dengue [2] Savargaonkar D, Sinha S, Srivastava B, Nagpal BN, Sinha A, Shamim A, et al. An
incidence. epidemiological study of dengue and its coinfections in Delhi. Int J Infect Dis
2018;74:41–6. https://doi.org/10.1016/j.ijid.2018.06.020.
Studies on intervention systems for dengue are quite limited. In this [3] Martina B, Koraka P, Osterhaus A. Dengue virus pathogenesis: an integrated view.
review, we found only eight studies with developed models for disease 2009. https://doi.org/10.1128/CMR.00035-09.
control. The techniques used were GLM, SS y BTS. The main data used [4] Wilder-Smith A, Ooi E-E, Horstick O, Wills B. Dengue. Lancet 2019;393:350–63.
https://doi.org/10.1016/S0140-6736(18)32560-1.
were entomological. The morbidity and mortality of the disease clearly
[5] World Health Organization. Dengue hemorrhagic fever: diagnosis, treatment,
depend on the decisions made by health authorities, therefore, more prevention and control. In: Technical report. WHO; 1997. URL: http://www.who.
studies are needed in this field to support the decisions. Finally, an int/csr/resources/publications/dengue/Denguepublication/en (Accessed 20
important general remark is that the diagnostic, epidemic and inter­ April 2020).
[6] World Health Organization. Dengue guidelines for diagnosis, treatment,
vention models of dengue are normally machine learning models of prevention and control: new edition. In: Technical report. WHO; 2009. URL: https
predictive, diagnostic or prescriptive type. ://apps.who.int/iris/bitstream/handle/10665/44188/9789241547871_eng.pdf?
Several limitations were found in the reviewed papers, among which sequence=1&isAllowed=y (Accessed 10 April 2020).
[7] World Health Organization. Dengue and severe dengue. In: Technical report.
we have: the absence of reporting of preprocessing techniques used, and WHO; 2010. URL: https://www.who.int/denguecontrol/en/ (Accessed 12 June
small sample sizes for disease variations, such as SD and DSS. Reviewing 2020).
the strengths and limitations of the articles allowed the identification of [8] Jing Q, Wang M. Dengue epidemiology. Glob Health J 2019;3:37–45. https://doi.
org/10.1016/j.glohj.2019.06.002.
future works for research: i) cause-effect models for dengue diagnosis, ii) [9] Alhaeli A, Bahkali S, Ali A, Househ MS, El-Metwally AA. The epidemiology of
use of new features, such as genetic data and Raman spectroscopy, for dengue fever in Saudi Arabia: a systematic review. 2016. https://doi.org/
disease diagnosis, iii) a preprocessing phase based on feature engi­ 10.1016/j.jiph.2015.05.006.
[10] Humphrey JM, Cleton NB, Reusken CB, Glesby MJ, Koopmans MP, Abu-
neering processes, iv) implementation of Bayesian or fuzzy models that Raddad LJ. Dengue in the Middle East and North Africa: a systematic review.
adequately manage data uncertainty, v) automatic prescriptive models PLoS Negl Trop Dis 2016;10. https://doi.org/10.1371/journal.pntd.0005194.
for data-based decision-making systems, and vi) models combining the [11] Limkittikul K, Brett J, L’Azou M. Epidemiological trends of dengue disease in
Thailand (2000–2011): a systematic literature review. PLoS Negl Trop Dis 2014;
three approaches discussed in this article using autonomous cycles of
8. https://doi.org/10.1371/journal.pntd.0003241.
data analysis. [12] Mohd-Zaki AH, Brett J, Ismail E, L’Azou M. Epidemiology of dengue disease in
Based on these future works, prioritization should focus on cause- Malaysia (2000− 2012): a systematic literature review. PLoS Negl Trop Dis 2014;
effect models for disease diagnosis. Not only the detection of the dis­ 8. https://doi.org/10.1371/journal.pntd.0003159.
[13] Bravo L, Roque VG, Brett J, Dizon R, L’Azou M. Epidemiology of dengue disease
ease is critical, but also, the assessment of factors that most influence the in the Philippines (2000–2011): a systematic literature review. PLoS Negl Trop
infection. A better understanding of dengue-related causes with more Dis 2014;8. https://doi.org/10.1371/journal.pntd.0003027.

13
W. Hoyos et al. Artificial Intelligence In Medicine 119 (2021) 102157

[14] Dantés HG, Farfán-Ale JA, Sarti E. Epidemiological trends of dengue disease in [40] Jayasundara SD, Perera SS, Malavige GN, Jayasinghe S. Mathematical modeling
Mexico (2000–2011): a systematic literature search and analysis. PLoS Negl Trop and a systems science approach to describe the role of cytokines in the evolution
Dis 2014;8. https://doi.org/10.1371/journal.pntd.0003158. of severe dengue. BMC Syst Biol 2017;11:1–14. https://doi.org/10.1186/s12918-
[15] Teixeira MG, Siqueira JB, Ferreira GL, Bricks L, Joint G. Epidemiological trends of 017-0415-3.
dengue disease in Brazil (2000− 2010): a systematic literature search and [41] Low GKK, Gan SC, Zainal N, Naidu KD, Amin-Nordin S, Khoo CS, et al. The
analysis. 2013. https://doi.org/10.1371/journal.pntd.0002520. predictive and diagnostic accuracy of vascular endothelial growth factor and
[16] Lim JK, Alexander N, Di Tanna GL. A systematic review of the economic impact of pentraxin-3 in severe dengue. In: Pathogens and Global Health. vol. 112; 2018.
rapid diagnostic tests for dengue. BMC Health Serv Res 2017;17. https://doi.org/ p. 334–41. https://doi.org/10.1080/20477724.2018.1516417.
10.1186/s12913-017-2789-8. [42] Nargis HF, Nawaz H, Ditta A, Mahmood T, Majeed MI, Rashid N, et al. Raman
[17] Luo R, Fongwen N, Kelly-Cirino C, Harris E, Wilder-Smith A, Peeling RW. Rapid spectroscopy of blood plasma samples from breast cancer patients at different
diagnostic tests for determining dengue serostatus: a systematic review and key stages. Spectrochim Acta A Mol Biomol Spectrosc 2019;222:117210. https://doi.
informant interviews. 2019. https://doi.org/10.1016/j.cmi.2019.01.002. org/10.1016/j.saa.2019.117210.
[18] Endo IC, Ziegelmann PK, Patel A. The economic promise of developing and [43] Bahreini M, Hosseinzadegan A, Rashidi A, Miri SR, Mirzaei HR, Hajian P.
implementing dengue vaccines: evidence from a systematic review. 2016. A Raman-based serum constituents’ analysis for gastric cancer diagnosis: in vitro
https://doi.org/10.1016/j.vaccine.2016.10.037. study, Talantavol. 204; 2019. p. 826–32. https://doi.org/10.1016/j.
[19] Supadmi W, Suwantika AA, Perwitasari DA, Abdulah R. Economic evaluations of talanta.2019.06.068.
dengue vaccination in Southeast Asia Region: evidence from a systematic review. [44] Shao L, Zhang A, Rong Z, Wang C, Jia X, Zhang K, et al. Fast and non-invasive
In: Value in health regional issues. vol. 18; 2019. p. 132–44. https://doi.org/ serum detection technology based on surface-enhanced Raman spectroscopy and
10.1016/j.vhri.2019.02.004. multivariate statistical analysis for liver disease. Nanomedicine 2018;14:451–9.
[20] Agarwal R, Wahid MH, Yausep OE, Angel SH, Lokeswara AW. The https://doi.org/10.1016/j.nano.2017.11.022.
immunogenicity and safety of CYD-Tetravalent Dengue Vaccine (CYD-TDV) in [45] Gurian E, Giraudi P, Rosso N, Tiribelli C, Bonazza D, Zanconati F, et al.
children and adolescents: a systematic review. Acta Med Indones 2017;49(1). htt Differentiation between stages of non-alcoholic fatty liver diseases using surface-
ps://pubmed.ncbi.nlm.nih.gov/28450651/. enhanced Raman spectroscopy. Anal Chim Acta 2020;1110:190–8. https://doi.
[21] Da Silveira LTC, Tura B, Santos M. Systematic review of dengue vaccine efficacy. org/10.1016/j.aca.2020.02.040.
2019. https://doi.org/10.1186/s12879-019-4369-5. [46] Khan S, Ullah R, Shahzad S, Anbreen N, Bilal M, Khan A. Analysis of tuberculosis
[22] Godói IP, Lemos LLP, Araújo VE De, Bonoto BC, Godman B, Júnior AA Guerra. disease through Raman spectroscopy and machine learning. Photodiagn
CYD-TDV dengue vaccine: systematic review and meta-analysis of efficacy, Photodyn Ther 2018;24:286–91. https://doi.org/10.1016/j.pdpdt.2018.10.014.
immunogenicity and safety. 2017. https://doi.org/10.2217/cer-2016-0045. [47] Pérez A, Prada YA, Cabanzo R, González CI, Mejía-Ospino E. Diagnosis of chagas
[23] Louis VR, Phalkey R, Horstick O, Ratanawong P, Wilder-Smith A, Tozan Y, et al. disease from human blood serum using surface-enhanced Raman scattering
Modeling tools for dengue risk mapping - a systematic review. Int J Health Geogr (SERS) spectroscopy and chemometric methods. Sens Bio-Sens Res 2018;21:40–5.
2014;13. https://doi.org/10.1186/1476-072X-13-50. https://doi.org/10.1016/j.sbsr.2018.10.003.
[24] Naish S, Dale P, Mackenzie JS, McBride J, Mengersen K, Tong S. Climate change [48] Khan S, Ullah R, Khan A, Sohail A, Wahab N, Bilal M, et al. Random Forest-based
and dengue: a critical and systematic review of quantitative modeling evaluation of Raman spectroscopy for dengue fever analysis. Appl Spectrosc
approaches. BMC Infect Dis 2014;14. https://doi.org/10.1186/1471-2334-14- 2017;71:2111–7. https://doi.org/10.1177/0003702817695571.
167. [49] Khan S, Ullah R, Khan A, Wahab N, Bilal M, Ahmed M. Analysis of dengue
[25] Andraud M, Hens N, Marais C, Beutels P. Dynamic epidemiological models for infection based on Raman spectroscopy and support vector machine (SVM).
dengue transmission: a systematic review of structural approaches. PLoS One Biomed Opt Express 2016;7:2249. https://doi.org/10.1364/boe.7.002249.
2012;7. https://doi.org/10.1371/journal.pone.0049085. [50] Suwarto S, Nainggolan L, Sinto R, Effendi B, Ibrahim E, Suryamin M, et al.
[26] Lourenço J, Tennant W, Faria NR, Walker A, Gupta S, Recker M. Challenges in Dengue score: a proposed diagnostic predictor for pleural effusion and/or ascites
dengue research: a computational perspective. Evol Appl 2018;11:516–33. in adults with dengue infection. BMC Infect Dis 2016;16:1–7. https://doi.org/
https://doi.org/10.1111/eva.12554. 10.1186/s12879-016-1671-3.
[27] Moher D, Liberati A, Tetzlaff J, Altman DG, Altman D, Antes G, et al. Preferred [51] Silva NS da, Undurraga EA, Silva Ferreira ER da, Estofolete CF, Nogueira ML.
reporting items for systematic reviews and meta-analyses: the PRISMA statement. Clinical, laboratory, and demographic determinants of hospitalization due to
PLoS Med 2009;6:e1000097. https://doi.org/10.1371/journal.pmed.1000097. dengue in 7613 patients: a retrospective study based on hierarchical models. Acta
[28] OECD, Gross domestic spending on R&D. URL: https://data.oecd.org/rd/gross-do Trop 2018;177:25–31. https://doi.org/10.1016/j.actatropica.2017.09.025.
mestic-spending-on-r-d.htm; 2019 (Accessed 30 April 2020). [52] Fernández E, Smieja M, Walter SD, Loeb M. A retrospective cohort study to
[29] Macedo-Hair G, Nobre F Fonseca, Brasil P. Characterization of clinical patterns of predict severe dengue in Honduran patients. BMC Infect Dis 2017;17. https://doi.
dengue patients using an unsupervised machine learning approach. BMC Infect org/10.1186/s12879-017-2800-3.
Dis 2019;19:1–11. https://doi.org/10.1186/s12879-019-4282-y. [53] Phuong NTN, Manh DH, Dumre SP, Mizukami S, Weiss LN, Thuong N Van, et al.
[30] Fernández E, Smieja M, Walter SD, Loeb M. A predictive model to differentiate Plasma cell-free DNA: a potential biomarker for early prediction of severe dengue.
dengue from other febrile illness. BMC Infect Dis 2016;16:1–7. https://doi.org/ Ann Clin Microbiol Antimicrob 2019;18. https://doi.org/10.1186/s12941-019-
10.1186/s12879-016-2024-y. 0309-x.
[31] Chatterjee S, Dey N, Shi F, Ashour AS, Fong SJ, Sen S. Clinical application of [54] Davi CCM, Pastor A, Oliveira T, Neto FB Lima, Braga-Neto U, Bigham A, et al.
modified bag-of-features coupled with hybrid neural-based classifier in dengue Severe dengue prognosis using human genome data and machine learning. IEEE
fever classification using gene expression data. Med Biol Eng Comput 2018;56: Trans Biomed Eng 2019. https://doi.org/10.1109/TBME.2019.2897285.
709–20. https://doi.org/10.1007/s11517-017-1722-y. [55] Tuan NM, Nhan HT, Van Vinh Chau N, Hung NT, Tuan HM, Tram T Van, et al. An
[32] Gambhir S, Malik SK, Kumar Y. PSO-ANN based diagnostic model for the early evidence-based algorithm for early prognosis of severe dengue in the outpatient
detection of dengue disease. New Horizons Transl Med 2017;4:1–8. https://doi. setting. Clin Infect Dis 2017;64:656–63. https://doi.org/10.1093/cid/ciw863.
org/10.1016/j.nhtm.2017.10.001. [56] Ahmad MH, Ibrahim MI, Mohamed Z, Ismail N, Abdullah MA, Shueb RH, et al.
[33] J. Schmidhuber, Deep learning in neural networks: an overview, Neural Netw 61 The sensitivity, specificity and accuracy of warning signs in predicting severe
(2015) 85–117. doi: https://doi.org/10.1016/j.neunet.2014.09.003. arXiv: dengue, the severe dengue prevalence and its associated factors. Int J Environ Res
1404.7828. Public Health 2018;15. https://doi.org/10.3390/ijerph15092018.
[34] Ho TS, Weng TC, Wang JD, Han HC, Cheng HC, Yang CC, et al. Comparing [57] Phakhounthong K, Chaovalit P, Jittamala P, Blacksell SD, Carter MJ, Turner P,
machine learning with case-control models to identify confirmed dengue cases. et al. Predicting the severity of dengue fever in children on admission based on
PLoS Negl Trop Dis 2020;14:1–21. https://doi.org/10.1371/journal. clinical features and laboratory indicators: application of classification tree
pntd.0008843. analysis. BMC Pediatr 2018;18:1–9. https://doi.org/10.1186/s12887-018-1078-
[35] Park S, Srikiatkhachorn A, Kalayanarooj S, Macareo L, Green S, Friedman JF, y.
et al. Use of structural equation models to predict dengue illness phenotype. PLoS [58] Huang SW, Tsai HP, Hung SJ, Ko WC, Wang JR. Assessing the risk of dengue
Negl Trop Dis 2018;12:e0006799. https://doi.org/10.1371/journal. severity using demographic information and laboratory test results with machine
pntd.0006799. learning. PLoS Negl Trop Dis 2020;14:1–19. https://doi.org/10.1371/journal.
[36] Khosavanna RR, Kareko BW, Brady AC, Booty BL, Nix CD, Lyski ZL, et al. Clinical pntd.0008960.
symptoms of dengue infection among patients from a non-endemic area and [59] Zhang H, Xie Z, Xie X, Ou Y, Zeng W, Zhou Y. A novel predictor of severe dengue:
potential for a predictive model: A multiple logistic regression analysis and the aspartate aminotransferase/platelet count ratio index (APRI). J Med Virol
decision tree. Am J Trop Med Hyg 2021;104:121–9. https://doi.org/10.4269/ 2018;90:803–9. https://doi.org/10.1002/jmv.25021.
AJTMH.20-0192. [60] Lin CY, Kolliopoulos C, Huang CH, Tenhunen J, Heldin CH, Chen YH, et al. High
[37] Eick SM, Dale AP, McKay B, Lawrence C, Ebell MH, Cordero JF, et al. levels of serum hyaluronan is an early predictor of dengue warning signs and
Seroprevalence of Dengue and Zika Virus in blood donations: a systematic review. perturbs vascular integrity. EBioMedicine 2019;48:425–41. https://doi.org/
Transfus Med Rev 2019;33:35–42. https://doi.org/10.1016/j.tmrv.2018.10.001. 10.1016/j.ebiom.2019.09.014.
[38] Al-Raddadi R, Alwafi O, Shabouni O, Akbar N, Alkhalawi M, Ibrahim A, et al. [61] Lee IK, Liu JW, Chen YH, Chen YC, Tsai CY, Huang SY, et al. Development of a
Seroprevalence of dengue fever and the associated sociodemographic, clinical, simple clinical risk score for early prediction of severe dengue in adult patients.
and environmental factors in Makkah, Madinah, Jeddah, and Jizan, Kingdom of PLoS One 2016;11:e0154772. https://doi.org/10.1371/journal.pone.0154772.
Saudi Arabia. Acta Trop 2019;189:54–64. https://doi.org/10.1016/j. [62] Lam PK, Tam DTH, Dung NM, Tien NTH, Kieu NTT, Simmons C, et al.
actatropica.2018.09.009. A prognostic model for development of profound shock among children
[39] Aguas R, Dorigatti I, Coudeville L, Luxemburger C, Ferguson NM. Cross-serotype presenting with dengue shock syndrome. PLoS One 2015;10:1–13. https://doi.
interactions and disease outcome prediction of dengue infections in Vietnam. Sci org/10.1371/journal.pone.0126134.
Rep 2019;9:1–12. https://doi.org/10.1038/s41598-019-45816-6.

14
W. Hoyos et al. Artificial Intelligence In Medicine 119 (2021) 102157

[63] Lam PK, Ngoc TV, Thuy TT Thu, Van NT Hong, Thuy TT Nhu, Tam DT Hoai, et al. [86] d. A. Marques-Toledo C, Degener CM, Vinhal L, Coelho G, Meira W, Codeço CT,
The value of daily platelet counts for predicting dengue shock syndrome: results et al. Dengue prediction by the web: tweets are a useful tool for estimating and
from a prospective observational study of 2301 Vietnamese children with dengue. forecasting Dengue at country and city level. PLoS Negl Trop Dis 2017;11.
PLoS Negl Trop Dis 2017;11. https://doi.org/10.1371/journal.pntd.0005498. https://doi.org/10.1371/journal.pntd.0005729.
[64] Rossi G, Karki S, Smith RL, Brown WM, Ruiz MO. The spread of mosquito-borne [87] Ramadona AL, Tozan Y, Lazuardi L, Rocklöv J. A combination of incidence data
viruses in modern times: A spatio-temporal analysis of dengue and chikungunya. and mobility proxies from social media predicts the intraurban spread of dengue
Spatial Spatio-temporal Epidemiol 2018;26:113–25. https://doi.org/10.1016/j. in Yogyakarta, Indonesia. PLoS Negl Trop Dis 2019;13. https://doi.org/10.1371/
sste.2018.06.002. journal.pntd.0007298.
[65] Delmelle E, Hagenlocher M, Kienberger S, Casas I. A spatial model of [88] Souza RC, Assunção RM, Oliveira DM, Neill DB, Meira W. Where did I get
socioeconomic and environmental determinants of dengue fever in Cali, dengue? Detecting spatial clusters of infection risk with social network data.
Colombia. Acta Trop 2016;164:169–76. https://doi.org/10.1016/j. Spatial Spatio-temporal Epidemiol 2019;29:163–75. https://doi.org/10.1016/j.
actatropica.2016.08.028. sste.2018.11.005.
[66] Mao L, Yin L, Song X, Mei S. Mapping intra-urban transmission risk of dengue [89] C. C. Huang, C. C. Hsu, H. R. Guo, S. B. Su, H. J. Lin, Dengue fever mortality score:
fever with big hourly cellphone data. Acta Trop 2016;162:188–95. https://doi. A novel decision rule to predict death from dengue fever, J Inf Secur 75 (2017 a)
org/10.1016/j.actatropica.2016.06.029. 532–540. doi: https://doi.org/10.1016/j.jinf.2017.09.014.
[67] Mutheneni SR, Mopuri R, Naish S, Gunti D, Upadhyayula SM. Spatial distribution [90] Huang HS, Hsu CC, Ye JC, Su SB, Huang CC, Lin HJ. Predicting the mortality in
and cluster analysis of dengue using self organizing maps in Andhra Pradesh, geriatric patients with dengue fever. Medicine 2017 b;96. https://doi.org/
India, 2011–2013. Parasite Epidemiol Contr 2018;3:52–61. https://doi.org/ 10.1097/MD.0000000000007878.
10.1016/j.parepi.2016.11.001. [91] Kesorn K, Ongruk P, Chompoosri J, Phumee A, Thavara U, Tawatsin A, et al.
[68] Akter R, Naish S, Hu W, Tong S. Socio-demographic, ecological factors and Morbidity rate prediction of dengue hemorrhagic fever (DHF) using the support
dengue infection trends in Australia. PLoS One 2017;12:1–18. https://doi.org/ vector machine and the Aedes aegypti infection rate in similar climates and
10.1371/journal.pone.0185551. geographical areas. PLoS One 2015;10:1–16. https://doi.org/10.1371/journal.
[69] Y. Yue, J. Sun, X. Liu, D. Ren, Q. Liu, X. Xiao, L. Lu, Spatial analysis of dengue pone.0125049.
fever and exploration of its environmental and socio-economic risk factors using [92] Md-Sani SS, Md-Noor J, Han WH, Gan SP, Rani NS, Tan HL, et al. Prediction of
ordinary least squares: a case study in five districts of Guangzhou City, China, mortality in severe dengue cases. BMC Infect Dis 2018;18. https://doi.org/
2014, Int J Infect Dis 75 (2018) 39–48. doi: https://doi.org/10.1016/j.ijid.2018.0 10.1186/s12879-018-3141-6.
7.023. [93] Nagori A, Dhingra LS, Bhatnagar A, Lodha R, Sethi T. Predicting hemodynamic
[70] Reyes-Castro PA, Harris RB, Brown HE, Christopherson GL, Ernst KC. Spatio- shock from thermal images using machine learning. Sci Rep 2019;9. https://doi.
temporal and neighborhood characteristics of two dengue outbreaks in two arid org/10.1038/s41598-018-36586-8.
cities of Mexico. Acta Trop 2017;167:174–82. https://doi.org/10.1016/j. [94] Dua X, Rajendra A. Data mining in biomedical imaging, signaling, and systems.
actatropica.2017.01.001. Boca Raton: Auerbach Publications; 2016. https://doi.org/10.1201/b10917.
[71] Ministry of Health. Epidemiological report–dengue fever (January to June, [95] Kalimuthu K, Panneerselvam C, Chou C, Tseng LC, Murugan K, Tsai KH, et al.
2008)., Technical Report, Ministry of Health. URL: http://bvsms.saude.gov.br Control of dengue and Zika virus vector Aedes aegypti using the predatory
/bvs/publicacoes/informe_epidemiologico_dengue_janeiro_junho_2008.pdf; copepod Megacyclops formosanus: synergy with Hedychium coronarium-
2008. synthesized silver nanoparticles and related histological changes in targeted
[72] Stolerman LM, Maia PD, Kutz JN. Forecasting dengue fever in Brazil: an mosquitoes. Process Saf Environ Prot 2017;109:82–96. https://doi.org/10.1016/
assessment of climate conditions. PLoS One 2019;14:e0220106. https://doi.org/ j.psep.2017.03.027.
10.1371/journal.pone.0220106. [96] Udayanga L, Ranathunge T, Iqbal MC, Abeyewickreme W, Hapugoda M.
[73] Carvajal TM, Viacrusis KM, Hernandez LFT, Ho HT, Amalin DM, Watanabe K. Predatory efficacy of five locally available copepods on Aedes larvae under
Machine learning methods reveal the temporal pattern of dengue incidence using laboratory settings: an approach towards bio-control of dengue in Sri Lanka. PLoS
meteorological factors in metropolitan Manila, Philippines. BMC Infect Dis 2018; One 2019;14:1–14. https://doi.org/10.1371/journal.pone.0216140.
18:1–15. https://doi.org/10.1186/s12879-018-3066-0. [97] Lee SJ, Kim S, Yu JS, Kim JC, Nai YS, Kim JS. Biological control of Asian tiger
[74] Zhao N, Charland K, Carabali M, Nsoesie EO, Maheu-Giroux M, Rees E, et al. mosquito, Aedes albopictus (Diptera: Culicidae) using Metarhizium anisopliae
Machine learning and dengue forecasting: comparing random forests and JEF-003 millet grain. J Asia Pac Entomol 2015;18:217–21. https://doi.org/
artificial neural networks for predicting dengue burden at national and sub- 10.1016/j.aspen.2015.02.003.
national scales in Colombia. PLoS Negl Trop Dis 2020;14:1–16. https://doi.org/ [98] Benelli G, Jeffries CL, Walker T. Biological control of mosquito vectors: past,
10.1371/journal.pntd.0008056. present, and future. Insects 2016;7:1–18. https://doi.org/10.3390/
[75] Salim NAM, Wah YB, Reeves C, Smith M, Yaacob WFW, Mudin RN, et al. insects7040052.
Prediction of dengue outbreak in Selangor Malaysia using machine learning [99] Nazni WA, Hoffmann AA, NoorAfizah A, Cheong YL, Mancini MV, Golding N,
techniques. Sci Rep 2021;11. https://doi.org/10.1038/s41598-020-79193-2. et al. Establishment of Wolbachia Strain wAlbB in Malaysian populations of Aedes
[76] Salami D, Sousa CA, R. O. Martins Md, Capinha C. Predicting dengue importation aegypti for dengue control. Curr Biol 2019;29. https://doi.org/10.1016/j.
into Europe, using machine learning and model-agnostic methods. Sci Rep 2020; cub.2019.11.007 (4241–4248.e5).
10. https://doi.org/10.1038/s41598-020-66650-1. [100] Indriani C, Tantowijoyo W, Rancès E, Andari B, Prabowo E, Yusdi D, et al.
[77] Chang FS, Tseng YT, Hsu PS, Chen CD, Lian IB, Chao DY. Re-assess vector indices Reduced dengue incidence following deployments of Wolbachia-infected Aedes
threshold as an early warning tool for predicting dengue epidemic in a dengue aegypti in Yogyakarta, Indonesia: a quasi-experimental trial using controlled
non-endemic country. PLoS Negl Trop Dis 2015;9. https://doi.org/10.1371/ interrupted time series analysis. Gates Open Res 2020;4:1–16. https://doi.org/
journal.pntd.0004043. 10.12688/gatesopenres.13122.1.
[78] J. M. Scavuzzo, F. Trucco, M. Espinosa, C. B. Tauro, M. Abril, C. M. Scavuzzo, A. [101] Ryan PA, Turley AP, Wilson G, Hurst TP, Retzki K, Brown-Kenyon J, et al.
C. Frery, Modeling dengue vector population using remotely sensed data and Establishment of wMel Wolbachia in Aedes aegypti mosquitoes and reduction of
machine learning, Acta Trop 185 (2018) 167–175. doi: https://doi.org/10.101 local dengue transmission in Cairns and surrounding locations in northern
6/j.actatropica.2018.05.003. arXiv:1805.02590. Queensland, Australia. Gates Open Res 2020;3:1547. https://doi.org/10.12688/
[79] Parra MCP, Fávaro EA, Dibo MR, Mondini A, Eiras ÁE, Kroon EG, et al. Using gatesopenres.13061.2.
adult Aedes aegypti females to predict areas at risk for dengue transmission: a [102] Lee JS, Lim JK, Dang DA, Nguyen THA, Farlow A. Dengue vaccine supplies under
spatial case-control study. Acta Trop 2018;182:43–53. https://doi.org/10.1016/j. endemic and epidemic conditions in three dengue-endemic countries: Colombia,
actatropica.2018.02.018. Thailand, and Vietnam. Vaccine 2017;35:6957–66. https://doi.org/10.1016/j.
[80] Ding F, Fu J, Jiang D, Hao M, Lin G. Mapping the spatial distribution of Aedes vaccine.2017.10.070.
aegypti and Aedes albopictus. Acta Trop 2018;178:155–62. https://doi.org/ [103] Hladish TJ, Pearson CA, Rojas D Patricia, Gomez-Dantes H, Halloran ME,
10.1016/j.actatropica.2017.11.020. Vazquez-Prokopec GM, et al. Forecasting the effectiveness of indoor residual
[81] Jácome G, Vilela P, Yoo CK. Present and future incidence of dengue fever in spraying for reducing dengue burden. PLoS Negl Trop Dis 2018;12:1–16. https://
Ecuador nationwide and coast region scale using species distribution modeling doi.org/10.1371/journal.pntd.0006570.
for climate variability’s effect. Ecol Model 2019;400:60–72. https://doi.org/ [104] Salimi A, Ziaii M, Amiri A, Zadeh M Hosseinjani, Karimpouli S, Moradkhani M.
10.1016/j.ecolmodel.2019.03.014. Using a Feature Subset Selection method and Support Vector Machine to address
[82] Wu CH, Kao SC, Shih CH, Kan MH. Open data mining for Taiwan’s dengue curse of dimensionality and redundancy in Hyperion hyperspectral data
epidemic. Acta Trop 2018;183:1–7. https://doi.org/10.1016/j. classification. Egypt J Remote Sens Space Sci 2018;21:27–36. https://doi.org/
actatropica.2018.03.017. 10.1016/j.ejrs.2017.02.003.
[83] Strauss RA, Castro JS, Reintjes R, Torres JR. Google dengue trends: an indicator of [105] Araujo M, Aguilar J, Aponte H. Fault detection system in gas lift well based on
epidemic behavior. Venezuelan Case Int J Med Informatics 2017;104:26–30. artificial immune system. In: Proceedings of the international joint conference on
https://doi.org/10.1016/j.ijmedinf.2017.05.003. neural Networks. vol. 3. IEEE; 2003. p. 1673–7. https://doi.org/10.1109/
[84] Li Z, Liu T, Zhu G, Lin H, Zhang Y, He J, et al. Dengue Baidu Search Index data ijcnn.2003.1223658.
can improve the prediction of local dengue epidemic: a case study in Guangzhou, [106] Puerto E, Aguilar J, Vargas R, Reyes J. An ar2p deep learning architecture for the
China. PLoS Negl Trop Dis 2017;11:e0005354. https://doi.org/10.1371/journal. discovery and the selection of features. Neural Process Lett 2019;50:623–43.
pntd.0005354. https://doi.org/10.1007/s11063-019-10062-4.
[85] Liu K, Wang T, Yang Z, Huang X, Milinovich GJ, Lu Y, et al. Using Baidu Search [107] Li Y, Sperrin M, Martin GP, Ashcroft DM, van Staa TP. Examining the impact of
Index to predict dengue outbreak in China. Sci Rep 2016;6. https://doi.org/ data quality and completeness of electronic health records on predictions of
10.1038/srep38040. patients’ risks of cardiovascular disease. Int J Med Inform 2020;133:104033.
https://doi.org/10.1016/j.ijmedinf.2019.104033.

15
W. Hoyos et al. Artificial Intelligence In Medicine 119 (2021) 102157

[108] Chen L, Duan G, Wang SY, Ma JF. A Choquet integral based fuzzy logic approach [114] Silver M, Karnieli A, Marra F, Fredj E. An evaluation of weather radar adjustment
to solve uncertain multi-criteria decision making problem. Expert Syst Appl 2020; algorithms using synthetic data. J Hydrol 2019;576:408–21. https://doi.org/
149:113303. https://doi.org/10.1016/j.eswa.2020.113303. 10.1016/j.jhydrol.2019.06.064.
[109] Qin B, Xia Y, Wang S, Du X. A novel Bayesian classification for uncertain data. [115] Lepenioti K, Bousdekis A, Apostolou D, Mentzas G. Prescriptive analytics:
Knowl-Based Syst 2011;24:1151–8. https://doi.org/10.1016/j. literature review and research challenges. Int J Inf Manag 2020;50:57–70.
knosys.2011.04.011. https://doi.org/10.1016/J.IJINFOMGT.2019.04.003.
[110] Shaheen O, El-Nagar AM, El-Bardini M, El-Rabaie NM. Probabilistic fuzzy logic [116] Menezes BC, Kelly JD, Leal AG, Roux GC Le. Predictive, prescriptive and detective
controller for uncertain nonlinear systems. J Frankl Inst 2018;355:1088–106. analytics for smart manufacturing in the information age. IFAC-PapersOnLine
https://doi.org/10.1016/j.jfranklin.2017.12.015. 2019;52:568–73. https://doi.org/10.1016/J.IFACOL.2019.06.123.
[111] Sugasawa S, Kubokawa T. Bayesian estimators in uncertain nested error [117] Google. Federated learning: collaborative machine learning without centralized
regression models. J Multivar Anal 2017;153:52–63. https://doi.org/10.1016/j. training data. URL: https://ai.googleblog.com/2017/04/federated-learning-colla
jmva.2016.09.011. borative.html; 2017.
[112] Velasco H, Laniado H, Toro M, Leiva V, Lio Y. Robust three-step regression based [118] Aguilar J, Sánchez M, Cordero J, Valdiviezo-Díaz P, Barba-Guamán L, Chamba-
on comedian and its performance in cell-wise and case-wise outliers. Mathematics Eras L. Learning analytics tasks as services in smart classrooms. Univ Access Inf
2020;8:1259. https://doi.org/10.3390/math8081259. Soc 2018;17:693–709. https://doi.org/10.1007/s10209-017-0525-0.
[113] Quick H, Waller LA. Using spatiotemporal models to generate synthetic data for [119] Aguilar J, Cordero J, Buendia O. Specification of the autonomic cycles of learning
public use. Spatial Spatio-temporal Epidemiol 2018;27:37–45. https://doi.org/ analytic tasks for a smart classroom. J Educ Comput Res 2018;56:866–91. https://
10.1016/j.sste.2018.08.004. doi.org/10.1177/0735633117727698.

16

You might also like