Investigating Machine Learning Techniques For Predicting Risk of Asthma Exacerbations: A Systematic Review

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 22

Investigating machine learning techniques for

predicting risk of asthma exacerbations: A


systematic review
Widana Kankanamge Darsha Jayamini1,2*, Farhaan Mirza1 ,
M. Asif Naeem3 , Amy Hai Yan Chan4
1* School of Engineering, Computer and Mathematical Sciences,
Auckland University of Technology, Auckland, 1010, New Zealand.
2 Department of Software Engineering, Faculty of Computing and

Technology, University of Kelaniya, Kelaniya, 11300, Sri Lanka.


3 Department of Data Science & Artificial Intelligence, National

University of Computer and Emerging Sciences (NUCES), Islamabad,


44000, Pakistan.
4 School of Pharmacy, Faculty of Medical and Health Sciences,

University of Auckland, Auckland, 1142, New Zealand.

*Corresponding author(s). E-mail(s): darsha.jayamini@autuni.ac.nz,


darshaj@kln.ac.lk;

Abstract
Asthma, a common chronic respiratory disease among children and adults, affects
more than 200 million people worldwide and causes about 450,000 deaths each
year. Machine learning is increasingly applied in healthcare to assist health prac-
titioners in decision-making. In asthma management, machine learning excels in
performing well-defined tasks, such as diagnosis, prediction, medication, and man-
agement. However, there remain uncertainties about how machine learning can
be applied to asthma exacerbation prediction. The study aimed to systematically
review recent applications of machine learning techniques in predicting the risk of
asthma attacks to assist asthma control and management. A total of 860 studies
were initially identified from the five databases, and after the screening and full-
text review, 20 were selected for inclusion in this review. The review considered
recent studies published from the year 2010 to the present. The included stud-
ies used machine learning techniques to support future asthma risk prediction by
using various data sources such as clinical, medical, biological, socio-demographic,

1
as well as environmental, and meteorological data. While some studies considered
prediction as a category, other studies predicted the probability of exacerbation.
Only a group of studies applied prediction windows. The paper proposes a con-
ceptual model to summarise how machine learning and available data sources can
be leveraged to produce effective models for the early detection of asthma attacks.
The review has also generated a list of data sources that other researchers may
use in similar work. Furthermore, we present opportunities for further research
and the limitations of the preceding studies.

Keywords: Asthma, Risk of attack, Prediction, Machine learning

1 Introduction
Asthma, a common long-term lung condition, is caused by inflammation in the air-
ways of the respiratory system. Asthma affected 262 million people in 2019, and it
causes 461,000 deaths each year on average [38]. Often beginning in childhood, asthma
affects all ages [27]. Asthma diagnosis considers clinical signs and symptoms, including
wheezing, breathlessness, chest tightening, and coughing [11]. Asthma exacerbations
could arise in patients due to multiple factors, including genetic, clinical, and envi-
ronmental factors, which also affect asthma control. For instance, if asthma patients
are exposed to atmospheric changes such as dust, fumes and pollen, their asthma can
worsen, and it can lead to asthma attacks.
The reasons for asthma exacerbations are multifactorial, and worsening of asthma
control may go unnoticed in patients if their symptoms are mild. However, asthma
can worsen rapidly, leading to hospitalisation or death [37]. The National Review of
Asthma Deaths (NRAD) conducted in the UK found that 58% of patients with asthma
who died were diagnosed as having mild or moderately severe asthma, with almost
half having had no asthma review in the previous 12 months [19]. This highlights
the need for innovative approaches to managing asthma and responding to attacks
to ensure that care is delivered in a timely manner. Another study [34] found that
hospitalisation due to COVID-19 has increased among children (5–17 years) with
poorly controlled asthma compared to children with well-controlled or without asthma.
This highlights the increased risk that asthma patients face and emphasises the need
for regular medical review, treatment and proper management since asthma control
can be affected by other factors, including environmental and meteorological conditions
[17, 18, 25], workplace conditions, and severe adverse life events [33]. Therefore, early
detection of asthma symptoms or exacerbations is essential to ensure rapid mitigation.
Even so, asthma prediction is challenging and complex due to its heterogeneous
nature and the diverse multifactorial triggers unique to individuals [10, 30]. Patients
with asthma are thus at a high risk of needing unscheduled medical care, which could
be prolonged and costly to the health system. Any approach to predict asthma attacks
will enhance asthma management and will save costs while increasing the quality of
life.

2
Machine learning (ML) techniques that have emerged to manage and treat various
diseases are increasingly being applied, particularly in terms of risk prediction. ML is
developed based on mathematics, logic, probability, neuroscience, and decision theory.
Using these concepts, different algorithms are constructed that can store important
information from data elements via continuous training sessions. Consequently, the
models built upon these algorithms have the ability to identify patterns and generate
outcomes from more complex data without any human-generated explicit program-
ming code [26]. These computer methodologies have been found to perform better than
traditional statistical approaches that can facilitate personalised medicine by address-
ing individual patient characteristics while processing vast amounts of data [21, 39].
Previous studies have shown that ML models can support asthma monitoring, pre-
diction, diagnosis, and control in children and adults. These techniques complement
clinicians if expertise or resources are limited [16] preventing or mitigating tragic cir-
cumstances. This may be particularly true for exacerbation prediction as it will be
difficult for health practitioners to consider a variety of factors, other than medical
and clinical, in determining future attacks. Computer-based assistance can help pre-
dict impending attacks by considering a range of factors, and it will assist clinicians
in making the appropriate decisions for the patient. The existence of a risk prediction
model will be helpful for physicians in monitoring disease control. It can also help
patients as they will be made aware of any possible upcoming attacks.
There are three categories of ML algorithms: Supervised, Unsupervised and Rein-
forcement learning algorithms. These main categories have sub-divisions, as illustrated
in Figure 1. Typical applications of classification and regression algorithms are seen
in spam filters, fraud detection systems, recommendation engines, image recognition
systems, and so on. Figure 1 below lists a few of the examples for each sub-category.
This systematic review explores the application of ML techniques to predict the risk
of asthma attacks. Specifically, we will identify the ML techniques used, categorise the
predictive models based on the outcome, and identify the suitability of ML approaches
as they relate to the scenarios analysed. It will also identify the best-performing ML
algorithms and provide prospective research directions.

2 Methodology
This review adopted the Preferred Reporting Items for Systematic reviews and
Meta-Analyses (PRISMA) methodology [29]. The protocol has been registered with
the PROSPERO International Prospective Register of Systematic Reviews (PROS-
PERO CRD42023402178). We extracted the data sources that the previous studies
incorporated to predict the risk of asthma attacks.

2.1 Inclusion criteria


Primary studies of ML-based solutions for asthma in adults and children published in
English from 2010 onward were included in our survey. No limitations were placed on
the study design, setting, or minimum follow-up though studies were limited to the
English language due to the language of the research team.

3
• Logistic Regression
• Naïve Bayes
• Decision Trees
• Support Vector Machines
Classification • Neural Networks
• Deep Learning
• Transfer Learning
Supervised
• Linear Regression
• Decision Trees
• Support Vector Regression
Regression • Neural Networks
• Deep Learning
Machine Learning

• Transfer Learning

Unsupervised Clustering • K-Means Clustering


• Agglomerative Hierarchical Clustering

• Q-learning
Reinforcement Decision Making • R Learning

Fig. 1: Overview of Machine Learning algorithms

2.2 Exclusion criteria


Studies focusing on the diagnosis/prediction of asthma itself (as a condition) rather
than asthma exacerbations were excluded from the current study. Additionally,
studies focusing on predicting asthma symptoms/control/severity level, number of
Emergency Department (ED) visits, and Peak Exploratory Flow Rate (PEFR) were
eliminated from the study pool. Further, we excluded research work published as
reviews, systematic reviews, editorials, letters, comments to the editor, book chapters,
abstracts, conceptual papers, opinions, unavailable sources, protocols, commentary,
and unpublished full-text documents.

2.3 Data sources and search strategy


We searched the databases Medline (via PubMed), Cochrane (via Wiley), Embase,
IEEE Xplore and Google Scholar using the search terms Asthma AND (attack*
OR exacerbat* OR control* OR symptom*) AND (detect* OR predict* OR diag-
nos* OR manag*) AND (”artificial intelligence” OR AI OR ”machine learning” OR
”deep learning” OR ”neural network” OR computer-based OR ”computer based”
OR computer-assisted OR ”computer assisted” OR ”computer technology” OR
technology).

4
2.4 Selection process
The database search resulted in finding 860 records in total. We used the tool RAYYAN
for the study selection process [28]. After removing 122 duplicates, the titles and the
abstracts of the remaining studies were screened. At the end of this step, we eliminated
667 records, and the rest of the studies were sent through full-text screening, which
resulted in the selection of 20 studies to be included in our review. These were blindly
reviewed by two different researchers. This study selection process is illustrated in
Figure 2.

2.5 Data extraction


A data extraction table was created to record the specific data extracted from the
selected studies. The table structure was designed as informed by previous research. We
extracted the year, country, title, aim, data source/s, sample size, research techniques,
findings, evaluation metrics, model performance and the limitations and strengths of
each study.

2.6 Risk of bias assessment


We conducted the risk of bias assessment of the selected studies using the tool Crit-
ical Appraisal Skills Programme (CASP) for Clinical Prediction Rule [3]. The CASP
checklist was completed for each study independently by two authors. Any disagree-
ments were resolved by consensus discussion. An additional file shows the quality of
each study (see Supplementary file S1).

3 Results
3.1 Characteristics of included studies
In this review, we identified the data sources that the previous studies incorporated
to predict the risk of asthma attacks. Table 1 shows the different data domains,
including clinical, medical, environmental, and meteorological, hospital, medical and
socio-demographic, that have been used to develop the asthma risk prediction models.
In addition, biological data was used in asthma risk prediction.

Table 1: Data sources used in previous studies.


Data source Previous studies

Biological [40]
Clinical [1, 4, 8, 9, 22, 23, 32, 35, 39–42]
Environmental and Meteorological [1, 5, 12, 31, 40]
Hospital and Medical [5, 7, 14, 15, 20, 22, 23, 39–41]
Socio-demographic [12, 13, 15, 20, 22, 23, 31, 35, 39–41]

5
Embase
IEEE
Cochrane 504
Xplore
92
147

Google
Medline
Scholar
47
70
Records identified
n = 860

Duplicates removed
n = 122

Title and abstract


screened
n = 738
Records excluded based
on title and abstract
n = 667

Full text screened


n = 71
Not asthma attacks - 29
Not asthma specific - 13
Records excluded
Not ML - 7
n = 51
Article unavailable -1
Not using real data - 1
Records included
n = 20

Fig. 2: Study inclusion process via PRISMA.

Included studies were conducted in a range of different countries, represented in


Figure 3. One study [42], used an international data set from multiple countries is not
included in the figure. Importantly, as can be seen from Figure 3, the USA is the main
country of origin for many of these studies predicting asthma attack risk using ML
techniques.

3.2 ML Models
Included studies that used different ML techniques to predict the risk of asthma
attacks. The outcome, asthma exacerbation, was considered either as a categorical
variable or as a continuous variable in the form of a probability. The studies can there-
fore be categorised into 2 groups: 1) studies that predict the risk of asthma attacks as
a classification and 2) studies that predict the risk of asthma attacks as a probability.
The classification group was further divided into 2 groups: studies with and without
a prediction window. The studies with a prediction window can again be subdivided

6
Fig. 3: Asthma risk prediction study count based on country.

based on the window size: less than and more than a month. This categorisation of
the studies is illustrated in Figure 4.

Predicting risk of
asthma attacks as a
probability (n=2)
Biological
Hospital
and Medical
Predicting risk of asthma
attacks without a
Data Studies prediction window (n=7)
Predicting risk of
asthma attacks as a
Clinical classification (n=18)
Predicting risk of asthma
attacks with a prediction
window (n=11)

Environmental
and
Meteorological
Socio-demographic
Predicting risk of asthma Predicting risk of asthma
attacks with a prediction attacks with a prediction
window less than a window of more than a
month (n=6) month (n=5)

Fig. 4: Presentation of the results of the review

In the literature, many studies predicted the risk of asthma exacerbations without
considering the temporal effect. For instance, the impact of weather data from the
previous day or a few days ago might have triggered the symptoms of asthma patients.
Therefore, it is critical to consider the impact of the different factors from previous

7
days (lags) in forecasting the risk of asthma attacks. Further, instead of just making a
prediction, a group of studies have constructed models to predict asthma attacks for
a specific time (prediction window), such as the coming 3 days, 7 days, 3 months or 1
year and so on. Table 2 shows the summary of the studies that developed ML models
to predict asthma risk as a category using the prediction window concept. Table 4
represents the study summary for predicting asthma risk without implementing the
prediction window concept. Table 5 shows the summary of the studies that predict
the risk of asthma attacks as a probability.
Among the ML algorithms employed in these studies, Logistic Regression (LR),
Decision Trees (DT), Random Forest (RF), Gradient Boost Machines (GBM), Extreme
Gradient Boost (XGB), Support Vector Machines (SVM), and Neural Network (NN)
were used most often. Most of the studies exercised the k-fold cross-validation tech-
nique to validate the model on the training data. Different studies chose different k
values such as 3 [31], 4 [20, 22], 5 [1, 4, 35, 39, 42] and 10 [7, 14, 15].

3.3 Model performance


To evaluate and compare the performance of the models, the studies have used dif-
ferent evaluation metrics as shown in Table 3. These were predominantly accuracy,
Area Under the Receiver Operating Curve (AUC-ROC), specificity, sensitivity, Posi-
tive Predictive Value (PPV), and Negative Predictive Value (NPV) metrics. Accuracy
is the ratio of correctly predicted outcomes and the total number of samples, simply
the overall correctness of the model. AUC-ROC value represents the capability of the
model to distinguish between the classes. Sensitivity, also called recall, measures the
completeness of positive predictions, while specificity measures the completeness of
negative predictions. PPV, also called precision, is the accuracy of positive predictions,
while NPV is the accuracy of negative predictions.

8
Table 2: Summary of the studies on predicting risk of asthma attacks as a classification - with prediction window.
Study Study aim Important predictors Feature Data Target ML Prediction Lookback Outcomes/Findings
identified selection balanc- algorithms window window
ing

[4] Develop and validate None None None Occurrence XGB, 2 days 5 days ML models reached higher
prediction models for of a severe SVM, LR (same discriminative performance
short-term prediction asthma exac- day or than a simple clinical rule
of severe asthma erbation(0 next
exacerbations or 1) day)
[8] Develop an algorithm Not given Not given Not Asthma CART 1 day 7 days Developed a model that
that predicts asthma given exacerbation predicts asthma
exacerbation one day (”no–alert” exacerbation on next day
in advance based on or ”high– based on the previous 7-day
the previous 7-day alert”) window
window
[9] Explore the use of Zone 7 MDL Not Asthma NB, ABN, 1 day 7 days Demonstrated the potential
telemonitoring data clear exacerbation SVM of ML techniques in
for building ML (”no-alert” personalised decision
algorithms that or support, found the
predict asthma ”high-alert”) attributes of day seven alone

9
exacerbations had more predictive power
[13] Develop a clinical obesity, atopy, None None Asthma LASSO, 30 days 30 days Near-term prediction
decision support tool medication, asthma exacerbation RF, XGB models performed better,
to predict the risk of controller plan, (0 or 1) and clinical factors alone
asthma exacerbation patient service had more predictive power
in children on a utilisation history
monthly- basis
[14] Develop a model to previous asthma None None Asthma Elastic-net 6 6 ML algorithms performed
predict asthma exacerbations, length exacerbation LR, RF, months months moderately in predicting
exacerbation in the 6 of treatment with (0 or 1) GBM asthma attacks after
months after stopping biologics stopping asthma biologics
biologic therapy using
ML techniques
[23] Develop a method Not given None None Asthma XGB 1 year Not Asthma risk prediction
which automatically hospital visit given model with automatic
explains the results of (0 or 1) explanation component
an asthma risk
prediction model on
imbalanced tabular
data
Study Study aim Important predictors Feature Data Target ML Prediction Lookback Outcomes/Findings
identified selection balanc- algorithms window window
ing

[20] Develop ML models comorbidity burden, Collinearity Random Asthma RF, XGB, 15 days Multiple Prior clinical information is
for predicting risk of previous exacerbations Under- exacerbation RNN, (10, 30, not sufficient for near-term
asthma exacerbations sampling (moderate or LR(LASSO, 60, 90, asthma risk prediction
and severe) Ridge, 365
Weight- Elastic- days)
ing net)
[22] Discover the usability mean number of daily None None Asthma LR, RF, 5 days Not Digital inhaler data with
of digital inhaler data inhalations during exacerbation XGB given demographic and clinical
merged with prior 4 days, (0 or 1) factors can be used for
demographic and comparison to baseline predicting impending
clinical data for inhalation parameters, asthma exacerbations
predicting future previous exacerbations
asthma exacerbations and comorbidities
[35] Analyse the impact of age, hospital stay None Under- A second RF, SVM 1 year 1 year Socio-markers can be
using demographic, (duration), blight, sampling hospital visit reliable predictors of
social, and symptoms neighbourhood (0 or 1) hospital re-visits and
related features in inequality combined demographics,
asthma hospital socio-markers and

10
re-visit prediction on biomarkers can predict
pediatric asthma future hospital re-visits
patients moderate accurately.
[15] Predict asthma Asthma medications None None Asthma Elastic-net 1 year Not Inclusion of medication-
exacerbations in the exacerbation LR, RF, given related covariates increase
next year with the use (0 or 1) XGB the odds of having as
of electronic data asthma exacerbation
conditional on
previous adherence to
asthma medications
[42] Derive and validate a Not given PCA, Random Asthma LR, NB, 4 days None ML models combined with
model predicting Recursive over- exacerbation DT, NN (same telemonitoring data such as
future exacerbations Feature sampling, (0 or 1) day or PEFR
using telemonitoring Elimina- Ran- next 3
data tion dom days)
under-
sampling,
SMOTE
Table 3: Performance of the ML models developed for asthma risk prediction.
Application Study Best model (overall) AUC-ROC1 AUPRC2 Accuracy Specificity Sensitivity NPV3 PPV4
type

Classification [1] GBM 0.97 - 97% - 0.98 - -


- Without [7] NB - - 70.7% 0.73 0.70 0.53 0.85
prediction [12] NN - - 94% - - - -
window
[32] BPNN - - 86% 0.86 0.85 - -
[31] XGB 0.84 - - - - - -
[40] RF 0.50 - - - - - -
Non-severe: 0.71 0.67 0.64 0.78 0.51
[41] XGB ED visits: 0.88 - - 0.76 0.84 0.99 0.12
Hospitalisation: 0.85 0.73 0.86 1.0 0.05

Classification [4] LR 0.88 - - - - - -


- With [8] CART - - 80.9% 0.97 0.65 - -
prediction [9] ABN - - 100% 1.00 1.00 - -
window
[13] XGB 0.76 - - - - - -

11
Primary outcome: XGB 0.74
[14] - - - -
Secondary outcome: RF 0.68
[15] Elastic-net LR 0.70 - - 0.57 0.72 - -
Patients without comorbidities : XGB 0.90 0.007 0.007 0.03
[20] - - -
All patients: LightGBM 0.90 0.01 0.01 0.03
[23] XGB 0.86 - - - - - -
[22] XGB 0.83 - - - - - -
[35] RF - - 66.05% 0.68 0.65 - -
[42] LR 0.83 - - 0.83 0.90 - -

Probability [5] LSTM RMSE5 = 0.093 - - - - - -


prediction [39] TSANN 0.69 - - - - - -

1 Area under the receiver operating characteristic curve


2 Area under the precision-recall curve
3 Negative Predictive Value
4 Positive Predictive Value
5 Root Mean Squared Error
Table 4: Summary of the studies on predicting risk of asthma attacks as a classification - without prediction window.
Study Study aim Important predictors Feature Data Target ML algo- Outcomes/Findings
identified selection imbalance rithms
technique handling

[1] Utilising bio-signals and smoking status, LASSO SMOTE Asthma DT, DT has the poorest
environmental triggers for allergies, normal PEFR Regres- using SVM attack (0 or GBM, performance while GBM
asthma risk prediction and reading, asthma sion (SMOTESVM) 1) LR, RF, has the best
symptoms (last 4 SVM performance
weeks), medication (last
4 weeks), no. of asthma
attacks (last 4 weeks),
temperature, wind
speed, relative
humidity, AQI
[7] To evaluate models for Not identified Not N/A (data Asthma NB, DT, NB performed better
predicting the severity of conducted not highly exacerba- E-DT, among ML models, but
asthma exacerbation in ED imbalanced) tion (Mild SVM, both the PRAM score
visits, constructed from data or Moder- IB1, IB10 and the NB model were
using ML techniques and to ate/Severe) less accurate than
select the best-performing model physicians

12
to compare with Pediatric
Respiratory Assessment Measure
(PRAM) score and predictions
made by ED physicians
[12] Exploring the impact of weather Not identified Not None Asthma NN Personalisation has
on asthma exacerbations and conducted attack(low positive effects on
examines the effectiveness and or high) asthma
limitations of several recent self-management, and
asthma self-management tools user demographics and
and applications weather have positive
effects on personalised
weather-based
healthcare
[32] Determine the severity of Not identified Not None Risk of BPNN, BPNN achieved higher
asthma using real-time data conducted asthma SVM, accuracy than other
from medical IoT devices while (low, KNN classifiers
monitoring continuously medium,
high)
Study Study aim Important predictors Feature Data Target ML Outcomes/Findings
identified selection imbalance algorithm
technique handling

[31] Compare the performance of ML patient vital signs, None None Hospitalisation DT, Except DT, all other
approaches incorporating a few acuity, age, weight, (0 or 1) LASSO models perform well in
data domains with community socioeconomic status, Regres- prediction and adding
viral load for early prediction of dry bulb temperature, sion, RF, weight, SES, and
the need for hospital-level care relative humidity, wind XGB weather data improved
in pediatric asthma speed, and station the model performance
pressure
[40] Validating the hypothesis that set of SNPs RF None Severe RF RF modelling can
the use of RF classifier to select asthma produce accurate results
SNPs would result in an exacerba- using hundreds of SNPs
improved predictive model of tion (0 or and RF-selected SNPs
asthma exacerbations 1) contain information
about exacerbation,
while the random SNPs
do not
[41] Develop ML models with large- age, long-acting β None None None-severe LR, RF, Higher albumin was
scale and real-world local data agonist, high-dose exacerba- Light- related to lower risk of
to predict asthma exacerbations inhaled glucocorticoid, tion (0 or GBM asthma exacerbation,

13
or chronic oral 1); ED and for patients with
glucocorticoid therapy, visits (0 or low FEV1 or FEV1 to
low FEV1 and FEV1 to 1); Hospi- FVC ratio, these
FVC ratio talisation (0 features significantly
or 1) affect their individual
risk prediction
4 Discussion
4.1 Data for predicting risk of asthma attacks
A conceptual model for predicting the risk of asthma attacks using machine learn-
ing techniques is presented in Figure 5. As asthma is a health condition affected by
many factors, data from heterogeneous domains have been integrated to predict the
risk of asthma exacerbations using the ML algorithms as listed in Table 1. Asthma
is affected by socio-demographic factors, including age, sex, education level, socioe-
conomic status, and marital status [6]. Socio-demographic features including age,
gender, race, insurance type, area of residence [12–15, 35, 39–41] have been used with
clinical factors such as asthma symptoms, PEFR, reliever medications, inhalations
[1, 4, 8, 22, 41]. While diagnoses, BMI, smoking status, previous asthma attacks, dis-
pensed medications, Charlson Comorbidity Index, comorbidities, ED visits, hospital
visits [1, 5, 7, 13–15, 20] were embedded as hospital and medical factors.
Environmental and meteorological data can be used for health forecasting with
respect to respiratory diseases [36]. Meteorological data, such as temperature, humid-
ity, air pressure, and wind speed have impacts on asthma [17, 18]. Air pollutants can
have chronic effects on humans and thereby pose a greater risk to a larger population.
These factors are useful for predicting asthma since they could lead to a high number
of asthma-related hospital admissions. Asthma triggers, including temperature, rela-
tive humidity, wind speed, AQI, air pressure, PM2.5 , pollen and gasses like CO, N O2 ,
SO2 were the environmental and meteorological factors investigated in the predictor
list by [1, 5, 12, 13, 31, 32]. Additionally, we found one study using the biological
factor SNP [40].
Among several factors collected from multiple domains, previous studies have found
that smoking status [1], normal PEFR and daily readings, previous asthma attacks,
asthma symptoms, medication usage [1, 13, 15, 20, 22, 39], comorbidities [13, 20, 22],
hospital stays [35] are important predictors of future asthma attacks. Socio-economic
factors including SES [31], age [31, 35, 41], gender, race [39] have shown importance
in prediction. Weather changes highly affect the control level of asthma patients, and
accordingly, past studies show that temperature, wind speed, relative humidity,[1, 31]
and AQI [1] have more predictive power.
These results show the importance of including multiple factors from different
domains (see Figure 5). However, only limited studies have utilised predictors from
more than three of the above heterogeneous domains. Therefore, it seems important
to incorporate different factors from multiple domains to forecast impending asthma
attacks, as previous works have shown important predictors from each domain. Fur-
ther, with the vast development of technology, smart devices such as smart peak flow
meters and smart inhalers are in the hands of asthma patients. Utilising this data will
be more accurate than using data from daily diaries because manually entered data
is prone to human errors, and recall bias. Moreover, at present, social media usage
has become more popular among people, where they share their feelings and opinions.
Therefore, it is also worth investigating the use of social media data streams in asthma
risk prediction.

14
Table 5: Summary of the studies on predicting risk of asthma attacks as a prob-
ability.

Study Study aim Important Feature Data Target ML Outcomes/


predic- selection balanc- algorithm Findings
tors technique/s ing

[5] Develop an None None None Relative LSTM DQN framework


asthma risk risk for predicting
prediction (numeric) asthma risk
model by with the use of
investigating personal risk
the ability of scores of
Q-learning triggers
method

[39] Predict the Gender, Attention None Asthma TSANN A TSANN


risk of asthma race, weights of exac- model
exacerbations some the model erba- employing
and explore diagnosis tion self-attention on
potential risk codes (prob- both code-level
factors and med- abil- and visit-level
ications ity) and addition of
elapsed time
improved model
performance

4.2 Handling data imbalance


In real-world data, especially health data, the existence of positive cases is very low
compared to the negatives. For instance, the proportion of asthma patients who are
admitted to hospitals due to asthma attacks is very low compared to the number
of patients who are not admitted to hospitals. The aim of this kind of research is
to identify the minority class over the majority. Hence, class imbalance needs to be
carefully addressed while applying ML models that highly depend on data. Otherwise,
the model will not be able to learn the patterns of positive cases and thereby produce
a lower performance for the minority class.
There are several ways to handle data imbalance. The most commonly used tech-
niques are random under-sampling [20, 35, 42] and random over-sampling [42]. In
random under-sampling, the number of records in the majority class will be reduced
to match with the minority class, and it is vice versa for random over-sampling. In
addition to these techniques, SMOTE is a more advanced technique that is used to
handle data imbalance. Alternative to the random over-sampling technique mentioned
above, SMOTE uses a more robust approach to generate new points based on the
existing data. Further, the Edited Nearest Neighbors (ENN) and Tomek link are some
other under-sampling techniques that can be used for down-sampling data.
Data imbalance should be handled; otherwise, classifiers would be biased or gener-
ate misleading results [2, 17, 20]. However, applying a data balancing technique is not
necessary, provided that the data set has considerable balance among target values. It
should be noted that any of these data balancing techniques need to be applied only on
the training set; otherwise, a data leakage issue arises. These techniques may improve
the predictions in some cases by artificially eliminating or simplifying the records

15
Clinical

Environmental
and
Hospital
Meteorological
and Medical

Biological
Social media Socio-demographic
Smart device

Data

Handling data
imbalance

Predicting risk of
asthma attacks

Category Probability

Without a With a prediction


prediction window window

0 days 365 days


| |

2 days
Prediction window

Decreasing model
5 days performance
7 days
15 days

180 days

Fig. 5: Conceptual overview for asthma risk prediction using ML

in the data set. The prediction model using the random under-sampling technique
showed the best performance with higher specificity and sensitivity values compared
to the other models (see Table 3). This highlights the importance of addressing data
imbalance prior to developing the ML models.

4.3 ML techniques in asthma risk prediction


Clinical practices are mostly dependent on the professional skills of individuals
acquired with practice and experience. However, the human brain has limited abilities
to identify longitudinal patterns and aggregate and extract data from massive amounts

16
of data [26]. ML-based models can assist in analysing the growing amount of patient
data, achieving patient-oriented decisions [24]. ML techniques can discover undisclosed
patterns and find new information from the data. Costs incurred in healthcare could
be reduced with the support of ML. Even though the knowledge and experience of
a physician is irreplaceable, computer-aided support could increase the efficiency and
accuracy of the diagnostic process. Therefore, ML could play a significant role in
diagnosing and predicting asthma.
There are several machine learning algorithms that previous studies have used to
predict imminent asthma attacks, as mentioned in Table 2 and Table 4. According to
Table 3, in terms of classifying a patient having or not having an asthma attack in the
future, XGM, RF and LR algorithms have achieved the best performance compared to
other classifiers, regardless of the presence of a prediction window. On the other hand,
both studies that predict the risk probability have used extensions of NN. In contrast
to predicting the risk of asthma attacks as a category (yes or no), the probability of
having an asthma attack will deliver more information to the health practitioner in
making decisions. Limited studies for probability prediction identified through this
review (see Table 5 emphasise that further research is required to understand the
applicability of ML techniques in predicting the risk of asthma attacks as a probability.
Further, to apply these prediction models into practice, model interpretation should
be included.

4.4 Prediction window


When anticipating the risk of attacks, it is more useful if the prediction can be made
within a specified time window. As illustrated in Table 4, researchers set the prediction
window from 1 day [8, 9] to 1 year [15, 23, 35]. However, none of the studies mentioned
a specific reason for choosing the window size. They used either a random value or
choose the window size according to the nature of the dataset. From the analysis of
the results of those models, we identified that short-term predictions achieved better
performance compared to long-term predictions, which is demonstrated in Figure 5. In
parallel, they entered the past data into the prediction models with a lookback window,
which is also called lag. While most of the studies have prepared data considering
lookback windows, nevertheless, these are different from the prediction window size.
Also, in selecting the prediction window size, it was either randomly selected or based
on the nature of the data collected. It is important to include data from the past within
a certain timeframe, especially those considering triggers, such as weather factors
changing over a short time, where the impact on asthma patients will be observed
after a couple of days. Therefore, there is a lack of knowledge about the optimum size
of the lookback and prediction windows in predicting the risk of asthma attacks.

5 Conclusion
With growing ML applications in healthcare, several studies have employed these
advanced computer-based techniques for predicting impending asthma attacks using
heterogeneous data sources. In this review, we have presented a conceptual overview
of the landscape of this research so that potentially the researchers can use Figure 4

17
to determine which strategy works better for them in terms of designing a research
project. Secondly, this review has generated a list of data sources available for
researchers, which is available in Table 1. Thirdly, we managed to present in a tab-
ular form for each branch of the review, showing lists of studies predicting the risk
of asthma attacks as a classification with and without a prediction window (Tables 2
and 4) and prediction as a probability (Table 5).
The findings of this review confirm the importance of using different predictors
from multiple data sources for predicting asthma attacks. The analysis shows the
applicability of ML algorithms and their ability to perform on asthma patient data,
including other factors. For classifying future attacks, ensemble methods based on DTs
have given better performance, while extended NN has achieved acceptable results on
probability prediction. Further, the summary of the past studies confirmed that having
a shorter prediction window improves the prediction outcome. For future research in
ML applications in asthma prediction, data imbalance handling needs to be carefully
followed on the integrated data from multiple sources while exploring the performance
of models in predicting the probability of the risk of asthma with an explanation while
identifying the optimum lookback and prediction window.
Supplementary information. Supplementary file S1: Results of risk of bias
assessment.

Statements and Declarations


• Ethics approval and consent to participate
Not applicable
• Consent for publication
Not applicable
• Availability of data and materials
Not applicable
• Competing interests
AC is the recipient of the Auckland Medical Research Foundation senior research
fellowship, which investigates asthma attack prediction models. She has also received
funding from the Health Research Council for research into asthma. AC also sits
on the board of Asthma New Zealand, is affiliated with Asthma UK Centre of
Applied Research, member of the Respiratory Effectiveness Group and is part of
the European Respiratory Society CONNECT Clinical Research Collaboration. The
authors have no relevant financial or non-financial interests to disclose.
• Funding
No funding was received for conducting this study
• Author’s contribution
WJ searched the databases and conducted the initial article selection. The studies
included in the study were blindly selected by WJ, FM and AC and finalised through
a discussion. WJ performed the data extraction. WJ, FM and AC blindly conducted
risk of bias assessments to check the quality of the studies and discussed them. WJ
was a major contributor to writing the manuscript. All authors read and approved
the final manuscript.

18
References
[1] Alharbi E, Cherif A, Nadeem F, et al (2022) Machine learning models for early
prediction of asthma attacks based on bio-signals and environmental triggers.
In: 2022 IEEE/ACS 19th International Conference on Computer Systems and
Applications (AICCSA), pp 1–7, https://doi.org/10.1109/AICCSA56895.2022.
10017305

[2] Bose S, Kenyon CC, Masino AJ (2021) Personalized prediction of early childhood
asthma persistence: A machine learning approach. PLOS ONE 16(3):1–17. https:
//doi.org/10.1371/journal.pone.0247784

[3] Critical Appraisal Skills Programme (2022) Casp (clinical prediction rule check-
list). URL https://casp-uk.net/

[4] De Hond AAH, Kant IMJ, Honkoop PJ, et al (2022) Machine learning did not
beat logistic regression in time series prediction for severe asthma exacerbations.
Sci Rep 12(1):20363. https://doi.org/10.1038/s41598-022-24909-9

[5] Do Q, Tran S, Doig A (2019) Reinforcement learning framework to identify


cause of diseases - predicting asthma attack case. In: 2019 IEEE International
Conference on Big Data (Big Data), pp 4829–4838, https://doi.org/10.1109/
BigData47090.2019.9006407

[6] Eisner MD, Katz PP, Yelin EH, et al (2000) Risk factors for hospitalization
among adults with asthma: the influence of sociodemographic factors and asthma
severity. Respiratory research 2(1):1–8. https://doi.org/10.1186/rr37

[7] Farion KJ, Wilk S, Michalowski W, et al (2013) Comparing predictions made by


a prediction model, clinical score, and physicians: Pediatric asthma exacerbations
in the emergency department. Appl Clin Informatics 4(3):376–391. https://doi.
org/10.4338/ACI-2013-04-RA-0029

[8] Finkelstein J, Jeong IC (2016) Using cart for advanced prediction of asthma
attacks based on telemonitoring data. In: 2016 IEEE 7th Annual Ubiquitous
Computing, Electronics and Mobile Communication Conference (UEMCON), pp
1–5, https://doi.org/10.1109/UEMCON.2016.7777890

[9] Finkelstein J, Jeong IC (2017) Machine learning approaches to personalize early


prediction of asthma exacerbations. Annals of the New York Academy of Sciences
1387(1):153–165. https://doi.org/10.1111/nyas.13218

[10] Gaudillo J, Rodriguez JJR, Nazareno A, et al (2019) Machine learning approach


to single nucleotide polymorphism-based asthma prediction. PLOS One 14(12):1–
12. https://doi.org/10.1371/journal.pone.0225574

19
[11] Global Initiative for Asthma (2021) Global strategy for asthma management and
prevention. Tech. rep., Global Initiative for Asthma, URL https://ginasthma.org/

[12] Haque R, Ho SB, Chai I, et al (2021) Intelligent asthma self-management


system for personalised weather-based healthcare using machine learning. In:
Advances and Trends in Artificial Intelligence. Artificial Intelligence Prac-
tices. Springer International Publishing, pp 297–308, https://doi.org/10.1007/
978-3-030-79457-6 26

[13] Hurst JH, Zhao C, Hostetler HP, et al (2022) Environmental and clinical data util-
ity in pediatric asthma exacerbation risk prediction models. BMC Medical Infor-
matics and Decision Making 22(1). https://doi.org/10.1186/s12911-022-01847-0

[14] Inselman JW, Jeffery MM, Maddux JT, et al (2023) A prediction model for
asthma exacerbations after stopping asthma biologics. Ann Allergy Asthma
Immunol 130(3):305–311. https://doi.org/10.1016/j.anai.2022.11.025

[15] Jiao T, Schnitzer ME, Forget A, et al (2022) Identifying asthma patients at high
risk of exacerbation in a routine visit: A machine learning model. Respir Med
198:106866. https://doi.org/10.1016/j.rmed.2022.106866

[16] Kaplan A, Cao H, FitzGerald JM, et al (2021) Artificial intelligence/machine


learning in respiratory medicine and potential role in asthma and copd diagnosis.
The Journal of Allergy and Clinical Immunology: In Practice 9(6):2255–2261.
https://doi.org/10.1016/j.jaip.2021.02.014

[17] Khatri KL, Tamil LS (2018) Early detection of peak demand days of chronic
respiratory diseases emergency department visits using artificial neural networks.
IEEE Journal of Biomedical and Health Informatics 22(1):285–290. https://doi.
org/10.1109/JBHI.2017.2698418

[18] Kim MS, Lee JH, Jang YJ, et al (2020) Hybrid deep learning algorithm with open
innovation perspective: a prediction model of asthmatic occurrence. Sustainability
12(15):6143. https://doi.org/10.3390/su12156143

[19] Levy ML, Winter R (2015) Asthma deaths: what now? Thorax 70(3):209–210.
https://doi.org/10.1136/thoraxjnl-2015-206800, URL https://thorax.bmj.com/
content/70/3/209

[20] Lisspers K, Ställberg B, Larsson K, et al (2021) Developing a short-term pre-


diction model for asthma exacerbations from swedish primary care patients’
data using machine learning - based on the arctic study. Respiratory Medicine
185:106483. https://doi.org/10.1016/j.rmed.2021.106483

[21] Lovrić M, Banić I, Lacić E, et al (2021) Predicting treatment outcomes


using explainable machine learning in children with asthma. Children 8(5):376.
https://doi.org/10.3390/children8050376

20
[22] Lugogo NL, DePietro M, Reich M, et al (2022) A predictive machine learning
tool for asthma exacerbations: Results from a 12-week, open-label study using
an electronic multi-dose dry powder inhaler with integrated sensors. J Asthma
Allergy 15:1623–1637. https://doi.org/10.2147/jaa.S377631

[23] Luo G, Johnson MD, Nkoy FL, et al (2020) Automatically explaining machine
learning prediction results on asthma hospital visits in patients with asthma:
Secondary analysis. JMIR Med Inform 8(12):e21965. https://doi.org/10.2196/
21965

[24] Mariani S, Lahr MM, Metting E, et al (2021) Developing an ml pipeline for


asthma and copd: The case of a dutch primary care service. International Journal
of Intelligent Systems 36(11):6763–6790. https://doi.org/10.1002/int.22568

[25] Maung TZ, Bishop JE, Holt E, et al (2022) Indoor air pollution and the health
of vulnerable groups: A systematic review focused on particulate matter (pm),
volatile organic compounds (vocs) and their effects on children and people with
pre-existing lung disease. International journal of environmental research and
public health 19(14). https://doi.org/10.3390/ijerph19148752

[26] Messinger AI, Luo G, Deterding RR (2020) The doctor will see you now: How
machine learning and artificial intelligence can extend our understanding and
treatment of asthma. Journal of Allergy and Clinical Immunology 145(2):476–478.
https://doi.org/10.1016/j.jaci.2019.12.898

[27] National Health Service (2021) Asthma. URL https://www.nhs.uk/conditions/


asthma/, accessed: 2021-09-18

[28] Ouzzani M, Hammady H, Fedorowicz Z, et al (2016) Rayyan—a web and mobile


app for systematic reviews. Systematic Reviews 5(1):210. https://doi.org/10.
1186/s13643-016-0384-4

[29] Page MJ, McKenzie JE, Bossuyt PM, et al (2021) The prisma 2020 statement: an
updated guideline for reporting systematic reviews. Systematic Reviews 10(1):89.
https://doi.org/10.1136/bmj.n71

[30] Patel D, Hall GL, Broadhurst D, et al (2021) Does machine learning have a role
in the prediction of asthma in children? Paediatric Respiratory Reviews https:
//doi.org/10.1016/j.prrv.2021.06.002

[31] Patel SJ, Chamberlain DB, Chamberlain JM (2018) A machine learning approach
to predicting need for hospitalization for pediatric asthma exacerbation at the
time of emergency department triage. Acad Emerg Med 25(12):1463–1470. https:
//doi.org/10.1111/acem.13655

21
[32] Priya CK, Sudhakar M, Lingampalli J, et al (2021) An advanced fog based
health care system using ann for the prediction of asthma. In: 2021 5th Interna-
tional Conference on Computing Methodologies and Communication (ICCMC),
pp 1138–1145, https://doi.org/10.1109/ICCMC51019.2021.9418248

[33] Sandberg S, Paton JY, Ahola S, et al (2000) The role of acute and chronic stress
in asthma attacks in children. The Lancet 356(9234):982–987. https://doi.org/
10.1016/S0140-6736(00)02715-X

[34] Shi T, Pan J, Katikireddi SV, et al (2022) Risk of covid-19 hospital admission
among children aged 5–17 years with asthma in scotland: a national incident
cohort study. The Lancet Respiratory Medicine 10(2):191–198. https://doi.org/
10.1016/S2213-2600(21)00491-4

[35] Shin EK, Mahajan R, Akbilgic O, et al (2018) Sociomarkers and biomarkers:


predictive modeling in identifying pediatric asthma patients at risk of hospital
revisits. NPJ Digit Med 1:50. https://doi.org/10.1038/s41746-018-0056-y

[36] Soyiri IN, Reidpath DD (2012) Evolving forecasting classifications and appli-
cations in health forecasting. International journal of general medicine 5:381.
https://doi.org/10.2147/IJGM.S31079

[37] Tsang KCH, Pinnock H, Wilson AM, et al (2020) Application of machine learn-
ing to support self-management of asthma with mhealth. Annual International
Conference of the IEEE Engineering in Medicine and Biology Society IEEE
Engineering in Medicine and Biology Society Annual International Conference
2020:5673–5677. https://doi.org/10.1109/EMBC44109.2020.9175679

[38] World Health Organization (2021) Asthma. URL https://www.who.int/


news-room/fact-sheets/detail/asthma, accessed: 2021-09-15

[39] Xiang Y, Ji H, Zhou Y, et al (2019) Asthma exacerbation prediction and interpre-


tation based on time-sensitive attentive neural network: A retrospective cohort
study. medRxiv p 19012161. https://doi.org/10.1101/19012161

[40] Xu M, Tantisira KG, Wu A, et al (2011) Genome wide association study to predict


severe asthma exacerbations in children using random forests classifiers. BMC
Med Genet 12. https://doi.org/10.1186/1471-2350-12-90

[41] Zein JG, Wu CP, Attaway AH, et al (2021) Novel machine learning can predict
acute asthma exacerbation. Chest 159(5):1747–1757. https://doi.org/10.1016/j.
chest.2020.12.051

[42] Zhang O, Minku LL, Gonem S (2021) Detecting asthma exacerbations using daily
home monitoring and machine learning. Journal of Asthma 58(11):1518–1527.
https://doi.org/10.1080/02770903.2020.1802746

22

You might also like