Professional Documents
Culture Documents
Asthma Attack Prediction Models IEEE Conference
Asthma Attack Prediction Models IEEE Conference
Asthma Attack Prediction Models IEEE Conference
Abstract—Asthma is a common respiratory disease affected by a reliable asthma attack prediction model. We discovered that
different bio-signals and environmental triggers. Early prediction few models had used bio-signals and environmental triggers
of asthma attacks is crucial to saving a patient’s life. Several to predict asthma attacks [15]–[17]. In addition, less consid-
machine learning models have been designed to predict asthma
attacks. However, few models have exploited bio-signals and en- eration has been paid to feature selection algorithms and the
vironmental triggers to build an asthma attack prediction model. variation of machine learning models.
Additionally, little attention has been devoted to feature selection To fill this gap, this study uses both bio-signals and
algorithms and the variation of machine learning models. This environmental triggers to create an effective asthma attack
study develops an asthma attack prediction model by testing prediction model. This study aims to find an optimum classifier
different machine learning classifiers. The used dataset includes
two main parts; the bio-signals dataset, which is recorded daily for asthma attack prediction. Five different classifiers are
from 21 volunteers for three months, and the environmental compared regarding the accuracy and recall metrics. These
dataset, which is available online. The utilized machine learning classifiers are support vector machine (SVM), logistic regres-
classifiers are support vector machine, logistic regression, decision sion (LR), decision Tree (DT), Random Forest (RF), and
tree, random forest, and gradient boosting model. Each classifier Gradient Boost Model (GBM). Besides, the L1-based feature
was grid searched to find the best value for the primary hyper-
parameters, then we used five-fold cross-validation to train each selection algorithm is applied with these techniques to provide
model. Results show that the gradient boost model outperforms more accurate models. In addition, this study introduces a new
the other classifiers when training it with 0.5 for the depth asthma dataset that includes many bio-signal and environmen-
parameter and 9 for the sub-sample parameter. The prediction tal triggers.
of the testing set produces 97.2% accuracy and 97.1% recall. The structure of this article is as follows. The related work is
Index Terms—Asthma attack, Bio-signals, Environmental, Pre-
diction, Machine learning. introduced in Section II. The study’s methodology is explained
in Section III. The findings of all classifiers are introduced in
Section IV. Section V introduces the conclusion and future
I. I NTRODUCTION
research.
Asthma is one of the most prevalent inflammatory diseases
that can affect anyone at any age [1]. Many researchers inves- II. R ELATED W ORK
tigated the causes and risk factors of asthma and the likelihood Recent studies utilized different triggers and classifiers to
of having an asthma attack. Bio-signals and environmental build a prediction model for asthma attacks. Finkelstein and
triggers were classified as risk triggers [2], [3]. The bio- Jeong [7], [8] developed a model that predicts asthma attacks
signals are related to the patient’s health, including allergies, by utilizing multiple bio-signals triggers without considering
symptoms, and medical history. Any external stimulus, such the environmental triggers. Their first study [7] study included
as weather and air pollution, is considered an environmental a comparison of different machine learning classifiers, which
trigger. Air Quality Index (AQI) is generally used to assess are SVM, naive Bayesian (NB), and adaptive Bayesian net-
air pollution’s overall state, which is calculated using the work. The outcomes demonstrated that the ABN classifier
concentrations of existing air contaminants. Asthmatic people had greater specificity and sensitivity (100 %) than other
may experience health concerns if the AQI exceeds 51 [4]. classifiers. Their second study [8] developed a prediction
Early prediction of asthma attacks is essential for enhancing model for asthma exacerbation using the Classification and
the patient’s quality of life. Artificial intelligence (AI) has a Regression Tree (CART) algorithm. The model predicts the
sub-field called machine learning (ML) that uses algorithms asthma for the upcoming day whether it could be normal or
to predict outcomes using a large amount of disease-related abnormal. The final model achieved 80 % accuracy, 64 %
data [5]. Asthma attack prediction models based on machine sensitivity, and 97 % specificity.
learning approaches have been proposed in recent publications. Another study by Lee et al. [15] demonstrated how ap-
Most of these studies [6]–[8] used bio-signals as triggers for plying several predictors from various sources can improve
predicting asthma attacks. Others [9]–[14] used weather and the model’s performance. Thirty-seven attributes were used,
AQI to predict asthma attacks without considering the bio- including multiple bio-signals and environmental triggers. The
signals triggers. Modern models should recognize individuals’ utilized ML algorithms were Pattern-Based Class-Association
bio-signals and environmental triggers, which are required for Rule (PBCAR) and Pattern-Based Decision Tree (PBDT).
According to the model’s performance, asthma attacks may be historical and daily records. The historical data consist of the
predicted with an accuracy of 0.87. With an accuracy of 87 % patient’s’ diagnostic data for the previous four weeks and the
and 86 %, respectively, the experimental results demonstrate daily records include asthma symptoms that were recorded for
that PBDT performs somewhat better than PBCAR. next three months (from 24-march – 30-June 2021). Most of
Kaffash-Charandabi et al. [16] developed another prediction the collected data is categorical and is based on a scale of five:
model for asthma attacks. They used a dataset that includes never, rarely, sometimes, often, and always. The environmental
bio-signals data obtained from three participants in addition to data was collected online [19]. It includes weather variables
the environmental triggers. The SVM algorithm was used, and and AQI. The two datasets were then integrated based on
the model’s accuracy was 93 %. However, it is uncertain how patients’ locations and the date.
well the model would perform if tested on a more significant The integrated dataset includes 655 entries with various
population because it was only trained and tested on three bio-signals and environmental features as shown in Table II.
people. The target is whether the use had an asthma attack (class
Khasha et al. [17] conducted another comparison between 1) or no (class 0). The data was cleaned by replacing the
five different classifiers: RF, DT, SVM, LR, and GBM. The missing values with the means then standard normalization
decision tree classifier outperformed all other classifiers. These was used to generate the final dataset. The integrated dataset
classifiers’ performance was not accurately reflected. Only was imbalanced since the ratio between the class 0 and class
figures, however, were utilized to convey the results. 1 is 7:58. To solve this issue, the Synthetic Minority Over-
Hosseini et al. [18] proposed another model that utilizes sampling Technique using Support Vector Machine (SMOTE-
only two bio-signals triggers with AQI. They used the RF SVM) [20] was applied to reach a balanced dataset with a
classifier and 10-fold cross-validation to categorize the like- ratio of 1:1.
lihood of experiencing an asthma risk into low, medium, or
B. Stage 2: Feature Selection and Classifiers Training
high risk. The suggested model achieved an accuracy of 80%.
According to these studies, there is a lack of utilizing Feature selection reduces the dataset’s dimensionality by
both bio-signal and environmental triggers in building asthma deleting irrelevant and unnecessary features from the original
attack prediction models. Adding extra triggers as features dataset [21]. The remaining features are fed into the prediction
significantly improves model performance, as seen in [15]. model as input. This pre-processing stage can improve the
Also, none of the related works utilized any of the feature classification accuracy of the prediction model.
selection algorithms, which may provide better model perfor- LASSO Regularization (L1) [22] is one of the most common
mance. Table. I provides a summary of the related work with feature selection algorithms. It applies a penalty to the various
their limitations. In this paper, multiple triggers are considered machine learning model parameters to prevent over-fitting.
besides the application of an additional feature selection stage. The penalty is applied over the coefficients that multiply
We compare different machine learning classifiers to build each feature in a linear model regularization. Some of the
an effective asthma attack prediction model since different coefficients from the various forms of regularization can be
classifiers produce different results. shrunk to zero using the Lasso or L1 technique. As a result,
the model can be trained without that feature.
III. R ESEARCH M ETHODOLOGY This work applies the L1 algorithm to the integrated dataset
The research methodology includes three main stages. The to reduce the set of features. The resulting dataset is split into
first stage consists of generating and preparing an integrated training and testing sets with the ratio 80:20. The data was then
dataset that includes both bio-signals and environmental trig- trained with 5 classifiers: SVM, LR, DT, RF, and GBM. Grid
gers. The second stage uses the L1-based feature selection search was used to tune hyper-parameters thus increasing the
algorithm, which helps find the best feature set that contributes classifier performance. We provide in what follows a quick
to the model. Then, the selected feature set is used to train the overview on each classifier and the tuned hyper-parameters
model through the five selected classifiers using grid search with the searched values.
and 5-fold cross-validation to find the average accuracy and • Decision Tree (DT) builds incrementally a classification
recall. The third stage is selecting the best-trained model and model as a tree composed of a root, internal and leaf
testing it to find the final prediction results. Fig. 1 shows nodes. It classifies instances by sorting them from the
the study methodology in the form of a schematic flow tree’s root to a leaf node based on a given criterion [23].
chart to facilitate the reader’s understanding. The following The most important parameters tuned in DTs are the tree
subsections explain each stage in detail. depth and the split criterion. We used different depth
values (2,4,6,8,10,12) and two main criteria in the grid
A. Stage 1: Data generation and preparation search, gini and entropy.
The dataset used in this study integrates both patients bio- • Random Forest (RF) is an ensemble machine learning
signal data and environmental factors. The bio-signal data was approach that fits a number of decision tree classifiers on
collected though Google Forms. It involves 21 volunteers older various dataset sub-samples. It use the average to raise
than 12 years and having different levels of asthma disease prediction accuracy and reduce over-fitting [24]. Usually,
(mild, moderate, and severe). The dataset has two main parts: each tree is built using a subset of the dataset based on
TABLE I
S UMMARY OF RELATED WORK MODELS WITH THEIR LIMITATIONS
Fig. 3. Grid search for RF model with their hyper-parameters using (a) all features (b) L1-based feature set
Fig. 4. Grid search for LR model with their hyper-parameters using (a) all features (b) L1-based feature set
TABLE III
C OMPARISON OF THE ML CLASSIFIERS U SING A LL FEATURE SET
Fig. 6. Grid search for SVM model with their hyper-parameters using (a) all features (b) L1-based feature set
TABLE IV
C OMPARISON OF THE ML CLASSIFIERS U SING L1- BASED SELECTED FEATURE SET