Professional Documents
Culture Documents
Paper 10-Dinacci2022
Paper 10-Dinacci2022
1 Introduction
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 201
T. Chen et al. (eds.), Artificial Intelligence in Healthcare, Brain Informatics and Health,
https://doi.org/10.1007/978-981-19-5272-2_10
202 M. Dinacci et al.
advent of vaccines and the gradual ending of lockdowns, the social, economic and
cultural effects of the pandemic will cast a long shadow into the future.
The development of Artificial Intelligence (AI) techniques have been accelerated
as a result of recent advances in machine learning and data analytics [9], which has
led to numerous successful applications in various domains including the healthcare
[7, 12, 19, 20]. In the context of the COVID-19, AI has been widely used in disease
detection and diagnosis, virology and pathogenesis, drug and vaccine development,
and epidemic and transmission prediction [6].
In particular, the diagnosis of virus infection is a significant part of COVID-
19 research and practice. The current detection methods used for COVID-19 dis-
ease mainly include nucleic acid testing, serological diagnosis, chest X-ray and CT
image inspection [6]. Bearing high sensitivity and specificity, the real-time Reverse
Transcriptase Polymerase Chain Reaction (RT-PCR) is the current standard detection
technology in diagnosing the COVID-19 virus. Isothermal nucleic acid amplification
and blood testing methods are also commonly used for rapid screening of SARS-
CoV-2. Medical imaging inspection is another widely used clinical approach for
COVID-19 detection and diagnosis, which generally includes chest X-ray and lung
CT imaging.
The existing testing and detection methods in medical practice also underpin
recent research of utilising AI and machine learning techniques to develop more
robust and accurate computer-assisted techniques as a complementary solution to
medical analysis [18, 21]. While results such as blood test, CT and X-ray scans,
respiratory sound and RT-PCR have been extensively applied, researchers have also
experimented diagnosing with only a questionnaire survey and without any physio-
logical analysis [25]. Despite numerous efforts in the wide AI scientific community,
a recent study [18] suggested overly optimistic performance based on observations
of methodological pitfalls and biases out of the analysis of a large number of papers.
In working towards demonstrating a use case of machine learning techniques for
the diagnosis of COVID-19, this paper aims to establish an effective model for its
automatic diagnosis. Utilising the XGBoost, a powerful machine learning technique,
this paper examines a number of model hyperparameters and data preprocessing
techniques, followed by the use of Shapley value to identify predictors that are most
informative of the diagnosis. With application to a collection of anonymised patients
data of the SARS-CoV-2 RT-PCR and additional laboratory test, the best model
obtained demonstrates high diagnostic performance and point out factors that might
worth further clinical attention.
The remainder of this chapter is structured as follows. Section II reviews the related
work of machine learning for COVID-19 diagnosis in recent literature. Section III
presents the experimental settings, results and discussions. Section IV concludes the
chapter and points out potential future works.
A Case Study of Using Machine Learning Techniques … 203
2 Literature Review
Depending on the use of source materials used for diagnosis, this section therefore
reviews popular machine learning methods applied to COVID-19 predictions, which
can generally include the use of medical imaging, blood tests, and respiratory sound.
For research based on medical imaging, two most common imaging techniques
are chest X-Rays and chest Computed Tomography (CT). In [16] a deep learning
network based on the popular ResNet50 architecture was used to predict COVID-19.
The model produces features from a series of CT slices and combines them into a
max-pooling operation, which is then fed to a fully connected layer and a softmax
activation function to obtain a probability score for each diagnostic category, i.e.,
COVID-19, Community Acquired Pneumonia (CAP), non-pneumonia. Evaluated on
an independent testing set made of 10% of the original image files, the model is able
to achieve the area under the curve of the receiver operating characteristics (AUROC)
of 0.96 from a dataset consisting of 4356 chest CT examinations of 3322 patients.
It is however worth noting that limitations of research include that patients affected
by COVID-19 might show similar imaging characteristics as pneumonia caused by
different viruses, where CAP was the only type of pneumonia used as comparison.
The second limitation is the difficulty in interpreting the results produced by the
neural network, which is a common issue to most deep learning methods, though
it may be significant in this area where predictions may have an direct impact on
human life.
On the other hand, due to being cheap and widespread, there is a lot of interest in
the clinical community in using Chest X-rays (CXR) to discriminate COVID-19. In
[22], an empirical evaluation is conducted for the evaluation of pre-training and trans-
fer learning of standard CNN models including ResNET, COVID-Net, DenseNet)
through six datasets among which COVID Radiographic imaged Data-set for AI
(CORDA), created out of 386 patients that were screened for COVID-19. Whist
promising, the study concluded that the CXR data needed to determine whether
CNNs can be effectively used as an aid in the fight against COVID-19 pandemic,
need to be scaled up by a factor of two, or more. This is also consistent with findings
from a recent survey that concludes being short of large-scale data sets is the main
challenge that hinders the implementation of AI-based imaging inspection [6].
Apart from medical imaging, the diagnosis of COVID-19 may be significantly
facilitated with routine blood tests, which are able to provide numerous impor-
tant indicators that may correlate with patients of COVID-19 [1]. For instance,
[2] has opted for an interpretable model based on decision trees in order to obtain
more insights and concluded that parameters such as white blood cells (WBC),
C-reactive protein (CRP), neutrophils (NEU), lymphocytes (LYM), monocytes
(MONO), eosinophils (EOS), basophils (BAY), aspartate and alanine aminotrans-
ferase (AST and ALT, respectively), lactate dehydrogenase (LDH) and others have
shown high correlations in patients diagnosed with COVID-19. A similar study by [4]
identified prognostic serum biomarkers in patients at greatest risk of mortality from
COVID-19, where a model was developed to predict whether a patient would expire
204 M. Dinacci et al.
The dataset is made of 111 features and 5644 entries. The vast majority of features
are derived from standard blood tests, such as number of red blood cells, platelets,
leukocytes, lymphocites, hematocrytes, but also the presence of other viruses such
as Influenza A and B, Rhinovirus, including coronaviruses such as Coronavirus229E
and CoronavirusOC43. About 15% of the features, such as number of urobilinogen,
ketone bodies, esterase, and others, are obtained from urine samples.
In terms of missing values, approximately 25% of the attributes have less than 1%
of values. Some attributes have a large percentage of invalid values (up to 100%),
so we removed any column where most values were encoded as “Not a Number”
(NaN).
206 M. Dinacci et al.
Some columns that weren’t relevant to the task were therefore removed. These are
’Patient admitted to regular ward’, ’Patient admitted to
semi-intensive unit’ and ’Patient admitted to intensive
care unit’.
Before using the dataset for training, all the values in Portuguese were converted
English, such as “Ausentes” which was translated to “absent”. The translation was
done using Google Translate. Some of the Boolean features were represented with
a mix of strings such as “true” or “false” and some with 0s and 1s. We converted
these features to use a native Boolean representation. String and object based features
have been encoded as integers, which helps normalize labels so that they contain only
values between 0 and n_classes-1.
As shown in Fig. 2, the dataset is highly imbalanced since most of the patients resulted
negative to COVID-19 after the tests. It contains 5086 negative use cases and 558
positive ones. To achieve reliable results, the dataset was re-balance through sam-
pling, including the random oversampling from the minority class and random under-
sampling from the majority one. We then simply removing the rows from the nega-
tive use cases which contained multiple null values. In over-sampling, we randomly
duplicated examples from the minority class and added them to the training dataset.
With under-sampling we did the opposite by randomly removing samples from the
majority class.
After balancing the dataset it was split into train (70%), test (15%) and validation
(15%) datasets. The validation set was used to evaluate the model hyperparameters,
the test set was used to evaluate the predictive power of the model against data it
hadn’t seen before (Table 1).
The prediction model was developed using XGBoost that belongs to the family of
gradient boosting algorithms [8], for its being a very efficient and flexible distributed
method that has found numerous successful application [23]. The XGBoost can be
used for both regression and classification and produce a model composed of an
ensemble of decision trees.
XGBoost is particularly effective at dealing with imbalanced datasets since it does
not make any assumptions on the data distribution nor about the relationships among
features, and can be configured using the scale_pos_weight hyperparameter to
scale the gradient’s weights for the positive (minority) class during training. Changing
the scale of the weights between positive (minority) and negative (majority) classes
has the effect to over-correct the errors made by the model on the positive class,
ultimately resulting in a better model.
The prediction scores of each individual tree are summed up to get the final score,
in the form:
ΣK
ŷi = f k (xi ), f k ∈ F (1)
k=1
where the first operand is the training loss function and the second the regularization
one which helps the model to avoid overfitting.
Tree boosting is fundamentally similar to Random Forests as both techniques
use tree ensembles as their models. A Random Forests classifier could have also
been a possible choice, but according to Chen [5], there is a high probability that a
bootstrap sample (the data points from the training data from which a decision tree
is fitted) contains few or even none of the minority class, resulting in a tree with
208 M. Dinacci et al.
poor performance for predicting the minority class. This isn’t a good choice since
the minority class is represented by the positive COVID-19 cases which is the main
class to predict.
3.5 Evaluation
To evaluate the model we considered both precision and recall, but considering a
false negative mistake that incorrectly miss diagnose a positive case, the metric of
recall is potentially more significant than precision, as it can be more affordable to
take more tests to find out whether one is actually positive than missing any positive
case which could potentially put more people under risk.
On the other hand, patients who were incorrectly classified as having COVID-19
might have a different illness (in [16] the authors have highlighted the ambiguity
between predictions of COVID-19 and various types of different pneumonias), so
we can’t simply discard precision.
A good trade-off between recall and precision is the F1 score, which is the har-
monic mean of recall and precision, i.e.
Precision × Recall
F1 =
Precision + Recall
In order to understand how well the model can separate the two classes, we also
computed the Area Under the Receiver Operating Characteristic Curve (ROC AUC)
from the prediction scores in order to determine whether the model can rank a random
positive case of COVID-19 higher than a random negative case.
We experimented with different thresholds and removing columns which con-
tained at least 23 null values produced the model with the best ROCAUC and F1
score, as reported in Table 1.
The best results are presented in Table 3, which is able to high F1 score, with
close capacity in both precision and recall. To better visualize how the model can
separate the two classes, the ROC probabilistic curve is depicted in Fig. 3, which far
outperform the random guess of the diagonal line.
In general, the results are very encouraging as we have obtained an F1 score of 0.96
score and 95.33% on AUC measured on a held out test set composed of 15% of the
original data. The imbalance in the dataset was addressed by removing the rows from
the negative use cases which contained multiple null values. This simple approach
was more effective than oversampling from the minority class and undersampling
from the majority class.
To visually assess the quality of the classifier we plotted a confusion matrix and
inspected the results. As we can see in Fig. 4 the model performed well, but incorrectly
predicted a negative outcome instead of the positive outcome 22 times, and predicted
53 times a positive outcome instead of a negative one.
Furthermore, in order to identify the subset of predictors that are more informa-
tive and contributes more towards the decision, the SHapley Additive exPlanations
(SHAP) value [17] is utilised which is a concept used in game theory to represent the
average of all the marginal contributions to all possible coalitions (features in this
case). It enables to explain the prediction of a classifier by computing the Shapley
value of each feature in order to determine how much a feature contribute to the
classifier prediction.
The top 20 most important features are plotted in Fig. 5. The most important
features identified are the patient age quantile and a high count of white cells (leuko-
cytes, eosinophils, monocytes, lymphocytes), which is clearly a sign that a patient’s
body is reacting to a pathogen. This is in line with some of the results we discovered
in the literature such as [2] where positive COVID-19 cases were strongly correlated
with an increased white blood cells count.
A Case Study of Using Machine Learning Techniques … 211
4 Conclusion
References
1. Alves MA, Castro GZ, Oliveira BAS, Ferreira LA, Ramírez JA, Silva R, Guimarães FG (2021)
Explaining machine learning based diagnosis of covid- 19 from routine blood tests with decision
trees and criteria graphs. Comput Biol Med 132:104335
2. Alves MA, Castro GZ, Oliveira BAS, Ferreira LA, Ramírez JA, Silva R, Guimarães
FG (2021) Explaining machine learning based diagnosis of covid-19 from routine blood
tests with decision trees and criteria graphs. Comput Biol Med 132:104335. Accessed
from https://www.sciencedirect.com/science/article/pii/S0010482521001293. https://doi.org/
10.1016/j.compbiomed.2021.104335
212 M. Dinacci et al.
3. Barbier EB, Burgess JC (2020) Sustainability and development after covid-19. World Develop
135:105082
4. Booth AL, Abels E, McCaffrey P (2021). Development of a prognostic model for mortality
in covid-19 infection using machine learning. Modern Pathol 34(3):522–531. Accessed from
https://doi.org/10.1038/s41379-020-00700-x
5. Chen C (2004) Using random forest to learn imbalanced data
6. Chen J, Li K, Zhang Z, Li K, Yu PS (2021) A survey on applications of artificial intelligence
in fighting against covid-19. ACM Comput Surv (CSUR) 54(8):1–32
7. Chen T, Antoniou G, Adamou M, Tachmazidis I, Su P (2021) Automatic diagnosis of attention
deficit hyperactivity disorder using machine learning. Appl Artif Intell 1–13
8. Chen T, Guestrin C (2016) Xgboost: a scalable tree boosting system. In: Proceedings of the
22nd ACM SIGKDD international conference on knowledge discovery and data mining
9. Chen T, Keravnou-Papailiou E, Antoniou G (2021) Medical analytics for healthcare intelligence
- recent advances and future directions. Artif Intell Med 112:102009
10. Chen T, Lucock M (2022) The mental health of university students during the covid-19 pan-
demic: an online survey in the UK. Plos One 17(1):e0262562
11. Chen T, Shang C, Yang J, Li F, Shen Q (2020) A new approach for transformation-based fuzzy
rule interpolation. IEEE Trans Fuzzy Syst. Accessed from https://doi.org/10.1109/TFUZZ.
2019.2949767
12. Chen T, Su P, Shen Y, Chen L, Mahmud M, Zhao Y, Antoniou G (2022) A dominant set-
informed interpretable fuzzy system for automated diagnosis of dementia. Front Neurosci
13. Coppock H, Gaskell A, Tzirakis P, Baird A, Jones L, Schuller B (2021) End-to-end convolu-
tional neural network enables covid-19 detection from breath and cough audio: a pilot study.
BMJ Innov 7(2):356–362. Accessed from https://innovations.bmj.com/content/7/2/356. http://
orcid.org/10.1136/bmjinnov-2021-000668
14. Friedman J (2002) Stochastic gradient boosting. Comput Stat Data Anal 38:367–378
15. Kaiser MS, Mahmud M, Noor MBT, Zenia NZ, Al Mamun S, Mahmud KA et al (2021)
iworksafe: towards healthy workplaces during covid-19 with an intelligent phealth app for
industrial settings. IEEE Access 9:13814–13828
16. Li L, Qin L, Xu Z, Yin Y, Wang X, Kong B, Xia J (2020) Using artificial intelligence to
detect covid-19 and community-acquired pneumonia based on pulmonary ct: evaluation of
the diagnostic accuracy. Radiology 296(2):E65–E71. Accessed from https://europepmc.org/
articles/PMC7233473. https://doi.org/10.1148/radiol.2020200905
17. Lundberg SM, Lee S-I (2017) A unified approach to interpreting model predictions. In: Guyon
I et al (eds) Advances in neural information processing systems, vol 30, pp 4765–4774. Cur-
ran Associates, Inc. Accessed from http://papers.nips.cc/paper/7062-a-unified-approach-to-
interpreting-model-predictions.pdf
18. Roberts M, Driggs D, Thorpe M, Gilbey J, Yeung M, Ursprung S, AIX-COVNET (2021)
Common pitfalls and recommendations for using machine learning to detect and prognosticate
for covid-19 using chest radiographs and ct scans. Nat Mach Intell 3(3):199–217. Accessed
from https://doi.org/10.1038/s42256-021-00307-0
19. Stirling J, Chen T, Bucholc M (2020) Diagnosing alzheimer’s disease using a self-organising
fuzzy classifier. In: Fuzzy logic recent applications and developments. Springer
20. Su P, Chen T, Xie J, Zheng Y, Qi H, Borroni D, Liu J (2020). Corneal nerve tortuosity grading
via ordered weighted averaging-based feature extraction. Med Phys
21. Syeda HB, Syed M, Sexton KW, Syed S, Begum S, Syed F, Yu Jr F (2021) Role of machine learn-
ing techniques to tackle the covid19 crisis: systematic review. JMIR Med Inform 9(1):e23811.
Accessed from http://medinform.jmir.org/2021/1/e23811/
22. Tartaglione E, Barbano C. A, Berzovini C, Calandri M, Grangetto M (2020) Unveiling covid-19
from chest x-ray with deep learning: a hurdles race with small data. Int J Environ Res Public
Health 17(18). Accessed from https://www.mdpi.com/1660-4601/17/18/6933
23. Wang J, Yue-Xin L, Chun-Ying W (2019) Survey of recommendation based on collaborative
filtering. J Phys: Conf Ser 1314
A Case Study of Using Machine Learning Techniques … 213
24. World Health Organisation (2022) Coronavirus disease (COVID-19) pandemic. https://www.
who.int/emergencies/diseases/novel-coronavirus-2019
25. Zoabi Y, Deri-Rozov S, Shomron N (2021) Machine learning-based prediction of covid-19
diagnosis based on symptoms. npj Digit Med 4(1):3. Accessed from https://doi.org/10.1038/
s41746-020-00372-6