Download as pdf or txt
Download as pdf or txt
You are on page 1of 74

F IRAT U NIVERSITY

GRADUATE SCHOOL OF NATURAL AND APPLIED SCIENCES


TÜRKİYE

COMPARISON OF DEEP LEARNING METHODS IN MEDICAL


IMAGE CLASSIFICATION

Muhammad Sani DANLADI

Master's Thesis
DEPARTMENT OF SOFTWARE ENGINEERING
Program of

J ANUARY 2023
F IRAT U NIVERSITY
GRADUATE SCHOOL OF NATURAL AND APPLIED SCIENCES
TÜRKİYE

Department of Software Engineering

Master's Thesis

COMPARISON OF DEEP LEARNING METHODS IN MEDICAL


IMAGE CLASSIFICATION

Author
Muhammad Sani DANLADI

Supervisor
Asst. Prof. Muhammet BAYKARA

J ANUARY 2023
ELAZIG
F IRAT U NIVERSITY
GRADUATE SCHOOL OF NATURAL AND APPLIED SCIENCES
TÜRKİYE

Department of Software Engineering

Master's Thesis

Title:
Comparison of Deep Learning Methods in Medical Image Classification

Author: Muhammad Sani DANLADI


Submission Date: 25 December 2022
Defense Date: 20 January 2023

THESIS APPROVAL
This thesis, which was prepared according to the thesis writing rules of the Graduate
School of Natural and Applied Sciences, Fırat University, was evaluated by the
committee members who have signed the following signatures and was unanimously
approved after the defense exam made open to the academic audience.

Signature

Supervisor: Asst. Prof. Muhammet BAYKARA Approved


Fırat University, Faculty of Technology.

Chair: Asst. Prof. Ahmet Arif AYDIN Approved


Inonu University, Faculty of Engineering

Member: Asst. Prof. Bihter DAŞ Approved


Fırat University, Faculty of Technology

This thesis was approved by the Administrative Board of the Graduate School on

....... / ........ / 20 ..........


Signature

Prof. Dr. Burhan ERGEN


Director of the Graduate School
DECLARATION

I hereby declare that I wrote this Master's Thesis titled “Comparison of Deep Learning Methods in
Medical Image Classification” in consistent with the thesis writing guide of the Graduate School of Natural
and Applied Sciences, Firat University. I also declare that all information in it is correct, that I acted
according to scientific ethics in producing and presenting the findings, cited all the references I used,
express all institutions or organizations or persons who supported the thesis financially. I have never used
the data and information I provide here in order to get a degree in any way.
20 January 2023

Muhammad Sani DANLADI


PREFACE

Deep learning methods continue to demonstrate their might in almost every aspect of human life
owing to their robustness and increased accuracy in performance. Although training deep learning models
requires computers with high-performance GPUs consuming a substantial amount of time, this thesis study
strives to leverage their power on medical images, particularly chest X-rays, to detect life-threatening
diseases like Covid-19 and Pneumonia.

Three categories of deep learning approach employing conventional CNN, hybrid, and pre-trained
models, were trained on two generated datasets, namely, Dataset-A and Dataset-B, for three-class and
binary classification after resampling and augmenting them along with necessary data pre-processing. The
outcomes of the experimental analyses indicate that these models can aid medical experts in early disease
diagnoses and treatments to curtail complications and sudden deaths, especially amidst the pandemic.

I would like to convey my heartfelt appreciation to my supervisor, Assistant Professor Muhammet


BAYKARA, for his unwavering support and mentorship throughout this study. My profound gratitude goes
to my mother, Aisha Aliyu Danladi, for her love, encouragement, and prayers. I love you from the bottom
of my heart. Finally, I would like to express my sincere gratitude to the Nigerian Information Technology
Development Agency (NITDA) for their full financial backing, without which this program cannot be
accomplished.

This research work was supported by grants given to the MSc. dissertation project by the Scientific
Research Projects Administration Unit of Fırat University, Elazığ, Turkey [grant number: TEKF.22.32].
The authors would like to express immense gratitude for this unit's financial support throughout the
research study.

Muhammad Sani DANLADI


ELAZIG, 2023

iv
TABLE OF CONTENTS

Preface ..................................................................................................................................................... iv
Abstract ................................................................................................................................................... vii
Özet…. ................................................................................................................................................... viii
List of Figures .......................................................................................................................................... ix
Lıst of Tables ............................................................................................................................................ x
Symbols and Abbreviations ..................................................................................................................... xi
1. INTRODUCTION .............................................................................................................. 1
1.1. Problem Statement ............................................................................................................................ 1
1.2. Purpose of Study............................................................................................................................... 3
1.3. Hypothesis ........................................................................................................................................ 4
1.4. Thesis Structure ................................................................................................................................ 4
2. LITERATURE REVIEW .................................................................................................... 6
2.1. Medical Image Classification Using Regular CNN .......................................................................... 6
2.2. Medical Image Classification Using Transfer Learning ................................................................... 9
2.3. Medical Image Classification Using Hybrid Models ...................................................................... 12
3. BACKGROUND OF STUDY ............................................................................................. 15
3.1. Artificial Intelligence ...................................................................................................................... 15
3.2. Applications of AI .......................................................................................................................... 16
3.3. Machine Learning (ML) and Deep Learning (DL) ......................................................................... 16
3.4. Artificial Neural Network (ANN) .................................................................................................. 18
3.5. Types of Artificial Neural Networks .............................................................................................. 19
3.5.1. The Perceptron Model ......................................................................................................... 19
3.5.2. Multi-Layer Perceptron (MLP) or Feed Forward Neural Network ..................................... 20
3.5.3. Radial Basis Function (RBF) Network................................................................................ 21
3.5.4. Convolutional Neural Network (CNN) ............................................................................... 22
3.5.5. Recurrent Neural Network (RNN) ...................................................................................... 30
3.6. Medical Imaging and Computer-Aided Diagnoses......................................................................... 31
4. MATERIAL AND METHOD ............................................................................................ 33
4.1. Datasets Description ....................................................................................................................... 33
4.2. Random Oversampling, Data Augmentation, and Other Data Pre-processing ............................... 36
4.3. Experimental Environment ............................................................................................................. 38
4.4. Proposed CNN Model .................................................................................................................... 38
4.5. The Pre-trained Models .................................................................................................................. 43
5. RESULTS AND DISCUSSION ........................................................................................... 47
5.1. Experimental Results ...................................................................................................................... 47
5.2. Evaluation Metrics .......................................................................................................................... 49
5.2.1. Accuracy ............................................................................................................................. 49
5.2.2. Precision .............................................................................................................................. 50
5.2.3. Recall .................................................................................................................................. 50
5.2.4. F1-Score .............................................................................................................................. 50
5.3. Confusion Matrix............................................................................................................................ 51
5.4. Discussion of Further Outcomes .................................................................................................... 53

v
6. CONCLUSIONS .............................................................................................................. 54
Recommendations ................................................................................................................................... 55
References ............................................................................................................................................... 56
CURRICULUM VITAE

vi
ABSTRACT
Dummy

Comparison of Deep Learning Methods in Medical Image Classification


Muhammad Sani DANLADI

Master's Thesis

FIRAT UNIVERSITY
Graduate School of Natural and Applied Sciences
Department of Software Engineering

January 2023, Page: xi + 61

Nowadays, current medical imaging techniques provide means of diagnosing disorders like the
recent Covid-19 and Pneumonia due to technological advancements in medicine. However, the lack of
sufficient medical experts, particularly amidst the breakout of the epidemic, poses severe challenges in
early diagnoses and treatments, resulting in complications and unexpected fatalities. In this thesis study, a
CNN model, VGG16 + XGBoost, and VGG16 + SVM, were used for three-class image classification on a
generated dataset named Dataset-A with 6,432 chest X-Ray (CXR) images (containing Normal, Covid-19,
and Pneumonia classes). Then, pre-trained ResNet50, Xception, and DenseNet201 models were employed
for binary classification on Dataset-B with 7,000 images (consisting of Normal and Covid-19). The
suggested CNN model achieved a test accuracy of 98.91 %. Then the hybrid models (VGG16 + XGBoost
and VGG16 + SVM) gained accuracies of 98.44 % and 95.60 %, respectively. The fine-tuned ResNet50,
Xception, and DenseNet201 models achieved accuracies of 98.90 %, 99.14 %, and 99.00 %, respectively.
Finally, the models were further evaluated and tested, yielding impressive results. These outcomes
demonstrate that the models can aid radiologists with robust tools for early disease diagnoses and treatment.

Keywords: Artificial Intelligence, Deep Learning, Machine Learning, Medical Imaging, Transfer Learning

vii
ÖZET
dummy

Tıbbi Görüntü Sınıflandırmasında Derin Öğrenme Yöntemlerinin


Karşılaştırılması
Muhammad Sani DANLADI

Yüksek Lisans Tezi

FIRAT ÜNİVERSİTESİ
Fen Bilimleri Enstitüsü
Yazılım Mühendisliği Anabilim Dalı

Ocak 2023, Sayfa: xi + 61

Günümüzde, mevcut görüntüleme teknikleri, tıptaki teknolojik gelişmeler nedeniyle Covid-19 ve Pnömoni
gibi rahatsızlıkların teşhis edilmesi için araçlar sunmaktadır. Bununla birlikte, özellikle salgının patlak
verdiği bir dönemde yeterli tıbbi uzmanın bulunmaması, erken teşhis ve tedavilerde ciddi zorluklara yol
açarak komplikasyonlara ve beklenmedik ölümlere yol açmaktadır. Bu tez çalışmasında, ikili ve üçlü
sınıflandırma için CNN modeli, VGG16 + XGBoost, ve VGG + SVM modeli kullanılmıştır. Dataset-A
6.432 farklı göğüs röntgeni (CXR) görüntüsü (Normal, Covid- 19 ve Pnömoni sınıfları) içermektedir.
Ardından, 7.000 görüntülü (Normal ve Covid-19'dan oluşan) Dataset-B üzerinde ikili sınıflandırma için
önceden eğitilmiş ResNet50, Xception ve DenseNet201 modelleri kullanılmıştır. Önerilen CNN modeli,
%98.91'lik bir test doğruluğu elde etmiştir. Daha sonra hibrit modeller (VGG16 + XGBoost ve VGG16 +
SVM) sırasıyla %98,44 ve %95,60 doğruluk oranlarına ulaşmıştır. Parametre ayarlamaları yapılmış
ResNet50, Xception ve DenseNet201 modelleri sırasıyla %98,90, %99,14 ve %99,00 doğruluk elde
etmiştir. Son olarak, modeller çeşitli çalışmalarla yeniden değerlendirilerek test edilmiş ve etkileyici
sonuçlar elde edilmiştir. Elde edilen sonuçlar, modellerin erken hastalık teşhisi ve tedavisi için sağlam
araçlarla radyologlara yardımcı olabileceğini göstermektedir.

Anahtar Kelimeler: Yapay Zeka, Derin Öğrenme, Makine Öğrenimi, Tıbbi Görüntüleme, Transfer
Öğrenme

viii
LIST OF FIGURES
Page
Figure 1.1. Number of COVID-19 deaths worldwide (Top 10 most affected countries) ............................. 2

Figure 3.1. The depiction of AI definitions in four different forms [65] .................................................... 15

Figure 3.2. Representation of AI, ML, and DL .......................................................................................... 17

Figure 3.3. Architecture of ANN ............................................................................................................... 18

Figure 3.4. A simple perceptron model [73] .............................................................................................. 20

Figure 3.5. A block diagram of MLP [74] ................................................................................................. 20

Figure 3.6. The architecture of an RBF network [76] ................................................................................ 21

Figure 3.7. The Building Blocks of CNN Architecture [78] ...................................................................... 23

Figure 3.8. A process of Convolution Operation [80] ................................................................................ 25

Figure 3.9. The ReLU Activation Function and its Derivative [81] ........................................................... 25

Figure 3.10. Depiction of Max and Average Pooling [83] .......................................................................... 27

Figure 3.11. A Depiction of Batch Normalization [84] ............................................................................... 28

Figure 3.12. An Illustration of Dropout in NN [85]..................................................................................... 29

Figure 3.13. A structure of a Simple RNN [87] ........................................................................................... 30

Figure 4.1. Samples of CXR Images for the Three-Class Distribution from Dataset-A .............................. 33

Figure 4.2. Distribution of Imbalance Data in the Training Set of Dataset-A ............................................. 34

Figure 4.3. Distribution of Imbalance Data in the Test/Validation Set of Dataset-A .................................. 35

Figure 4.4. Decision functions of Under sampled and Resampled Class Distribution [98] ......................... 37

Figure 4.5. A Random Samples of the Augmented Images ........................................................................ 38

Figure 4.6. The Proposed ConvNet Architecture ......................................................................................... 40

Figure 4.7. The Pseudocode for the Proposed CNN model. ....................................................................... 40

Figure 4.8. The The Architecture of the Hybrid Model ............................................................................... 42

Figure 4.9. Process Flow Chat of the Pre-trained Models .......................................................................... 45

Figure 5.1. Learning Curves of Losses and Accuracies of (a) CNN Model (b) ResNet50 Model ............. 47

Figure 5.2. Learning Curves of Losses and Accuracies of (c) Xception Model (d) DenseNet201 Model . 48

Figure 5.3. Confusion Matrices of (a) CNN Model, (b) VGG16+XGBoost, and (c) VGG16+SVM Hybrid
Models for Three-class Classification Using Dataset-A .......................................................... 51

Figure 5.4. Confusion Matrices of (d) ResNet50, (e) Xception, and (f) DenseNet201 Models for Binary
Classification Using Dataset-B ................................................................................................ 51

Figure 5.5. Supplementary Testing of the Proposed Models on Random Samples from (a) Datasets-A and
(b) Dataset-B ............................................................................................................................ 53

ix
LIST OF TABLES
Page
Table 4.1. Distribution of the Unprocessed and Unbalanced CXR images in Dataset-A ......................... 34

Table 4.2. Distribution of the Pre-processed and Balanced CXR images in Dataset-A............................ 35

Table 4.3. Distribution of the Pre-processed and Balanced CXR images in Dataset-B ............................ 36

Table 4.4. The Structure of the Proposed ConvNet model ....................................................................... 39

Table 4.5. The Hyper-Parameters of XGBoost and SVM Classifiers ....................................................... 42

Table 4.6. The Structure of the Top Layers of The Pre-trained Models ................................................... 44

Table 5.1. Results of Experiment Conducted on Dataset-A ..................................................................... 49

Table 5.2. Results of Experiment Conducted on Dataset-B ...................................................................... 49

Table 5.3. Performance Comparisons with Some State-of-the-art Studies from the Literature ................ 52

x
SYMBOLS AND ABBREVIATIONS
Symbols

μ : Mean
σ : Standard Deviation (Variance)
∞ : Infinity

Abbreviations
AI : Artificial Intelligence
ANN : Artificial Neural Network
CNN : Convolutional Neural Networks
RNN : Recurrent Neural Network
FFN : Feed Forward Network
CXR : Chest X-Ray
NLP : Natural Language Processing
ReLU : Rectified Linear Unit
RT-PCR : Reverse Transcriptase Polymerase Chain Reaction
MLP : Multi-Layer Perceptron
RBF : Radial Basis Function
ML : Machine Learning
DL : Deep Learning
DHL : Deep Hybrid Learning
LSTM : Long Short-Term Memory
SGD : Stochastic Gradient Descent
SVM : Support Vector Machine
VGG : Visual Graphics Group
ResNet : Residual Network
XGBoost : Extreme Gradient Boosting
MRI : Magnetic Resonance Imaging
SARS-COV-2 : Severe Acute Respiratory Syndrome Coronavirus 2

xi
1. INTRODUCTION

Medical image classification and analysis involve using several techniques to scrutinize
and categorize medical images, resulting in the identification of diseases by human medical
experts or intelligent algorithms. Intelligent algorithms, like the prominent Artificial Intelligence
(AI) based architectures encompassing machine learning and deep learning, are currently widely
employed in medicine, particularly in imaging analysis and clinical decision support, to assist in
searching and uncovering insights from medical data [1]. Medical experts like radiologists,
oncologists, and so on have been employing manual processes to diagnose various ailments for
decades. Nonetheless, a significant concept of using machine algorithms to classify medical
images provides a tons of benefits like error and cost reduction, fast and efficient diagnoses,
provision of contextual relevance, and so on. For example, the outbreak of the recent pandemic
has forced researchers around the globe to embrace the use of deep learning techniques as an
alternative method of early detection of the deadly virus due to the challenges posed by the
widely utilized and validated transcriptase polymerase chain reaction, short for RT-PCR [2–4].
As a result, our thesis study has employed several deep learning techniques ranging from regular
convolutional neural networks (CNN) to large pre-trained models with millions of parameters on
real-world datasets present at the Kaggle open-source repository, majorly of chest X-Ray (CXR)
images to conduct extensive research on their use in medical image classification. This aimed at
developing a robust tool that can be employed in early diagnoses of lung diseases, such as the
novel COVID-19 infection and Pneumonia, even amid an outbreak of epidemics with high
accuracy.

1.1. Problem Statement

Just recently, the entire world was in a state of extreme fear and trepidation due to the
outburst of the novel Coronavirus (COVID-19) epidemic that started in Wuhan, Hubei province
of China, in December 2019 [5,6]. This virus, also known as SARS-CoV-2 or the severe acute
respiratory syndrome coronavirus 2, holds an unknown etiology and is zoonotic, meaning it can
propagate from animals to human beings [7]. Because of the novel virus's airborne nature, any
infected person can spread it to people around them by merely breathing, speaking, sneezing, or
coughing [8,9]. The above-mentioned nature of the virus made it easy to circulate rapidly. Seeing
the rate at which the virus spread, the World Health Organization (WHO) assessed and
characterized it all together as a global epidemic on March 11, 2020 [10]. The virus poses serious
illness, particularly to the elderly, children, and persons with underlying medical conditions [11–
14]. According to WHO, Worldometer, and Statista websites, there have been over 6.5 million
reported deaths and over 635 million documented cases of SARS-CoV-2 infections worldwide as
of this writing [15–17]. Figure 1.1 shows the globally reported death cases and the top ten
countries hit mainly by the virus, with the US leading the list with over a million deaths.
Comparably, the health sector is one of many that has been negatively affected by this deadly
virus. As a direct result of the lockdown of all activities, which precipitated a global economic
collapse, people's ability to make a living has been severely compromised. These adverse effects
can be observed in the education sector [18], global food insecurity [19], declining stock markets
[20], severe environmental pollution caused by the high number of chemicals related to face
masks usage [21], and falls in the tourism industry [22], among others.

Figure 1.1. Number of COVID-19 deaths worldwide (Top 10 most affected countries)

To combat the effects of the virus, which has ravaged numerous aspects of human life,
scientists have been studying it nonstop. The scientific community has provided several
preventive measures, including getting vaccinated, early testing, wearing a mask, washing and
cleaning hands, and keeping a safe distance [23,24]. In addition to self-prevention, vaccination
and early testing are the most effective measures to end the pandemic by cutting the transmission
chain if utilized and followed appropriately [25]. However, early testing and diagnosis are
challenging, especially in the event of a worldwide epidemic, when the number of infected people
exceeds the capacity of hospitals and healthcare professionals. SARS-CoV-2 coronavirus is
diagnosed using techniques ranging from reverse RT-PCR tests, lateral flow tests,

2
immunoenzyme serological tests, rapid antigen and antibody tests, and other alternative methods
[26]. Most of these methods are not without caveats, making them unsuited for application in
early testing and diagnosis. Even the widely used and validated method, RT-PCR, is not entirely
effective, especially in patients with low viral loads. Moreover, the cost of running laboratories
that require expensive instruments, affirmative examinations, and trained personnel remains a
significant drawback [27].

1.2. Purpose of Study

Due to the advancement and diversity of various medical imaging and computer-aided
diagnoses (CAD) techniques, mainly in the field of radiology and oncology, AI-based models
(machine learning and deep learning) provide an instantaneous, precise, as well as an inexpensive
means of diagnosing COVID-19 and other related diseases [28]. The primary cause of this
significant milestone is the remarkable breakthrough these models have accomplished,
particularly in image classification and segmentation tasks for both binary and multi-class with
high accuracy [29]. It is now simpler to employ the power of these models to support radiologists
with early diagnostic techniques even amid a global pandemic, thanks to the abundance of
medical images from technologies like magnetic resonance imaging (MRI), computed
tomography (CT) scans, and X-ray [30]. Because of their improved classification performance,
robustness, and enormous data processing power, these machine-learning algorithms significantly
reduce the limitations of human medical experts, thereby mitigating the occurrence of severe
illnesses and deaths [31].
This thesis study aims to investigate and demonstrate the effectiveness of AI-based
models on CXR images for binary and three-class image classification tasks utilizing two openly
accessible datasets from Kaggle. The first dataset, named Dataset-A in this study, is obtained
from the repository with the name Chest X-ray (COVID-19 & Pneumonia) [32], which comprises
three classes, namely COVID-19, Normal, and Pneumonia. This dataset is unbalanced and
insufficient for deep learning models to make accurate predictions, posing a risk of overfitting. To
overcome these challenges, we employed the random oversampling technique to adjust the class
distribution of such images to be balanced and increased the number to approximately 15,384.
After pre-processing the images, three models were used for the classification task: a CNN and
two hybrid models (using VGG16 to extract deep-level features followed by XGBoost and SVM
classifiers to classify the obtained features). The second dataset, named Dataset-B, with the name
COVID-19 Radiography Database from the same repository [33], contains 3,500 random images
from COVID-19 and Normal classes. After pre-processing and data augmentation, three pre-
trained models, namely, Xception, ResNet50, and DenseNet201, were used to train and classify
the images using transfer learning (TL) and fine-tuning techniques. Finally, the study's outcome

3
was analyzed and compared with recent works on this domain. To fully demonstrate the relevance
of this thesis study, we precisely summarized its contributions as follows:
1) Employing the random oversampling technique to handle the issue of class imbalance
and limited input data significantly improves the performance of the models. High test
accuracies on unseen images, indicating less overfitting and more generalizable models,
are evidence of these improvements.
2) Utilizing proper image pre-processing techniques, like resizing and cropping the images
to remove unnecessary features and enable the models to learn the appropriate ones, has
significantly improved their performance and conserved memory and resources during
training.
3) Training different models, including standard convolutional neural networks, hybrid
models, and pre-trained models that achieve high test accuracies, demonstrates that this
research can undoubtedly aid radiologists in making accurate early diagnoses, even
during pandemics. In turn, this could reduce severe illness and save the life of
individuals.

1.3. Hypothesis

It is assumed that by employing the essential data pre-processing and optimization


techniques and subsequently utilizing these deep learning-based techniques on radiological chest
X-ray images (CXR), lung diseases like COVID-19 and Pneumonia will be appropriately
identified with better performance. This will serve as an efficient tool in assisting radiologists in
minimizing severe illnesses and death by providing early detection and diagnosis.

1.4. Thesis Structure

The thesis’s remaining sections are ordered as follows:


Chapter 2 offers an in-depth review of the most recent and related pieces of literature and
studies conducted in this domain. These state-of-the-art studies serve as the backbone of this
thesis as several pieces of information pertaining to the core concepts and techniques used were
found in them, and gaps regarding future work were also realized.
Chapter 3 thoroughly details a precise description of the background of the study. Deep
theoretical explanations of AI, DL, and ML, as well as its constituents and key application
domains, have been provided. In addition, a succinct summary of the current contemporary
medical imaging techniques was presented.

4
Chapter 4 describes the materials and methods employed in training these models for the
image classification tasks, including information regarding the datasets, software, and hardware
specifications used.
Chapter 5 contains the necessary experimental results in the form of images and tables,
and a fitting discussion was given for more explanation to fully comprehend and appreciate the
milestone achieved by the thesis work.
Chapter 6 consists of a precise conclusion of the study. Ultimately, detailed and valuable
recommendations concerning future work are presented.

5
2. LITERATURE REVIEW

This chapter contains a review of pieces of literature comprising recent studies conducted
in medical image classification, particularly those based on lung diseases. CXR and computer
tomography (CT) scan-based images are majorly preferred due to their ability to capture
numerous lung details for further examinations. Moreover, the deep learning ability to engineer
features from these details, coming from thousands of images, and classify them with high
accuracy is what attracted researchers to partake in these studies. By visiting these recent and
state-of-the-art works and outlining their findings, one can easily find some essential takeaways
in the field of medical image classifications. As a result, the actual milestones achieved by these
researchers can be seen to really appreciate their efforts and, most importantly, to derive a gap to
which this research work can add upon.

2.1. Medical Image Classification Using Regular CNN

Regular CNNs are robust architectures that enable one to create and modify as many layers
as possible and tune their hyperparameters easily, providing high classification and segmentation
results that may be equal to or better than pre-trained models in some instances. Caseneuve et al.
[34] suggested a CNN model that overcomes the need for metadata to classify hard-to-identify
chest X-ray images by employing edge-detecting techniques based on the Sobel and Scharr
operators to clean such input images before feeding them to the deep learning network for
training. They used a shallow CNN with a binary classification to classify the ChestX-ray8
dataset consisting of 3,402 edge-detected images, which were grouped into 2,381 for training, 340
for testing, and 681 for validation, as clear or noisy. The result from the model attained an
accuracy of 95%. Their resulting work was meant to be employed by other researchers in the
future for various disease classification tasks. Another impressive work was conducted by
Hussain and colleagues [35] to assess the performance of CNN for binary classification and
diagnosis of COVID-19 on medical CXR images. They built three different one, three, and four-
layered CNN models and trained them individually on combined datasets with a resulting 13,808
CXR images obtained from numerous freely available public datasets. They pre-processed the
dataset by resizing them from their initial 299 x 299 to 30 x 30 pixels sizes and later applied data
augmentation to raise the quantity of data for improved classification performances. Furthermore,
they split the data into 3,616 COVID-19 and 10,192 Normal classes. They utilized 70% of the
data for training and the remainder for testing and validation on two cases with and without data
augmentation. They observed that the models performed better with data augmentation. Also,
they found out that the three-layered model has the maximum accuracy of 96%, preceded by the
single and four-layered models with 95% and 94%, respectively. Other metrics recorded by the
models were 96% precision, 94% recall, and 96% F-score. To help with fast and inexpensive
methods in combating the recent pandemic, Arivoli and others [36] presented another deep
learning model that employed the standard CNN architecture. Their work involves developing
software that integrates the web design approach with the Keras library to create an interactive
graphical user interface (GUI). Using the Python programming language to develop the backend
(with Flask) and HTML, CSS, and JavaScript for the front end, their software, CoviExpert,
involves making combined independent predictions by training on 1,584 CXR images for either
positive or negative COVID-19 cases. Employing 177 images for testing, the suggested model
provided a 99 % classification accuracy. Also, they indicated that any device could deploy the
software, giving medical professionals a fast and convenient tool to detect positive Covid
patients.
Another study was conducted by Emin S. M. [37], employing radiological CXR images
for immediate diagnoses and treatment of coronavirus disease. The COVID-19 dataset, which
contains 13,824 images, was used in the proposed study employing the ConvNet model to
conduct a binary image classification task of tagging the images as either COVID cases or
Normal. Additionally, they compared the suggested model with two different pre-trained models,
MobileNetV2 and ResNet50, and it demonstrated better performance in test accuracy. The study's
experimental outcome recorded a test accuracy and F1-score of 96.71 % and 97 %, respectively.
The proposed ConvNet architecture demonstrated remarkable efficiency as its total trainable
parameters are far less than the pre-trained models' but also achieved greater test accuracy. Ghose
et al. [38] presented another outstanding deep-learning model based on the favored CNN
architectures using a significant amount of CXR images. Their proposed model was used to train
two different datasets for multi-class and binary image classification. The first dataset, dataset-1,
consists of a total of 10,293 CXR images with the following classes COVID-19, Normal, and
Pneumonia, attaining a recorded accuracy, precision, F1-score, specificity, and sensitivity, of 98.5
%, 99.2 %, 98.3 %, 98.9 %, 99.2 %, respectively. For the second dataset, dataset-2, a total of
7,075 CXR images were utilized for training and validation of two types of classes, COVID-19
and Pneumonia, accomplishing an accuracy of 99.60 % along with 99.30 % F1-score, 99.60 %
specificity, and 99.90 % sensitivity. The proposed model demonstrates high efficiency with little
to no drawbacks. Deep CNN architectures are ConvNet that are tens or sometimes hundreds of
layers deep, making them powerful in feature learning with high performance in training and
validation processes. Musallam et al. [39] employed deep CNN to develop a model named
DeepChest to identify Normal, Pneumonia, and COVID-19 classes of 7,512 CXR images
obtained from different sources on the Kaggle website. After receiving the datasets, they balanced
and pre-processed the images and subsequently adopted a three-phase training strategy, each with

7
ten epochs where the best weight is saved and loaded before the next phase. Although the model
is based on deep CNN, it is composed of a few layers and iterations in training compared to
previous models making it fast and efficient. The suggested model gained an overall 96.56 %
accuracy, with 99.40 % and 99.32 % in spotting COVID-19 and Pneumonia, respectively.
Furthermore, Singh et al. [40] utilized the might of deep learning on radiological CXR
images to detect lung diseases in three different categories, COVID-19, Pneumonia, and No
Findings. The suggested model comprises a mixture of single and tripled-layered convolution
configurations with a total of 19 layers for the multi-class classification task on a dataset obtained
from two publicly available sources. By using a total of 600 images, the model attained 87 %
classification accuracy. However, the presented model faces drawbacks due to limited data and a
need to address the class imbalance. Hussain and colleagues [41] suggested a novel CNN model,
CoroDet, leveraging raw CT scans and CXR images to assist radiologists worldwide with early
detection and diagnostic technique, particularly in developing countries where testing kits are in
short supply. The 22-layered deep learning architecture was employed on their generated dataset,
called COVID-R, to perform binary, three, and four-class classification tasks. The dataset was
produced by combining eight openly accessible datasets comprising different sorts of images like
COVID-19, Normal, Non-COVID-19 viral, and bacterial pneumonia. The presented model
reached classification accuracies of 99.1 %, 94.2 %, and 91.2 % for the binary, three, and four-
class classifications. Their obtained model accuracies were found to be better than ten other
existing studies after comparisons were made. In another research study, Gilani et al. [42]
combined three different public datasets with a locally obtained one from Pakistan consisting of a
mix of CT scans and CXR images to train a 14-layered CNN to categorize COVID-19, Normal,
and Pneumonia images as a fast alternate method to RT-PCR test method in that region. The
suggested deep learning model provided a recorded 96.68 % average accuracy, sensitivity, and
specificity of 96.24 % and 95.65 %, respectively. They further deployed the model to be used on
the abundance of medical images to assist Radiologists in that local hospital by integrating a
printed circuit board (PCB) with CT scanners and X-Ray machines. In a study conducted by
Hassantabar et al. [43], three models based on deep learning architectures for the classification
and segmentation of CXR and CT scan images were developed. A deep neural network (DNN)
was used for the classification task on the CXR images' fractal features, and a CNN used the
images directly. The CNN model showed better classification results, with 93.2 % accuracy and
96.1 % sensitivity, while the deep NN achieved classification results of accuracy and sensitivity
of 83.4 % and 86 %, respectively. For the image segmentation, another CNN architecture was
employed for the image to locate infected tissues in the CXR images yielding an accuracy of
83.84%.

8
2.2. Medical Image Classification Using Transfer Learning

Transfer learning (TL) involves a method of fine-tuning a pre-trained network, usually a


network trained on large datasets with millions of parameters, providing a way for their weights
to be utilized. The above-mentioned technique results in increasing the training and validation
accuracy of many models. This process is what makes deep learning highly efficient and unique
compared to other machine learning algorithms. Mahmud and colleagues [44] offered a network
named CovXNet, an architecture based on a deep CNN, to extract varied features from
radiological CXR images utilizing a depth-wise multi-dilation rates convolution. They initially
trained the model on a large dataset comprising varying CXR from healthy to viral or bacterial
pneumonia-infected lungs. Using fine-tuning, they transferred the learned features to a smaller
one with COVID-19 and other pneumonia cases. Moreover, they employed a stacking algorithm
to improve the different model variations. Eventually, they utilized discriminative localization
based on a gradient to determine the various types of Pneumonia centered on the X-Ray image
region abnormality. They employed their suggested algorithms on two additional datasets from
the Zhang Lab Data collected from the Mendeley data website and achieved an accuracy of 97.4
% for Normal and COVID-19 infected lungs, 96.9% for COVID-19 and Viral Pneumonia
infected lungs, 94.7% for COVID-19 and Bacterial Pneumonia infected lungs, and 90.2% for
multi-class classification of COVID-19, normal, Bacterial, Viral and Pneumonia. Dilshad and
others [45] used a MobileNet_V2 pre-trained model on three diverse datasets consisting of 894
CXR images to offer an early automatic COVID-19 recognition technique centered on a binary
classification approach for low computational power and cost in medical diagnoses. The three
merged datasets include Covid-Chest-X-Ray-Dataset, CXR8, and Local (Aligarh). Later, they
split up the resulting dataset into training, validation, testing, and Local and fed them into the
model for binary classification. The last part of the dataset was mainly used to ensure the
diagnostic efficiency of their model on local data from India. With 447 COVID-19 CXR images
and 447 No-Findings CXR images, the model accomplished 96.33 % accuracy. Moreover, the
model recorded F1 scores for the testing and locally acquired data testing of 93 % and 96 % and
False Negative Rates (FNR) of 12 % and 0 %, respectively. This FNR was lower than the normal
real-time RT-PCR test. Li et al. [46] aimed to minimize the performance reduction of models
posed by obtaining datasets from various sources in image classification problems. To achieve
that, they offered a self-supervised block for feature standardization and optimization comprising
image normalization, boundary detection, and contrast enhancements on three pre-trained models,
VGG, Xception, and DenseNet, as a baseline. These blocks extracted features from four CXR
lung disease classification datasets: Kaggle-RSNA, Shenzhen-Hospital, NLM-
MontgomeryCXRSet, and China Set-Chest X-ray database to examine the respective
effectiveness of the proposed models. They observed improved classification outcomes by the

9
three models resulting from the feature standardization block. The improvements included a 2 %,
5 %, and 2 % increase by the normalization, boundary detection, and contrast enhancement block
on the input data with a 6% overall improved accuracy in the entire models.
Veras et al. [47] suggested a transfer learning technique that used two pre-trained models,
ResNet50 and VGG16, on a total of 61,011 CXR images from various sources, which were
merged into one input collection. The three datasets include COVID-DB, COVID-19, and NIH
Chest X-Ray Datasets. The resulting dataset was pre-processed, making use of the limited
contrast adaptive histogram equalization technique to enhance their contrast. They further
converted the images into 8 bits images, which was the majority, and later resized them to 224 ×
224 as the standard CNN input data entry. After employing various data augmentation techniques,
they used a recent genetic fine-tuning approach to automatically determine an optimum set of
hyperparameters for the two pre-trained models. By using 80 % of the input data for training and
the remaining for validation, their proposed models achieved an accuracy of 97% in multi-class
classification of three classes COVID-19, healthy, and other Pneumonia infested lungs. Gour and
colleagues [48] proposed an uncertainty-conscious CNN model to categorize and detect COVID-
19 automatically. They used proper fine-tuning to obtain the posterior analytical distribution and
Monte Carlo (MC) dropout on the EfficientNet-B3 model's M forward passes on three CXR
datasets. They further calculated the mean and entropy of the distribution for the model's mean
distribution and uncertainty. Their suggested model achieved a 98.2 % G-Mean and 98.15 %
sensitivity for multi-class classification. Finally, they used the presented model for the binary
classification scheme and got a 99.16% G-Mean and 99.30% sensitivity, which shows how well
the model performed. In another study, Abbas et al. [49] presented a robust image classification
model that efficiently uses irregular or unbalanced datasets to achieve high performance. Their
suggested ResNet18 deep CNN model employed a self-supervised learning mechanism and
superb sample decomposition with a superb transfer learning approach from a vast range of image
recognition and classification to medical CXR image classification. The model initially
transferred learned features from a colossal Unlabeled CXR dataset comprising 50,000 X-Ray
images to two other CXR datasets called COVID-19 Datasets A and B, with the earlier smaller
than the latter. Moreover, they utilized learning based downstream with a layer of class
decomposition for data structure simplification to knowledge transformation. After fitting the
offered model, an accuracy of 99.8% was registered by the more extensive dataset and 97.54%
accuracy on the lesser dataset.
In their measures to decrease COVID-19 prediction time under the supervision of
specialists for early detection, Malhotra and others [50] utilized the RestNet18 pre-trained model
on two combined datasets obtained on public repositories of GitHub and Kaggle with COVID-19,
Pneumonia, and Normal for the earlier, as well as Normal and Pneumonia for the later,

10
respectively. They split the resultant dataset into training consisting of 1065 CXR images and
validation with 50 CXR images. Before feeding the input data points to the model, they pre-
processed them for better performance. The first pre-processing they did was resizing the images
into a 224 x 224 uniform required aspect ratio for the model. Then they applied data
normalization by obtaining similar data distribution for each input parameter for faster
convergence of the model's training. Also, they reduced the input data's dimension to greyscale
with a single channel from the expected three RGB channels to manage the model's training.
Finally, after applying various data augmentation to avoid overfitting, they trained the model and
reached 93 % accuracy and 98 % recall. They tried the model on a larger dataset and noticed
better outcomes. Singh et al. [51] presented a transfer deep learning method using three pre-
trained models, ResNet50, VGG16, and VGG19, on a publicly available CXR dataset from
Kaggle comprising 5,863 CXR images to detect Pneumonia. After resizing the input data based
on the models' input requirements, selecting the needed layers, and fine-tuning the models'
parameters, 5216 and 624 CXR images were utilized for training and validation, respectively. The
presented models classified the outcome as Pneumonia positive or Normal using four different
sets of epochs 3, 5, 10, and 15, and the models performed better with the five epochs value. The
maximum accuracy values achieved by these models are 85.3% for the ResNet50, 91.7% for
VGG16, and 91.9% for VGG19. Furthermore, Ibrahim et al. [52] aimed to resolve the prevalence
of human error in mistaking COVID-19 for Pneumonia and other lung cancers in lung disease
diagnoses. They employed a combination of four pre-trained multi-class deep learning models
with standard and bi-directional gated recurrent units (GRUs) on a merged CXR with computed
tomography (CT) dataset from numerous sources to classify lung diseases. They combined these
datasets to leverage the upper hand provided by CT scan images over the CXR to precisely
identify the necessary features in categorizing the data into COVID-19, Pneumonia, or lung
cancer. The combination also helped them increase the dataset's volume for improved
performance. After training and testing the models, a VGG19+CNN combination proved better,
with an accuracy of 98.05%, along with several other performance metrics. The remaining
models recorded accuracies of 95.31%, 96.09%, and 93.36%, respectively. In a study by Das et
al. [53], three different pre-trained models, Resnet50V2, DenseNet201, and InceptionV3 were
employed on a dataset generated from multiple open sources to categorize CXR images either as
positive or negative COVID-19 illness. The presented models were trained independently on a
total of 1006 images split into training, validation, and testing and later combined using a
weighted average ensembling approach, yielding a classification accuracy of 91.62 %. They
further developed an application based on GUI, which is deployable on any available computer to
assist medical experts in the early and fast detection of the virus.

11
2.3. Medical Image Classification Using Hybrid Models

Hybrid models, known as deep hybrid learning (DHL) technique, involve the integration
of different short algorithms that augment each other used in image classification and
segmentation. Most of the time, these models perform exceptionally well where input data are
insufficient, and deep CNN models may not achieve better performance. They usually combine
conventional CNN feedforward networks with ML classifiers (Random Forest, XGBoost, SVM,
etc.), Long short-term memory (LSTM), and so on. In a study by Toğaçar [54], the DarkNet-19
with an SVM classifier was used on the LC25000 datasets of 25,000 histopathological images to
classify colon and lung cancers. The preferred dataset contains five classes, each with 5000
images containing three lung and two colon cancer disease classes. The author trained the
proposed model, DarkNet-19 containing nineteen convolution and five max-pooling layers on the
mentioned dataset from the very beginning. For feature extraction purposes, the author employed
the Equilibrium and Manta Ray Foraging optimizations to select inefficient features in the
dataset, where the SVM classified the remaining useful features after merging them. The outcome
of the model's training was impressive, recording an accuracy value of 99.69%. Sharma et al. [55]
used an SVM classifier for detecting COVID-19 presence in a UCI dataset comprising 1000
images with 700 Normal and 300 COVID-19 diseases. They utilized the modified cuckoo
searching algorithm for hyperparameter optimization purposes. Furthermore, they used the
minimum redundancy maximum relevance, a hybrid feature selection algorithm, to improve the
feature selection of the presented model. Later, they used 70 % and 30 % of the images for
training and validation using three different variations of the suggested model. The recorded
outcomes are an accuracy of 80.42 % for the standard SVM classifier, an accuracy of 85.67 % for
the SVM using hybrid feature selection, and an accuracy of 96.73 % for the SVM with
hyperparameter optimization. Their findings showed the usefulness of optimizing the
hyperparameters of machine learning models for improved performance. Singh et al. [56]
suggested a multi-model neural network encompassing a VGG16 architecture and SVM classifier
for binary classification of CXR images with Normal and COVID-19 classes. The presented
model incorporated the SVM in the last VGG16 layer, where they also added convolutional, max-
pooling, and dense layers between the two for proper synchronization purposes. Moreover, their
proposed model utilized a radial basis function (RBF) to identify the top result. After pre-
processing a publicly available dataset called COVID-19 Image Data Collection, initially
consisting of 220 CXR images, they fed it to offered model. The dataset was composed of 118
and 102 Normal and COVID-19 patients' CXR images, and their outcome was 95% accuracy.
They compared their result to five other previous models, and it proved to perform better, taking
recall, precision, and F-Score into consideration.

12
In their effort to reduce one significant problem in medical image analysis technique that
many researchers ignore, an appropriate intrinsic way of framing uncertainty regarding the
prediction of the deep learning models. Prabha et al. [57] strived to ease healthcare professionals
in creating a model that accurately predicts lung disease due to the complex nature of CT scan
images. They offered a new hybrid model by fusing CNN with the AdaBoost ML classifier for
pre-processing, feature extraction, and binary classification on the lung CT scan dataset
containing about 2000 images with 1340 positive and 660 negative COVID-19 cases.
Furthermore, they resized the images into 256 by 256 pixels and made them finer by employing
image binarization and segmentation techniques. After the training and validation processes, the
presented model attained 97 % accuracy. In a study by Islam and colleagues [58], a combined
deep learning approach incorporating a CNN for feature extraction of the input images and LSTM
for image classification was employed on a generated dataset with 4,575 CXR images from
public sources, GitHub, Mendeley, and Kaggle, to detect COVID-19, Pneumonia, or Normal
classes. Their experimental results attained 99.4 % accuracy, a specificity of 99.2 %, 99.3 %
sensitivity, 98.9 % F1-score, and an AUC of 99.9%. They found that the initial model displayed
better performance metrics using a different CNN architecture, termed competitive CNN, and
then further compared to the results of other recent cutting-edge models. Their suggested model's
drawbacks are the small relative sample size and the inability to classify different views like
Anterior-Posterior (AP), lateral, and so on. Alshahrni et al. [59] proposed another impressive
work by developing a novel deep CNN hybrid model using a clustering approach known as Two-
step-AS using ensemble bootstrap aggregating training and multiple nearest neighbor classifier to
classify multi-class CXR images. Their presented model accomplished a recorded classification
accuracy of 98.062 %.
In one more study, Nasiri et al. [60] also used another hybrid model, in this case, the
DenseNet169 pre-trained model for feature extraction from combined CXR images obtained from
two public sources, where these features were later fed to the Extreme Gradient Boosting
(XGBoost) classifier for binary and multi-class classification. Their models achieved 98.23 %
classification accuracy, 99.78 % specificity, and 92.08 % sensitivity for the COVID-19 and No
finding classes and 89.70 % accuracy, specificity of 100 %, and sensitivity of 95.20 % for the
COVID-19, Pneumonia, and No finding categories. The offered model outperforms other recent
ones and takes less execution time with lower computational complexity. Also, Kumar et al. [61]
suggested a hybrid multimodal framework that fused two separate models employing the
weighted sum-rule fusion approach to segment and classifies CXR images and collected cough
samples. After using the necessary signal processing and Mel frequency cepstral coefficient to
pre-process the cough samples, a deep CNN model was used to obtain in-depth features. The
presented model fused the resulting obtained features, where an accuracy of 98.7 % was attained

13
for the CXR images and 82.7 % for the cough samples. Banerjee et al. [62] also used the concept
of decision-level fusion to develop an efficient ensemble network by combining the DenseNet201
model with the Random Forest (RF)on the large and small COVID-19 CXR datasets. The
DenseNet201 architecture was trained once, generating numerous snapshots of the model
employing the cosine annealing approach that resulted in myriads of extracted feature information
from the datasets. Then the RF meta-learner blending algorithm was used to categorize COVID-
19, Normal, and Pneumonia with classification accuracies of 94.55 % and 98.13 % for the large
and small-scale datasets, respectively. Ismael and others [63] suggested three deep learning-based
techniques, rich feature abstraction, large pre-trained model fine-tuning, and sequentially training
a CNN for binary (COVID-19 and Normal) classification of CXR images. For the first approach,
five pre-trained models, ResNet18, ResNet50, ResNet101, VGG16, and VGG19, were employed
to get features from 380 CXR images, and the SVM classifier have being used with various
kernel functions, out of which the linear kernel achieved 94.7 % classification accuracy. These
pre-trained models were also trained separately and fine-tuned, where ResNet50 acquired the best
accuracy of 92.6 %. Lastly, a classification accuracy of 91.6 % was recorded for the CNN
architecture.

14
3. BACKGROUND OF STUDY

In this chapter, several theoretical underpinnings communed with the thesis study, in
general, were precisely elaborated. Details regarding the broader aspects of artificial intelligence
and several areas of its applications, machine learning (ML) and deep learning (DL) approaches,
artificial neural networks (ANN), and various types used for purposes like pattern recognition,
natural language processing, computer vision, and so on are provided. The goal is to provide as
many theoretical backgrounds as possible to facilitate quick comprehension of the diverse nature
and relevance of the thesis study.

3.1. Artificial Intelligence

Intelligence is described as the capacity to learn, comprehend, and form rulings or


perspectives based on logic and reasoning [64]. Humans are said to possess the highest level of
intelligence, enabling them to make fast decisions and create sophisticated tools necessary for
survival. Basically, thinking and reasoning remain the most distinguishing factor between humans
and animals since both are termed "living things" due to the presence of life in them. For
thousands of years, man has remained curious about his surrounding, conducting numerous
experiments, and developing new ideas. The breakthrough in computer sciences and engineering
brought the beginning of recent and what stands to be among the most revolutionary concept in
history named Artificial Intelligence (AI), conceived in the 1950s shortly following the Second
World War [65]. AI has a bunch of definitions, making it tricky to adhere to one.

Figure 3.1. The depiction of AI definitions in four different forms [65]


Figure 3.1 depicts eight respective definitions categorized into twos, with the ones at the
top addressing reasoning and thinking, whereas the ones at the bottom are concerned with actions.
In a nutshell, AI can be described as a digital computer or computer-driven robot or machine's
capacity to execute functions typically accomplished by intelligent beings. The phrase is often
utilized to refer to the effort to create intelligent systems that possess human-like cognitive
abilities like the capability for reasoning, generalization, meaning-finding, and experience-based
learning. It has been proven that computers can be programmed to conduct highly intricate tasks
with excellent mastery since the digital computer's development in the 1940s. These tasks are,
however, specific, unlike human flexibility across a broader range of functions, but with high
performance and sometimes considered to beat human-level intelligence. Examples of such tasks
can be found in self-driving cars, recognition of voices or handwriting, computer search engines,
medical diagnosis, etc. [66]. AI is a vast study domain that embraces both machine and deep
learning, along with other practical techniques that may exclude the concept of learning. Figure
3.2 depicts how these learning-based approaches are represented under the fold of AI.

3.2. Applications of AI

AI remains a groundbreaking concept with a wide range of applications in our daily lives.
These applications are so diverse that they are employed in nearly every aspect of human lives,
sometimes unknowingly. Below is a list of some of these notable applications:

1. Process Diagnosis Systems 7. Gaming Industry


2. AI-powered Digital Assistants 8. Personalized Learning
3. Weather Forecasting 9. Expert Systems
4. Autonomous Vehicles 10. Fraud Detection and Prevention Systems
5. Space Shuttle Mission Control 11. Engineering Work Scheduling Systems
6. Labour Management Systems 12. Commerce and Sourcing

3.3. Machine Learning (ML) and Deep Learning (DL)

Machine learning (ML) allows computers to examine input data and their corresponding
results in labels to derive rules, unlike the conventional approaches in which computers are
provided with programs containing hard-coded rules and subsequently applied to transform the
input data into desired outcomes. This extremely potent concept enables ML systems to learn
valuable representations from their input data through a process known as training rather than
being specifically programmed [67]. The ML system is usually shown numerous illustrations of
the input data relevant to a specific task, sometimes in different variations known as data

16
augmentation. Then it discovers usable representations that ultimately enable it to derive
appropriate rules, which may make the task completely automated. Although very recent, the
availability of an abundance of datasets, coupled with advances in algorithms and hardware,
enables ML systems to perform complex tasks with millions of input data of significantly larger
proportions. The issue of generating features required as input for these systems to obtain the
necessary representations for easy process automation is a significant bottleneck. Deep learning
(DL) is regarded as a sub-field of ML that emerged to solve the issues mentioned earlier. DL
helps automate a substantial chunk of the feature extraction stage, reducing the quantity of
manual human involvement deemed necessary, thereby permitting more extensive datasets to be
utilized [68]. The main idea behind DL is the issue of employing consecutive layers to learn
progressively significant representations of the input data through models known as neural
networks (NN). NN is the widely used DL algorithm, where many notions are formed widely by
gaining inspiration from human knowledge of the brain functions comprising neuron units with
input, bias, weight, and output. Convolutional neural networks (CNN) is the main algorithm that
makes DL achieve such an astounding milestone in producing intelligent systems. CNN is a
particular form of ANN that takes the available input data and allocate some learnable biases and
weights to numerous aspects of the input to eventually come up with a way to distinguish one
from another under distinct groups. Figure 3.2 depicts the representation of AI, ML, and DL,
where these concepts are shown to be embodied under the artificial intelligence as a whole.

Figure 3.2. Representation of AI, ML, and DL

17
3.4. Artificial Neural Network (ANN)

An ANN is simply regarded as a computational model comprising processing units


known as neurons or nodes interconnected and constrained by coefficients in the form of weights
that acquire motivation from the biological interpretation of the human brain. Attached to each
neuron are the weights and biases, consisting of the learning algorithms solely responsible for
training and recall [69]. Due to the type of connections between these structures, NN is
sometimes referred to as the connectionist model. Typically, these models are organized in layers,
with each neuron capable of receiving input from neurons preceding it, processing the input, and
transmitting output to neurons following it. Figure 3.3 shows the architecture of an ANN
consisting of an input layer, two hidden layers, and an output layer. In this architecture, the input
layer contains three neurons that process the input data individually, passing them to the four sets
of neurons in the hidden layers and eventually passing them to one unit in the output layer to
produce the final result. Each input to a node is multiplied by a weight it learned during training
and passed through a non-linear activation function to become its output as well as an input to a
neighboring neuron [70]. An ANN is also referred to as a "black box" because it is unclear how
the neurons interact with one another to produce an output in larger architectures and dimensions.

Figure 3.3. Architecture of ANN

In Equation 3.1, the individual inputs to a particular node are multiplied by their
corresponding weights, and subsequently, biases are added to them. The equation shows the
summation of the individual nodes processing their input to produce the desired outputs.

18
∑𝑚
𝑖=1 𝑤𝑖 𝑥𝑖 + 𝑏𝑖𝑎𝑠 = 𝑤1 𝑥1 + 𝑤2 𝑥2 + 𝑤3 𝑥3 + 𝑏𝑖𝑎𝑠 (3.1)

Considering a single node, equations 3.2 and 3.3 show how the output of each neuron is
computed. Once a node receives the input, it randomly assigns a weight. The weight is crucial in
determining any particular variable's relevance, with more contributions by the larger ones [71].
Any node whose output exceeds the defined threshold value, which is greater than or equal to
zero, is activated and sends data to the model’s upcoming layer. Else, data is not transmitted to
the succeeding layers of the network.

1 𝑖𝑓 ∑ 𝑤1 𝑥1 + 𝑏 ≥ 0
𝑜𝑢𝑡𝑝𝑢𝑡 = 𝑓(𝑥) = { (3.2)
0 𝑖𝑓 ∑ 𝑤1 𝑥1 + 𝑏 < 0

𝑜𝑢𝑡𝑝𝑢𝑡 = 𝑓(𝑥) = 1 𝑖𝑓 ∑ 𝑤1 𝑥1 + 𝑏 ≥ 0; 0 𝑖𝑓 ∑ 𝑤1 𝑥1 + 𝑏 < 0 (3.3)

The most relevant characteristics concerning ANN include the capability to generalize,
experience-based adaptation, learning capability, data organization, and so forth.

3.5. Types of Artificial Neural Networks

ANNs vary in architecture and functionality, ranging from simple perceptron to more
complex deep CNNs and recurrent neural networks (RNNs). These variations enable several
types of ANN to be used for different purposes like image recognition and classification,
computer vision, natural language processing (NLP), future outcome predictions with time-series
data, etc.

3.5.1. The Perceptron Model

The perceptron model is the oldest NN model that gained a profound impact from the first-
ever electric circuit-based neuron model invented by two neurophysiologists, McCulloh and Pitts,
known as the McCulloh-Pitts neuron model (MCP) in 1943 [72]. It was created by Frank
Rosenblatt, majorly gaining insights from the MCP model with a single neuron allowing it to
learn the optimum weight coefficients for the correct prediction of a binary classification. There
can be many inputs to the model, each consisting of m samples with n features, allowing
straightforward representations of the inputs and their corresponding weights and vectors or
matrices [73]. Figure 3.4 illustrates the perceptron model with three input features and a bias,
each with its corresponding weights. Every input feature is processed sequentially by the model’s
weighted sum or net input function, where its resulting weight is optimized during training
toward the precise outcome of any prediction. Before the model computes its output, the summed
inputs and the bias with a default value of ‘1’ are required to pass through the threshold unit

19
containing a decision function. The below figure employs a step function to output ‘1’ if the input
surpasses a particular value and ‘0’ otherwise.

Figure 3.4. A simple perceptron model [73]

3.5.2. Multi-Layer Perceptron (MLP) or Feed Forward Neural Network

Feedforward NNs, also considered multi-layer perceptron (MLP), consist of an input, at


least a single hidden, and an output layer. Although they are regarded as MLPs, they are
essentially formed with sigmoidal neurons, or any arbitrary activation function, in contrast to the
normal perceptron owing to the non-linear disposition of real-world problems [71]. Figure 3.5
perfectly depicts the fundamental structure of the MLP, along with the explanations preceding it.
One characteristic of MLPs is that they are arranged in layers with connections in a strict
feedforward manner. The same links or those made to previous layers, such as feedback in the
case of RNNs, are entirely ruled out [74].

Figure 3.5. A block diagram of MLP [74]

20
A simple block diagram of the MLP is presented in Figure 3.5, where X = (𝑥1 , 𝑥2 , … , 𝑥𝑛 )
input vectors are fed to an activation function f, producing a net input value [WX] from the scalar
products of the inputs and weight vectors to produce an output vector O = (𝑜1 , 𝑜2 , … , 𝑜𝑚 ). These
models are usually trained by exposure to the data that are fed to them to ensure achieving the
proper weights combinations in all the layers for precise predictions. MLPs remain the
underpinnings of computer vision, robotics, pattern recognition, optimization, etc.

3.5.3. Radial Basis Function (RBF) Network

Radial basis function neural network, short for RBF, while similar in structure and
usually employed for tasks such as fitting curves and classification of patterns akin to the MLPs,
is a form of neural network with a slightly distinct fundamental concept. Although MLPs are
designed to accommodate multiple hidden layers, starting with at least one. RBF network
architectures are constrained to a single hidden or intermediate layer. This layer consists of
neurons with prototype vectors, a non-computational input layer that directs the input data to the
hidden layer, and a single output layer that uses linear activation functions to conduct
classification and regression tasks [75].

Figure 3.6. The architecture of an RBF network [76]

They employ neurons at the intermediate layer based on basis functions like the Gaussian
with radial symmetry concerning their centers to approximate an unknown function using a linear
blend of non-linear functions. These networks are considered multiple-layer feedforward with a
supervised form of training and learning. In Figure 3.6, the architecture of an RBF network like

21
the MLPs shows how the sets of input X = (𝑥1 , 𝑥2 , … , 𝑥𝑛 ) are propagated throughout the
network, passing via the hidden neural layer of the gaussian function to produce a linear output.
In contrast to its counterpart, the MLP, training is conducted in two stages. The first stage uses a
self-organized learning and radial basis functions-based allocations approach to adjust the
neurons' weights in the hidden layer, depending on the features of the supplied inputs. The second
utilizes a learning approach analogous to the MLP's output layer, a delta rule founded on
generalization, to efficiently adjust the weight of the nodes in the last layer [76]. The ability to
classify non-linearly separable data, faster training time, ease of intermediate neurons
interpretation, and the absence of strict parametrization are the significant advantages of RBF
networks compared to MLPs. These networks are equally utilized to solve problems in pattern
recognition, time-series prediction, function approximation, curve fitting, and so on.

3.5.4. Convolutional Neural Network (CNN)

Convolutional neural networks, also known as the ConvNets or CNN for short, are a
unique type of ANN architecture characterized by their ability to possess numerous stacks of
layers, each automatically learning to identify varying features and representations straight from
their input data resulting in eliminating the necessity for manual feature extraction [77]. These
sets of deep sequences of layers employ a mathematical operation known as the convolution
operation to recognize numerous collections of input-label pairs after being exposed to them and
be able to learn how to categorize data belonging to such classes successfully. ConvNets are
structured akin to the feedforward networks with numerous supplementary operations. They are
formed with input and output layers and dozens or hundreds of hidden layers, with each layer
retaining weights and biases, providing a means for learning. Each layer's weight is initialized
with a random set of numbers, specifying the desired operations to be conducted in such layers.
Computing a collection of suitable weight values for each layer in a network so that it can
appropriately link the provided inputs to their corresponding targets is what is meant by
"learning" in this specific situation. It may appear inconceivable to determine the proper values
for every weight within those enormous collections of layers, particularly since altering the value
of one will ultimately alter the function of others. However, adopting loss functions, optimizers,
and backpropagation, the main algorithm in deep learning, makes things feasible and
straightforward. The objective or loss function computes a distance score founded on network
predictions. It employs this score as a feedback signal to adapt the weights to minimize the
current score. This modification is the optimizer's responsibility, which uses the backpropagation
algorithm to bring about these changes [67]. CNN models are remarkably invaluable in
recognizing patterns in visual data such as images and videos. They are also exceptionally
effective in structured sequence data like time series and texts. A simplified building block of a

22
typical CNN architecture is depicted in Figure 3.7. The figure shows how a filter or kernel,
typically the weights resulting from a convolution operation, is used to obtain a range of feature
maps from a pre-processed input image. After the convolution with an activation function, the
features are reduced in dimension, with more valuable representations retained due to the usage of
the pooling operation. All these procedures are conducted within the first convolution layer. A
similar process is further accomplished in the second, with a final fully connected layer
vectorizing the resulting feature maps to produce an output from predictions made by other
mathematical functions such as softmax or sigmoid.

Figure 3.7. The Building Blocks of CNN Architecture [78]

As can be seen, the above explanations are just a rough description of the assumed
operations in just a four-layered CNN model. To fully grasp the main ideas in these architectures,
precise explanations regarding all the necessary terms in building a CNN architecture are given
below:

 Backpropagation Algorithm

The backpropagation (BP) algorithm is the neural networks' most essential method for
providing several practical algorithms with a means to learn effectively. It entails a rational
way to train multi-layered networks since their loss values are always considered to be intricate
composition functions of the preceding layers' weights. Consequently, BP is used to calculate the
gradient of these composition functions. The BP method employs the differential calculus chain
rule from advanced mathematics, which computes error gradients as summed outlines of the
product of local gradients along the numerous routes from any given node to the output ones.
Even though this sum contains an increasing number of routes, it can be effectively calculated

23
using optimization and programming techniques. Simply put, the BP method illustrates a pure
dynamic programming application [79]. BP consists of the forward and backward
or reverse phases to effectively allow neural networks to compute the gradients, where algorithms
like the Stochastic Gradient Descent (SGD) utilize them to conduct learning. The earlier is
desired to estimate the available output and the localized derivatives at different units. At the
same time, the latter is made to aggregate the products of these localized values along all routes
from the given node all the way to the output.

 Convolution Operation:

In the field of mathematics, convolution is a mathematical operation that conjoins sets of


matrices, such as X and Y, to produce an element-wise summation that typically takes the form of
a single numeral. The sum of the values from each convolution operation is stored as the weight
of the CNN models for further refinement to produce the best deals for feature learning on the
input data [80]. In deep learning, a square filter or kernel is slid over the entire span of any given
input image in a right-left and top-bottom direction, using a specific stride value, performing a dot
or scalar multiplication of weight and image values. The square filter or kernel is usually of 3 x 3
or 5 x 5 dimensions. The following operations create a unique weight function that affects the
transformations incurred on the given input by the particular layer containing the filter. A key
thing to note is that convolution layers in deep learning learn local patterns in their input feature
map, although other layers, like dense layers, learn global patterns. The convolution layers gain
an advantage in processing their input data thanks to the distinctive property of learning
translation invariance and pattern hierarchies. For instance, a CNN convolutional layer can
recognize a specific pattern anywhere after learning from the provided input, while densely
connected layers would find it highly challenging. Convolutional layers can also extract
progressively more intricate and abstract patterns from their input. The simple process of a
convolution operation is demonstrated in Figure 3.8, where two sets of 3 x 3 matrices, X and Y,
are convolved to produce a sum of value 12. Here, each value from matrix X is multiplied by its
corresponding element-wise value from matrix Y, and the resulting values are summed together
to produce the final result (known as dot product). The most important things to note are how
these matrices are stacked and their operation direction, forming what is described as an overlay.

24
Figure 3.8. A process of Convolution Operation [80]

 Activation Functions

Whether or not a neuron should fire is determined by the activation function. It thus
suggests that it will use more simple mathematics to decide whether the neuron's contribution to
the network's prediction phase is relevant. Numerous activation functions like sigmoid or logistic,
ReLU, tanh, and Leaky ReLU activation functions may be implemented in neural networks.
Specifically, non-linear, continuously differentiable activation functions are of the utmost
preference. A non-linear activation function allows the network to learn any nonlinearity from the
real-world input data, assuming it has enough neurons and layers. When a linear function
stimulates all neurons in a feedforward network, the network corresponds to a linear function, no
matter the depth of the model. Therefore, at minimum one neuron unit must have a non-linear
activation function to create a non-linear network [81].

Figure 3.9. The ReLU Activation Function and its Derivative [81]

25
The differentiability element is also essential as we commonly train neural networks using the
gradient descent method. Even if non-gradient-based optimization techniques are used for
improving fundamental functions, gradient-based approaches are the most often employed
methods in CNN training. Rectifiers, also known as rectified linear units (ReLU), are one of
modern technology's most predominant activation functions used in both RNN and CNN models.
It is computationally efficient and unaffected by the vanishing gradient challenges. Figure 3.9
shows the representation of the ReLU function along with its derivative. Equations 3.4 and 3.5
provide its definition as follows:

𝑓(𝑥) = max(0, 𝑥) (3.4)

0 𝑥<0
𝑓′(𝑥) = { (3.5)
1 𝑥≥0

From the above descriptions, f(x) is 0 when x is less than 0. Conversely, f(x) takes x
resulting value when its value is greater or equal to 0. The derivative of the ReLU function
always takes the value 1 in the set of all positive real numbers without saturation. In other words,
it has a range of values as [0, ∞]. Due to a property of the ReLU activation, dead neurons may be
created during training. No matter the data sample, a dead neuron will always report 0. The
mentioned situation may occur if the weight of a dead neuron is altered in such a manner that the
product of x with the layer's weight is always negative. Therefore, it always yields zero when
employed with the ReLU activation function. This property is advantageous because it permits
inputs that are always zero at the output of a layer. These outputs must be deleted to increase the
network's efficiency. A possible drawback of this feature is that the accuracy of a network may
degrade if some neurons die. Before starting training, it is essential to ensure that no dead neurons
exist in the network.

 Pooling

Pooling involves using a downsampling approach to recede the dimensionality of


feature maps from the output of a convolution operation, usually with a 2 x 2 filter smaller than
the feature map size [81]. The above implies that the pooling layer almost always decreases the
size of a convolved feature map by a factor of 2. By utilizing a pooling function, the net output at
a specific location may be substituted with the statistical mean of the outcomes in the surrounding
region, thereby drastically reducing the computational power in processing the input data. The
max pooling method, for instance, returns the maximum output within a specified rectangular
area. Pooling functions that are often used include the min, max, and average pooling to assist in
preserving tractability. Thanks to pooling, if the input is translated significantly, the

26
representation may become invariant, resulting in the extraction of prevalent features.
By translation invariance, therefore, the input may be solved by a little quantity without
appreciably changing the amounts of the pooled outputs [82]. It is standard practice to pool or
downsample the result of a convolution operation to address varying input data sizes, which are
required for many tasks.

Figure 3.10. Depiction of Max and Average Pooling [83]

For instance, the input to the classification layer must be of constant dimension to classify
images with varying proportions. Commonly, this is accomplished by altering the amount of an
offset between pooling provinces such that the classification layer acquires the exact number of
summary statistics no matter the size of the input. Figure 3.10 shows two types of pooling to help
with easy conception of these operations. As stated earlier, pooling almost always halved each
dimension of a resulting feature map. Both forms of pooling prove this statement, as shown. The
initial feature map is of size 4 x 4 pixels, after which the two results are 2 x 2 pixels in dimension.
As their names imply, the max pooling takes the highest value in every quarter of the feature map,
while the average pooling computes the average value of each feature map quarter and uses them
as a pooled feature map.

 Batch Normalization

Batch normalization (BN) is a highly effective way of enhancing the training of a deep
neural network with minimal computation overhead. Applying BN to neurons before using an
activation function improves overall performance. That is to say, if the neurons of one layer
are made zero-centered with unit variance as input values into an activation function, the
activations will produce the same outcome. In DL, it is often essential to scale or normalize the
input data before conducting training. For instance, if a data collection contains features with

27
widely distinct ranges of values, such as [0,1] and [10,100], these data should be normalized to
accelerate the network's learning and improve its prediction to achieve high performance.

Figure 3.11. A Depiction of Batch Normalization [84]

Because upon training CNN models, the distribution of all the model’s layers fluctuates,
resulting in lowering the optimization algorithm's convergence speed. BN is a method that is
widely employed to address this issue. Figure 3.11 shows how BN is represented, which allows a
simultaneous form of learning by the layers in a deep NN without waiting for output from
previous layers, thereby improving speed and performance. After the initial pre-processing of the
input data and fed to the input layer, a series of normalizations are applied to the output of all the
subsequent hidden layers before the final output layer. Equation 3.6 describes how the BN layer
transforms input data denoted by x to produce the output designated by z.

𝑥−𝜇
𝑧= 𝛾+ 𝛽 (3.6)
√𝜎 2 + 𝜖

The above transformation implements the mean-variance normalization (utilizing the μ and
σ) and linear scaling and shifting (with the γ and β) to the x input, respectively. An exponential
moving average determines the resulting layer's normalization parameters (μ and σ) over the
complete training data set. These normalization parameters are not trainable, whereas γ and β are
said to be trainable. In addition, the normalization parameters calculated throughout the training
set are utilized for the forward pass in the testing period, and they stay unaltered [81]. Typically,

28
the BN layer is positioned between the convolution or fully connected layer and the preferred
activation function used in these layers. Advantages of using BN in deep learning include fast
training, internal covariate shift handling, loss function smoothening, and so on.

 Dropout

Dropout is a straightforward, inexpensive form of regularization technique widely


employed to combat overfitting concerns and speed up convergence in DL models. The dropout
technique takes a different path to assemble a CNN ensemble by sampling at nodes rather than
edges. If a node is removed from the network, all connections to and from it must also be
disjointed. Only the input and hidden layers of the network are used to sample the nodes. Bear in
mind that it is inconceivable to create a prediction or calculate the loss function if the output
node(s) are tested. Sometimes the probability of testing the input nodes is distinct from the
intermediate nodes. The total number of sampled networks is thus 2N if the entire architecture has
N nodes [79]. Because the shut-down of nodes is random, the learning algorithm is unaware of
the nodes to be dropped in the network, making it quickly disseminate the weights in both
forward and backward propagation rather than focusing on specific nodes. These result in a
simpler network with fewer features, significantly resolving overfitting issues. Figure 3.12
provides an illustration of dropout in NN architectures. As shown, two neuron units from the
network's hidden layer are crossed out with red to depict shut-down units in this scenario. Despite
the absence of dropout in the input and output layers, the overall network parameter is expected to
be significantly minimized. The figure is a simple illustration of a primary feedforward network
with only one hidden layer. This is meant just for intuition, as fundamental CNN architectures are
enormous, with tens or hundreds of intermediate layers.

Figure 3.12. An Illustration of Dropout in NN [85]

29
3.5.5. Recurrent Neural Network (RNN)

Recurrent neural networks (RNNs) are another form of ANNs that store the output of any
given layer and feed these values back to the layer’s input to predict the outcome from the same
layer. RNNs are best suited for processing time series data or other forms of data consisting of
sequential dependencies among their tributes. These networks take information from
their preceding inputs to alter the present input and output values, providing them with what is
known as "memory." In contrast to the input-and-output-independence assumed by conventional
deep neural networks, recurrent neural networks' output is conditional on the present position in
the sequence. Whereas FFNs' weights vary between nodes, RNNs' weight parameters are constant
throughout all layers. However, these weights are still updated via BP and gradient descent
algorithms to aid in the learning process. Although analogous to standard BP, the gradients in
RNNs are determined using the more specialized backpropagation through time (BPTT)
technique. BPTT works equivalent to conventional backpropagation, in which a model trains
itself by back-propagating flaws from the output to the input layer [86]. Because of these
computations, we may fine-tune and match the model's parameters to prevent the exploding and
vanishing gradient issues that usually arise when the gradients are too large or too small,
respectively. Typical applications of these DL techniques include voice recognition, image
captioning, NLP, and general language translation, all of which are ordinal. Figure 3.13 shows a
simple RNN, where a simple FFN illustrates how this network is formed, providing a clear
intuition for ease of understanding. The blue, brown, and green nodes denoted by x, h, and y
represent the input, hidden, and output layers, respectively. In addition, the letters N, M, and L are
used to indicate the network parameters at the three corresponding layers. These parameters are
highly essential in improving the network's overall output.

Figure 3.13. A structure of a Simple RNN [87]

30
The input layer x of the above-illustrated network accepts the input data, evaluates it, and
transmits it to the hidden layer(s), each having its activation functions along with biases and
weights. The RNN will normalize these learnable network parameters such that every hidden
layer has equivalent network parameters. Then, rather than constructing numerous
intermediate layers, it will generate one and iterate over it as often as necessary.

3.6. Medical Imaging and Computer-Aided Diagnoses

Medical imaging (MI), often known as radiography, is the branch of medicine in which
medical experts reconstruct diagnostic or therapeutic images of various body parts using
sophisticated systems. MI procedures comprise unobtrusive diagnostics that enable physicians to
identify injuries and illnesses without invading patients' privacy. Three widely employed
techniques, including computer-aided diagnosis (CADx), that use computer systems to assist in
detecting and diagnosing ailments are described below:

 X-Ray Imaging

X-ray scanning is the earliest type of medical imaging, and contemporary medicine would
be inconceivable without it. Radiating ionizing X-rays across the body, tissue, bones, and organs
absorb them to variable degrees. An analog X-ray film, an imaging plate, which is subsequently
read out using an imaging plate scanner and digitalized, or an X-ray detector, which digitizes the
visual data directly, is exposed to the residual radiation. There is a distinction between analog and
digital X-ray technology. This technique is most often employed in radiology, orthopedics, and
surgery and is especially useful for diagnosing lung disorders, bone fractures, and tumor diseases.
The traditional and more recent digital X-ray imaging equipment, which also uses X-rays, has
given rise to several variants. The images depict various body parts in distinct tones of black and
white. The main reason is that multiple tissues soak up varying quantities of radiation. The
calcium in bones grips the most x-rays; therefore, bones emerge white [88]. Less-absorbent soft
tissues, such as fat, appear gray. Because air absorbs the least, the lungs appear black. X-rays are
most often used to detect fractures known as broken bones, but they also have additional
applications. Chest x-rays, for instance, may detect chest diseases like tuberculosis and
pneumonia. Also, Mammograms search for breast cancer by employing x-rays.

 Computed Tomography (CT) Scan

A computed tomography (CT) scan integrates a succession of X-ray images captured from
various viewpoints around a patient's body with high-performance computing to generate slices
or cross-sectional images. These images capture numerous organs and tissues, such as bones,

31
blood vessels, and other soft tissues within the patient's body. CT scans deliver more
sophisticated details than conventional X-ray scans. A CT scan might view virtually all body
parts and is used to detect sickness or injuries and plan for surgical, medical, or different radiation
therapy issues [89]. Several sections of photographs of a particular region of the patient's body are
captured and then merged by a computer system to generate a three-dimensional image. While a
CT scan typically utilizes X-rays, the tissue's thickness recreates a significant role in the imaging
process. Hounsfield unit (HU) is the general unit of measurement for the density and thickness of
a patient's tissue. The various HUs correlate to multiple grayscale levels provided by the device,
ranging from a +ve 1000 value referring to white color for the bones, through zero depicted
by grey for water content, to a -ve 1000 with black for air. Other terms like hyperdense, isodense,
and hypodense are generally used to characterize a CT scan's results as alternative means to refer
to the same thing. The CT scan's benefits over traditional X-rays include a 3D portrayal of organs
or tissues and a higher resolution imaging technique. There are numerous CT scanning
approaches, including single-slice CT scan abbreviated as SSCT, multi-slice CT scan, short for
MSCT, and spiral CT scan [90].

 Magnetic Resonance Imaging (MRI)

Magnetic resonance imaging, or MRI for short, is a cross-sectional medical imaging


technique that uses magnetism and computer-generated radio waves called high-frequency pulses
to produce detailed images of various body tissue. These images vary depending on the
concentration and density of hydrogen atoms in the body. Most, if not all, of these MRI machines
are constructed with massive and tubular magnets. The magnetic field of an MRI scanner
momentarily readjusts the patient's bodily water molecules, which generate feeble signals from
radio waves that are used to construct cross-sectional images. The MRI scanner can also produce
several three-dimensional viewable images from differing viewpoints [91]. Low-field or open
MRI devices are contrasted with high-field or closed MRT devices on the grounds of the
magnitude of the magnetism operated. Supplementary magnetic resonance coils, which function
as sensors and a means for detecting the proportions of the available body protons, are essential to
capture the magnetic waves. In addition to organelles, MR imaging is used to evaluate the
musculoskeletal system, muscles, blood vessels, and soft tissue layers. Cranial imaging, a method
used to assess the skull and brain, is another significant area in which MRI is applicable.

32
4. MATERIAL AND METHOD

This segment comprehensively describes the materials and methods employed during the
study's experimental process. We initially obtained two publicly available datasets from the
famous ML specialist and data scientist community, Kaggle [92], a subsidiary of the renowned
Google company. These two datasets consisted of CXR images and were appropriately pre-
processed to allow the deep learning models, such as the CNN, hybrid, and pre-trained models
created in the study, to train for subsequent use in lung disease predictions. The pre-processing
technique used included resizing, cropping, reshaping, normalizing the range of image values,
random oversampling to correct the class imbalance, data augmentation to mitigate overfitting,
and splitting the data into three distributions for training, testing, and validation. The experiment
was conducted in two categories. The first used one of the datasets for detecting three-class image
classification with COVID-19, Normal, and Pneumonia groups. The second set of the experiment
deployed the other dataset with a binary classification task to discern Normal and COVID-19
infected CXR images. The outcomes were impressive, achieving higher test accuracies, which
indicates the study's novelty.

4.1. Datasets Description

As stated beforehand, the datasets utilized in this research analysis are obtained from an
entrusted Kaggle open-source data repository, which comprises more than 50,000 datasets that
are made public to be accessed by anyone willing to conduct any AI-based experimental analysis.
These datasets are named Dataset-A and Dataset-B in this study to make any further references
easier. Figure 4.1 shows some random samples from Dataset-A, which is used for the three-class
image classification.

Figure 4.1. Samples of CXR Images for the Three-Class Distribution from Dataset-A
 Dataset-A

Dataset-A consists of 6,432 CXR images from the following categories: Normal,
Pneumonia, and COVID-19. These images are collected from three distinct open sources [93–95]
to form the resulting dataset. Table 4.1 illustrates the image allocation in the train and test folders.
As shown, the dataset comprises 1,266 and 317 normal images, 3,418 and 855 pneumonia-
infected images, and 460 and 116 COVID-19 positive images in the train and test folders,
respectively.

Table 4.1. Distribution of the Unprocessed and Unbalanced CXR images in Dataset-A

Folder Norma Pneumonia COVID- Total


l 19
Train Folder 1,266 3,418 460 5,144
Test/Val 317 855 116 1,288
Folder
Total 1,583 4,273 576 6,432

Furthermore, figures 4.2 and 4.3 show these distributions graphically to aid quick
comprehension. As shown, the dataset contains a considerable class imbalance, a big issue to
consider when training deep learning models to avoid developing networks that overfit a
particular class. If not curtailed, it will result in poorly trained models because they will not be
suitable for other unseen data, resulting in a lack of generalization. The random oversampling
technique, to be described in detail later, was used to address this issue.

Figure 4.2. Distribution of Imbalance Data in the Training Set of Dataset-A

34
Figure 4.3. Distribution of Imbalance Data in the Test/Validation Set of Dataset-A

 Dataset-B

The initial dataset is created in partnerships between medical professionals and researchers
from various universities from Qatar, Bangladesh, Malaysia, and Pakistan [96,97]. The dataset
won an award from the Kaggle society for the best COVID-19 dataset on the platform. Dataset-B,
our created dataset in this thesis study, is composed by randomly selecting 3,500 CXR images
from the COVID-19 positive and the Normal CXR image folders. A total of 7,000 images are
formed and used for training and validation after applying several data augmentation techniques
like rescaling, zooming, rotation, etc. The images from Dataset-B are well-composed and evenly
distributed. Because of the mentioned status of the dataset, the class imbalance concern has been
eliminated. Tables 4.2 and 4.3 show the breakdown of the fully pre-processed and balanced data
distributed among the classes for the binary and three-class image classifications for Dataset-A
and Dataset-B.

Table 4.2. Distribution of the Pre-processed and Balanced CXR images in Dataset-A

Data Normal Pneumonia COVID-19 Total


Training Data 3,418 3,418 3,418 10,254
Testing Data 428 428 428 1,284
Validation Data 427 427 427 1,281
Total 4,273 4,273 4,273 12,819

35
Table 4.3. Distribution of the Pre-processed and Balanced CXR images in Dataset-B

Data Normal COVID-19 Total


Training Data 2,800 2,800 5,600
Testing Data 350 350 700
Validation Data 350 350 700
Total 3,500 3,500 7,000

4.2. Random Oversampling, Data Augmentation, and Other Data Pre-processing

Several data pre-processing approaches were employed on the formed databases before the
models subsequently utilized them to classify the images from the various classes. In fact, data
pre-processing is one of the essential aspects of deep learning, as any deep learning model cannot
work when fed inappropriate data input. The input data's dimension must reflect the model's input
dimension. Several tensors of varying dimensions store the values of these inputs with image data
having different dimensions with video and time-series data. The CXR images in Dataset-A were
initially cropped and resized to 150 x 150 pixels for the CNN and hybrid models and 224 x 224
pixels for the pre-trained models with three channels since the images are of higher dimensions.
These steps are crucial as they greatly minimize computational complexities and make the images
easily classified due to the cropping. The cropping discards irrelevant regions of the images,
thereby fully exposing the chest regions to the classifying algorithms.

 Random Oversampling Technique

The random oversampling technique offers a naïve and straightforward approach to


harmonizing the class allocations of the imbalanced dataset. The RandomOverSampler [98] is
employed in this study to generate new samples of the under-represented classes by arbitrarily
sampling and replacing the existing images. Figure 4.4 shows this technique by depicting how a
simple marginalized class is rebalanced by duplicating samples from the current category, making
the decision function equally represent all available classes by preventing the mainstream class
from dominating the training process. One drawback of this technique, if not carefully
implemented or employed on samples with high disparity, will cause the trained models to overfit
specific data since they will be significantly duplicated. The issue of overfitting is appropriately
addressed by using data augmentation, dropout, batch normalization, and callbacks. Also, the data
was adequately distributed into training, validation, and testing, with the testing data being
withheld and then utilized to evaluate the performance of the models on the never-before-seen
data. The test results from these data signify the elimination of overfitting in the trained models.

36
Figure 4.4. Decision functions of Under sampled and Resampled Class Distribution [98]

 Data Augmentation

Data augmentation involves artificially creating additional data samples from existing data to
boost the extent and diversity of the training data. Augmentation may include making minimal
modifications to data or utilizing ML models to create extra data samples in the underlying space
of the initial data to augment the dataset and reduce overfitting. Several geometric
transformations, like translation, flipping, rescaling, rotation, and so on, are used on the original
images to generate numerous variations of each image. These rendered images will appear
different from the classifying algorithms, thereby increasing the volume of data and curtailing the
over-memorization issue that learning algorithms fall short of due to imbalance or limited
available data. It differs from synthetic data generation, in which data is created artificially.
Figure 4.5 provides random samples of images that were created because of using numerous data
augmentation approaches in the study. These images are generated in batches of 32 with target
sizes like the pre-defined image dimensions, a positive shuffle value as True, and appropriate
class modes for the binary and three-class classifications.

37
Figure 4.5. A Random Samples of the Augmented Images

4.3. Experimental Environment

All experiments for this thesis study were run using the premium Google Collaboratory Pro
subscription, which provides the K80, P100, and P4 GPUs and 32 GB of RAM. An HP Notebook
computer with a 2,667 MHZ speed and 16 GB of memory, 1.60 - 2.11 GHz Intel Core i5 CPU,
64-bit operating system (OS), and NVIDIA GeForce MX110 GPU supplied the means of
accessing the purchased cloud infrastructure for the successful completion of the experimental
analysis.

4.4. Proposed CNN Model

After applying the random oversampling and data augmentation techniques to Dataset-A,
the data types were converted from 8 bits unsigned integers to 32 bits floating point numbers
using the assign-type python command. This step is crucial because all the mathematical
operations in the deep learning process involve continuous rather than discrete values. Then these
values are further normalized to the range [0, 1] by dividing by the highest image pixel value of
255 to aid with faster computations and reduce exhausting unnecessary computer resources. The
final step before building the proposed model involved splitting the pre-processed and normalized
images into 80:10:10 ratios for training, validation, and testing. These ratios signify that the
model will use 80 % of the input images during the training phase, 10 % for the validation phase,
and the remaining 10 % for the testing phase. Table 4.4 consist of the breakdown of the suggested
CNN model developed in this study.

38
Table 4.4. The Structure of the Proposed ConvNet model

Block Type of Layer Output Shape Filter Pool Probability Parameters


# Size Size #
Block Conv2D (150, 150, 256) 3x3 _ _ 7,168
1 BatchNormalization (150, 150, 256) _ _ _ 1,024
MaxPooling2D (75, 75, 256) _ 2x2 _ _
Dropout (75, 75, 256) _ _ 0.2 _
Block Conv2D (75, 75, 128) 3x3 _ _ 295,040
2 BatchNormalization (75, 75, 128) _ _ _ 512
MaxPooling2D (37, 37, 128) _ 2x2 _ _
Dropout (37, 37, 128) _ _ 0.2 _
Block Conv2D (37, 37, 256) 3x3 _ _ 295,168
3 BatchNormalization (37, 37, 256) _ _ _ 1,024
MaxPooling2D (18, 18, 256) _ 2x2 _ _
Dropout (18, 18, 256) _ _ 0.2 _
Block Conv2D (18, 18, 256) 3x3 _ _ 590,080
4 BatchNormalization (18, 18, 256) _ _ _ 1,024
MaxPooling2D (9, 9, 256) _ 2x2 _ _
Dropout (9, 9, 256) _ _ 0.2 _
Block Conv2D (9, 9, 512) 3x3 _ _ 1,180,160
5 BatchNormalization (9, 9, 512) _ _ _ 2,048
MaxPooling2D (4, 4, 512) _ 2x2 _ _
Dropout (4, 4, 512) _ _ 0.2 _
Block Flatten 8,192 _ _ _ _
6 Dense 512 _ _ _ 4,194,816
BatchNormalization 512 _ _ _ 2,048
Dropout 512 _ _ 0.2 _
Dense 3 _ _ _ 1,539
Total Number of Parameters: 6,571,651

The table summarizes all the types of layers used in generating the model. Further details
like layers output shapes, convolution filter sizes, pooling size, dropout probabilities, and all the
generated parameters by the model are clearly shown. Initially, we set the shape of the model's
input layer to 150 x 150 x 3, corresponding to the dimensions of the input images. Then as shown
in the table, a six-block summary with the first five blocks containing a repetition of similar sets
of convolution and regularization operations with varying filters was constructed. These first five
blocks comprise convolution operations with 3 x 3 kernels, batch normalization layers, max-
pooling layers with 2 x 2 pool sizes, dropout layers with a probability of 0.2, and activation layers
with ReLU functions. The last block contains a Flatten layer that vectorizes the previous set of
convolution and dropout operations, similar batch normalization, Max pooling, and dropout
layers, and two Dense layers with a softmax function in the last containing three units to classify
the images into these categories. Figure 4.6 below shows a visualized depiction of the proposed
architect to aid with the quick assimilation of these explanations.

39
Figure 4.6. The Proposed ConvNet Architecture

Figure 4.7. The Pseudocode for the Proposed CNN model.

We also wrote the pseudocode for the proposed CNN architecture as shown in Figure 4.7
above to guide the readers on the workings of CNN models. The built model was later compiled
by an RMSprop optimizer with a 0.001 learning rate, a Sparse Categorical Crossentropy loss
function since the input data were not hot encoded, and an accuracy metric. The compiled model

40
generated 6,571,651 parameters due to the weights and biases, of which only 3,840 are not
trainable. The model was trained on the training and validation data in batches of 32 for
initialized epochs of 60. Also, two callback functions, namely EarlyStopping and
ReduceLROnPlateau, were used during the training process to aid with quick convergence and
mitigate the overfitting of the suggested model. After the 23rd epoch, the training was terminated,
and the early stopping callback function restored the model’s weights from the 13 th epoch with
very performance accuracy of about 99.7 %. The previously held out, never-before-seen test
portion of the pre-processed dataset was later employed to evaluate the study model's overall
performance, attaining an accuracy of 98.9 %. The discrepancy between the training and test
accuracies is less the 1 %, making it a well-trained model with high generalization and overall
performance. Other metrics like precision, recall, and f-score were also used for further
evaluation. The Hybrid Models
The second architecture proposed in the thesis study is the hybrid model. This hybrid
model is an example of the late fusion method of deep hybrid learning (DHL) techniques. Late
fusion because only at the last part of the process a particular classifier is used to make
predictions. DL methods, like the pre-trained models' remarkable feature-extracting power
(VGG16 in this case), are leveraged to perform automatic feature extraction, where classical ML
classifiers are employed to make predictions from the generated features of Dataset-A. Because
ML classifiers are considered fast, less computationally expensive, and more efficient,
particularly in the event of limited data, this model was created to combine the benefits of these
two AI approaches. Initially, the VGG16 pre-trained model was loaded without the top fully
connected layers with an input shape of 150 x 150 x 3 using the ImageNet weights and making
the loaded layers non-trainable to allow exclusively feature extraction and avoid retraining the
model from scratch. Then the same images from Dataset-A in the earlier CNN model were passed
through the VGG16 feature extractor. Two classifiers, XGBoost and SVM, were utilized to
provide predictions for three-class image categorization based on the derived features from the
training and validation sets of data. Table 4.5 shows the varying parameters that were tuned while
performing the mentioned predictions by these classifiers. For the SVM classifier, a grid search
approach, where the best value is selected while performing classification from a range of values,
was utilized for the gamma and c hyper-parameters.

41
Table 4.5. The Hyper-Parameters of XGBoost and SVM Classifiers

Classifier Parameter Value/ Type


XGBoost Learning rate 0.3
Number of estimators 300
Objective ‘Multi: softmax’
Metric Mlogloss
SVM Kernel RBF
Gamma [1e-3, 1e-4]
C [1, 10, 100, 1000]
Cross-validation 5

Later, the test set of images was utilized to evaluate the implementation of the classifiers,
resulting in more accurate classifications. Other performance indicators were applied for further
analysis, including the f-score, precision, and recall. Figure 4.8 reveals a graphical illustration of
the proposed model architecture for more comprehension of the whole process.

Figure 4.8. The The Architecture of the Hybrid Model

Short details of the mentioned AI methods used in developing the hybrid model used in this
experiment are provided below.

 VGG16: This architecture is a prominent CNN model presented by [99], which enhances
its predecessor, the AlexNet model, by substituting the 11 x 11 and 5 x 5 kernels in the
first two convolution layers with several 3 x 3 ones in succession. The model is 528 MB
in size with a recorded 90.1% top-5 accuracy on ImageNet data and about 138.4 million

42
parameters. The ImageNet dataset possesses around 14 million images from 1000
categories. VGG16 was trained on powerful GPUs over a period of several weeks.

 XGBoost: Coined after Extreme Gradient Boosting, it is a collection of scalable and


distributed Gradient boosting that are prominent for their high efficiency, flexibility, and
portability, which executes ML algorithms under the gradient-boosted frameworks.
XGBoost presents a similar tree-boosting approach known as GBDT or GBM that
efficiently and precisely solves classification and regression problems involving billions
of cases or more [100].

 SVM: An acronym for Support Vector Machine, it is among the most prominent
algorithms for classification and regression issues along with outlier detection in
supervised learning circumstances. SVM is an ML technique that strives to classify data
points operating a hyperplane in the space of N-dimensions with N as the number of
features. It is a memory-efficient method of applying varying kernel operations for the
decision function to solve issues where the number of dimensions surpasses the number
of samples [101].

4.5. The Pre-trained Models

The last sets of models developed in this study are the pre-trained models established on
transfer learning concepts. The second dataset, Dataset-B, is used for binary image classification.
Similar to the pre-processing steps conducted in the first dataset, the images in Dataset-B were
also reshaped, normalized, augmented, and split into an 80:10:10 ratio for training, validation,
and testing. Then ResNet50, Xception, and DenseNet201 models were loaded with the ImageNet
pre-trained weights without their respective layers. The same layers comprising global average
pooling, dropout with the probabilities of 0.2, batch normalization, and Dense layers with two
units using the softmax activation functions were later added to the models to aid with the
classification. Table 4.6 provides the details of these layers' structure, output shapes, and top
parameters generated by these layers.

43
Table 4.6. The Structure of the Top Layers of The Pre-trained Models

Model Type of Layer Output Shape Parameters #


ResNet50 GlobalAveragePooling2D 2048 _
Dropout 2048 _
BatchNormalization 2048 8192
Dense 2 4098
Xception GlobalAveragePooling2D 2048 _
Dropout 2048 _

BatchNormalization 2048 8192


Dense 2 4098
DenseNet201 GlobalAveragePooling2D 1920 _

Dropout 1920 _
BatchNormalization 1920 7680
Dense 2 3842

Then we compile them using the Adam optimizer, each with a learning rate of 0.00001
and a decay of 0.00001/60. Other hyperparameters used in the compilation process include
categorical cross-entropy loss function and accuracy metrics. The overall parameters generated by
these models are approximately 23.5, 20.8, and 18.3 million for the ResNet50, Xception, and
DenseNet201, respectively. The model's training process involves using epochs of 60, a batch
size of 128, along with EarlyStopping and ReduceLROnPlateau callbacks. The models were
evaluated individually using accuracy, f1-Score, precision, recall, and confusion metrics. The
overall performances of these models were quite impressive. Figure 4.9 visualizes the entire
process of the suggested pre-trained models using a flow diagram. Below is a concise description
of these pre-trained models that assisted in achieving experimental analysis results.

44
Figure 4.9. Process Flow Chat of the Pre-trained Models

 ResNet50: It is a kind of ANN that produces networks by piling blocks of residual


connections, as demonstrated by the 50-layer ConvNet featuring 48 convolution layers, a
maximum, and an average pooling layer [102]. Trained on ImageNet data, the 224 × 224
input-sized networks can categorize images into a thousand object groups. Consequent to
these datasets' weight, the network has acquired rich feature representations for an
expansive variety of input images. By utilizing the notion of shortcut connections, it was
possible to add additional convolutional layers to ConvNets without encountering the
vanishing gradient dilemma. ResNet architecture adheres to two fundamental design
principles. Initially, the quantity of filters within every layer remains constant regardless
of the extracted feature map's size. Then it requires twice as many filters to preserve the
layers' temporal intricacy when the resulting feature map size is sliced in half.

 Xception: This architecture was created to improve the fundamentals of the Inception
model, where 1 x 1 convolution operations were used to compress input data and
further utilize varying filters on each data's depth space. Extreme Inception, short for
Xception, essentially reverses this mentioned process. Conversely, it applies the filters to

45
each depth map before performing the exact convolution processes across the depth to
shrink the input space. This technique is known as the depth-wise separable convolution
operation [103]. A further discrepancy between these methods is that the earlier employs
the ReLU function to apply non-linearity after each convolution, whereas the latter does
not induce such non-linearity.

 DenseNet201: This network has a 201-layer design in which directly linked channels
form dense connections. Each layer obtains extra inputs from previous ones and sends its
feature maps to succeeding layers utilizing a concatenated strategy. Splitting the 3 x 3
convolution operation into a 1 x 1 and a 3 x 3 minimizes the quantity of the model
parameters since each layer gets feature maps from subsequent ones. The growth rate
denoted by k signifies that the amount of feature maps rises by k each moment the dense
block is traversed, which helps to lower the parameters. The dense blocks are joined by a
transitional layer that significantly minimizes the number of features and nets out the
most effective ones in each layer [104].

46
5. RESULTS AND DISCUSSION

In this part, all data collected during the experimental investigation of the thesis were
presented and appropriately explained. The majority of these findings are presented as tables,
plots, and figures, displaying actual values from different libraries such as Python's Matplotlib,
Scikit-learn, and tables that were reconstructed using Microsoft Word to depict the outcomes of
the experiments. Some essential software such as Photoshop and Paint was also used to resize,
crop, and combine figures to enhance the presentation of findings.

5.1. Experimental Results

After training the models in this study with the two distinct datasets, we evaluated them
using test portions of the preserved data to measure their test loss and accuracy. The plots of these
losses and accuracies against the number of training epochs are known as learning curves. Figure
5.1 (a) and (b) depict the learning curves for our suggested ConvNets model and ResNet50 deep
learning model, respectively. ResNet50 and DensNet201 have non-smooth learning curves while
producing excellent results.

Figure 5.1. Learning Curves of Losses and Accuracies of (a) CNN Model (b) ResNet50 Model
Figure 5.2 (c) and (d) presented the learning curves of the Xception and DenseNet201
models, respectively. These plots are very significant as they indicate the performance of the
models in terms of training and validation losses and accuracies. Typically, they are used to detect
overfitting and underfitting in models. Based on these curves, the models perform well since the
gap between average losses and accuracies is minimal. The variation in the number of epochs
shown by various plots is another factor worthy of consideration concerning these graphs. These
differences are due to the EarlyStopping and ReduceLROnPlateau callback methods used during
model training. These callbacks continuously monitor the models' convergence to determine the
optimal learning rates. Consequently, training was terminated early with epochs between 17 and
30, demonstrating the quality of the models. These factors also cause large models such as the
ResNet50 and DensNet201 to have non-smooth learning curves while producing excellent results.

Figure 5.2. Learning Curves of Losses and Accuracies of (c) Xception Model (d) DenseNet201 Model

After obtaining the plots of the individual deep learning models' learning curves, the
classification report from the Scikit-learn library was utilized to generate the average precision,
recall, f1-score, and accuracy metrics from the test data. Table 5.1 shows the summary of the
experimental results obtained from the proposed models conducted on Dataset-A for the three-
class image classification. The proposed CNN model achieved an estimated accuracy of 98.91 %.
It also attained an average precision value of 98.92 %, 98.90 % recall, and 98.90 % F1-score for

48
COVID-19, Pneumonia, and Normal classification. The VGG16 + XGBoost and VGG16 + SVM
hybrid models obtained 98.44 % and 95.60 % accuracies. However, they recorded similar average
precision, recall, and F1-score value of 96.00 %.

Table 5.1. Results of Experiment Conducted on Dataset-A

Models Precision Recall F1-Score Accuracy


(%) (%) (%) (%)
CNN 98.92 98.90 98.90 98.91
VGG16 + XGBoost 96.00 96.00 96.00 98.44
VGG16 + SVM 96.00 96.00 96.00 95.60

For the fine-tuned pre-trained models, an analogous procedure was followed to evaluate
the models' outcomes. Table 5.2 presents the results obtained from these models using Dataset-B
for binary image classification. In this category, the Xception model outperformed the best,
reaching a test accuracy of 99.14 %. The precision, recall, and F1-score values are also estimated
to attain an equivalent result of 99.14 %. Next is the DensNet201 model with accuracy, precision,
F1-score, and recall or sensitivity of 99.00 %, respectively. Finally, the ResNet50 pre-trained
model acquired an accuracy of 98.90 %, accompanied by equal value for the precision, sensitivity
(recall), and F1-score of 98.86 %.

Table 5.2. Results of Experiment Conducted on Dataset-B

Models Precision Recall F1-Score Accuracy


(%) (%) (%) (%)
ResNet50 98.86 98.86 98.86 98.90
Xception 99.14 99.14 99.14 99.14
DenseNet201 99.00 99.00 99.00 99.00

5.2. Evaluation Metrics

Below are concise discussions of the four evaluation metrics (accuracy, F-measure,
precision, and recall) used to assess the performance of the suggested models. This wide range of
evaluation metrics is necessary to ensure that the proposed models are correctly scrutinized and
evaluated in different forms to guarantee the validity of their outcomes.

5.2.1. Accuracy

The accuracy of an AI model is defined as the total number of valid predictions it


produces, which is crucial for measuring its classifications and recognition of input data

49
efficiency. It is represented as the proportion of accurate predictions to all other classifications
generated by the model using the input data [105].

𝑇𝑃+𝑇𝑁
𝐴𝑐𝑐𝑢𝑟𝑎𝑐𝑦 = 𝑇𝑃+𝑇𝑁+𝐹𝑃+𝐹𝑁
(5.1)

Where
TP is the true positive prediction in which the suggested models correctly classify the CXR
images with ailments. TN is short for true negative, where the models accurately identify the
Normal images in the datasets. FP stands for false positive, in which the models wrongly
classified the Normal images as either Pneumonia or COVID-19 infected CXR images. In
conclusion, FN refers to false negative predictions. It is used to characterize instances in which
COVID-19 or Pneumonia-positive CXR images are classified as Normal by the models.

5.2.2. Precision

The precision of a given model is defined as the ratio of the TP (correct predictions) to all
the positive predictions (FP included) in its dataset. The model has a satisfactory prediction
quality when it appropriately predicts all the positive classes in the dataset [105].

𝑇𝑃
𝑃= 𝑇𝑃+𝐹𝑃
(5.2)

Where P is the Precision.

5.2.3. Recall

The recall of the model is expressed as the ratio of the TP (correct predictions) to the
complete number of accurate entities in the given dataset [105]. It is given as follows:

𝑇𝑃
𝑅= (5.3)
𝑇𝑃 + 𝐹𝑁

Where R is the Recall.

5.2.4. F1-Score

The F-measure (also F-Score) of an AI model is the weighted harmonic mean of the
recall and precision, ranging between zero (0) and one (1). When the value of the F-measure is
increased, the model’s classification performance also increases [105].

𝑝𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 . 𝑟𝑒𝑐𝑎𝑙𝑙
𝐹 = 2. (5.4)
𝑝𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 + 𝑟𝑒𝑐𝑎𝑙𝑙

50
5.3. Confusion Matrix

A confusion matrix is a table-like plot that is widely employed to describe the performance
of classification models on a collection of known-true test data. The Confusion Matrix is
somewhat simple to comprehend, although its terminology may be confusing. A confusion matrix
offers far more information than a straightforward accuracy score. It indicates if a dataset is
balanced when the output classes have comparable amounts. For the multi-class case, it illustrates
how inaccurate a prediction maybe when the output classes are ordinal. When dealing
with imbalanced data, where there is a considerable gap between the different categories of
groups, accuracy alone is insufficient. Figure 5.3 (a), (b), and (c) present the confusion matrices
for the ConveNet and hybrid models obtained using Dataset-A. The diagonal blocks of light
pink color contain the true and exact predictions provided by these models.

Figure 5.3. Confusion Matrices of (a) CNN Model, (b) VGG16+XGBoost, and (c) VGG16+SVM Hybrid
Models for Three-class Classification Using Dataset-A

Figure 5.4. Confusion Matrices of (d) ResNet50, (e) Xception, and (f) DenseNet201 Models for Binary
Classification Using Dataset-B

51
Table 5.3. Performance Comparisons with Some State-of-the-art Studies from the Literature

Methods Authors Dataset Size Top Accuracy


(%)
CNN Caseneuve et al. [34] ChestX-ray8 3,402 95.00
Based Hussain et al. [35] COVID-19 Radiography 13,808 96.00
Models
Musallam et al. [39] Chest X-Ray Images 7,512 96.56
(Pneumonia)
Covid-chestxray-dataset
COVID-XRay-5 k
Singh et al. [40] COVID-19 Image Data 600 87.00
Collection
ChestX-ray 8
Gilani et al. [42] COVID-R 8,087 96.68

Proposed Model Dataset-A 6,432 98.90


Pre-trained Mahmud and others [44] Zhang Lab Data 5,856 97.40
Models Dilshad and others [45] Covid-Chest-X-Ray-Dataset 894 96.33
CXR8
Aligarh
Veras and colleagues [47] COVID-DB 61,011 97.00
COVID-19
NIH Chest X-Ray
Abbas and others [49] COVID-19 Datasets A 50,000 97.00
COVID-19 Datasets B
Ibrahim and others [52] Multiple Datasets 33,676 98.05
Proposed Model Dataset-B 7,000 99.1
Hybrid Toğaçar [54] LC25000 25,000 99.69
Models Sharma et al. [55] UCI ML Repository 1000 96.73
Singh et al. [56] COVID-19 Image Data 220 95.00
Collection
Prabha et al. [57] Lung CT image dataset 2000 97.00
Nasiri et al. [60] COVID-19 Image Data 1,125 98.23
Collection
ChestX-ray 8
Proposed Model Dataset-A 6,432 98.40

The dark blocks, on the other hand, represent the number of incorrect predictions made by the
three-class classification models. Additionally, Figure 5.4 (d), (e), and (f) provide the confusion
matrices for the binary classification for the ResNet50, Xception, and DenseNet201 pre-trained
models based on fine-tuning acquired from the test results of Dataset-B. In the dark blocks of the
figures, a minimum of six incorrect predictions were identified for the Xception and DensNet201
models, even though the ResNet50 model had eight total wrong predictions, making it less
accurate than the former two.

52
5.4. Discussion of Further Outcomes

Furthermore, a table of comparison was created by picking the best accuracies of some
work conducted in the literature and compared with the ones obtained by our proposed models.
The comparison is organized into three sections to represent the three kinds of trained models in
the thesis research. Five recent state-of-the-art works were selected with details and volume of
their dataset and compared with the suggested model's best accuracy in each section. For the
ConvNet and pre-trained models, the proposed models have the best accuracies compared to these
novel studies. Only one research in the hybrid model area surpasses our model's outcome by a
slight margin. These results also indicate that, if appropriately enhanced in the future, these
models will be very hard for other researchers to match. Table 5.3 shows the comparison table for
these explanations. Moreover, we conducted additional testing on the proposed models using
random CXR images from the two datasets. Figure 5.3 shows the outcome of this supplementary
testing process. In Figures 5.5 (a) and (b), COVID-19 positive and normal CXR labeled images
were randomly selected and passed through the models. The models correctly predicted the
mentioned labels, demonstrating how well the models have been trained. The use of additional
random model testing shows how robust and reliable the proposed models are. These make us
confident that they are indeed generalizable. In addition, their results are regarded as superior to
those of earlier research since not all previous studies included this vital step in their experimental
analysis.

Figure 5.5. Supplementary Testing of the Proposed Models on Random Samples from (a) Datasets-A and
(b) Dataset-B

53
6. CONCLUSIONS

With the existence of technology in medicine, numerous medical imaging techniques like
X-ray, CT scans, and MRI images are deployed to provide efficient ways of identifying diseases.
Several AI-based methods are widely deployed on large volumes of these images to quickly
diagnose a range of ailments, particularly in the presence of an epidemic, to minimize severe
cases and sudden death. In this thesis study, conventional ConvNets, hybrid, and pre-trained
models were used on two generated datasets, namely Dataset-A (containing three classes,
Normal, Pneumonia, and COVID-19) and Dataset-B (comprising Normal and COVID-19
classes), with an initial 6,432 and 7,000 CXR images, respectively. The random oversampling
technique was used on the first dataset to alleviate the issue of class imbalance. Further, data
augmentation was used to introduce different variations of the images to minimize overfitting.
After all the necessary data pre-processing, six models, namely, one proposed CNN,
hybridization of VGG16 with XGBoost and SVM classifiers, and ResNet50, Xception, and
DenseNet201 pre-trained models were proposed for binary and three-class image classification.
The suggested CNN model achieved precision, recall, F1-score, and accuracy of 98.92 %, 98.90
%, 98.90 %, and 98.91 % for Dataset-A containing Normal, Pneumonia, and COVID-19
categories. Then the hybrid models VGG16 + XGBoost and VGG16 + SVM classifiers were used
on the same dataset and attained precisions, recalls, F1-scores, and accuracies of 96.00 %, 96.00
%, 96.00 %, and 98.44 %, and 96.00 %, 96.00 %, 96.00 %, and 95.60 %, respectively. For the
binary image classification on Dataset-B, the fine-tuned ResNet50 gained a precision of 98.86 %,
recall of 98.86 %, F1-score of 98.86 %, and accuracy of 98.90 %. The Xception model attained a
precision of 99.14 %, a recall (sensitivity) of 99.14 %, an F1-score of 99.14 %, and an accuracy
of 99.14 %. Lastly, the DensNet201 model realized precision, sensitivity, F1-score, and accuracy
of 99.00 %, 99.00 %, 99.00 %, and 99.00 %, respectively.
In addition, we used the confusion matrix to evaluate the proposed models, achieving
outstanding results with very few FN and FP values, as represented by the dark columns in the
two sets of experiments. Also, we compared the outcomes of the presented models with some
state-of-the-art obtained from the literature. The comparisons demonstrated the achievements of
the models in this study, with only one surpassing the accuracy of the hybrid models. Finally, we
conducted additional testing of the models by randomly passing CXR images from the two
datasets and obtaining the correct predictions by the models. These outcomes show the
significance of the study and, without a doubt, makes us confident that the models can indeed be
deployed in real-life scenarios to aid radiologists with robust tools for early disease detection and
diagnosis, particularly those based on CXR images.
RECOMMENDATIONS

The entire conduct of this thesis study is not without obstacles and limitations. To ensure
improvements for future studies, we offer the following list of recommendations:

1) We recommend the use of alternative resampling methods, such as combinations of


under sampling and oversampling or synthetic minority oversampling techniques
(SMOTE), to address the class imbalance. The rationale behind this is that directly
using random oversampling without other approaches like data augmentation,
regularizations, and acceptable data splitting would cause the model to overfit by
exposing it to repeated instances of the same data.

2) We also advise using additional hybrid models, such as merging CNN with extreme
learning machines (ELM) or convolution auto-encoders with deep learning
classification using the encoded features to classify the images. Research in these
domains will allow exploring other boundaries in image classification to create robust
models for aiding medical experts in quick diagnoses and treatment of ailments.

3) We suggest that the institution equip its computer labs with GPU-enabled PCs that
are easily accessible to students. In this approach, students will be encouraged to
begin developing powerful models that can be used in a variety of problem-solving
circumstances and compete with nations that are leading in AI.
REFERENCES
[1] “IBM,” Artificial Intelligence in Medicine | IBM, (n.d.). https://www.ibm.com/topics/artificial-
intelligence-medicine (accessed November 8, 2022).
[2] L. Cai, J. Gao, D. Zhao, A review of the application of deep learning in medical image classification
and segmentation, Ann. Transl. Med. 8 (2020) 713–713. https://doi.org/10.21037/atm.2020.02.44.
[3] E. Miranda, M. Aryuni, E. Irwansyah, A survey of medical image classification techniques, Proc.
2016 Int. Conf. Inf. Manag. Technol. ICIMTech 2016. (2017) 56–61.
https://doi.org/10.1109/ICIMTech.2016.7930302.
[4] I. Castiglioni, L. Rundo, M. Codari, G. Di Leo, C. Salvatore, M. Interlenghi, F. Gallivanone, A.
Cozzi, N.C. D’Amico, F. Sardanelli, AI applications to medical images: From machine learning to
deep learning, Phys. Medica. 83 (2021) 9–24. https://doi.org/10.1016/j.ejmp.2021.02.006.
[5] Li Q, Guan X, Wu P, Wang X, Zhou L, Tong Y, et al, Early transmission dynamics in Wuhan,
China, of novel coronavirus–infected pneumonia, N. Engl. J. Med. 382 (2020) 1199–1207.
[6] M. Abed Alah, S. Abdeen, V. Kehyayan, The first few cases and fatalities of Corona Virus Disease
2019 (COVID-19) in the Eastern Mediterranean Region of the World Health Organization: A rapid
review, J. Infect. Public Health. 13 (2020) 1367–1372. https://doi.org/10.1016/j.jiph.2020.06.009.
[7] J.S. Mackenzie, D.W. Smith, COVID-19 : a novel zoonotic disease caused by a coronavirus from
China : what we know and what we don ’ t, (n.d.).
[8] A.R. Rahmani, M. Leili, G. Azarian, A. Poormohammadi, Sampling and detection of corona viruses
in air: A mini review, Sci. Total Environ. 740 (2020) 140207.
https://doi.org/10.1016/j.scitotenv.2020.140207.
[9] “World Health Organization,” Coronavirus, (n.d.). https://www.who.int/health-
topics/coronavirus#tab=tab_1 (accessed October 31, 2022).
[10] T.A. Ghebreyesus, WHO Director-General’s opening remarks at the media briefing on COVID-19 -
11 March 2020, (2020). https://www.who.int/director-general/speeches/detail/who-director-
general-s-opening-remarks-at-the-media-briefing-on-covid-19---11-march-2020 (accessed October
28, 2022).
[11] P.T. Campbell, O.K. Fix, COVID-19 and Implications on the Liver, Clin. Liver Dis. (2022) 1–19.
https://doi.org/10.1016/j.cld.2022.08.003.
[12] P. Sirohiya, S. Vig, T. Mathur, J.K. Meena, S. Panda, G. Goswami, R. Gupta, A. konkimalla, D.
Kondamudi, N. Gupta, B.K. Ratre, R. Singh, B. Kumar, A. Pandit, K. Sikka, A. Thakar, S.
Bhatnagar, Airway management, procedural data, and in-hospital mortality records of patients
undergoing surgery for mucormycosis associated with coronavirus disease (COVID-19), J. Med.
Mycol. 32 (2022) 101307. https://doi.org/10.1016/j.mycmed.2022.101307.
[13] Y. Rolland, M. Baziard, A. De Mauleon, E. Dubus, P. Saidlitz, M.E. Soto, Coronavirus Disease-
2019 in Older People with Cognitive Impairment, Clin. Geriatr. Med. 38 (2022) 501–517.
https://doi.org/10.1016/j.cger.2022.03.002.
[14] E.J. Chow, J.A. Englund, Severe Acute Respiratory Syndrome Coronavirus 2 Infections in Children,
Infect. Dis. Clin. North Am. 36 (2022) 435–479. https://doi.org/10.1016/j.idc.2022.01.005.
[15] “WHO,” WHO Coronavirus (COVID-19) Dashboard | WHO Coronavirus (COVID-19) Dashboard
With Vaccination Data, (n.d.). https://covid19.who.int/ (accessed October 31, 2022).
[16] “worldometer,” COVID Live - Coronavirus Statistics - Worldometer, (n.d.).
https://www.worldometers.info/coronavirus/ (accessed October 31, 2022).
[17] “Statista,” Novel coronavirus (COVID-19) deaths by country worldwide 2022 | Statista, (n.d.).
https://www.statista.com/statistics/1093256/novel-coronavirus-2019ncov-deaths-worldwide-by-
country/ (accessed October 31, 2022).
[18] “Harvard,” Harvard EdCast: COVID’s Impact on Education in Developing Countries | Harvard
Graduate School of Education, (n.d.). https://www.gse.harvard.edu/news/21/12/harvard-edcast-
covids-impact-education-developing-countries (accessed November 1, 2022).
[19] B. Saboori, R. Radmehr, Y.Y. Zhang, S. Zekri, A new face of food security: A global perspective of
the COVID-19 pandemic, Prog. Disaster Sci. 16 (2022) 100252.
https://doi.org/10.1016/j.pdisas.2022.100252.
[20] M. Guven, B. Cetinguc, B. Guloglu, F. Calisir, The effects of daily growth in COVID-19 deaths,
cases, and governments’ response policies on stock markets of emerging economies, Res. Int. Bus.
Financ. 61 (2022). https://doi.org/10.1016/j.ribaf.2022.101659.
[21] E.J. Hou, Y.Y. Hsieh, T.W. Hsu, C.S. Huang, Y.C. Lee, Y.S. Han, H.T. Chu, Using the concept of
circular economy to reduce the environmental impact of COVID-19 face mask waste, Sustain.
Mater. Technol. 33 (2022) e00475. https://doi.org/10.1016/j.susmat.2022.e00475.
[22] M. Henseler, H. Maisonnave, A. Maskaeva, Annals of Tourism Research Empirical Insights
Economic impacts of COVID-19 on the tourism sector in Tanzania, Ann. Tour. Res. Empir.
Insights. 3 (2022) 100042. https://doi.org/10.1016/j.annale.2022.100075.
[23] “WHO-Covid-Prevention,” Coronavirus, (n.d.). https://www.who.int/health-
topics/coronavirus#tab=tab_2 (accessed November 1, 2022).
[24] “CDC-Covid-Prevention,” How to Protect Yourself and Others | CDC, (n.d.).
https://www.cdc.gov/coronavirus/2019-ncov/prevent-getting-sick/prevention.html (accessed
November 1, 2022).
[25] K. Shimizu, K. Kondo, Y. Osugi, M. Negita, H. Mase, T. Kondo, M. Aoki, K. Taniguchi, K.
Shibuya, Y. Tokuda, Early COVID‐19 testing is critical to end the pandemic, J. Gen. Fam. Med. 22
(2021) 67. https://doi.org/10.1002/JGF2.420.
[26] “Medical Device Network,” What are the different types of Covid-19 test and how do they work?,
(n.d.). https://www.medicaldevice-network.com/analysis/types-of-covid-19-test-antibody-pcr-
antigen/ (accessed November 1, 2022).
[27] L. Falzone, G. Gattuso, A. Tsatsakis, D.A. Spandidos, M. Libra, Current and innovative methods for
the diagnosis of COVID-19 infection (Review), Int. J. Mol. Med. 47 (2021) 1–23.
https://doi.org/10.3892/ijmm.2021.4933.
[28] A. Barragán-Montero, U. Javaid, G. Valdés, D. Nguyen, P. Desbordes, B. Macq, S. Willems, L.
Vandewinckele, M. Holmström, F. Löfman, S. Michiels, K. Souris, E. Sterpin, J.A. Lee, Artificial
intelligence and machine learning for medical imaging: A technology review, Phys. Medica. 83
(2021) 242–256. https://doi.org/10.1016/j.ejmp.2021.04.016.
[29] M. Aljabri, M. AlGhamdi, A review on the use of deep learning for medical images segmentation,
Neurocomputing. 506 (2022) 311–335. https://doi.org/10.1016/j.neucom.2022.07.070.
[30] P. Asha, P. Srivani, R. iqbaldoewes, A. Al Ayub Ahmed, A. Kolhe, M.Z.M. Nomani, Artificial
intelligence in medical Imaging: An analysis of innovative technique and its future promise, Mater.
Today Proc. 56 (2022) 2236–2239. https://doi.org/10.1016/j.matpr.2021.11.558.
[31] A. Singhal, M. Phogat, D. Kumar, A. Kumar, M. Dahiya, V.K. Shrivastava, Study of deep learning
techniques for medical image analysis: A review, Mater. Today Proc. 56 (2022) 209–214.
https://doi.org/10.1016/j.matpr.2022.01.071.
[32] “Chest X-ray (Covid-19 & Pneumonia),” Chest X-ray (Covid-19 & Pneumonia) | Kaggle, (n.d.).
https://www.kaggle.com/datasets/prashant268/chest-xray-covid19-pneumonia (accessed November
2, 2022).
[33] “Dataset-B,” COVID-19 Radiography Database | Kaggle, (n.d.).
https://www.kaggle.com/datasets/tawsifurrahman/covid19-radiography-database (accessed
December 5, 2022).
[34] G. Caseneuve, I. Valova, N. Leblanc, M. Thibodeau, ScienceDirect ScienceDirect Chest X-Ray
Image Preprocessing for Disease Classification, Procedia Comput. Sci. 192 (2021) 658–665.
https://doi.org/10.1016/j.procs.2021.08.068.
[35] M.G. Hussain, Y. Shiren, Recognition of COVID-19 Disease Utilizing X-Ray Imaging of the Chest
Using CNN, Proc. - 2021 Int. Conf. Comput. Electron. Commun. Eng. ICCECE 2021. (2021) 71–

57
76. https://doi.org/10.1109/iCCECE52344.2021.9534839.
[36] A. Arivoli, D. Golwala, R. Reddy, CoviExpert: COVID-19 detection from chest X-ray using CNN,
Meas. Sensors. 23 (2022) 100392. https://doi.org/10.1016/j.measen.2022.100392.
[37] M. Emin Sahin, Deep learning-based approach for detecting COVID-19 in chest X-rays, Biomed.
Signal Process. Control. 78 (2022) 103977. https://doi.org/10.1016/j.bspc.2022.103977.
[38] P. Ghose, M.A. Uddin, U.K. Acharjee, S. Sharmin, Deep viewing for the identification of Covid-19
infection status from chest X-Ray image using CNN based architecture, Intell. Syst. with Appl. 16
(2022) 200130. https://doi.org/10.1016/j.iswa.2022.200130.
[39] A.S. Musallam, A.S. Sherif, M.K. Hussein, Efficient framework for detecting COVID-19 and
pneumonia from chest X-ray using deep convolutional network, Egypt. Informatics J. 23 (2022)
247–257. https://doi.org/10.1016/j.eij.2022.01.002.
[40] S. Singh, P. Sapra, A. Garg, D.K. Vishwakarma, CNN based Covid-aid: Covid 19 Detection using
Chest X-ray, Proc. - 5th Int. Conf. Comput. Methodol. Commun. ICCMC 2021. (2021) 1791–1797.
https://doi.org/10.1109/ICCMC51019.2021.9418407.
[41] E. Hussain, M. Hasan, M.A. Rahman, I. Lee, T. Tamanna, M.Z. Parvez, CoroDet: A deep learning
based classification for COVID-19 detection using chest X-ray images, Chaos, Solitons and
Fractals. 142 (2021) 110495. https://doi.org/10.1016/j.chaos.2020.110495.
[42] G. Gilanie, U.I. Bajwa, M.M. Waraich, M. Asghar, R. Kousar, A. Kashif, R.S. Aslam, M.M. Qasim,
H. Rafique, Coronavirus (COVID-19) detection from chest radiology images using convolutional
neural networks, Biomed. Signal Process. Control. 66 (2021) 102490.
https://doi.org/10.1016/j.bspc.2021.102490.
[43] S. Hassantabar, M. Ahmadi, A. Sharifi, Diagnosis and detection of infected tissue of COVID-19
patients based on lung x-ray image using convolutional neural network approaches, Chaos, Solitons
and Fractals. 140 (2020) 110170. https://doi.org/10.1016/j.chaos.2020.110170.
[44] T. Mahmud, M.A. Rahman, S.A. Fattah, CovXNet: A multi-dilation convolutional neural network
for automatic COVID-19 and other pneumonia detection from chest X-ray images with transferable
multi-receptive feature optimization, Comput. Biol. Med. 122 (2020) 103869.
https://doi.org/10.1016/j.compbiomed.2020.103869.
[45] S. Dilshad, N. Singh, M. Atif, A. Hanif, N. Yaqub, W.A. Farooq, H. Ahmad, Y. Chu, M. Tamoor,
Results in Physics Automated image classification of chest X-rays of COVID-19 using deep
transfer learning, Results Phys. 28 (2021) 104529. https://doi.org/10.1016/j.rinp.2021.104529.
[46] X. Li, L. Shen, Z. Lai, Z. Li, J. Yu, Z. Pu, L. Mou, M. Cao, H. Kong, Y. Li, W. Dai, A self-
supervised feature-standardization-block for cross-domain lung disease classification, Methods.
(2021) 1–8. https://doi.org/10.1016/j.ymeth.2021.05.007.
[47] R.M.S. Veras, R.A.L. Rabêlo, R.R. V Silva, Classification of COVID-19 in X-ray images with
Genetic, Comput. Electr. Eng. 96 (2021) 107467.
https://doi.org/10.1016/j.compeleceng.2021.107467.
[48] M. Gour, S. Jain, Uncertainty-aware convolutional neural network for COVID-19 X-ray images
classification, Comput. Biol. Med. 140 (2022) 105047.
https://doi.org/10.1016/j.compbiomed.2021.105047.
[49] A. Abbas, M.M. Abdelsamea, M.M. Gaber, 4S-DT: Self-Supervised Super Sample Decomposition
for Transfer Learning with Application to COVID-19 Detection, IEEE Trans. Neural Networks
Learn. Syst. 32 (2021) 2798–2808. https://doi.org/10.1109/TNNLS.2021.3082015.
[50] R. Malhotra, H. Patel, B.D. Fataniya, Prediction of COVID-19 Disease with Chest X-Rays Using
Convolutional Neural Network, Proc. 3rd Int. Conf. Inven. Res. Comput. Appl. ICIRCA 2021.
(2021) 545–550. https://doi.org/10.1109/ICIRCA51532.2021.9544991.
[51] U. Singh, Deep Learning Model to Predict Pneumonia Disease, (2020) 1315–1320.
[52] D.M. Ibrahim, N.M. Elshennawy, A.M. Sarhan, Deep-chest: Multi-classification deep learning
model for diagnosing COVID-19, pneumonia, and lung cancer chest diseases, Comput. Biol. Med.
132 (2021). https://doi.org/10.1016/j.compbiomed.2021.104348.

58
[53] A.K. Das, S. Ghosh, S. Thunder, R. Dutta, S. Agarwal, A. Chakrabarti, Deep learning-based
approach for detecting COVID-19 in chest X-rays, Pattern Anal. Appl. 24 (2021) 1111–1124.
https://doi.org/10.1007/s10044-021-00970-4.
[54] M. Toğaçar, Disease type detection in lung and colon cancer images using the complement approach
of inefficient sets, Comput. Biol. Med. 137 (2021).
https://doi.org/10.1016/j.compbiomed.2021.104827.
[55] D.K. Sharma, M. Subramanian, P. Malyadri, B.S. Reddy, M. Sharma, M. Tahreem, Classification of
COVID-19 by using supervised optimized machine learning technique, Mater. Today Proc. (2021).
https://doi.org/10.1016/j.matpr.2021.11.388.
[56] S. Singh, A. Prasad, A. Kumar, CovXmlc : High performance COVID-19 detection on X-ray images
using Multi-Model classification, Biomed. Signal Process. Control. 71 (2022) 103272.
https://doi.org/10.1016/j.bspc.2021.103272.
[57] B. Prabha, S. Kaur, J. Singh, P. Nandankar, S. Kumar Jain, H. Pallathadka, Intelligent predictions of
Covid disease based on lung CT images using machine learning strategy, Mater. Today Proc.
(2021). https://doi.org/10.1016/j.matpr.2021.07.372.
[58] M.Z. Islam, M.M. Islam, A. Asraf, A combined deep CNN-LSTM network for the detection of
novel coronavirus (COVID-19) using X-ray images, Informatics Med. Unlocked. 20 (2020)
100412. https://doi.org/10.1016/j.imu.2020.100412.
[59] M.M. Alshahrni, M.A. Ahmad, M. Abdullah, N. Omer, M. Aziz, An intelligent deep convolutional
network based COVID-19 detection from chest X-rays, Alexandria Eng. J. (2022).
https://doi.org/10.1016/j.aej.2022.09.016.
[60] H. Nasiri, S. Hasani, Automated detection of COVID-19 cases from chest X-ray images using deep
neural network and XGBoost, Radiography. 28 (2022) 732–738.
https://doi.org/10.1016/j.radi.2022.03.011.
[61] S. Kumar, R. Nagar, S. Bhatnagar, R. Vaddi, S.K. Gupta, M. Rashid, A.K. Bashir, T. Alkhalifah,
Chest X ray and cough sample based deep learning framework for accurate diagnosis of COVID-
19, Comput. Electr. Eng. 103 (2022) 108391. https://doi.org/10.1016/j.compeleceng.2022.108391.
[62] A. Banerjee, A. Sarkar, S. Roy, P.K. Singh, R. Sarkar, COVID-19 chest X-ray detection through
blending ensemble of CNN snapshots, Biomed. Signal Process. Control. 78 (2022).
https://doi.org/10.1016/j.bspc.2022.104000.
[63] A.M. Ismael, A. Şengür, Deep learning approaches for COVID-19 detection based on chest X-ray
images, Expert Syst. Appl. 164 (2021). https://doi.org/10.1016/j.eswa.2020.114054.
[64] “Cambridge Dictionary,” INTELLIGENCE | English meaning - Cambridge Dictionary, (n.d.).
https://dictionary.cambridge.org/dictionary/english/intelligence (accessed November 10, 2022).
[65] S. Russell, P. Norvig, Artificial Intelligence A Modern Approach (4th Edition), 2021.
https://books.google.com.br/books?id=koFptAEACAAJ.
[66] “Britannica,” artificial intelligence | Definition, Examples, Types, Applications, Companies, & Facts
| Britannica, (n.d.). https://www.britannica.com/technology/artificial-intelligence#ref219078
(accessed November 10, 2022).
[67] F. Chollet, Deep Learning with Python, Second Edition, 2021.
https://www.manning.com/books/deep-learning-with-python-second-edition.
[68] “IBM,” AI vs. Machine Learning vs. Deep Learning vs. Neural Networks: What’s the Difference? |
IBM, (n.d.). https://www.ibm.com/cloud/blog/ai-vs-machine-learning-vs-deep-learning-vs-neural-
networks (accessed November 11, 2022).
[69] S. Shanmuganathan, A hybrid artificial neural network (ANN) approach to spatial and non-spatial
attribute data mining: A case study experience, 2016. https://doi.org/10.1007/978-3-319-28495-
8_21.
[70] “Towards Data Science,” Why We Will Never Open Deep Learning’s Black Box | by Paul J. Blazek
| Towards Data Science, (n.d.). https://towardsdatascience.com/why-we-will-never-open-deep-
learnings-black-box-4c27cd335118 (accessed November 14, 2022).

59
[71] “IBM,” What are Neural Networks? | IBM, (n.d.). https://www.ibm.com/cloud/learn/neural-
networks (accessed November 15, 2022).
[72] F. Fogelman Soulié, Neural networks and computing, 1991. https://doi.org/10.1016/0167-
739X(91)90017-R.
[73] “Towards Data Science,” Understanding basic machine learning with Python — Perceptrons and
Artificial Neurons | by Ben Fraser | Towards Data Science, (n.d.).
https://towardsdatascience.com/understanding-basic-machine-learning-with-python-perceptrons-
and-artificial-neurons-dfae8fe61700 (accessed November 15, 2022).
[74] S. Chakraverty, S. Mall, Artificial neural networks for engineers and scientists: Solving ordinary
differential equations, 2017. https://doi.org/10.1201/9781315155265.
[75] L. Ramadhan, Radial Basis Function Neural Network Simplified | by Luthfi Ramadhan | Towards
Data Science, (n.d.). https://towardsdatascience.com/radial-basis-function-neural-network-
simplified-6f26e3d5e04d (accessed November 16, 2022).
[76] I.N. da Silva, D.H. Spatti, R.A. Flauzino, L.H.B. Liboni, S.F. dos Reis Alves, Artificial neural
networks: A practical course, Artif. Neural Networks A Pract. Course. (2016) 1–307.
https://doi.org/10.1007/978-3-319-43162-8/COVER.
[77] “Mathworks,” What is a Convolutional Neural Network? - MATLAB & Simulink, (n.d.).
https://www.mathworks.com/discovery/convolutional-neural-network-matlab.html (accessed
November 17, 2022).
[78] A.S. Lundervold, A. Lundervold, An overview of deep learning in medical imaging focusing on
MRI, Z. Med. Phys. 29 (2019) 102–127. https://doi.org/10.1016/j.zemedi.2018.11.002.
[79] C.C. Aggarwal, Neural Networks and Deep Learning, Neural Networks Deep Learn. (2018).
https://doi.org/10.1007/978-3-319-94463-0.
[80] Z. Elhamraoui, Introduction to convolutional neural network | by Zahra Elhamraoui | Analytics
Vidhya | Medium, (n.d.). https://medium.com/analytics-vidhya/introduction-to-convolutional-
neural-network-6942c189a723 (accessed November 17, 2022).
[81] H. Habibi Aghdam, E. Jahani Heravi, S.I.P. AG, Guide to Convolutional Neural Networks A
Practical Application to Traffic-Sign Detection and Classification, 2018.
[82] I. Goodfellow, Y. Bengio, A. Courville, Deep Learning, MIT Press, 2016.
https://books.google.com.tr/books?id=-s2MEAAAQBAJ.
[83] S. Saha, A Comprehensive Guide to Convolutional Neural Networks — the ELI5 way | by Sumit
Saha | Towards Data Science, (n.d.). https://towardsdatascience.com/a-comprehensive-guide-to-
convolutional-neural-networks-the-eli5-way-3bd2b1164a53 (accessed November 29, 2022).
[84] “Medium,” Batch Normalization. Heap Normalization | by Zehra | medium, (n.d.).
https://coimer.medium.com/batch-normalization-b7d73c9cc6df (accessed November 29, 2022).
[85] I. Dabbura, Coding Neural Network — Dropout. Dropout is a regularization technique… | by Imad
Dabbura | Towards Data Science, (n.d.). https://towardsdatascience.com/coding-neural-network-
dropout-3095632d25ce (accessed November 29, 2022).
[86] “IBM Cloud Education,” What are Recurrent Neural Networks? | IBM, (n.d.).
https://www.ibm.com/cloud/learn/recurrent-neural-networks (accessed November 30, 2022).
[87] A. Biswal, Recurrent Neural Network (RNN) Tutorial: Types and Examples [Updated] |
Simplilearn, (n.d.). https://www.simplilearn.com/tutorials/deep-learning-tutorial/rnn (accessed
November 30, 2022).
[88] “Medline Plus,” X-Rays: MedlinePlus, (n.d.). https://medlineplus.gov/xrays.html (accessed
November 30, 2022).
[89] “Mayo Clinic,” CT scan - Mayo Clinic, (n.d.). https://www.mayoclinic.org/tests-procedures/ct-
scan/about/pac-20393675 (accessed November 30, 2022).
[90] “Ken Hub,” Bildgebende Verfahren - Röntgen, CT und MRT | Kenhub, (n.d.).
https://www.kenhub.com/de/library/anatomie/medizinische-bildgebung-und-radiologische-

60
anatomie (accessed November 30, 2022).
[91] “Mayo Clinic MRI,” MRI - Mayo Clinic, (n.d.). https://www.mayoclinic.org/tests-
procedures/mri/about/pac-20384768 (accessed November 30, 2022).
[92] “Kaggle,” Kaggle: Your Machine Learning and Data Science Community, (n.d.).
https://www.kaggle.com/ (accessed December 1, 2022).
[93] “First Dataset-A,” GitHub - ieee8023/covid-chestxray-dataset: We are building an open database of
COVID-19 cases with chest X-ray or CT images., (n.d.). https://github.com/ieee8023/covid-
chestxray-dataset (accessed December 1, 2022).
[94] “Second Dataset-A,” Chest X-Ray Images (Pneumonia) | Kaggle, (n.d.).
https://www.kaggle.com/datasets/paultimothymooney/chest-xray-pneumonia (accessed December
1, 2022).
[95] “Third Dataset-A,” agchung · GitHub, (n.d.). https://github.com/agchung (accessed December 1,
2022).
[96] M.E.H. Chowdhury, T. Rahman, A. Khandakar, R. Mazhar, M.A. Kadir, Z. Bin Mahbub, K.R.
Islam, M.S. Khan, A. Iqbal, N. Al Emadi, M.B.I. Reaz, M.T. Islam, Can AI Help in Screening Viral
and COVID-19 Pneumonia?, IEEE Access. 8 (2020) 132665–132676.
https://doi.org/10.1109/ACCESS.2020.3010287.
[97] T. Rahman, A. Khandakar, Y. Qiblawey, A. Tahir, S. Kiranyaz, S. Bin Abul Kashem, M.T. Islam, S.
Al Maadeed, S.M. Zughaier, M.S. Khan, M.E.H. Chowdhury, Exploring the effect of image
enhancement techniques on COVID-19 detection using chest X-ray images, Comput. Biol. Med.
132 (2021). https://doi.org/10.1016/J.COMPBIOMED.2021.104319.
[98] “Imblearn,” 2. Over-sampling — Version 0.9.1, (n.d.). https://imbalanced-
learn.org/stable/over_sampling.html (accessed December 5, 2022).
[99] K. Simonyan, A. Zisserman, Very deep convolutional networks for large-scale image recognition,
3rd Int. Conf. Learn. Represent. ICLR 2015 - Conf. Track Proc. (2015) 1–14.
[100] “XGBoost,” XGBoost Documentation — xgboost 1.7.1 documentation, (n.d.).
https://xgboost.readthedocs.io/en/stable/ (accessed December 8, 2022).
[101] “Scikit Learn,” 1.4. Support Vector Machines — scikit-learn 1.1.3 documentation, (n.d.).
https://scikit-learn.org/stable/modules/svm.html (accessed December 8, 2022).
[102] K. He, X. Zhang, S. Ren, J. Sun, Deep residual learning for image recognition, Proc. IEEE Comput.
Soc. Conf. Comput. Vis. Pattern Recognit. 2016-Decem (2016) 770–778.
https://doi.org/10.1109/CVPR.2016.90.
[103] F. Chollet, Xception: Deep learning with depthwise separable convolutions, Proc. - 30th IEEE Conf.
Comput. Vis. Pattern Recognition, CVPR 2017. 2017-Janua (2017) 1800–1807.
https://doi.org/10.1109/CVPR.2017.195.
[104] G. Huang, Z. Liu, L. Van Der Maaten, K.Q. Weinberger, Densely connected convolutional
networks, Proc. - 30th IEEE Conf. Comput. Vis. Pattern Recognition, CVPR 2017. 2017-Janua
(2017) 2261–2269. https://doi.org/10.1109/CVPR.2017.243.
[105] “Evaluation Metrics,” Classification: Accuracy | Machine Learning | Google Developers, (n.d.).
https://developers.google.com/machine-learning/crash-course/classification/accuracy (accessed
December 12, 2022).

61
CURRICULUM VITAE

Muhammad Sani DANLADI

ACADEMIC ACTIVITIES
Paper:
1. Danladi, M.S., Baykara, M. (2022). Design and implementation of temperature and humidity
monitoring system using LPWAN technology. Ingénierie des Systèmes d’Information, Vol. 27,
No. 4, pp. 521-529. https://doi.org/10.18280/isi.270401.
2. Danladi, M.S., Baykara, M. (2022). Low power wide area network technologies: Open
problems, challenges, and potential applications. Review of Computer Engineering Studies,
Vol. 9, No. 2, pp. 71-78. https://doi.org/10.18280/rces.090205.

You might also like