Professional Documents
Culture Documents
20bec048 Sa ML
20bec048 Sa ML
Abstract -The COVID-19 pandemic has had a diagnosing patients quickly and efficiently. In
significant impact on global public health and has response to this challenge, researchers have turned
posed significant challenges for healthcare systems to machine learning as a promising tool to aid in the
worldwide. In this paper, we explore the relationship diagnosis of COVID-19. In this paper, we present
between blood tests and COVID-19 using machine an algorithm to train the given blood report and
learning techniques. Specifically, we analyze the
COVID report of patients to predict the result of
blood test results of COVID-19 patients and
new patients. This approach aims to provide doctors
non-COVID-19 patients to identify potential
biomarkers and their correlation with the disease. We with a quick and reliable method of diagnosing
also investigate the use of machine learning COVID-19, which can potentially save lives and
algorithms to predict COVID-19 infection based on help curb the spread of the disease. Our method
blood test results. Our results show that certain blood involves training a machine learning model on a
test parameters, such as lymphocyte count, large dataset of blood reports and COVID reports of
neutrophil count, and C-reactive protein (CRP) patients to identify key biomarkers that are
levels, are significantly associated with COVID-19. indicative of the disease. We demonstrate the
This research has important implications for the effectiveness of our approach through extensive
early detection and management of COVID-19, as experimentation and evaluation, achieving high
well as the development of personalized treatment
accuracy in predicting COVID-19 infection from
plans for infected individuals.
blood reports. Our work has significant implications
Keywords – COVID-19, Machine Learning, Blood Tests, KNN, for the management of COVID-19 and can be a
SVM, AdaBoost, Random Forest, Ensemble Learning valuable addition to the toolkit of healthcare
professionals in the fight against this global health
I. INTRODUCTION crisis.
The COVID-19 pandemic has had a
significant impact on global public health since its In the following section, we will delve into a dataset
emergence in late 2019. In early 2020, the World acquired from a Brazilian hospital[1] that contains
Health Organization declared it a public health over 5000 blood test results along with their
emergency of international concern, leading to COVID-19 diagnosis. While the dataset covers
urgent efforts to contain its spread and mitigate its various blood tests, the ultimate aim is to establish a
impact. One of the key challenges in the connection between crucial independent variables
management of COVID-19 has been the timely and and the binary classification of COVID-19 (positive
accurate diagnosis of infected individuals. With the or negative). We will discuss various models and
increasing number of cases and limited resources, compare their efficacy in the upcoming sections.
healthcare systems have faced a daunting task of
Coronavirus infections using complete blood count
II. Literature review test results. Three datasets obtained from hospitals
in Italy, Brazil, and Indonesia are used to train the
Machine Learning has been applied in a lot of models. The average AUC scores obtained for the
research, especially considering the ongoing models trained with datasets from San Raphael
COVID-19 pandemic, to help detect patterns and Hospital in Italy, Albert Einstein Hospital in Brazil,
insights leading or related to the infection. This and Pasar Minggu Hospital in Indonesia are 0.87,
section will address some of the published papers 0.90, and 0.88, respectively.
relevant to the subject.
A. RANDOM FOREST
B. SVM
Random Forest is a popular ensemble learning
algorithm used in machine learning for Support Vector Machine (SVM) is a popular
classification and regression tasks. It is a collection supervised learning algorithm used in machine
of decision trees that work together to improve the learning for classification and regression tasks.
accuracy and stability of predictions. SVM aims to find the optimal hyperplane in a
In the Random Forest algorithm, multiple decision high-dimensional space that maximally separates
trees are built on randomly selected subsets of the the data points of different classes.
training data and features, resulting in a forest of In SVM, each data point is represented as a vector
trees. During training, each tree is grown by in the feature space and is classified based on its
recursively splitting the data into smaller subsets position relative to the hyperplane. The hyperplane
based on the most discriminative features until a is chosen such that the margin between the
stopping criterion is reached. hyperplane and the closest data points of each class
During prediction, the Random Forest algorithm is maximized. The closest data points are called
combines the outputs of individual trees to make a support vectors, and they define the margin of the
final prediction. hyperplane.
One of the main advantages of Random Forest is its SVM has several advantages, including its ability to
ability to handle high-dimensional data with a large handle high-dimensional data with a large number
number of features. of features and its robustness to outliers. SVM can
also provide a clear boundary between classes,
making it easy to interpret and visualize the results.
Result-
Result-
VI. CONCLUSION
C. ADAPTIVE BOOSTING In this paper we investigated various machine
learning algorithms to explore the relation of blood
Adaptive Boosting (AdaBoost) is a popular
tests with COVID-19. We applied SVM, AdaBoost,
ensemble learning algorithm used in machine
random forest, and K nearest neighbor. A real
learning for classification tasks. It works by
dataset obtained from a Brazilian hospital has been
combining multiple weak classifiers into a strong
used to test the models. As accuracy and F measure
classifier, where each weak classifier is trained on a
are considered, the SVM gives the highest accuracy.
different subset of the training data.
Also, the unbalance of data may also reduce the true
During training, AdaBoost assigns higher weights
positive rate. Future research direction would
to misclassified data points, which allows
include applying the used and other machine
subsequent weak classifiers to focus on the
learning models on more and various real datasets.
hard-to-classify examples. The final strong classifier
is a weighted combination of the individual weak
classifiers, with higher weights assigned to the more XIV. REFERENCES
accurate classifiers. [1] N. Crokidakis, "Data analysis and modeling of the evolution of
COVID-19 in Brazil," arXiv preprint arXiv:2003.12150, 2020.
[2]N. Hany, N. Atef, N. Mostafa, S. Mohamed, M. ElSahhar and A.
AbdelRaouf, "Detection COVID-19 using Machine Learning from
Blood Tests," 2021 International Mobile, Intelligent, and Ubiquitous
Computing Conference (MIUCC), Cairo, Egypt, 2021, pp. 229-234, doi:
10.1109/MIUCC52538.2021.9447639.
[6]https://www.javatpoint.com/machine-learning-random-forest-algorith
m
[7]https://www.geeksforgeeks.org/boosting-in-machine-learning-boostin
g-and-adaboost/