Download as pdf or txt
Download as pdf or txt
You are on page 1of 5

2023 11th International Conference on Internet of Everything, Microwave Engineering, Communication and Networks (IEMECON) | 978-1-6654-7512-9/23/$31.

00 ©2023 IEEE | DOI: 10.1109/IEMECON56962.2023.10092374

Supervised Machine Learning Approaches for Brain


Stroke Detection
Dhyey v. Desai1 Tarun Jain2 Priyesh Tiwari3
Department of Computer Science and Department of Computer Science and Greater Noida Institute of
Engineering Engineering Technology
Manipal University Jaipur Manipal University Jaipur Greater Noida
dvdesai06@gmail.com tarunjainjain02@gmail.com priyesht81@gmail.com
Abstract - A brain stroke occurs when the blood flow to a part R, Mahesh, et al in [7] implemented 4 classifiers and
of the brain is reduced or restricted. Due to this brain cells achieved the best accuracy using the Logistic Regression
start to die, in that part of the brain, at a very fast rate due to a model. Kanchana R., et al in [9] achieved the best
lack of oxygen and nutrients. There are two types of brain accuracy of 99.79% using the RFC model. Musuka, T. D.,
strokes: (a) Ischemic stroke and (b) Haemorrhagic stroke of
which Ischemic stroke is more likely to occur. Here we have
et al in [10] have diagnosed the causes of brain stroke.
used 8 classifiers on the given dataset on brain stroke detection SATHYANARAYAN RAO IN [11] performed accuracy
from Kaggle. Our experimental results showed that the Logistic prediction using the Ensemble Voting Model and obtained
Regression Model shows the best accuracy, of 97%, of all the 8 an accuracy of 74.22%. Samantha Lin in [12] performed
classifiers, followed by the SVM classifier and Random Forest accuracy prediction using different models such as
both with an accuracy of 96%, and the Ensemble Model of Logistic Regression, SVM, etc. and obtained the best
Logistic Regression, Random Forest and KNN at 95%. We accuracy of 95% from Logistic Regression and Random
have trained the 9 models and compared them based on their Forest Models [8].
respective accuracies.
III. METHODOLOGY
Index Terms - Machine Learning, Classification,
KNN, Logistic Regression Model, Random Forest Model,
Decision Tree Model, SVM Model, and Naive Bayes Model. A. Dataset

I. INTRODUCTION A dataset is an ordered collection of data often


A brain stroke occurs when the blood flow to a present in tabular form. A dataset may also consist of a
part of the brain is reduced or restricted. Due to this brain collection of files or documents.
cells start to die, in that part of the brain, at a very fast rate
due to a lack of oxygen and nutrients. There are two types We obtained the brain stroke classification dataset from
of brain strokes: (a)Ischemic stroke and (b) Hemorrhagic Kaggle datasets, which have close to 4981 entries. We
stroke of which Ischemic stroke is more likely to occur. have trained 8 classification models on the given dataset.
Each case (row) in the dataset had 10 attributes to
We implemented 8 different models and compared the describe it, i.e., our data has 10 attributes (columns).
results of these Machine Learning models such as Logistic Proposed system architecture is shown in Fig. 1.
Regression Model, Random Forest Model, Decision Tree B. Proposed System
Model, SVM Model, KNN Model, and Naive Bayes
Model. Among the 8 models, that we trained, we obtained
the highest accuracy of 96% for the Logistic Regression
Model.
II. LITERATURE REVIEW
AKBASLI, IZZET TURKALP et al in [1] has created
the “Brain Stroke Detection” dataset and has done a major
part of pre-processing using the collected data. Tazin,
Tahia, et al in [2] observed an accuracy of 96% using the
Random Forest algorithm, 94% using the Decision Tree
algorithm, and 87% using logistic regression model. B.R.
Gaidhani, in [3] achieved an accuracy of 96-97% using
the classification model. Akash, Kunder, et al in [4]
achieved an accuracy of 94% using the Decision Tree
Classifier. K. D. Mohana, in [5] achieved an accuracy of
95% using the Random Forest Classifier. Karthik, R., et al
in [6] got the best accuracy using the CNN model.

Fig. 1. Proposed System for Accuracy Prediction.

978-1-6654-7512-9/23/$31.00 ©2023 IEEE


C. Data Pre-processing

Before creating a model, data pre-processing is


required to remove unwanted noise and outliers from the
dataset that could lead the model to give less accuracy by
causing errors such as overfitting and underfitting of the
model [13].

(a) Before Pre-processing

The stroke prediction dataset was used to perform the


study. There were 4981 rows and 10 columns in this
dataset. The value of the output column(stroke) is either 1
or 0. The number 0 indicates that there no risk of a stroke
to occur, while the value 1 indicates that there is a risk
that a stroke might occur. The probability of 0 in the
output column (stroke) exceeds the possibility of 1 in the Fig. 3. Bar Graph for processed data
same column in this dataset. 249 rows alone in the stroke
column have the value 1, whereas 4732 rows have the D. Attributes used in the Model
value 0.
Accuracy: The percentage of accurate predictions
To improve accuracy, data pre-processing is required to for the test results.
balance the data Bar Graph for unprocessed data and
𝑇𝑃+𝑇𝑁
processed data is shown in Fig. 2 and Fig. 3 respectively. Accuracy =
𝑇𝑃+𝑇𝑁+𝐹𝑃+𝐹𝑁

Precision: Precision gives the proportion of the correctly


classified data to the total data classified as positive
(True).
𝑇𝑃
Precision =
𝐹𝑃+𝑇𝑃

Recall: Recall gives the proportion of the correctly


classified data to the total data present in the positively
classified class.
𝑇𝑃
Recall =
𝐹𝑁+𝑇𝑃

F1 Score: F1 Score is used to show the performance of a


given Machine Learning Model.
𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛∗𝑅𝑒𝑐𝑎𝑙𝑙
F1 Score = 2 ∗
𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛+𝑟𝑒𝑐𝑎𝑙𝑙
Fig. 2. Bar Graph for unprocessed data.
(a) SVM Model

(b) After Pre-processing SVM Model is a Machine Learning Algorithm


that divides data into two categories. The main aim of the
SVM Model is to determine to which category a new data
value belongs to.

Accuracy Precision F1 Score Recall

0.96 0.95 0.96 0.96


Table 4. Evaluation Table for SVM

Here we obtained an accuracy of 96%.


(b) Naïve Bayes Model (f) K-Nearest Neighbour Classifier

Naive Bayes Model assumes strong and weak A K-NN algorithm predicts if a data value is to
independence between attributes of data values. be a member of one group or the other, created in
classification, depending on the value of its closest data
Bayes Theorem: value.
P(A)
P(A|B) = P(A|B) ∗ Accuracy Precision F1 Score Recall
P(B)

Accuracy Precision F1 Score Recall


0.97 0.95 0.97 1.00
Table 6. Evaluation table for KNN
0.85 0.97 0.92 0.87
Table 5. Evaluation Table for Naïve Bayes Our model gives an accuracy of 96%.

Our model yields an accuracy of 85%. (g) KMeans Classifier

(c) Decision Tree Model KMeans is a type of clustering algorithm that


divides observations into k number of clusters.
Decision Tree Model is a hierarchical tree-
structured classifier, which has internal nodes, branches,
and leaf nodes. Accuracy Precision F1 Score Recall

Accuracy Precision F1 Score Recall


0.78 0.96 0.87 0.79
Table 7. Evaluation table for KMeans
0.93 0.96 0.96 0.96
Table 3. Evaluation table for Decision Tree Our model gives an accuracy of 78%.

It gives an accuracy of 93%. (h) Ensemble Voting Classifier

(d) Random Forest Classifier If we build and combine multiple models, we can
increase the overall accuracy. Ensemble models is a
A Random Forest Model is a supervised Machine machine learning approach to combining multiple models
Learning model that you can use for classification. [15,16].

Accuracy Precision F1 Score Recall Our ensemble model gives an accuracy of 96%.

0.96 0.95 0.97 1.00 Steps for executing the above-mentioned models:
Table 4. Evaluation table for Random Forest
1. Install the libraries that are needed.
Our implementation gives an accuracy of 96%. 2. Load the datasets on which you want to test and
train the data.
(e) Logistic Regression Classifier 3. Pre-process the data if required.
4. Split the dataset into train and test in appropriate
Logistic Regression is a model that is used to proportions.
predict a binary outcome based on a set of independent 5. Use the train dataset to train the given models.
inputs. 6. Then use the test dataset to test the accuracy
generated by each model created.
Accuracy Precision F1 Score Recall

0.97 0.95 0.97 1.00 IV. RESULTS AND COMPARISON


Table 5. Evaluation table for Logistic Regression Here we can see a snapshot showing the bar
graph comparison of all the algorithms proposed in this
Our implementation gives an accuracy of 97%. work i.e., Decision Tree model, Naïve Bayes model, SVM
model, Random Forest Classifier, KNN Classifier,
KMeans Classifier, Logistic Regression, Ensemble Voting
Classifier are compared using bar graph. A comparison of
model accuracies of our implementations is shown in Fig.
4.

Fig. 6. Comparing Random Forest Model with already existing models.

A. (C) The given bar graph shows the accuracy comparison


of our Ensemble Voting Model with the accuracy of two
different papers. Comparing Ensemble Voting Model
with already existing models is shown in Fig. 7.

Fig. 4. A comparison of model accuracies of our implementations

We have also used bar graph to compare the accuracy of


our models with the accuracy of models of different
papers [17,18].

(A) The below given bar graph shows the accuracy


comparison of Logistic Regression Model with the
accuracy of two different papers. Comparing Logistic
Regression with already existing models is shown in Fig.
5.

Fig. 7. Comparing Ensemble Voting Model with already existing


models.

(D) The below given bar graph shows the accuracy


comparison of our Decision Model with the accuracy of
another paper. Comparing Decision Tree Model with
already existing models is shown in Fig. 8.

Fig. 5. Comparing Logistic Regression with already existing models.

(B) The below given bar graph shows the accuracy


comparison of our Random Forest Model with the
accuracy of two different papers. Comparing Random
Forest Model with already existing models is shown in Fig.
6.

Fig. 8. Comparing Decision Tree Model with already existing models.


V. CONCLUSION [8] “Stroke Research | NHLBI, NIH.” NHLBI, NIH, 20 May 2022.

Stroke is a harmful medical illness that should be [9] Kanchana R., Menaka, R. Ischemic stroke lesion detection,
detected and treated as soon as possible. The creation of characterization and classification in CT- images with optimal
such a Machine Learning model could help in the early features selection. Biomed. Eng. Lett. 10, 333–344 (2020).
detection of stroke which can further help in the early
treatment of the symptoms. Here we used several [10] Musuka, T. D., Wilton, S. B., Traboulsi, M., & Hill, M. D.
(2015). Diagnosis and management of acute ischemic stroke:
prediction models such as KNN, Logistic Regression speed is critical. CMAJ: Canadian Medical Association journal
Model, Random Forest Model, Decision Tree Model, = journal de l'Association medicale canadienne, 187(12), 887–
SVM model, and Naive Bayes Model out of which we got 893.
the best accuracy, of 97%, using the Logistic Regression
Model. Of these models, some of them, namely the [11] RAO, SATHYANARAYAN. “Ensemble Models with Voting
Logistic Regression Model, SVM Model, Random Forest and XGBoost.” Ensemble Models with Voting and XGBoost |
Kaggle, 13 Aug. 2022.
Model, Decision Tree Model, KNN Model, and Ensemble
Model, showed good accuracy and a very low number of [12] Lin, Samantha. “Stroke Prediction. Constructing Prediction
false positives and false negatives in identifying stroke- Model for the… | by Samantha Lin | Geek Culture | Medium.”
prone patients. We have also compared the accuracy of all Medium, 8 Mar. 2021.
the models that we have created and compared the
accuracies of Logistic Regression Model, Random Forest [13] R, K., Johnson, A., Anand, S. & R, M. (2020). Neuroimaging
and deep learning for brain stroke detection - A review of recent
Model and SVM Model with the accuracies of other
advancements and prospects. ComputerMethods and Programs
papers. in Biomedicine, 197. DOI: 10.1016/j.cmpb.2020.105728.

[14] Pathanjali C, Monisha G, Priya T, Ruchita Sudarshan K,


VI. REFERENCES Samyuktha Bhaskar, 2020, Machine Learning for Predicting
[1] AKBASLI, IZZET TURKALP. “Brain Stroke Prediction Ischemic Stroke, INTERNATIONAL JOURNAL OF
Dataset.” Brain Stroke Prediction Dataset | Kaggle, 8 July 2022. ENGINEERING RESEARCH & TECHNOLOGY (IJERT)
Volume 09, Issue 05 (May 2020).

[2] Tazin, Tahia, et al. “Stroke Disease Detection and Prediction [15] Inamdar MA, Raghavendra U, Gudigar A, et al. A Review on
Using Robust Learning Approaches.” Stroke Disease Detection Computer-Aided Diagnosis of Acute Brain
and Prediction Using Robust Learning Approaches, 26 Nov.
2021. [16] Soni, Kartik M., Amisha Gupta, and Tarun Jain. "Supervised
Machine Learning Approaches for Breast Cancer Classification
[3] B.R. Gaidhani, R. R.Rajamenakshi, and S. Sonavane, "Brain and a high performance Recurrent Neural Network." Third
Stroke Detection Using Convolutional Neural Network and International Conference on Inventive Research in Computing
Deep Learning Models," 2019 2nd International Conference on Applications (ICIRCA). IEEE, 2021.
Intelligent Communication and Computational Techniques
(ICCT), 2019, pp. 242-249, DOI: [17] T. Jain, A. Jain, P. S. Hada, H. Kumar, V. K. Verma and A.
10.1109/ICCT46177.2019.8969052. Patni, "Machine Learning Techniques for Prediction of Mental
Health," 2021 Third International Conference on Inventive
[4] Akash, Kunder, et al. “Prediction of Stroke Using Machine Research in Computing Applications (ICIRCA) 2021
Learning.”Prediction_of_Stroke_Using_Machine_Lear ning,
June 2020. [18] A. Yadav, T. Jain, V. K. Verma and V. Pal, "Evaluation of
Machine Learning Algorithms for the Detection of Fake Bank
[5] Sundaram, K. D. Mohana, et al. “Detection of Brain Stroke Currency," 2021 11th International Conference on Cloud
Using Machine Learning Algorithm | QUEST JOURNALS - Computing, Data Science & Engineering (Confluence) 2021.
Academia.Edu.” (PDF) Detection of Brain Stroke Using
Machine Learning Algorithm | QUEST JOURNALS -
Academia.Edu, 30 Apr. 2022.

[6] Karthik, R., et al. “Neuroimaging and Deep Learning for Brain
Stroke Detection - A Review of Recent Advancements and
Future Prospects - ScienceDirect.” Neuroimaging and Deep
Learning for Brain Stroke Detection - A Review of Recent
Advancements and Future Prospects - ScienceDirect, 26 Dec.
2020.

[7] R, Mahesh, et al. “Automated Prediction of Brain Stroke


Disease Classification Using Machine Learning Algorithm
Techniques.” Automated Prediction of Brain Stroke Disease
Classification Using Machine Learning Algorithm Techniques,
7 June 2022.

You might also like