Springer Parkinson

Performance Analysis of Machine Learning
Algorithms for Parkinson's Disease Detection
1 1 1 1
*
Janvi Malhotra , *Khushal Thakur , Divneet Singh Kapoor , Kiran Jot Singh and
1
Anshul Sharma
1
Kalpana Chawla Center for Research in Space Science & Technology,
Chandigarh Univesity, Mohali – 140413, Punjab, India
1*khushal.ece@gmail.com
* Corresponding Author
Abstract. Parkinson’s disease is an age related nervous system disorder that

impacts movement of the human body. Cure of this disease is yet unknown
but a patient's quality of life can be improved if it is diagnosed at an early
stage. It has been identified by various researchers that voice degradation is
a common symptom of this disease. Machine Learning is increasingly
establishing its place in healthcare industry. In this paper we implement and
compare state of the art Machine Learning algorithms like Support Vector
Machine, Random Forest Classifier, Decision Tree Classifier and Extra
Trees Classifier for diagnosis of Parkinson’ Disease. Performance
evaluation. It was observed that Extra Trees Classifier along with entropy
based feature selection proved superior over all other models for
Parkinson’s disease prediction. It yields highest accuracy 94.34%, precision
0.9388, F1- score 0.9684 and recall value 1.
Keywords: Parkinson's disease, Machine learning, Classification, Feature
Selection, Accuracy.
1 Introduction
Parkinson’s disease is a long-term disorder of the nervous system that affects

movement. Its symptoms appear slowly and sometimes it is just barely noticeable
tremor in one hand but as the disease worsens it causes stiffness in the body and so
slowing of movement [15]. The common symptoms of the disease are tremor,
rigidity, slowness of movement, difficulty in walking, loss of odor and
deterioration of voice. In the first stages your face may not show any change, your
arms and legs may not swing, but these symptoms will come into visibility when
the disease worsens. Death of cells in substantia nigra, a region of mid brain leads
to dopamine deficit causes motor symptoms. It also involves building up
misfolded proteins in the neurons. Because it affects motor characteristics,
Parkinson’s disease is also called ‘shaking palsy’ [12]. Exact cause of Parkinson’s
disease is unknown and it is believed that both genetic and environmental factors
2
are responsible for the disease. People who have some family members with
Parkinson’s disease are at risk. Other groups of people at elevated risk are the
people who have been exposed to some pesticides or have suffered from some
prior head injuries. Parkinson’s disease occurs in the age group of over 60 [7, 9].
Males are more affected by this disease and the ratio stands 3:2. Parkinson’s
disease in the age group of before 50 is called early onset P.D. Average life
expectancy after diagnosis is said to be in between 7-15 years. Some of the
Parkinson’s disease statistics are as follows:
• 1 or 2 people out of 1000 people have Parkinson's disease.

• For people over the age of 65, 18 people out of 1000 are affected.
• Annual mortality rate per 100000 in India has increased by 87.9% since 1990
which was 3.8% earlier.
• About 15% of people who have this disease have a family history of this
disease.
• Approximately 60000 people are diagnosed with Parkinson's disease each
year and it is believed large numbers of cases re not detected.
• 7-10 people worldwide are suffering from this disease.
• Around 1 in 20 affected people is from the age group of less than 40.
The cure for this disease is not known but medications and therapies like
physiotherapy and speech therapy, especially in preliminary stages, can
significantly improve life quality. If Parkinson's disease is detected in preliminary
stages, it can also reduce estimated cost of pathology. One common and early-
stage symptom of Parkinson disease is degradation of voice [16]. Also, the
analysis of voice measurement is simple and non-invasive. Thus, for diagnosis of
Parkinson's disease measurement of voice can be used. Data Science is one
approach to diagnose Parkinson’s disease in its preliminary stages. Data Science is
the study of enormous amounts of data that uses systems, algorithms, and
processes to extract meaningful and useful information from raw, structured, and
unstructured data [4, 6]. Data is a precious asset for any organization data science,
data is manipulated to get some new and meaningful information. In Data Science,
knowledge from datasets (typically large) is extracted and applying that
knowledge, several significant problems related to that dataset can be solved. Data
science is playing a significant role in the health care industry. In healthcare,
physicians use data science to Analyse their patient’s health make weighty
decision. It helps hospital managing teams to enhance care and reduce waiting
time[25]. For this purpose, voice data has been collected via telemonitoring and
tele-diagnosis systems which are economical and easy to use. Furthermore,
advances in technologies like Internet of things, Wireless sensor networks and
computer vision can be used to develop newer multi domain solutions [10, 11, 19,
22–24][a-f].
3
2 Background
2.1 Related Work
Several researchers have made several attempts to diagnose Parkinson's disease.

Here, in this section some recent works on the detection of Parkinson’s disease
using voice features are mentioned. B.E Sakar et al [20] developed machine
learning models that differentiates healthy and Parkinsons’s affected people. Data
was collected from a group of 40 people. 20 were healthy and 20 were Parkinson’s
affected patients. 26 speech samples were collected from all 40 patients.
SVM(RBF-kernel) and k-NN (1,3,5,7) were used for classification. s-LOO
(Summarized Leave-One-Out) and LOSO(Leave-One-Subject-Out) were used in
Cross Validation. Ipsita Bhattacharya et al [3] made this model using WEKA.
SVM was the machine learning algorithm used in their research. The values of
kernels were varied to achieve the best accuracy. Before classification, the data
was also preprocessed. Richa Mathur et al also developed a model using WEKA.
They went with k-NN, Adaboost, M1, bagging and MLP. Achraf Benba et al [2]
Collected dataset from 34 people., 17 of which were PD affected patients. From
each person they collected 1-20 MFCC (Mel-frequency cepstral coefficients).
SVM Machine leaning algorithm was used. For cross validation LOSO was used.
C.O Sakar et al [21] introduced TQWT (tunable Q-factor wavelet transform) in
their work. TQWT outperformed old methods for extraction of voice features.
MFCC and TQWT were contributing the most in best accuracy Therefore, when it
comes to Parkinson’s detection TQWT and MFCC contributes the most. Max A.
little et al [13] introduced PPE (Pitch Period Entropy) in their work. Data was
collected from a group of 31 people. 8 were healthy and 23 were Parkinson’s
affected patients. They went with feature calculation, data preprocessing, followed
by feature selection and finally classification. They went with SVM algorithm. All
these works above show that various machine learning algorithms have been
applied on Dataset to detect Parkinson’s disease. Forest- based algorithms like
Extra Trees Classifier have not been used in these works. In our proposed work,
forest-based algorithms are used. Also, feature selection techniques
SelectFromModel have been applied on the dataset for improving the results.
2.2 Parkinson’s Disease Dataset
The first step for performing classification using machine learning algorithms is
collection of data. For this research, the data is downloaded from an online
website www.kaggle.com, which originally was collected from the UCI, a
machine learning repository. Some common features of the data are as follows.
This dataset has a total of 756 instances and 754 features. This dataset has data
both Parkinson’s Disease affected patients and healthy people. Voice based data of
188 Parkinson’s Disease Patients (107-men, 81-women) and 64 healthy
individuals (41-women, 23-men) is collected in this dataset. Three repetitions of
4
sustained phonations were performed and this is how 756 instances are formed.
Fig. 1. Support Vector Machine classification.
2.3 Machine Learning
The four models that were opted to work on this problem statement were Support
Vector Machine (kernel=’linear’), Decision Tree Classifier, Random Forest
Classifier and Extra Trees Classifier. All these models are explained in detail
below.
1. Support Vector Machine (SVM) [26]: Support Vector Machine is an ML
algorithm which is used in both classification as well as in regression problem
statements. It is one of most used machine learning algorithms. SVM
algorithm creates a line (decision boundary), as shown in Fig. 1. This line will
segregate data points in N-dimensional space and it will be able to put new
data points will be given a particular category. Extreme points also called
vectors are considered while creating this decision boundary (hyper plane).
The extreme cases are support vectors of this Support Vector Machine model.
If we have a dataset to be classified into two categories (red circles and green
stars) then there can be many decision boundaries for that like red and green.
The work of SVM machine learning algorithm is to find the best decision
boundary (called hyperplane) that separates given two data most efficiently.
Hyperplane is a decision boundary with largest margin (distance between
decision boundary and closest points).
2. Decision Tree Classifier (dt) [5]: Decision tree classifier classifies the data
like human technique of thinking. Its method is given below in the flowchart,
as shown in Fig. 2. We need to be familiar with some terms to understand logic
behind Decision Tree Classifier:
 ROOT NODE (parent node): It is the starting point of the whole tree. It is
the node from where the entire dataset will be divided into sets.
 LEAF NODE: These are the final nodes. These nodes cannot be divided
further.
5
 SPLITTING: It is the process in which decision nodes are further

divided.
 BRANCH: It is a tree formed when a tree splits.
 CHILD NODE: All nodes (except root node).
 PRUNING: Removal of unwanted subtrees.
The following describes the working of Decision Tree classifier:

 Tree will start given real dataset which is the root node of our tree
 The values of root attributes will be compared with attributes of given
dataset
 Considering the comparison branch/subtree is made and algorithm works
on next node
 This process will take place until the final leaf node of tree and
eventually prediction will be made by the algorithm.
Fig. 2. Decision Tree classifier.
3. Random Forest Classıfıer (rf) [18]: Random Forest Classifier works on the
principle of ensemble learning. In ensemble learning, multiple classifiers work
together to predict the result and eventually performance of model will be
improved. In Random Forest Classifier there are multiple decision trees that
will work together and maximum votes from all the decision trees will be final
output will Working of Random Forest classifier have two phases. In the first
6
phase, the machine learning algorithm will combine N decision trees and in
other results from all the trees in first phase will be taken into account. First
the algorithm will build dt for some randomly selected points from training
data for N number of trees. Second, for new data all decision trees will make
their prediction and the final classified category will be one with maximum
number of votes . It is explained in Fig. 3.
4. Extra Trees Classifier (et) [1]: Extra Trees Classifier is also based on Deci-
sion Tree Classifier and in concept it is remarkably like that of Random Forest
Classifier. In this algorithm, many decision trees (without pruning) will be
trained using training data and final output will be the majority of prediction
made by all decision trees individually. There are two differences between Ex-
tra Trees Classifier and Random Forest Classifier and that are in Extra Tress
classifier there is no Bootstrapping of observations and the nodes are also
Random splits.
Fig. 3. Random Forest classifier.
3 Methodology
The flow diagram representation for building a model for voice feature-based
detection of Parkinson's disease using machine learning algorithms is presented in
the form of steps for model building are given in Fig. 4.
7
Fig. 4. Proposed methodology
All the steps in the above given diagram are fully explained below individually.
A. Data Pre-Processıng
This step is a combination of two processes. One is outliers removal and other is
Feature Selection. Both steps are explained below:
Outlıers Removal [8]: An outlier is something that is different from the rest.
Mostly Machine learning algorithms are affected when some value of attribute is
not in the same range. These outliers are mostly the result of errors while data
collection may be measuremental or executional. These extreme values are far
away from other observations. Machine learning models can be misled by outliers
that cause various problems while training and eventually a less efficient machine
learning algorithm model. There are a lot of diverse ways to remove outliers.
Some outliers were observed in the Parkinson's Disease Dataset and an attempt to
remove these outliers has been made. Outliers were removed considering the most
key features for specific machine learning algorithm as a base. After performing
this step, the number of instances in the dataset will decrease.
Feature Selectıon [17]: In today's data the collected data is high dimensional and
rich in information. It is quite common to go through a data frame with hundreds
of attributes. Feature selection is a technique to select most prominent features
from n given features. Feature selection is important because of some given
reasons:
• While training models, with an increase in the number of features time taken
too increase exponentially.
8
• Risk of overfitting also increases if number of features are more.

• In this work, the feature selection technique used is SelectFromModel.
The feature selection technique SelectFromModel select features is based on
importance attribute threshold. This threshold is mean by default. Different
machine learning models were then trained with these selected features and the
efficiency of model has been improved than before.
B. Model Selectıon, Traınıng and Testıng

The dataset will be divided into two parts training dataset and testing dataset. The
model will be trained by fitting training dataset in the model and later it will be
tested using test dataset. In the testing process the performance metrics is noted so
that we can move to the next step that is model evaluation.
C. Model Evaluatıon
For finding the best model out of all our proposed models, model evaluation is
necessary. Once all the models are trained and tested, the next step is evaluation to
find out the best machine learning model which is best suitable for the given
problem. To this problem different performance metrics are used to evaluate the
most efficient machine learning model namely accuracy, precision, F1 score,
recall, AUC-ROC curve. Performance metrics will judge if the models are
improving or not. To get the correct evaluation of machine learning model., the
metrics should be chosen very carefully.
Confusıon Matrıx: The problem that we are solving is a classification-based
problem. One of the most preferred ways to evaluate a classification-based
machine learning model is confusion matrix. Confusion matrix summarizes the
performance of machine learning models. Confusion matrix is the most intuitive
performance metric because it evaluates the performance of a model with count
value. This technique is used in both multiclass and binary classification . The
concept of confusion matrix of binary classification is explained below (in Fig. 5).
Fig. 5. Confusion matrix

9
Confusion matrix is a two-dimension table and the table is divided into four parts.
All the four parts of the confusion metrics are explained below:
True Posıtıve: It is efficiency of a machine learning algorithm to classify positive
instances as positive. In the true positive section there comes the count of
instances which are predicted as of Label 1 and they actually belong to Label 1
too. It is expressed as True Positive Rate (TPR), and is often called sensitivity
which is proportion of correctly predicted positive samples to actual positive
samples.
𝑆𝑒𝑛𝑠𝑖𝑡𝑖𝑣𝑖𝑡𝑦 = 𝑇𝑃/ (𝑇𝑃 + 𝐹𝑁) (1)
True Negatıve: It is efficiency of a machine learning algorithm to classify
negative instances as negative. In the true negative section there comes the count
of instances which are predicted as of Label 0 and they actually belong to Label 0
too. It is expressed as True Negativity Rate (TNR), and is often called specificity
which is proportion of correctly predicted negative samples to actual negative
samples .
𝑆𝑝𝑒𝑐𝑖𝑓𝑖𝑐𝑖𝑡𝑦 = 𝑇𝑁/ (𝑇𝑁 + 𝐹𝑃) (2)
False Posıtıve: This is a case when a model makes false prediction of the
classification problem. In this section there comes a count of instances which are
predicted as of Label 1 and they actually belong to Label 0. It is expressed as True
Negativity Rate (TNR) False positive rate (FPR) is the proportion of the negative
cases that is predicted as positive to the actual negative cases .
𝐹𝑃𝑅 = 𝐹𝑃/(𝑇𝑁 + 𝐹𝑃) (3)
False Negatıve: This is a case when a model makes false prediction of the
classification problem. In this section there comes a count of instances which are
predicted as of Label 0 and they actually belong to Label 1. It is expressed as
False Negativity Rate (FNR) False positive rate (FPR) is the proportion of the
positive cases that is predicted as negative to the actual positive cases.
𝐹𝑃𝑅 = 𝐹𝑁/(𝑇𝑃 + 𝐹𝑁) (4)
Accuracy: It will decide how accurately or ML model is working. Accuracy is all
about correct predictions. Hence it is proportion of correct predictions to that of
total predictions .
𝐴𝑐𝑐𝑢𝑟𝑎𝑐𝑦 = (𝑇𝑃 + 𝑇𝑁)/(𝑇𝑃 + 𝑇𝑁 + 𝐹𝑃 + 𝐹𝑁) (5)
Precısıon: Precision considers accuracy of positively predicted classes. Precision
is ratio of correctly predicted positive instances to total predicted positive
instances.
𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 = 𝑇𝑃/(𝑇𝑃 + 𝐹𝑃) (6)
Recall: Recall is another name for sensitivity of confusion matrix.
𝑅𝑒𝑐𝑎𝑙𝑙 = 𝑇𝑃/(𝑇𝑃 + 𝐹𝑁) (7)
F1-Score: It is another performance metric formed by harmonic mean of recall
and precision.
𝐹1 − 𝑠𝑐𝑜𝑟𝑒 = 2. (𝑇𝑃)/(2. (𝑇𝑃) + 𝐹𝑃 + 𝐹𝑁) (8)
AUC ROC Curve: AUC (area under curve) ROC (receiver operating
characteristics) curve is one important way to evaluate performance of a
classification problem. It is one of the most important performance metrics when
it comes to model evaluation. ROC is a probability curve and AUC measures
separability. Better machine learning models have more value of AUC than others.
10
Its value lies between 0 to 1. ROC is curve between True Positivity Rate(y-axis)
and False Positivity Rate (X-axis). Let us suppose AUC for a machine learning
based model comes out to be 0.8 that means there are 80% chances that the model
is able distinguish label 1 and label 0 classes.
4 Results
In this section, the performance of our proposed models is analyzed. We have

worked with 4 machine learning models and analyzed them. These were SVM,
Decision Tree Classifier, Random Forest Classifier and Extra Trees Classifier.
Accuracy, Precision, Recall, F1-score were the performance metrics used. It was
observed that Extra Trees Classifier achieved the highest accuracy of 93.39%.
Other performance metrics of Extra Trees Classifier are also far better than others.
It touches the recall value 1, precision 0.92 and F1-score is 0.96.
Table 1. Performance metrics (feature selection not applied)
Model Accuracy Precision Recall F1-Score

svm (linear) 0.801886 0.877777 0.887640 0.882681
dt 0.822429 0.870588 0.902439 0.886227
XGBoost 0.915094 0.927835 0.978260 0.952380
rf 0.884210 0.888888 0.972972 0.929032
et 0.933962 0.929292 1 0.963350
After applying SelectfromModel feature selection Technique, the results

improved. There was notable improvement in the performance metrics of Random
Forest Classifier. Both Extra Trees Classifier and Random Forest Classifier are
getting satisfactory accuracy. Accuracy of Extra Trees Classifier improved and
reached 94.33% while accuracy of Random Forest Classifier increased from
88.42% to 91.57%. Value of other performance metrics also improved after
applying SelectFromModel feature selection techniques as shown in table below.
Table 2. Performance metrics (SelectfromModel applied)
Model Accuracy Precision Recall F1-Score

svm (linear) 0.811320 0.879120 0.898876 0.888888
dt 0.803738 0.858823 0.890243 0.874251
XGBoost 0.915094 0.919192 0.989130 0.952879
rf 0.915789 0.902439 1 0.94871
et 0.943396 0.938775 1 0.968421
Confusion matrix is known to be the easiest way to understand performance of a

machine learning Model. Based on the following observations made, Extra Trees
Classifier came out as best model among all for Parkinson’s disease prediction.
The confusion matrix for the same is given in Fig. 6. In Fig. 6, we can notice that
11
in our trained model if there are total 106 patients out of which 92 are Parkinson's
disease infected and 14 healthy people. Then our trained model is predicting 99 as
Parkinson’s positive and 7 as healthy subjects. The confusion matrix of Extra
Trees Classifier after feature selection is given below (Fig. 7). Improvement is
clearly visible in the confusion matrix after feature selection. Without feature
selection when there were 7 cases that were not Parkinson's disease positive put
predicted as Parkinson's affected. Now the number has reduced to six and
eventually one more true negative case increases (now 98 cases are predicted
positive (out of which 96 true positive) and 8 cases are predicted negative.
Fig. 6. Confusion matrix (feature selection not applied)
Fig. 7. Confusion matrix (SelectfromModel applied)
The Relative Operating Characteristic curves of Extra Trees Classifier and

12
Random Forest classifier after applying feature selection are depicted in Fig. 8 and
Fig. 9. Area under curve for Random Forest Classifier is 0.91 while for Extra
Trees Classifier its 0.96. Both results are quite satisfactory but if considered, Extra
Trees Classifier is better than all other machine learning models in our work. The
performance metrics of ExtraTReesClassifier combined with SelectFromModel
are performing excellently and even if we compare this machine learning model to
some already proposed models it is performing significantly better. There is
comparative analysis of our proposed model and some already existing models for
Parkinson’s Disease detection (Table 3). So, it can be observed that Extra Trees
Classifier when combined with SelectFromModel feature selection technique is
working very efficiently.
Table 3. Comparative analysis of ML models for voice-based diagnosis of Parkinson’s

disease
Model Reference Accuracy

linear SVM [3] 65.21
linear SVM [20] 85
linear SVM [2] 91.17
SVM (RBF) [13] 91.4
SVM (RBF) [21] 86
kNN+ ADABOOST.M1 [14] 91.28
et This work 94.3396
5 Conclusion and Future Scope
Parkinson’s disease needs to be diagnosed at preliminary stage to improve life

quality of the patient. Using Machine learning algorithms and collected dataset for
this purpose many Machine learning Models have been proposed. Those models
produce some important results. In this work getting an efficient machine learning
model to detect Parkinson's disease was the final goal which has been achieved. In
this work, some machine learning models were designed to predict Parkinson's
positive or negative person based on the voice features. These models were
particularly SVM (kernel=’linear’), Decision Trees Classifier, Random Forest
Classifier, Extra Trees Classifier and evaluating all these models based on their
performance metrics Extra Trees Classifier when combined with
SelectFromModel feature selection technique comes out to best with an accuracy
of 94.339%. When compared to some earlier machine learning models that were
designed for the same problem statement, the model designed in this work is better
than most. Also, random forest classifier combined with SelectFromModel
technique was performing efficiently with an accuracy of 91.57%. Here are some
key points for the machine learning models based on Parkinson’s disease
diagnosis problem statement are:
• ExtraTreesClassifier is a great model for voice feature-based diagnosis of
13
Parkinson’s Disease and its efficiency increase too when combined with
SelectFromModel feature selection technique.
• It is strongly recommended to apply feature selection technique to machine
learning because there can be large no of feature and it. make the training
process complex.
Though, the results for our proposed Machine learning model are quite
satisfactory but there is always a scope for improvement. In future, the accuracy of
the model can be increased by applying some other techniques like data balancing,
Cross- Validation. Looking at the performance metrics of the model that we
proposed with this dataset is still efficient and can be relied upon for the problem
statement.
Fig. 8. ROC Curve (feature selection not applied)
Fig. 9. ROC Curve (SelectfromModel)

14
References
1. Ampomah, E.K. et al.: Evaluation of Tree-Based Ensemble Machine Learning

Models in Predicting Stock Price Direction of Movement. Inf. 2020, Vol. 11, Page
332. 11, 6, 332 (2020). https://doi.org/10.3390/INFO11060332.
2. Benba, A. et al.: Voiceprints analysis using MFCC and SVM for detecting patients
with Parkinson’s disease. Proc. 2015 Int. Conf. Electr. Inf. Technol. ICEIT 2015.
300–304 (2015). https://doi.org/10.1109/EITECH.2015.7163000.
3. Bhattacharya, I., Bhatia, M.P.S.: SVM classification to distinguish Parkinson disease
patients. Proc. 1st Amrita ACM-W Celebr. Women Comput. India, A2CWiC’10.
(2010). https://doi.org/10.1145/1858378.1858392.
4. Cao, L.: Data science. Commun. ACM. 60, 8, 59–68 (2017).
https://doi.org/10.1145/3015456.
5. Dai, Q.-Y. et al.: Research of Decision Tree Classification Algorithm in Data Mining.
Int. J. Database Theory Appl. 9, 5, 1–8 (2016).
https://doi.org/10.14257/ijdta.2016.9.5.01.
6. Dhar, V.: Data science and prediction. Commun. ACM. 56, 12, 64–73 (2013).
https://doi.org/10.1145/2500499.
7. Van Den Eeden, S.K. et al.: Incidence of Parkinson’s disease: variation by age,
gender, and race/ethnicity. Am. J. Epidemiol. 157, 11, 1015–1022 (2003).
https://doi.org/10.1093/AJE/KWG068.
8. Gress, T.W. et al.: Effect of removing outliers on statistical inference: implications to
interpretation of experimental data in medical research. Marshall J. Med. 4, 2, (2018).
https://doi.org/10.18590/MJM.2018.VOL4.ISS2.9.
9. Huse, D.M. et al.: Burden of illness in Parkinson’s disease. Mov. Disord. 20, 11,
1449–1454 (2005). https://doi.org/10.1002/MDS.20609.
10. Jawhar, Q. et al.: Recent Advances in Handling Big Data for Wireless Sensor
Networks. IEEE Potentials. 39, 6, 22–27 (2020).
https://doi.org/10.1109/MPOT.2019.2959086.
11. Jawhar, Q., Thakur, K.: An Improved Algorithm for Data Gathering in Large-Scale
Wireless Sensor Networks. Lect. Notes Electr. Eng. 605, 141–151 (2020).
https://doi.org/10.1007/978-3-030-30577-2_12.
12. Langston, J.W.: Parkinson’s disease: current and future challenges. Neurotoxicology.
23, 4–5, 443–450 (2002). https://doi.org/10.1016/S0161-813X(02)00098-0.
13. Little, M.A. et al.: Suitability of dysphonia measurements for telemonitoring of
Parkinson’s disease. IEEE Trans. Biomed. Eng. 56, 4, 1015 (2009).
https://doi.org/10.1109/TBME.2008.2005954.
14. Mathur, R. et al.: Parkinson Disease Prediction Using Machine Learning Algorithm.
Adv. Intell. Syst. Comput. 841, 357–363 (2019). https://doi.org/10.1007/978-981-13-
2285-3_42.
15. Olanow, C.W. et al.: The scientific and clinical basis for the treatment of Parkinson
disease (2009). Neurology. 72, 21 Suppl 4, (2009).
https://doi.org/10.1212/WNL.0B013E3181A1D44C.
16. Perez, K.S. et al.: The Parkinson larynx: tremor and videostroboscopic findings. J.
Voice. 10, 4, 354–361 (1996). https://doi.org/10.1016/S0892-1997(96)80027-0.
17. Ramaswami, M., Bhaskaran, R.: A Study on Feature Selection Techniques in
Educational Data Mining. J. Comput. 1, 2151–9617 (2009).
https://doi.org/10.48550/arxiv.0912.3924.
18. Ren, Q. et al.: Research on machine learning framework based on random forest
algorithm. AIP Conf. Proc. 1820, 1, 080020 (2017).
15
https://doi.org/10.1063/1.4977376.
19. Sachdeva, P., Singh, K.J.: Automatic segmentation and area calculation of optic disc
in ophthalmic images. 2015 2nd Int. Conf. Recent Adv. Eng. Comput. Sci. RAECS
2015. (2016). https://doi.org/10.1109/RAECS.2015.7453356.
20. Sakar, B.E. et al.: Collection and analysis of a Parkinson speech dataset with multiple
types of sound recordings. IEEE J. Biomed. Heal. informatics. 17, 4, 828–834 (2013).
https://doi.org/10.1109/JBHI.2013.2245674.
21. Sakar, C.O. et al.: A comparative analysis of speech signal processing algorithms for
Parkinson’s disease classification and the use of the tunable Q-factor wavelet
transform. Appl. Soft Comput. 74, 255–263 (2019).
https://doi.org/10.1016/J.ASOC.2018.10.022.
22. Sharma, A. et al.: Exploration of IoT Nodes Communication Using LoRaWAN in
Forest Environment. Comput. Mater. Contin. 71, 2, 6240–6256 (2022).
https://doi.org/10.32604/CMC.2022.024639.
23. Sharma, A., Agrawal, S.: Performance of Error Filters on Shares in Halftone Visual
Cryptography via Error Diffusion. Int. J. Comput. Appl. 45, 23–30 (2012).
24. Singh, K. et al.: Image retrieval for medical imaging using combined feature fuzzy
approach. 2014 Int. Conf. Devices, Circuits Commun. ICDCCom 2014 - Proc.
(2014). https://doi.org/10.1109/ICDCCOM.2014.7024725.
25. Subrahmanya, S.V.G. et al.: The role of data science in healthcare advancements:
applications, benefits, and future prospects. Ir. J. Med. Sci. 1–11 (2021).
https://doi.org/10.1007/S11845-021-02730-Z/FIGURES/5.
26. Zhang, Y.: Support vector machine classification algorithm and its application.
Commun. Comput. Inf. Sci. 308 CCIS, PART 2, 179–186 (2012).
https://doi.org/10.1007/978-3-642-34041-3_27.

Springer Parkinson

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Springer Parkinson

Uploaded by

Copyright:

Available Formats

Performance Analysis of Machine Learning

Algorithms for Parkinson's Disease Detection

Abstract. Parkinson’s disease is an age related nervous system disorder that

Parkinson’s disease is a long-term disorder of the nervous system that affects

• 1 or 2 people out of 1000 people have Parkinson's disease.

2.1 Related Work

Several researchers have made several attempts to diagnose Parkinson's disease.

2.2 Parkinson’s Disease Dataset

Fig. 1. Support Vector Machine classification.

2.3 Machine Learning

 SPLITTING: It is the process in which decision nodes are further

The following describes the working of Decision Tree classifier:

Fig. 2. Decision Tree classifier.

Fig. 3. Random Forest classifier.

Fig. 4. Proposed methodology

• Risk of overfitting also increases if number of features are more.

B. Model Selectıon, Traınıng and Testıng

Fig. 5. Confusion matrix

In this section, the performance of our proposed models is analyzed. We have

Table 1. Performance metrics (feature selection not applied)

Model Accuracy Precision Recall F1-Score

After applying SelectfromModel feature selection Technique, the results

Table 2. Performance metrics (SelectfromModel applied)

Model Accuracy Precision Recall F1-Score

Confusion matrix is known to be the easiest way to understand performance of a

Fig. 6. Confusion matrix (feature selection not applied)

Fig. 7. Confusion matrix (SelectfromModel applied)

The Relative Operating Characteristic curves of Extra Trees Classifier and

Table 3. Comparative analysis of ML models for voice-based diagnosis of Parkinson’s

Model Reference Accuracy

5 Conclusion and Future Scope

Parkinson’s disease needs to be diagnosed at preliminary stage to improve life

Fig. 8. ROC Curve (feature selection not applied)

Fig. 9. ROC Curve (SelectfromModel)

1. Ampomah, E.K. et al.: Evaluation of Tree-Based Ensemble Machine Learning

You might also like