Professional Documents
Culture Documents
Classification of Parkinson's Disease by Analyzing Multiple Vocal Features Sets
Classification of Parkinson's Disease by Analyzing Multiple Vocal Features Sets
Abstract—Parkinson’s disease (PD) is a growing and chronic these researches, numerous features were obtained by using
neurodegenerative disease with a great amount of motor and non- some speech signal processing methods and extracted features
motor symptoms. In the initial stages, most of the PD patients were fed into different machine learning algorithms.
face difficulties in regular movements. Vocal disorders are one of
the common symptoms of them. Vocal disorder centric diagnosis The analysis and investigation of PD are difficult in some
systems are one of the leading areas in recent PD detection cases because of their overlapping symptoms. Only 75% of
studies. In this paper, the dataset was taken from the UCI the clinical investigation of PD is validated to be idiopathic
Machine Learning repository and a feature extraction technique Parkinson’s Disease at autopsy due to the symptoms overlap
was applied. The Analysis of Variance (ANOVA) is used for with other diseases [6]. An enormous amount of research
extracting the features as the dataset was full of features and the
topmost 50 features are selected according to ANOVA F-score. activities are performed in the PD classification. Classification
Multiple machine learning classification methods were applied methods can escalate the accuracy and efficiency of the re-
and compared with other related existing works. Experimental ferred diagnosis system as well as can make the diagnosis time
results show that the highest accuracy score of 0.91 was achieved more effective. In recent studies, TQWT features produced
with the Random Forest Classifier method by feeding the selected better results than other vocal features. MFCC and TQWT
features. ANOVA as a feature extraction technique successfully
extracted the significant features that differentiate PD patients combined features also improved the classification accuracy
from healthy individuals and also improve the classification [4].
accuracy. In this paper, different classifiers are used and applied to
Index Terms—Parkinson’s Disease (PD), Feature Extraction, recognize Parkinson’s disease. A feature selection technique,
Analysis of Variance (ANOVA), Classification Analysis of Variance (ANOVA) is used in this research and the
topmost 50 features are selected according to ANOVA F-value.
I. I NTRODUCTION The extracted features were fed into different classification
In recent years, health informatics systems perform a vital algorithms and final prediction is generated from that. The
role in identifying and monitoring different diseases. Parkin- overall workflow is shown in Fig.2 in Methodology section.
son’s disease (PD) is a rapidly growing neurodegenerative Various classification methods are implemented for evaluating
disease with a great amount of motor and non-motor signs the accuracy of the classifiers. As per the classification accu-
[1]. The early death of dopamine generative neurons in the racy, random forest classifier yields 91% correct classification
substantia nigral region results in Parkinson’s disease (PD) [2]. rate.
As PD progresses, the amount of dopamine produced in the
II. DATASET
brain diminishes gradually and the affected person becomes
unable to control his/her actions normally. People ranging A. Dataset Description
from 55-75 years are more at risk of affecting by PD. The The dataset used in this research was collected from the UCI
increasing risk expands the necessity for accurate PD diagnosis Machine Learning Repository [7]. This specific dataset was
and monitoring. In the US, there are about one million people also used in research [4], [8]. The dataset contains the data of
who got affected by Parkinson’s disease. Medications and 188 Parkinson disease affected patients, where 107 of them are
surgeries are possible treatments to cure their symptoms. Still, men and 81 of them are female. The age span of Parkinson’s
no effective solution and therapy for Parkinson’s disease is Disease patients is varied between 33 and 87 years. Along
discovered [3]. with this, the control group contains 64 healthy individuals
PD detection methods focus on observing and estimating where 23 of them are men and 41 of them are female. The
the severity and austerity of the indications using numerous age varies between 41 to 82 years of the control healthy
types of devices. One of the most popular signs of PD is the individuals. This dataset was collected from the Department
vocal difficulties, faced by 90% of PD victims in their earlier of Neurology in Cerrahpasa Faculty of Medicine, Istanbul
stages of the disease [4]. This vocal problem-based systems University. The frequency response remained fixed to 44.1 kHz
are in the uppermost positions on PD detection studies [5]. In of the microphone while gathering the data. After the doctor’s
Data Preprocessing
Feature Selection
Classification