Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 2

[1] Exploiting Nonlinear Recurrence and Fractal Scaling Properties for Voice Disorder

Detection',
Little MA, McSharry PE, Roberts SJ, Costello DAE, Moroz IM.
BioMedical Engineering OnLine 2007, 6:23 (26 June 2007)

WORKING METHODOLOGY:
The dataset has been taken from UCI Machine Learning repository [1]. The dataset is created by
Max Little of the University of Oxford with association of National Center for Voice and speech,
Denver, Colorado. Max Little recorded the speech signals needed for the dataset. This dataset is
composed of a medical measurement namely voice measurements from 31 people of which 23
was with Parkinson’s disease. Individual columns are a type of voice measures. The rows
amount to 195 voice recordings of the personals. The primary aim of this dataset is to
differentiate well abled people from people with Parkinson’s disease according to “status”
column set to 0 for healthy and 1 for PD. Each row in the dataset represents one recording per
patient. Each patient contributed with six of their recordings. Dataset has 23 features of non-null
values of data type float. MDVP:Fo(Hz) - Average vocal fundamental frequency,
MDVP:Fhi(Hz) - Maximum vocal fundamental frequency, MDVP:Flo(Hz) - Minimum vocal
fundamental frequency,MDVP:Jitter(%),MDVP:Jitter(Abs),MDVP:RAP,MDVP:PPQ,Jitter:DDP
- Several measures of variation in fundamental frequency, MDVP:Shimmer,
MDVP:Shimmer(dB), Shimmer:APQ3, Shimmer:APQ5,MDVP:APQ, Shimmer:DDA - Several
measures of variation in amplitude, NHR, HNR - Two measures of ratio of noise to tonal
components in the voice are the speech signal processing algorithms used to extract clinically
critical information.
Firstly, dataset has been imported into coding framework of Jupiter notebook from Anaconda
navigator. Features of dataset. Datasets features have been correlated. Data are divided into
training and data set in a ratio of 80 to 20. KNeighborsClassfier, Logistic Regression, Random
Forest, Linear SVM have been calculated with the features as parameter. Randomized Search
CV and Grid Search CVs is used individually on the best performing model till. Cross Validation
is implemented with best performing tuning process namely Grid Search CV, and used to
finalize on the model. Eventually evaluation matrix is populated with the tuned model.
Fig 1. Correlation Heat Map

You might also like