Major Project Presentation

You might also like

Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 30

KIET GROUP OF INSTITUTIONS

DELHI NCR - 201013


Major Project Presentation

Parkinson’s Disease Detection


Using Machine Learning Techniques

Prepared By Project Supervisor


Harshit Mishra Professor Ajay Kumar
Kundan Verma
Himanshu Yadav

DEPARTMENT OF INFORMATN TECHNOLOGY


Problem Statement
• PD (Parkinson’s Disease) is a neurological disorder that causes motor movement disorder usually present In the
elderly group.
• The project aims to Detect Disease Using voice samples and various ML Techniques.
• The Project aims to use several ML Models like (Linear Regression, Logistic Regression, Decision tree, Random
forest, Xgboost, Adaboost, Neural Networks, and SVM) and evaluate their performance.
Project Need and Objectives
● Biomarkers derived from human voice can offer insight into neurological disorders, such as Parkinson's disease
(PD), because of their underlying cognitive and neuromuscular function.

● PD is a progressive neurodegenerative disorder that affects about 7 million plus people all over the world(mostly
elders), with approximately 150 thousand new clinical diagnoses made each year. Historically, PD has been
difficult to detect, and doctors have tended to focus on some symptoms while ignoring others, relying primarily
on subjective rating scales. Due to the decrease in motor control that is the hallmark of the disease, voice can be
used as a means to detect and diagnose PD.

● With advancements in technology and the prevalence of audio collecting devices in daily lives, reliable models
that can translate this audio data into a diagnostic tool for healthcare professionals would potentially provide
diagnoses that are cheaper and more accurate. We provide evidence to validate this concept here using a voice
dataset collected from people with and without PD.
Introduction To Project
● PD Diagnosis is a difficult task. Early diagnosis of this disease can lead to good control and long-life expectancy.
The primal way to detect disease was MRI but with advancements in technology, there have been alternatives to
this. Change in voice is one of the viable symptoms of PD, with processing samples of voice software like
MFCCs, TWQT, and Vocal Fold Features, the voices of 252 people were recorded.

● This project explores the effectiveness of using supervised classification algorithms, such as Logistic
Regression, Support Vector Machines, Decision Trees, Random Forest, XGBoost, Adaboost, and Neural
Network to accurately diagnose individuals with the disease on the data mentioned above.

● Till now a peak accuracy of 86%(on the dataset that we were using) provided by the machine learning models
exceeds the average clinical diagnosis accuracy of non-experts (73.8%) and average accuracy of movement
disorder specialists (79.6% without follow-up, 83.9% after follow-up) with pathological post-mortem
examination as ground truth.
Data Flow Diagram

Methodology Used
Methodology Used
Algorithm Used

• Linear Regression: Simple linear regression is useful for finding the relationship between two continuous variables.
One is a predictor or independent variable and the other is a response or dependent variable.
• Logistic Regression: Logistic regression is one of the most commonly used machine learning algorithms for binary
classification problems, which are problems with two class values, including predictions such as “this or that,”
“yes or no” and “A or B.” A logistic regression model predicts a dependent data variable by analyzing the
relationship between one or more existing independent variables.
• Decision Tree Model: A Decision tree is a type of flowchart that shows a clear pathway to a decision. In terms of
data analytics, it is a type of algorithm that includes control statements to classify data. A decision tree starts at a
single control statement to classify data which then splits into two or more directions.
• Support Vector Machine Model: Support vector machines are supervised learning models associated with
learning algorithms that analyse data used for classification and regression analysis. The goal of the SVM
algorithm is to create the best line or decision boundary that can segregate N-dimensional space into two classes so
that we can easily put the new data in the correct category in the future.
Methodology Used
Algorithm Used
• Random Forest: A Random forest classifier is a supervised learning algorithm that you can use for regression and
classification problems t is also the most flexible and easy-to-use algorithm. It consists of multiple decision trees. It
has numerous applications in our daily life such as future selectors, recommender systems, and fault detection.
• XGBoost: XGBoost or extreme gradient boosting is one of the well-known gradient boosting techniques(ensemble)
having enhanced performance and speed in tree-based (sequential decision trees) machine learning algorithms.
XGBoost is a boosting-based ensemble learning method. In boosting, the trees are built sequentially such that each
subsequent tree aims to reduce the errors of the previous tree. Each tree learns from its predecessors and updates
the residual errors. Hence, the tree that grows next in the sequence will learn from an updated version of the
residuals.
• Neural Networks: Neural networks are a set of algorithms, modeled loosely after the human brain, that is
designed to recognize patterns. They interpret sensory data through a kind of machine perception, labeling, or
clustering raw input. The patterns they recognize are numerical, contained in vectors, into which all real-world
data, be it images, sound, text, or time series, must be translated.
• AdaBoost: AdaBoost is short for Adaptive Boosting and is a very popular boosting technique that combines
multiple “weak classifiers” into a single “strong classifier”.
Methodology Used
Algorithm Used

Neural Network
AdaBoost
Methodology Used

Algorithm Used

XGBoost
Tools Platforms/Technology/Languages Used
Dataset:

Title : Parkinson’s Disease Classification

Data Set Characteristics:

Multivariate
Number of Instances: 756
Area: Computer
Attribute Characteristics: Integer, Real
Number of Attributes: 754
Date Donated: 2018-11-05
Associated Tasks: Classification
Missing Values? N/A

Data Set Information:

The data used in this study were gathered from 188 patients with PD (107 men and 81 women) with ages ranging from 33 to 87 at the Department
of Neurology in the Faculty of Medicine, Istanbul University. The control group consists of 64 healthy individuals (23 men and 41 women) with
ages varying between 41 and 82. During the data collection process, the microphone is set to 44.1 kHz and following the physician’s examination,
the sustained pronunciation of the vowel /a/ was collected from each subject with three repetitions.
Various speech signal processing algorithms including Time Frequency Features, Mel Frequency Cepstral Coefficients (MFCCs), Wavelet
Transform based Features, Vocal Fold Features and TWQT features have been applied to the speech recordings of Parkinson's Disease
(PD) patients to extract clinically useful information for PD assessment.
Tools Platforms/Technology/Languages Used
Programming Language used: Python

Platform Used: MS Windows 11, Google co-lab

Machine Learning Libraries used: Numpy, MatplotLib, Seaborn, Pandas, Scikit Learn, XGBoost.

Machine Learning Algorithms Used: Linear Regression, Logistic Regression, Decision Trees, Support Vector
Machine, Random Forest, XGBoost, Neural Network, AdaBoost

Evaluation Methods and Metrics Used: Confusion Matrix, Classification Report, F1 - Score, Accuracy, Precision,
Recall.
Implementation

1. Standard Scaling
2. Train Test Split
3. Linear Regression Model.
4. Logistic Regression Model.
5. Decision Tree.
6. Support Vector Machine.
7. Random Forest.
8. XGBoost.
9. Neural Network.
9. Neural Network.
10. AdaBoost.
Comparisons
Accuracies:
• Achieved 75% Accuracy using SVM.
• Achieved 78% Accuracy using the Decision Tree Model.
• Achieved 75% Accuracy using the Logistic Regression Model.
• Achieved -15012 Accuracy using Linear Regression Model.
• Achieved 86% Accuracy using XGBooster Model.
• Achieved 85% Accuracy using Random Forest.
• Achieved 85% Accuracy using AdaBoost
• Achieved 77% Accuracy using Neural Networks.

The two highest accuracies received were from model Random Forest and Xg boost
Comparisons (Precision, Recall, Fi-Score)
Comparisons (Precision, Recall, Fi-Score)
Comparisons (Precision, Recall, Fi-Score)
Project Outcome
Future Innovations in the same Project
● Project got 87% peak accuracy. Now My aim is to attain a 100% accuracy or an accuracy score of 90-100%.

● For This I have thought about using various other neural network techniques like Recurrent NN, Multilayer
Perceptron, Autoencoder, Modular NN, and Deep Belief Network.
References
References

You might also like