Parkinson's Disease Prediction Using SVM and Logistic Regression Algorithm

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 66

PREDICTION OF PARKINSON’S

DISEASES USING SVM AND


LOGISTIC REGRESSION
ALGORITHM

A PROJECT REPORT

Submitted by

DURKKA DEVI S (711110104015)


GOKILA S (711117104017)
JOY PRINCY J (711110104024)
MONICKA G (711110104034)

in partial fulfilment for the award of the degree

of

BACHELOR OF ENGINEERING

in

COMPUTER SCIENCE AND ENGINEERING

JANSONS INSTITUTE OF TECHNOLOGY, COIMBATORE

ANNA UNIVERSITY:: CHENNAI 600 025

APRIL 2021
ANNA UNIVERSITY :: CHENNAI 600 025
BONAFIDE CERTIFICATE

Certified that this project report “PREDICTION OF PARKINSON’S


DISEASE USING SVM AND LOGISTIC REGRESSION ALGORITHM ”
is bonafide work of “DURKKA DEVI S (711117104015), GOKILA S
(711117104017), JOY PRINCY J (711117104024), MONICKA G
(711117104034)”who carried out the project work under my supervision.

……………………….. …………………………
SIGNATURE SIGNATURE
Dr.A.VELAYUDHAM M.E., Ph.D Ms.M.PAVITHRA M.E
HEAD OF THE DEPARTMENT SUPERVISOR

Head and Professor Assistant Professor


Department of CSE Department of CSE
Jansons Institute of Technology Jansons Institute of Technology
Karumathampatti Karumathampatti
Coimbatore-641 659 Coimbatore-641659

Submitted for the ANNA UNIVERSITY practical examination project work


viva-voce held on _____________.

………………………. ………………………
INTERNAL EXAMINER EXTERNAL EXAMINER
ACKNOWLEDGEMENT

We would like to express our sincere thanks to the honourable Chairman Rtn.
MPHF. Shri. T.S. NATARAJAN and Vice Chairmen Mr. T.N. KALAIMANI &
Mr. T.N. THIRUKUMAR for providing all the facilities to do the project in the
college campus.

We have unique pleasure in thanking our respected Principal


Dr. V. NAGARAJAN M.E., Ph.D for his continuous encouragement to do this
project.

We express our gratitude to Dr. A. VELAYUDHAM M.E., Ph.D Head


and Professor, Department of Computer Science and Engineering for his excellent
guidance and for providing necessary facilities to carry out the project.

We would like to thank our project Supervisor Ms. M. PAVITHRA M.E.,


Assistant Professor, Department of Computer Science and Engineering for her
constant support and motivation in the success of this work.

We heartily express our thanks to the Project Co-ordinator


Dr. E.S. SHAMILA M.E., Ph.D, Professor, Department of Computer Science
and Engineering for her guidance and suggestions during this project work.

We extend our sincere thanks to all Technical and non – Technical staff
Members of our department who helped us in all aspects throughout this project.

I also thank the GOD ALMIGHTY for giving me courage and all the
needful to fulfil this project.
TABLE OF CONTENTS

CHAPTER TITLE PAGE

NO. NO.

ABSTRACT
LIST OF FIGURES

1 INTRODUCTION 1

1.1 OBJECTIVE 5
1.2 PROBLEM DEFINITION 5

2 LITERATURE SURVEY 6

3 SYSTEM ANALYSIS 11

3.1 EXISTING SYSTEM 11

3.2 DISADVANTAGE 11

4 SYSTEM DESIGN 12

4.1 PROPOSED SYSTEM 12

4.2 ARCHITECTURAL DESIGN 13

4.3 MODULE DESCRIPTION 19

4.4 DATA FLOW DIAGRAM 24

4.5 IMPLEMENTATION DETAILS 30


5 SYSTEM REQURIMENTS 32

5.1 HARDWARE SPECIFICATION 32

5.2 SOFTWARE SPECIFICATION 32

5.3 PYHON PROGRAMMING

LANGUAGE 33

6 CONCLUSION 43

REFERENCES 46

APPENDIX 47

SAMPLE CODE 48
LIST OF FIGURES

FIGURE TITLE PAGE

NO. NO.

1 ARCHITECTURE DIAGRAM 13

2 DATASET 35
3 GRAPHICAL REPRESENTATION OF 36
DISEASE’S PREDICTION
ABSTRACT

Predicting Parkinson’s Disease by using Data mining in systematic way. This project
aims at detecting Parkinson’s disease through data mining. Since there is no standard
test to detect parkinsonism, we propose a statistical approach using the most common
symptoms of PD which are gait, tremors and micro-graphia. This includes analysing the
co-relation between the symptoms and classifying the achieved data using different
classification algorithms in order to find the algorithm which gives the highest accuracy
in diagnosing PD patients.
CHAPTER 1

INTRODUCTION
Machine learning is the computational learning using algorithm to learn from
and make predictions on data. Machine Learning is the powerful new technology
for analyst to focus on the most important information in their data warehouse.
Support vector machine is an open source it helps to extract the featured attribute
from the dataset.
Sequential Minimal Optimization use Poly Kernels to predict the rate of disease
in graphical manner. A type of machine learning, SVM allows categorization of an
individual's previously unseen data into a predefined group using a classification
algorithm, developed on a training data set.

In recent years, SVM has been successfully applied in the context of disease
diagnosis, transition prediction and treatment prognosis, using both structural and
functional neuroimaging data.

Standard univariate analysis of neuroimaging data has revealed a host of


neuroanatomical and functional differences between healthy individuals and
patients suffering a wide range of neurological and psychiatric disorders.
Significant only at group level however these findings have had limited clinical
translation, and recent attention has turned toward alternative forms of analysis,
including Support-Vector- Machine (SVM).
This idea leads to improved quality of patient service and good patient retention
and satisfaction. Standard univariate analysis of neuroimaging data has revealed a
host of neuroanatomical and functional differences between healthy individuals and
patients suffering a wide range of neurological and psychiatric disorders.
Significant only at group level however these findings have had limited clinical
translation, and recent attention has turned toward alternative forms of analysis,
1
including Support-Vector-Machine (SVM).
A type of machine learning, SVM allows categorization of an individual's
previously unseen data into a predefined group using a classification algorithm,
developed on a training data set. In recent years, SVM has been successfully
applied in the context of disease diagnosis, transition prediction and treatment
prognosis, using both structural and functional neuroimaging data.

Standard univariate analysis of neuroimaging data has revealed a host of


neuroanatomical and functional differences between healthy individuals and
patients suffering a wide range of neurological and psychiatric disorders.
Significant only at group level however these findings have had limited clinical
translation, and recent attention has turned toward alternative forms of analysis,
including Support-Vector-Machine (SVM). A type of machine learning, SVM
allows categorization of an individual's previously unseen data into a predefined
group using a classification algorithm, developed on a training data set.

In recent years, SVM has been successfully applied in the context of disease
diagnosis, transition prediction and treatment prognosis, using both structural and
functional neuroimaging data.
Standard univariate analysis of neuroimaging data has revealed a host of
neuroanatomical and functional differences between healthy individuals and
patients suffering a wide range of neurological and psychiatric disorders.
Significant only at group level however these findings have had limited clinical
translation, and recent attention has turned toward alternative forms of analysis,
including Support-Vector- Machine (SVM). A type of machine learning, SVM
allows categorization of an individual's previously unseen data into a predefined
group using a classification algorithm, developed on a training data set. In recent
years, SVM has been successfully applied in the context of disease diagnosis,
transition prediction and treatment prognosis, using both structural and functional
neuroimaging data.-Standard univariate analysis of neuroimaging data has revealed
2
a host of neuroanatomical and functional differences between healthy individuals
and patients suffering a wide range of neurological and psychiatric disorders.
Significant only at group level however these findings have had limited clinical
translation, and recent attention has turned toward alternative forms of analysis,
including Support-Vector-Machine (SVM). A type of machine learning, SVM
allows categorization of an individual's previously unseen data into a predefined
group using a classification algorithm, developed on a training data set. In recent
years, SVM has been successfully applied in the context of disease diagnosis,
transition prediction and treatment prognosis, using both structural and functional
neuroimaging data.
Standard univariate analysis of neuroimaging data has revealed a host of
neuroanatomical and functional differences between healthy individuals and
patients suffering a wide range of neurological and psychiatric disorders.
Significant only at group level however these findings have had limited clinical
translation, and recent attention has turned toward alternative forms of analysis,
including Support-Vector- Machine (SVM).
A type of machine learning, SVM allows categorization of an individual's
previously unseen data into a predefined group using a classification algorithm,
developed on a training data set. In recent years, SVM has been successfully
applied in the context of disease diagnosis, transition prediction and treatment
prognosis, using both structural and functional neuroimaging data. Standard
univariate analysis of neuroimaging data has revealed a host of neuroanatomical
and functional differences between healthy individuals and patients suffering a
wide range of neurological and psychiatric disorders. Significant only at group
level however these findings have had limited clinical translation, and recent
attention has turned toward alternative forms of analysis, including Support-
Vector-Machine (SVM). A type of machine learning, SVM allows
categorization of an individual's previously unseen data into a predefined group

3
using a classification algorithm, developed on a training data set. In recent
years, SVM has been successfully applied in the context of disease
diagnosis, transition prediction and treatment prognosis, using both
structural and functional neuroimaging data. Standard univariate analysis
of neuroimaging data has revealed a host of neuroanatomical and
functional differences between healthy individuals and patients suffering
a wide range of neurological and psychiatric disorders. Significant only at
group level however these findings have had limited clinical translation,
and recent attention has turned toward alternative forms of analysis,
including Support-Vector- Machine (SVM). A type of machine learning,
SVM allows categorization of an individual's previously unseen data into
a predefined group using a classification algorithm, developed on a
training data set. In recent years, SVM has been successfully applied in
the context of disease diagnosis, transition prediction and treatment
prognosis, using both structural and functional neuroimaging data.

Standard univariate analysis of neuroimaging data has revealed a host


of neuroanatomical and functional differences between healthy
individuals and patients suffering a wide range of neurological and
psychiatric disorders. Significant only at group level however these
findings have had limited clinical translation, and recent attention has
turned toward alternative forms of analysis, including Support-Vector-
Machine (SVM).

A type of machine learning, SVM allows categorization of an


individual's previously unseen data into a predefined group using a
classification algorithm, developed on a training data set. In recent years,
SVM has been successfully applied in the context of disease diagnosis,
transition prediction and treatment prognosis, using both structural and
functional neuroimaging data.

4
Parkinson's disease (PD) is a neurodegenerative disease which often
affects patients' movement. Currently, PD is diagnosed via various
neurological examinations by specialists. The most common symptoms of PD
are tremor, gait disturbance, stiffness, and slowness.
Through this project we are trying to co-relate different symptoms in order to
increase the accuracy in diagnosing Parkinson’s. The dataset will include
features such as jitters and stride. This data will be analyzed using different
classification techniques thus providing a reliable and accurate approach to
diagnose Parkinson’s at an early stage.

1.1 OBJECTIVE

The main objective of this disease to overcome the difficulty of


recognizing constrained association rules for Parkinson’s illness
prediction was studied. The support vector machine techniques have been
engaged by various works in the works to analyze various diseases, for
instance: Gait, Tremor, Micro-graphia, extreme nervousness. PD is
measured to be one of the basic data mining difficulties that expects to
discern collections of items or values or forms that co-occur regularly in a
dataset. The term Parkinson’s illness covers the various diseases that
affect the brain and nervous. This technique is used while prescribing the
patient and this system predicts which remedy in the form of medicines
and medical test suits best.

1.2 PROBLEM DEFINITION

 Analyze the relation between different symptoms of Parkinson’s such as


tremors, gait, stride.
 Classify the data using different classification algorithm through the
means of Python software.
 Calculate the accuracy of each algorithm and deducing the most
appropriate algorithm for the diagnosis
5
CHAPTER 2

Literature
survey

2.1 Rahul R Zaveri, Prof. Pramila M. Chawan "The International


Research Journal of Engineering and Technology" vol. 07, issue:
10, 2020.

Dr. Anupam Bhatia and Raunak Sulekh [1]

“Predictive Model for Parkinson’s Disease through Naive


Bayes Classification” In this study, Naive Bayes was applied to predict the
performance of the dataset. Rapid miner 7.6.001 is a tool, which was used to
explore, statistically analyze, and mine the data. The Naive Bayes model
performs with 98.5 % accuracy, and 99.75% of precision.

Carlo Ricciardi, et al [2]


“Using gait analysis’ parameters to classify Parkinsonism: A data
mining approach” In this system, Random Forest is used for classification
along with comparing it with Gradient Boosted Trees. These results are being
categorized into 3 different categories namely PSP, De Novo Parkinson’s
Disease and Stable Parkinson’s Disease with their accuracy being as high as
86.4% as compared to Gradient Boosted Trees which were accurate to a
meagre 70%. Also the precision rate of Random Forest was maximum of 90 %
against Gradient Boosted Trees which were around maximum of 85%.

6
Mehrbakhsh Nilashi et al [3]

“A hybrid intelligent system for the prediction of Parkinson’s


Disease progression using Machine Learning techniques” In this system a
method was proposed for the UPDRS (Total-UPDRS and Motor-UPDRS)
prediction using machine learning. ISVR was used to predict the Total-UPDRS
and Motor-UPDRS. SOM and NIPALS were used for clustering and data
dimensionality reduction. The results show that the method combining SOM,
NIPALS, and ISVR techniques was effective in predicting the Total-UPDRS
and Motor-UPDRS.

Arvind Kumar Tiwari [4]

“Machine Learning based Approaches for Prediction of Parkinson’s


Disease,” In this system, minimum redundancy maximum relevance feature
selection algorithms were used to select the most important feature among all
the features to predict Parkinson’s disease. This system of feature selection
along with Random Forests provided an accuracy of 90.3% and precision of
90.2%.

M. Abdar and M. Zomorodi-Moghadam [5]

“Impact of Patients’ Gender on Parkinson’s disease is using


Classification Algorithms” In this system, SVM and Bayesian Networks were
used for classification of data based on the gender of the patient. The accuracy
for SVM was 90.98% and Bayesian network was 88.62%.This test proved that
the SVM algorithm had a great ability to identify a patient’s gender suffering
from PD.

7
Dragana Miljkovic et al [6]

“Machine Learning and Data Mining Methods for Managing


Parkinson’s Disease” In this system, based on the initial patients examination
and medications taken, the Predictor part was able to predict each Parkinson’s
Disease symptom separately covering 15 different Parkinson’s Disease
symptoms in total. The accuracy of prediction ranges from 57.1% to 77.4%
depending on the symptom where the highest accuracy is achieved from
tremor detection. 7) Md. Redone Hassan et al [7], “A Knowledge Base Data
Mining based on Parkinson’s Disease” In this system, the results and output of
the vector support machine (SVM), K nearest neighbor and the output figures
for the decision tree algorithms were shown in the output section of the train
data. The decision tree offered the highest precision of 78.2%.

Satish Srinivasan, Michael Martin & Abhishek Tripathi [8]

“ANN based Data Mining Analysis of Parkinson’s Disease”


In this study, it was intended to understand how the different types of
preprocessing steps could affect the prediction accuracy of the classifier. In the
process of classifying the Parkinson’s Disease dataset using the ANN based
MLP classifier a significantly high prediction accuracy was observed when the
dataset was pre-processed using both the Discretization and Resample
technique, both in the case of 10-fold cross validation and 80:20 split. Whereas
in the 70:30 split it was found that the combination of the preprocessing steps
namely resampling and SMOTE on the dataset resulted towards the higher
prediction accuracy using the MLP classifier. On an 80:20 split of the pre-
processed (Discretized and Resampled) dataset the ANN based MLP classifier
achieved a 100% classification accuracy with F1-score and MCC being 100%.

8
Ramzi M. Sadek et al [9]
“Parkinson’s Disease Prediction is using Artificial Neural
Network” In this system, 195 samples in the dataset were divided into 170
training samples and 25 validating samples. Then importing the dataset in the
Just Neural Network (JNN) environment, we trained, validated the Artificial
Neural Network model. The most important attributes contributing to the ANN
model were made known of. The ANN model was 100% accurate. Table -1:
Summary of Literature.

2.2 Saykin, A. 1., Shen, L., Foroud, T. M., Potkin, S. G., Swaminathan,
S., Kim, S., et al. "Alzheimer's disease Neuroimaging Initiative
biomarkers as quantitative phenotypes: genetics core aims, progress,
and plans". Alzheimers Dement. Vol. 6, No.3, pp. 265- 273, 2010.

The role of the Alzheimer’s Disease Neuroimaging Initiative Genetics


Core is to facilitate the investigation of genetic influences on disease onset
and trajectory as reflected in structural, functional, and molecular imaging
changes; fluid biomarkers; and cognitive status. Major goals include (1)
blood sample processing, genotyping, and dissemination, (2) genome-wide
association studies (GWAS) of longitudinal phenotypic data, and (3)
providing a central resource, point of contact and planning group for
genetics within the Alzheimer’s Disease Neuroimaging Initiative.

Genome-wide array data have been publicly released and updated, and
several neuroimaging GWAS have recently been reported examining
baseline magnetic resonance imaging measures as quantitative phenotypes.
Other preliminary investigations include copy number variation in mild
cognitive impairment and Alzheimer’s disease and GWAS of baseline
cerebrospinal fluid biomarkers and longitudinal changes on magnetic
resonance imaging.

9
Blood collection for RNA studies is a new direction. Genetic studies of
longitudinal phenotypes hold promise for elucidating disease mechanisms
and risk, development of therapeutic strategies, and refining selection
criteria for clinical trials. A type of machine learning, SVM allows
categorization of an individual's previously unseen data into a predefined
group using a classification algorithm, developed on a training data set. In
recent years, SVM has been successfully applied in the context of disease
diagnosis, transition prediction and treatment prognosis, using both
structural and functional neuroimaging data.

10
CHAPTER 3

SYSTEM ANALYSIS

3.1 EXISTING SYSTEM:

Clinical decisions are often made based on doctors’ intuition and experience
rather than on the knowledge rich data hidden in the database. This practice leads
to unwanted biases, errors and excessive medical costs which affects the quality
of service provided to patient.
❖ Use of wearable technologies through the implementation of Internet of
things.

❖ Handwriting as a marker for the diagnosis of PD using support vector


machine achieving the accuracy of 88.13%

❖ Using 3D visualization techniques to provide an intuitive tool for assessment


of Parkinson’s

❖ Visually guided tracking performance of PD patients using data mining


technique.

❖ Using Voice and speech data to detect Parkinson’s.

3.2 DISADVANTAGES

❖ Speech samples require speech segmentation and noise removal.

❖ Breath samples require dedicated sensors.

❖ Handwriting samples can be influenced by other factors

❖ Considering single symptom requires less calculation.

❖ Results and accuracy are based on a single symptom.

11
CHAPTER 4

SYSTEM DESIGN

4.1 PROPOSED SYSTEM

❖ Parkinson’s disease detection using gait, tremors and handwriting samples as


the dataset, in order to increase the accuracy by finding the co-relation
between these symptoms.

❖ Since individual analysis of every symptom has some drawback attached to it,
for example handwriting is a complex activity where other factors can
influence motor movement, in speech recognition additional steps such as
noise removal and speech segmentation are required, using breath samples
has been proved to fail to meet clinically relevant results.

❖ Thus in order to avoid the above problems, we have included multiple


symptoms rather than relying on one of them.

ADVANTAGES

❖ No such additional steps needed.

❖ No requirement for any special sensors and no need to solve the typical
problems of acoustic signal acquisition and processing.

❖ Including additional symptoms into account.

❖ Analysis of multiple symptoms may require additional calculations.

❖ Results and accuracy are based on multiple co-related symptoms hence


making it more reliable.

12
ARCHITECTURAL DESIGN

SYSTEM ARCHITECTURE

Pre-process Apply ML
Dataset

SVM Logistic
algorithm regression

Train model Train model

Figure: Overall
PredictArchitecture Predict
result result

Fig 1. Architecture diagram

4.3 SYSTEM ANALYSIS AND DESIGN

Introduction
Computer Aided Diagnosis is a rapidly growing dynamic area of research in
medical industry. The recent researchers in Machine Learning promise the
improved accuracy of price predictions. Here the computers are enabled to

13
think by developing intelligence by learning. There are many types of
Machine Learning Techniques and which are used to classify the data sets.

Requirement Analysis

Software Requirement Specification (SRS) is the starting point of the


software developing activity. As system grew more complex it became evident that
the goal of the entire system cannot be easily comprehended. Hence the need for
the requirement phase arose. The software project is initiated by the client needs.
The SRS is the means of translating the ideas of the minds of clients (the input) into
a formal document (the output of the requirement phase.)
Under requirement specification, the focus is on specifying what has been
found giving analysis such as representation, specification languages and tools, and
checking the specifications are addressed during this activity.
The Requirement phase terminates with the production of the validate SRS
document. Producing the SRS document is the basic goal of this phase.
The purpose of the Software Requirement Specification is to reduce the
communication gap between the clients and the developers. Software Requirement
Specification is the medium which the client and user needs are accurately
specified. It forms the basis of software development. A good SRS should satisfy
all the parties involved in the system.

Functional Requirements

The proposed application should be able to Parkinson disease prediction


from given vocal and key stroke symptoms. Prediction is performed by support
vector machine (SVM) and linear regression techniques.

14
Product Perspective

The application is developed in such a way that any future enhancement can
be easily implementable. The project is developed in such a way that it requires
minimal maintenance. The software used are open source and easy to install. The
application developed should be easy to install and use.

Product features

o The application is developed in such a way that disease prediction can


be predicted using machine learning algorithms Support vector
machine and logistics regression.
o The dataset is taken from kaggle.com.
o We can compare the accuracy for the implemented algorithms.

User characteristics

Application is developed in such a way that its users are

❖ Easy to use

❖ Error free

❖ Minimal training or no training

❖ Patient regular monitor

Assumption & Dependencies

It is considered that the dataset taken fulfils all the requirements.

15
Domain Requirements

This document is the only one that describes the requirements of the
system. It is meant for the use by the developers, and will also by the basis for
validating the final delivered system. Any changes made to the requirements in the
future will have to go through a formal change approval process.

User Requirements

User can decide on the prediction accuracy to decide on which


algorithm can be used in real-time predictions.

Non Functional Requirements

➢ Dataset collected should be in the CSV format

➢ The column values should be numerical values

➢ Test set are stored as CSV files

➢ Error rates can be calculated for prediction algorithms

Efficiency:

Less time for detection and price forecast for five days

Reliability:
Maturity, fault tolerance and recoverability

Portability:
It can the software easily be transferred to another environment,
16
including install ability.

Usability:
How easy it is to understand, learn and operate the software system

Organizational Requirements:
Do not block the some of available ports through the windows firewall.
Internet connection should be available

Implementation Requirements

Quandl.com authentication key for dataset collection, internet connection


to install related libraries.

Engineering Standard Requirements

User Interfaces
User interface is developed in python, which gets input such stock
symbol.

Hardware Interfaces

Ethernet
Ethernet on the AS/400 supports TCP/IP, Advanced Peer-to-Peer
Networking (APPN) and advanced program-to-program communications (APPC).

ISDN
To connect AS/400 to an Integrated Services Digital Network (ISDN) for

17
faster, more accurate data transmission. An ISDN is a public or private digital
communications network that can support data, fax, image, and other services over
the
same physical interface. can use other protocols on ISDN, such as IDLC and X.25.

Software Interfaces
No specific software interface is used.

Operational Requirements

• Economic
The developed product is economic as it is not required any hardware
interface etc.

• Environmental
Statements of fact and assumptions that define the expectations of the
system in terms of mission objectives, environment, constraints, and measures of
effectiveness and suitability (MOE/MOS). The customers are those that perform the
eight primary functions of systems engineering, with special emphasis on the
operator as the key customer.

• Health and Safety


The software may be safety-critical. If so, there are issues associated
with its integrity level. The software may not be safety-critical although it forms
part of a safety-critical system. For example, software may simply log transactions.
If a system must be of a high integrity level and if the software is shown to be of
that integrity level, then the hardware must be at least of the same integrity level.
There is little point in producing 'perfect' code in some language if hardware and
system software (in widest sense) are not reliable. If a computer system is to run

18
software of a high integrity level then that system should not at the same time
accommodate software of a lower integrity level. Systems with different
requirements for safety levels must be separated. Otherwise, the highest level of
integrity required must be applied to all systems in the same environment.

MODULE DESCRIPTION:

DATA PRE-PROCESSING
We have taken multiple symptoms in our case study, in which we combined
the patient’s dataset with speech and keystroke dataset. Pre-processing of dataset is
done for converting the string attributes to numerals and missing data records are
dropped. The pre-processed data is stored in “newdata.csv” file, which is given as
input for machine learning models.

SUPPORT VECTOR MACHINES ALGORITHM

A Support Vector Machine is a supervised learning algorithm. An SVM


models the data into k categories, performing classification and forming an N-
dimensional hyper plane. These models are very similar to neural networks.
Consider a dataset of N dimensions. The SVM plots the training data into an N
dimensioned space. The training data points are then divided into k different regions
depending on their labels by hyper-planes of n different dimensions. After the
testing phase is complete, the test points are plotted in the same N dimensioned
plane. Depending on which region the points are located in, they are appropriately
classified in that region.
Split our dataset to train and test set and fit the dataset to SVM model, as
given below.

X_train, X_test, Y_train, Y_test = train_test_split(X_all, y_all)

19
clf = svm.LinearSVC()
clf.fit(X_train,Y_train)
pred = clf.predict(X_test)

Result is stored in csv file by using below code


result2=open("Output/resultSVM.csv","w")
result2.write("ID,Predicted Value" + "\n")
for j in range(len(pred)):
result2.write(str(j+1) + "," + str(pred[j]) + "\n")
result2.close()
Output is stored in csv as below:

20
Logistic Regression

Logistic regression is a predictive analysis. Logistic regression is used


to describe data and to explain the relationship between one dependent binary
variable and one or more nominal, ordinal, interval or ratio-level independent
variables. When selecting the model for the logistic regression analysis, another
important consideration is the model fit. Adding independent variables to a logistic
regression model will always increase the amount of variance. A pseudo R2 value
is also available to indicate the adequacy of the regression model.

Split our dataset to train and test set and fit the dataset to Logistic regression
model, as given below.

X_train, X_test, Y_train, Y_test = train_test_split(X_all, y_all)


sclf = linear_model.LogisticRegression(C=1e5)
sclf.fit(X_train,Y_train)
pred = sclf.predict(X_test)

result2=open("Output/resultLogisticRegression.csv","w")
result2.write("ID,Predicted Value" + "\n")
for j in range(len(pred)):
result2.write(str(j+1) + "," + str(pred[j]) + "\n")
result2.close()

21
Output is stored in csv as below:

22
Flow chart of Linear Regression algorithm:

Start

Input: training data and testing


data

Computing the regression


coefficients of training data

Sigmoid Function

Finding the relationship between


the training data and the testing
data

Output: the object’s positions

End

23
DATA FLOW DIAGRAM

Level 0

User Pre- Split data


process

Regressor/
Input values
classifier

Trained
Test input model

Predicted
result

24
Level 1

User Pre- Split data


process

Regression,
Input values
SVM

Trained
Test input model

Predicted
result

25
UML DIAGRAM

USE CASE DIAGRAM

Dataset CSV

Data cleaning and pre-


process

Split train and test set

Train model

Test input

Apply algorithm

Predicted results

Figure: Use case diagram

26
SEQUENCE DIAGRAM

Dataset Pre- Train set Test set ML


process algorithm

Load

Apply

Apply

Analyze

Analyze

Figure: Sequence Diagram


Figure: Sequence diagram

27
ACTIVITY DIAGRAM

Input dataset

Data pre-process

Data split

Trainset Testset

Train model ML

Predicted
result

Figure: Activity Diagram

28
ER DIAGRAM

MDVP:Fh MDVP:Shimmer
MDVP:F MDVP:Jitt
i
o er
MDVP:Fl
o

Vocal Frequency
frequency check variation

condition check

Dataset
Vocal HNR
check component
check

NHR

keystroke

latency

Hold
Han time
d Result
Direction

29
IMPLEMENTATION DETAILS

DATASET
We considered multiple symptoms of patients such as speech and key stroke data
for Parkinson’s disease prediction.

The attribute columns considered are as follows


Attribute Name Attribute description
name ASCII subject name and recording
number
MDVP:Fo(Hz) Average vocal fundamental frequency
MDVP:Fhi(Hz) Maximum vocal fundamental frequency
MDVP:Flo(Hz) Minimum vocal fundamental frequency
MDVP:Jitter(%), MDVP:Jitter(Abs), Several measures of variation in
MDVP:RAP, MDVP:PPQ, Jitter:DDP fundamental frequency
MDVP:Shimmer, Several measures of variation in *
MDVP:Shimmer(dB), Shimmer:APQ3, amplitude
Shimmer:APQ5, MDVP:APQ,
Shimmer:DDA
NHR,HNR Two measures of ratio of noise to tonal
components in the voice
status Health status of the subject (one) -
Parkinson's, (zero) – healthy
RPDE,D2 Two nonlinear dynamical complexity
measures
DFA Signal fractal scaling exponent
spread1,spread2,PPE Three nonlinear measures of
fundamental frequency variation
UserKey 10 character code for that user

30
Date YYMMDD
Timestamp HH:MM:SS.SSS
Hand L or R key pressed
Hold time Time between press and release for
current key milliseconds
Direction Previous to current LL, LR, RL, RR (and
S for a space key)
Latency time Time between pressing the previous key
and pressing current key. Milliseconds
Flight time Time between release of previous key
and press of current key. Milliseconds

31
CHAPTER 5

SYSTEM REQUIREMENTS

5.1 HARDWARE AND SOFTWARE SPECIFICATION

HARDWARE REQUIREMENTS
Processor : Any Processor above 500 MHz.
Ram : 4 GB
Hard Disk : 4 GB
Input device : Standard Keyboard and Mouse.
Output device : VGA and High Resolution Monitor.

SOFTWARE REQUIREMENTS
Operating System : Windows 7 or higher
Programming : Python 3.6 and related libraries

32
5.2 The Software Description

PYTHON
Python is an interpreted high-level programming language for general-
purpose programming. Created by Guido van Rossum and first released in 1991,
Python has a design philosophy that emphasizes code readability, notably using
significant whitespace. It provides constructs that enable clear programming on both
small and large scales.
Python features a dynamic type system and automatic memory
management. It supports multiple programming paradigms, including object-
oriented, imperative, functional and procedural, and has a large and comprehensive
standard library.
Python interpreters are available for many operating systems. CPython, the
reference implementation of Python, is open source software and has a community-
based development model, as do nearly all of its variant implementations. CPython is
managed by the non-profit Python Software Foundation.

Features and philosophy


Python is a multi-paradigm programming language. Object-oriented
programming and structured programming are fully supported, and many of its
features support functional programming and aspect-oriented programming
(including by meta programming and meta objects (magic methods)). Many other
paradigms are supported via extensions, including design by contract and logic
programming.
Python uses dynamic typing, and a combination of reference counting and
a cycle-detecting garbage collector for memory management. It also features
dynamic name resolution (late binding), which binds method and variable names
during program execution.
Python's design offers some support for functional programming in the Lisp

33
tradition. It has filter(), map(), and reduce() functions; list comprehensions,
dictionaries, and sets; and generator expressions. The standard library has two
modules (itertools and functools) that implement functional tools borrowed from
Haskell and Standard ML.
The language's core philosophy is summarized in the document The Zen of
Python (PEP 20), which includes aphorisms such as:

➢ Beautiful is better than ugly

➢ Explicit is better than implicit

➢ Simple is better than complex

➢ Complex is better than complicated

➢ Readability counts

Rather than having all of its functionality built into its core, Python was
designed to be highly extensible. This compact modularity has made it particularly
popular as a means of adding programmable interfaces to existing applications. Van
Rossum's vision of a small core language with a large standard library and easily
extensible interpreter stemmed from his frustrations with ABC, which espoused the
opposite approach.

While offering choice in coding methodology, the Python philosophy rejects


exuberant syntax (such as that of Perl) in favor of a simpler, less-cluttered grammar.
As Alex Martelli put it: "To describe something as 'clever' is not considered a
compliment in the Python culture." Python's philosophy rejects the Perl "there is
more than one way to do it" approach to language design in favor of "there should be
one—and preferably only one—obvious way to do it".

34
Python's developers strive to avoid premature optimization, and reject patches to
non-critical parts of CPython that would offer marginal increases in speed at the cost
of clarity. When speed is important, a Python programmer can move time-critical
functions to extension modules written in languages such as C, or use PyPy, a just-
in-time compiler. Cython is also available, which translates a Python script into C
and makes direct C-level API calls into the Python interpreter.

An important goal of Python's developers is keeping it fun to use. This is


reflected in the language's name—a tribute to the British comedy group Monty
Python and in occasionally playful approaches to tutorials and reference materials,
such as examples that refer to spam and eggs (from a famous Monty Python sketch)
instead of the standard foo and bar.

A common neologism in the Python community is pythonic, which can


have a wide range of meanings related to program style. To say that code is pythonic
is to say that it uses Python idioms well, that it is natural or shows fluency in the
language, that it conforms with Python's minimalist philosophy and emphasis on
readability. In contrast, code that is difficult to understand or reads like a rough
transcription from another programming language is called un-pythonic.
Users and admirers of Python, especially those considered knowledgeable or
experienced, are often referred to as Pythonists, Pythonistas, and Pythoneers

Syntax and semantics

Python is meant to be an easily readable language. Its formatting is visually


uncluttered, and it often uses English keywords where other languages use
punctuation. Unlike many other languages, it does not use curly brackets to delimit
blocks, and semicolons after statements are optional. It has fewer syntactic
exceptions and special cases than C or Pascal.
35
Indentation

Python uses whitespace indentation, rather than curly brackets or keywords,


to delimit blocks. An increase in indentation comes after certain statements; a
decrease in indentation signifies the end of the current block. This feature is also
sometimes termed the off-side rule.

Statements and control flow

Python's statements include (among others):


The assignment statement (token '=', the equals sign). This operates
differently than in traditional imperative programming languages, and this
fundamental mechanism (including the nature of Python's version of variables)
illuminates many other features of the language. Assignment in C, e.g., x = 2,
translates to "typed variable name x receives a copy of numeric value 2". The (right-
hand) value is copied into an allocated storage location for which the (left-hand)
variable name is the symbolic address. The memory allocated to the variable is large
enough (potentially quite large) for the declared type. In the simplest case of Python
assignment, using the same example, x = 2, translates to "(generic) name x receives a
reference to a separate, dynamically allocated object of numeric (int) type of value
2." This is termed binding the name to the object.

Since the name's storage location doesn't contain the indicated value, it is
improper to call it a variable. Names may be subsequently rebound at any time to
objects of greatly varying types, including strings, procedures, complex objects with
data and methods, etc. Successive assignments of a common value to multiple
names, e.g., x = 2; y = 2; z = 2 result in allocating storage to (at most) three names
36
and one numeric object, to which all three names are bound. Since a name is a
generic reference holder it is unreasonable to associate a fixed data type with it.
However at a given time a name will be bound to some object, which will have a
type; thus there is dynamic typing.

o The if statement, which conditionally executes a block of code, along


with else and elif (a contraction of else-if).
o The for statement, which iterates over an iterable object, capturing each
element to a local variable for use by the attached block.
o The while statement, which executes a block of code as long as its
condition is true.
o The try statement, which allows exceptions raised in its attached code
block to be caught and handled by except clauses; it also ensures that
clean-up code in a finally block will always be run regardless of how
the block exits.
o The class statement, which executes a block of code and attaches its
local namespace to a class, for use in object-oriented programming.
o The def statement, which defines a function or method.
The with statement, from Python 2.5 released on September 2006, which
encloses a code block within a context manager (for example, acquiring a lock before
the block of code is run and releasing the lock afterwards, or opening a file and then
closing it), allowing Resource Acquisition Is Initialization (RAII)-like behavior and
replaces a common try/finally idiom.

The pass statement, which serves as a NOP. It is syntactically needed to


create an empty code block. The assert statement, used during debugging to check
for conditions that ought to apply. The yield statement, which returns a value from a
generator function. From Python 2.5, yield is also an operator. This form is used to
implement routines.
37
The import statement, which is used to import modules whose functions or
variables can be used in the current program. There are four ways of using import:
import <module name> or from <module name> import * or import numpy as np or
from numpy import pi as Pie.

The print statement was changed to the print() function in Python 3.


Python does not support tail call optimization or first-class continuations,
and, according to Guido van Rossum, it never will. However, better support for
coroutine-like functionality is provided in 2.5, by extending Python's generators.
Before 2.5, generators were lazy iterators; information was passed unidirectionally
out of the generator. From Python 2.5, it is possible to pass information back into a
generator function, and from Python 3.3, the information can be passed through
multiple stack levels.

Expressions

Some Python expressions are similar to languages such as C and Java, while
some are not:
Addition, subtraction, and multiplication are the same, but the behavior of
division differs. There are two types of divisions in Python. They are floor division
and integer division. Python also added the ** operator for exponentiation.
From Python 3.5, the new @ infix operator was introduced. It is intended to
be used by libraries such as NumPy for matrix multiplication.
In Python, == compares by value, versus Java, which compares numerics
by value and objects by reference. (Value comparisons in Java on objects can be
performed with the equals() method.) Python's is operator may be used to compare
object identities (comparison by reference). In Python, comparisons may be chained,
for example a <= b <= c.
38
Python uses the words and, or, not for its boolean operators rather than the
symbolic &&, ||, ! used in Java and C.
Python has a type of expression termed a list comprehension. Python 2.4
extended list comprehensions into a more general expression termed a generator
expression.

Anonymous functions are implemented using lambda expressions; however,


these are limited in that the body can only be one expression.

Conditional expressions in Python are written as x if c else y (different in


order of operands from the c ? x : y operator common to many other languages).

Python makes a distinction between lists and tuples. Lists are written as [1,
2, 3], are mutable, and cannot be used as the keys of dictionaries (dictionary keys
must be immutable in Python). Tuples are written as (1, 2, 3), are immutable and
thus can be used as the keys of dictionaries, provided all elements of the tuple are
immutable. The + operator can be used to concatenate two tuples, which does not
directly modify their contents, but rather produces a new tuple containing the
elements of both provided tuples. Thus, given the variable t initially equal to (1, 2,
3), executing t = t + (4, 5) first evaluates t + (4, 5), which yields (1, 2, 3, 4, 5), which
is then assigned back to t, thereby effectively "modifying the contents" of t, while
conforming to the immutable nature of tuple objects. Parentheses are optional for
tuples in unambiguous contexts.

Python features sequence unpacking where multiple expressions, each


evaluating to anything that can be assigned to (a variable, a writable property, etc.),
are associated in the identical manner to that forming tuple literals and, as a whole,
are put on the left-hand side of the equal sign in an assignment statement. The
statement expects an iterable object on the right hand side of the equal sign that
39
produces the same number of values as the provided writable expressions when
iterated through, and will iterate through it, assigning each of the produced values to
the corresponding expression on the left.[citation needed]

Python has a "string format" operator %. This functions analogous to printf


format strings in C, e.g. "spam=%s eggs=%d" % ("blah", 2) evaluates to "spam=blah
eggs=2". In Python 3 and 2.6+, this was supplemented by the format() method of the
str class, e.g. "spam={0} eggs={1}".format("blah", 2). Python 3.6 added "f-strings":
blah = "blah"; eggs = 2; f'spam={blah} eggs={eggs}'.

Python has various kinds of string literals:

Strings delimited by single or double quote marks. Unlike in Unix shells,


Perl and Perl-influenced languages, single quote marks and double quote marks
function identically. Both kinds of string use the backslash (\) as an escape character.
String interpolation became available in Python 3.6 as "formatted string literals".[5]

Triple-quoted strings, which begin and end with a series of three single or
double quote marks. They may span multiple lines and function like here documents
in shells, Perl and Ruby.

Raw string varieties, denoted by prefixing the string literal with an r. Escape
sequences are not interpreted; hence raw strings are useful where literal backslashes
are common, such as regular expressions and Windows-style paths. Compare "@-
quoting" in C#.

Python has array index and array slicing expressions on lists, denoted as
a[key], a[start:stop] or a[start:stop:step]. Indexes are zero-based, and negative
indexes are relative to the end. Slices take elements from the start index up to, but
40
not including, the stop index. The third slice parameter, called step or stride, allows
elements to be skipped and reversed. Slice indexes may be omitted, for example a[:]
returns a copy of the entire list. Each element of a slice is a shallow copy.

In Python, a distinction between expressions and statements is rigidly


enforced, in contrast to languages such as Common Lisp, Scheme, or Ruby. This
leads to duplicating some functionality. For example:

List comprehensions vs. for-loops


Conditional expressions vs. if blocks
The eval() vs. exec() built-in functions (in Python 2, exec is a statement);
the former is for expressions, the latter is for statements.
Statements cannot be a part of an expression, so list and other
comprehensions or lambda expressions, all being expressions, cannot contain
statements. A particular case of this is that an assignment statement such as a = 1
cannot form part of the conditional expression of a conditional statement. This has
the advantage of avoiding a classic C error of mistaking an assignment operator = for
an equality operator == in conditions: if (c = 1) { ... } is syntactically valid (but
probably unintended) C code but if c = 1: ... causes a syntax error in Python.

Methods
Methods on objects are functions attached to the object's class; the syntax
instance.method(argument) is, for normal methods and functions, syntactic sugar for
Class.method(instance, argument). Python methods have an explicit self parameter to
access instance data, in contrast to the implicit self (or this) in some other object-
oriented programming languages.

Typing
Python uses duck typing and has typed objects but untyped variable names.
41
Type constraints are not checked at compile time; rather, operations on an object
may fail, signifying that the given object is not of a suitable type. Despite being
dynamically typed, Python is strongly typed, forbidding operations that are not well-
defined (for example, adding a number to a string) rather than silently attempting to
make sense of them.

Python allows programmers to define their own types using classes, which are
most often used for object-oriented programming. New instances of classes are
constructed by calling the class (for example, SpamClass() or EggsClass()), and the
classes are instances of the metaclass type (itself an instance of itself), allowing
metaprogramming and reflection.

42
CHAPTER 5
CONCLUSION

RESULT AND DISCUSSION


In our implementation, Parkinson disease prediction is done using two
machine learning algorithms such as Logistics regression and Support vector
machine. The result shows that Regression model achieves high accuracy than
support vector machine. The following table shows the implemented algorithm and
its accuracy arrived.
The following table represents the performance of two algorithm over considered
dataset.

Algorithm Accuracy
SVM 81.81
Logistic Regression 88.63

Figure: Accuracy comparison of proposed machine learning algorithms

The following charts shows the error values such as mean squared error (MSE), mean absolute
error (MAE), Root Mean Square Error (RMSE) and R-squared value.

43
44
CONCLUSION

The use of multiple instance learning for detecting Parkinson disease


symptoms is studied. Proposed work addressed the formulation of PD symptom
detection from weakly labeled data as a semi-supervised multiple instance learning
problem. The features were carefully chosen to address the subject and symptom
specific nature of the problem. We show promising preliminary results on four days
of monitoring performed with two PD subjects.

FUTURE ENHANCEMENTS

In future work, we plan to increase our subject pool and utilize optimal
feature selection strategies under MIL frameworks for developing robust person-
specific models. These techniques can potentially be adapted to various other
physiological sensing and monitoring applications as well.

45
REFERENCES

 [1] Dragana Miljkovic et al, “Machine Learning and Data Mining Methods for
Managing Parkinson’s Disease” LNAI 9605, pp 209-220, 2016.
 [2] Arvind Kumar Tiwari, “Machine Learning based Approaches for
Prediction of Parkinson’s Disease,” Machine Learning and Applications- An
International Journal (MLAU) vol. 3, June 2016.
 [3] Dr. Anupam Bhatia and Raunak Sulekh, “Predictive Model for
Parkinson’s Disease through Naive Bayes Classification” International Journal
of Computer Science & Communication vol. 9, March 2018.
 [4] M. Abdar and M. Zomorodi-Moghadam, “Impact of Patients’ Gender on
Parkinson’s disease using Classification Algorithms” Journal of AI and Data
Mining, vol. 6, 2018.
 [5] Md. Redone Hassan et al, “A Knowledge Base Data Mining based on
Parkinson’s Disease” International Conference on System Modelling &
Advancement in Research Trends, 2019.

46
APPENDIX

SAMPLE SCREENSHOT

FigDataset

47
SAMPLE CODE
DATA PREPROCESS:

import os
import csv
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from pandas.plotting import scatter_matrix
from sklearn.linear_model import LogisticRegression
from sklearn.svm import LinearSVC
from sklearn.model_selection import train_test_split
from sklearn import linear_model, datasets
from sklearn import svm

from sklearn.metrics import confusion_matrix, accuracy_score, roc_auc_score,


roc_curve
from matplotlib.colors import ListedColormap
from sklearn.metrics import mean_squared_error
from sklearn.metrics import mean_absolute_error
from sklearn.metrics import r2_score
from sklearn.metrics import roc_curve, auc

newdata=[]

with open("Output/fulldata.csv", "r") as f:


reader = csv.reader(f, delimiter="\t")
for lines in reader:
#print(lines)
for m in lines:
n=m.split(",")
a=[]
c=0
for x in n:
c=c+1
48
if (c!=1 and c!=25 and c!=28 and c!=38 and c!=40):
x=x.replace(" ", "")
x=x.replace("\"", "")
if x=="True":
a.append(1)
elif x=="False":
a.append(0)
elif x=="":
a.append(0)
elif x=="Male":
a.append(0)
elif x=="Female":
a.append(1)
elif x=="None":
a.append(0)
elif x=="Left":
a.append(1)
elif x=="Right":
a.append(2)
elif x=="Don'tknow":
a.append(0)
elif x=="------":
a.append(0)
elif x=="Mild":
a.append(1)
elif x=="Medium":
a.append(2)
elif x=="Severe":
a.append(3)
elif x=="L":
a.append(0)
elif x=="R":
a.append(1)
elif x=="S":
a.append(2)
elif x=="LL":
a.append(0)
elif x=="LR":
a.append(1)
elif x=="LS":
a.append(2)
elif x=="RL":
a.append(3)
elif x=="RR":
49
a.append(4)
elif x=="RS":
a.append(5)
elif x=="SL":
a.append(6)
elif x=="SR":
a.append(7)
elif x=="SS":
a.append(8)
else:
a.append(x)
newdata.append(a)

#print(newdata)

with open("Output/newdata.csv", 'w', newline='') as myfile:


wr = csv.writer(myfile, quoting=csv.QUOTE_ALL)
for m in newdata:
#print(m)
wr.writerow(m)

df_main = pd.read_table("Output/newdata.csv", sep=',')

df_main.astype(float)
# Normalize values to range [0:1]
df_main /= df_main.max()
# split data into independent and dependent variables
y_all=df_main.iloc[:,16]
X_all = df_main.drop(df_main.columns[[16]], axis=1)

fig, axs = plt.subplots(nrows=1, ncols=1, sharey=False, figsize=(10,5))


axs.set_xlabel('No Disease Disease')
axs.set_title('DataSet')
axs.grid()
axs.hist(y_all)
axs.get_children()[0].set_color('g')
axs.get_children()[9].set_color('r')
plt.show()

ncols=3
50
plt.clf()
f = plt.figure(1)
f.suptitle(" Data Histograms", fontsize=12)
vlist = list(df_main.columns)
nrows = len(vlist) // ncols
if len(vlist) % ncols > 0:
nrows += 1
for i, var in enumerate(vlist):
plt.subplot(nrows, ncols, i+1)
plt.hist(df_main[var].values, bins=15)
plt.title(var, fontsize=10)
plt.tick_params(labelbottom='off', labelleft='off')
plt.tight_layout()
plt.subplots_adjust(top=0.88)
plt.show()

SVM AND LOGISTIC REGRESSION:

import os
import csv
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from pandas.plotting import scatter_matrix
from sklearn.linear_model import LogisticRegression
from sklearn.svm import LinearSVC
from sklearn.model_selection import train_test_split
from sklearn import linear_model, datasets
from sklearn import svm

from sklearn.metrics import confusion_matrix, accuracy_score, roc_auc_score,


roc_curve
from matplotlib.colors import ListedColormap
from sklearn.metrics import mean_squared_error
from sklearn.metrics import mean_absolute_error
from sklearn.metrics import r2_score
from sklearn.metrics import roc_curve, auc

mse=[]
mae=[]
rsq=[]
rmse=[]
acy=[]
51
df_main = pd.read_table("Output/newdata.csv", sep=',')

df_main.astype(float)
# Normalize values to range [0:1]
df_main /= df_main.max()
# split data into independent and dependent variables
y_all=df_main.iloc[:,16]
X_all = df_main.drop(df_main.columns[[16]], axis=1)

X_train, X_test, Y_train, Y_test = train_test_split(X_all, y_all)

clf = svm.LinearSVC()
clf.fit(X_train,Y_train)
pred = clf.predict(X_test)

result2=open("Output/resultSVM.csv","w")
result2.write("ID,Predicted Value" + "\n")
for j in range(len(pred)):
result2.write(str(j+1) + "," + str(pred[j]) + "\n")
result2.close()

print("---------------------------------------------------------")
print("MSE VALUE FOR SVM IS %f " % mean_squared_error(Y_test, pred))
print("MAE VALUE FOR SVM IS %f " % mean_absolute_error(Y_test, pred))
print("R-SQUARED VALUE FOR SVM IS %f " % r2_score(Y_test, pred))
rms = np.sqrt(mean_squared_error(Y_test, pred))
print("RMSE VALUE FOR SVM IS %f " % rms)
ac=accuracy_score(Y_test,pred) * 100
print ("ACCURACY VALUE SVM IS %f" % ac)
print("---------------------------------------------------------")
mse.append(mean_squared_error(Y_test, pred))
mae.append(mean_absolute_error(Y_test, pred))
rsq.append(r2_score(Y_test, pred))
rmse.append(rms)
acy.append(ac)

X_train, X_test, Y_train, Y_test = train_test_split(X_all, y_all)


sclf = linear_model.LogisticRegression(C=1e5)
sclf.fit(X_train,Y_train)
pred = sclf.predict(X_test)
52
result2=open("Output/resultLogisticRegression.csv","w")
result2.write("ID,Predicted Value" + "\n")
for j in range(len(pred)):
result2.write(str(j+1) + "," + str(pred[j]) + "\n")
result2.close()

print("---------------------------------------------------------")
print("MSE VALUE FOR Logistic Regression IS %f " % mean_squared_error(Y_test,
pred))
print("MAE VALUE FOR Logistic Regression IS %f " %
mean_absolute_error(Y_test, pred))
print("R-SQUARED VALUE FOR Logistic Regression IS %f " % r2_score(Y_test,
pred))
rms = np.sqrt(mean_squared_error(Y_test, pred))
print("RMSE VALUE FOR Logistic Regression IS %f " % rms)
ac=accuracy_score(Y_test,pred) * 100
print ("ACCURACY VALUE Logistic Regression IS %f" % ac)
print("---------------------------------------------------------")
mse.append(mean_squared_error(Y_test, pred))
mae.append(mean_absolute_error(Y_test, pred))
rsq.append(r2_score(Y_test, pred))
rmse.append(rms)
acy.append(ac)

al = ['SVM','Logistic Regression']

result2=open('Output/MSE.csv', 'w')
result2.write("Algorithm,MSE" + "\n")
for i in range(0,len(mse)):
result2.write(al[i] + "," +str(mse[i]) + "\n")
result2.close()

colors = ["#1f77b4", "#ff7f0e", "#2ca02c", "#d62728", "#8c564b"]


explode = (0.1, 0, 0, 0, 0)

#Barplot for the dependent variable


fig = plt.figure(0)
53
df = pd.read_csv('Output/MSE.csv')
acc = df["MSE"]
alc = df["Algorithm"]
plt.bar(alc,acc,align='center', alpha=0.5,color=colors)
plt.xlabel('Algorithm')
plt.ylabel('MSE')
plt.title("MSE Value");
fig.savefig('Output/MSE.png')
plt.show()

result2=open('Output/MAE.csv', 'w')
result2.write("Algorithm,MAE" + "\n")
for i in range(0,len(mae)):
result2.write(al[i] + "," +str(mae[i]) + "\n")
result2.close()

fig = plt.figure(0)
df = pd.read_csv('Output/MAE.csv')
acc = df["MAE"]
alc = df["Algorithm"]
plt.bar(alc,acc,align='center', alpha=0.5,color=colors)
plt.xlabel('Algorithm')
plt.ylabel('MAE')
plt.title('MAE Value')
fig.savefig('Output/MAE.png')
plt.show()

result2=open('Output/R-SQUARED.csv', 'w')
result2.write("Algorithm,R-SQUARED" + "\n")
for i in range(0,len(rsq)):
result2.write(al[i] + "," +str(rsq[i]) + "\n")
result2.close()

fig = plt.figure(0)
df = pd.read_csv('Output/R-SQUARED.csv')
acc = df["R-SQUARED"]
alc = df["Algorithm"]
colors = ["#1f77b4", "#ff7f0e", "#2ca02c", "#d62728", "#8c564b"]
explode = (0.1, 0, 0, 0, 0)
plt.bar(alc,acc,align='center', alpha=0.5,color=colors)
plt.xlabel('Algorithm')
plt.ylabel('R-SQUARED')
54
plt.title('R-SQUARED Value')
fig.savefig('Output/R-SQUARED.png')
plt.show()

result2=open('Output/RMSE.csv', 'w')
result2.write("Algorithm,RMSE" + "\n")
for i in range(0,len(rmse)):
result2.write(al[i] + "," +str(rmse[i]) + "\n")
result2.close()

fig = plt.figure(0)
df = pd.read_csv('Output/RMSE.csv')
acc = df["RMSE"]
alc = df["Algorithm"]
plt.bar(alc, acc, align='center', alpha=0.5,color=colors)
plt.xlabel('Algorithm')
plt.ylabel('RMSE')
plt.title('RMSE Value')
fig.savefig('Output/RMSE.png')
plt.show()

result2=open('Output/Accuracy.csv', 'w')
result2.write("Algorithm,Accuracy" + "\n")
for i in range(0,len(acy)):
result2.write(al[i] + "," +str(acy[i]) + "\n")
result2.close()

fig = plt.figure(0)
df = pd.read_csv('Output/Accuracy.csv')
acc = df["Accuracy"]
alc = df["Algorithm"]
plt.bar(alc, acc, align='center', alpha=0.5,color=colors)
plt.xlabel('Algorithm')
plt.ylabel('Accuracy')
plt.title('Accuracy Value')
fig.savefig('Output/Accuracy.png')
plt.show()

55
7
58

You might also like