DBMS LAB REPORT SAMPLE

You might also like

Download as pdf
Download as pdf
You are on page 1of 17
—— sé 7e@eee DEPART PART MEN SREENIDH OF ELECTRONICS & COMPUTER ENGINEERING UINSTIPUTE OF SCIENCE & TECHNOLOGY (AUTONOMOUS) CERTIFICATE This is to certify that the Project work PARKINSON'S DISEASE PREDICTION submitied by B.SAHITHI SRIYA (20311A19D5) N.NAVANITHA (2031 1A 19D9)and G. PRADYUMN(20311A19C4) towards partial fulfillment for the award of Bachelor Degree in Electronics & Computer Engineering from Sreenidhi Institute of Science & Technology Ghatkesar , Hyderabad, is a record of bonafide work done by hinv/ her. The resultsembodied in the work are not submitted to any other University or Institute for the award of any degree or diploma. Dr. D. Mohan K. Sreelatha HOD, ECM. Assistant Professor, ECM Oe edbedddededs S64 CEES FIGNO. LIST OF FIGURES FIGURE NAME. ACCURACY OF TRAINING DATA ACCURACY OF TEST DATA. PREDICTION OF DISEASE 1 INTRODUCTION Parkinson's disease (PD) 1s the second wonder plex neurodegeneraty worldwide Both polygenie and environme bout 1%-2% of al factors can cause PD. It 1s found that. in the PD cases (mainly familia ). the disease development occurs through a single gene. The marn symptoms of PD are br ykinesia (motor features), muscle stiffness, and tremor, along with other symptoms such ay steep disorders (nonmotor features), cardiac arrhythmia, and constipation. Alteration of voice and speech is one of the features of PD. Unified Parkinson's Disease Rating Seale or UPDRS: which shows symptoms” presence and severity, is mainly used in tracking PD symptom progression UPDRS is considered as the well-validated test and the most widely used clinical rating scale for patients with PD. UPDRS includes 4 sections, in which UPDRS 1, UPDRS 11, UPDRS Ill, and UPDRS IV are used to evaluate psychiatric symptoms in PD, activities of daily living, reliable motor symptoms measured in PD recognized by physical exam, and complications of treatment. In many studies, this scale is considered based on Total-UPDRS with the range of 0-176 (176 total disability and 0 representing healthy) and Motor-UPDRS which indicates the UPDRS motor section with the range of 0-108 (108 indica ng, severe motor impairment and 0 indicating healthy state). The goal of this paper is to present a comparison of machine leaning approaches for remote tracking of Parkinson's disease progression. The comparative study is based on clustering and prediction learning \ approaches. To further improve the accuracy of UPDRS prediction, this study uses ensemble learning in the final stage of the proposed method. Ensemble leaming approaches have proven to be effective in prediction tasks. Few studies have incorporated ensemble learning approaches for the development of the diseases diagnosis systems. Further investigations are needed for the effe tiveness of these approaches in UPDRS prediction. Accordingly, we use ensembles of support vector regression and dift rent clustering techniques for PD data clustering. The results are then compared with other prediction leaming | approaches, deep belief network (DBN), support vector regression, multiple linear regression, and neuro | fuzzy techniques. eee esbsevuvvWv9¥eT¥ ~~~ ~ 4. 2. ML ALGORITHMS. 1. K-Nearest Neighbors (KNN): KNN is a simple and intuitive algorithm sed for elassificanon and regression tasks. It classifies abjects based on the majority votes of their neyghbors. Given a new data point, it identifies its nearest neighbors in the training set based on a chosen distance metric (c.g., Euclidean distance) and assigns the class label (for classification) or predicts the value (for regression) based on the labels/values of those neighbors 2. Support Vector Machines (SVM): SVM is a powerful supervised learning algonthm used for both classification and regression tasks. Its primary objective in classification is to find the hyperplane that best separates classes while maximizing the margin between them. It works well in high-dimensional spaces and can handle non-linear decision boundaries by using keel | functions to map data into higher-dimensional spaces. | . Random Forest Algorithm: A Random Forest Algorithm is a supervised machine learning algorithm | that is extremely popular and is used for Classification and Regression problems in Machine Learning. | We know that a forest comprises numerous trees, and the more trees more it will be robust. Similarly. | the greater the number of trees in a Random Forest Algorithm, the higher its accuracy and problem- solving ability. Random Forest is a classifier that contains several decision trees on various subsets of the given dataset and takes the average to improve the predictive accuracy of that dataset. It is based on the concept of ensemble leaming which is a process of combining multiple classifiers to solve a ‘complex problem and improve the performance of the model. Logistic and Regression Model: Logistie regression is a supervised machine Ieaming algorithm that accomplishes binary classification tasks by predicting the probability of an outcome, event, or observation. The model delivers a binary or dichotomous outcome limited to two possible outcomes: yes/no, 0/1, or true/false regression analyzes. gression analyzes the relationship between one or mor tive modeling, where the data into discrete nto discrete classes. It is extensively used in pret mathematical probabi probability of whether an instance belongs to a specific category oF Not or example, 0 = repres dan presents a negative class; 1 - represents a positive cl commonly use ly used in binary classification problems where the outcome categories(Oand!. cc independent varia lass. Logistic r variable reveals cither of the bles and classifies model estimates the jession 1s envrew J. ENISTIN HODEL in recent, machine learning knowledge of algorithms have generated a great impact and com muimert in the Parkinson studies network for detection of Parkinson disorder. furthermore, system Bett ® know techniques are distinctive greater precise outcomes in disease prediction in comparison 1? shed others fact taxonomy techniques. prompted by way of ths.the authors have used three distinguish sufferers. The main purpose device studying algorithms for detection and right diagnosis of Parkinson si ods for this of this task is to look at the overall performance sizeof various distinguished class methods for © assignment a couple of machine studying techniques were used which includes support Vector device eral Logistics Regression and dillerent system getting to know algorithms. moreover, the overal performance of the 3 classifiers become evaluated the use of exceptional methods PARKINSON'S DISEASE Skit Symptoms Nonmotor Skil Symptoms ee © suse ose anne AND le soy a0 imaawoun now Q — (ie Wrens 9 vVUSsUVUUUHUV!Y a probe SD |. PROPOSED MODEL By using michine leaminy i lechniques. the hassle ean he solved with minimum mistakes price voice dataset of Parkinson's ailment from the UCT device mastering library 1s as input, additionally our proposed device presents correct results with the aid of integrating sp drawing inputs of regular and Parkinson's affected sufferers. We endorse a hybnd and accurate results reading affected person both voice and spiral drawing data’s. for that reason combining each the effects. the medieal doctor ean finish normality or abnormality me prescribe the medicine based totally af the affected stage K Nearest Neighbour(KNN) classifies data points based on similarity to neighboring points Decision Trees create a tree-like structure to make decisions based on feature values. Random Forest constructs multiple decision trees to improve accuracy and prevent overfitting. The Voting Classifier combines multiple models to enhance overall performanee, while AdaBoost focuscs on sequentially correcting the errors of weak classifiers. The proposed methodology collects audio data from PPMI and UCI about Parkinson's patients Voice modulations. Dataset contains information about jitter, shimmer and MDVP of vowel phonations. Data is preprocessed, analyzed and visualized for a thorough understanding of the attributes, Four models ~ Logistic regression, SVM, Random Forest Regressor and K nearest neighbors ~ are trained on 75% of the data, Models are trained to classify given audio data into PD or healthy, based on variations in frequency. Models are tested on 25% of the data and evaluated based on sensitivity, precision, accuracy, confusion matrix and ROC-AUC score. => jel 5. WORK! The dataset for parkinson’s disease prehiction is taken from Kaggle The datasets availble on K include information abo — Mation about disease prediction and other relevant information DATASET: This dataset 1s composed of a range of biomedical voice measurements from 31 people, 24 with Kinson's discase (PD). Fach column in the table is a particular voice measure, and cach row corresponds to one of 195 vive recordings om these individuals (“name" column). The main aim of the data is to discriminate healthy people from those with PD, according to the "status" column which is set to 0 for healthy and | for PD. The data is in ASCH CSV format, The rows of the “SY file contain an instance corresponding to one Voice recording. There are around six recordings per patient, the name of the patient is identified in the first column. Matrix column entries (attributes): name - ASCII subject name and recording number MDVP:Fo(Hz)~ Average vocal fundamental frequency MDVP:Fhi(H2) - Maximum voeal fundamental frequency MDVP°Flo(t'z) - Minimum vocal fundamental frequeney MDVP-Jittcr(%), MDVP:Jitter( Abs), MDVP:RAP, MDVP:PPQ, Jitter: DDP - Several measures of Variation in fundamental frequene MDVP-Shimmer. MDVP:Shimmer(dB),Shimmer.APQ3.Shimmer.APQS.MDVP:APQ.Shimmer:DDA - Several measures of variation in amplitude NHR, HNR - Two measures of the ratio of noise to tonal components in the voice status - The health status of the subject (one) - Parkinson's, (zero) - healthy RPDF, D2 - Two nonlinear dynamical complexity measures DFA- spread | spread?, al fractal sealing exponent PPL - Three nonlinear measures of fundamental frequency variation 1. Data Preprocessing: As suggested by previous studies, the data is preprocessed to have a more accurate prediction of UPDRS. The goal of data preprocessing in this study is to handle the dataset’s null values. In general, we included the preprocessing stage in the proposed method because it is typically completed during the first step of data analysis. 2. Data Clustering: We use an unsupervised learning technique in this stage for clustering the PD data The objective of this step was to inercase patient record readability through the grouping of patients into different groups We used ensembles of EM to have a better cluster analysis of the data. 3 wwe n this phase to Dimensionality Reduction: To remove the noise of data, the PCA method was wed # lcrable impact on the accuracy of Jower the dimensionality of the data, Multicollinearity has a consid ctors has been predictors and is a major issue in the field of disease diagnosis. The accuracy of SVR predictors has affected by the multicollincarity of the data ut features. In 4. UPDRS Prediction: This stage was performed to predict UPDRS according to the input feature 0 form this contrast to the previous prediction methods for PD diagnosis, we used ensembles of SVR to perform 10 seek the task. R is trained to build prediction models with training datasets. It is a common practice ' ' ¢ ultimate decision advice of several doctors who are experts in the field in various elinical settings. The ultimate de for a specific therapy is thus normally made through consultation and a combination of opinions of 2 committee of specialists. Ensemble fearing systems serve a similar funetion in the machine Jearmins context, The total error ean be reduced by combining the output of different prediction madels through an algebraic expression (e.g., mean value of the predictions), as the various errors of the prediction models | are averaged out HOR ine waste DPR APR ae PW rune aegis bene wore HuDREp 6 RESULTS AND OUTPUTS . +0 ' ° j wee neenm eee ee ' ' im wE Oe ee coee eis . ’ 6.1, Loading data from csv file to pandas data frame and printing first S rows . , , - ° ’ . aaa cts et eae eee ’ = ’ 7 > > > . ade Cas geoven ee e ’ > 6.2, Stastical measures about the data > 5 , P > a K ® . ® ee ey 8: Data Preprocessing and separating features and target a Che GcoegE 6.4, Status of the data 9 oo a0 20 o~ an 0 ® . wi © 008 wore 6S Splitting the data to training and test data a8 a eo . ud oe bT008 ose 6.6. Accuracy score of training data wv. eovuvwv eave tenet tenn 224 wove’ ne Goes 6.7. Accuracy score of test data eecoes 6.8. Prediction of disease ,SCCCUG Ct OL 1 117 22 2 “ sas 7, CONCLUSION joa ofthe presented methods for PD prediction depend strongly on human proficiency [96]. The benetits ofdeploying the ML in the medical sector are that they provide objective, context independent, and data- ven analysis, ML approaches have been utilized effectively in disease diagnosis and severity also been ulilized in analyzing the data collected from wearable IMU prediction. Particularly, ML has sensors for automated evaluation of motor disorders like PD. tence, the practical aim of this study entails providing supplementary. quick, and agents methods that can aid experts in reaching more objective medical decisions considering the PD diagnosis. By deploy these methods inthe appropriate systems, several gains can be acquired that ent reducing the expenses of manual diagnosis and minimizing diagnosis time. severity predietion, and Many previous works ave been conducted foeusing on patients’ classfieg several sensors: remote monitoring, Still, there are future routes in cach field to be investigated. Besides roscope have been utilized and assessed. Additionally, MRI. urate predictions of the disease. such as magnetometer, aceclerometer, and EEG signals, fMRI, and DATSCAN images were utilized to present a\ ‘an be followed by utilizing other brain signal imag a more accurate classification such as ECG, EMG, and Other research directions ¢¢ PCG, Other sensing modalities ean be explored and combined to present of the disease. 8.REFERENCES APP. ENDIX wy Pere pandas a8 Pd port Par tparode. selection amport Crain test spit fee cara spteprocessing import Standardscaier ee earn amport. 6vm ‘Kearhemetrics INport accuracy score jing the data from csv file to a Pandas ramre § owinaona data = pd.read csv(" /content/parkinsons.csv") pa g the first 5 rows of the datefrane f printing the £1 | head () parkinsons_dat number of rows and colunns in the dataframe sshape 1 parkinsons_dat f getting more information about the dataset parkinsons_data.info() f checking for missing values in each column parkineons_data.isnuli() .sum() f getting sone statistical measures about the date parkinsons_data.describe() f distribution of target Varieb! parkingons_data("status'] valu counts () f grouping the data bas3ed on the target variable parkinsons_data.groupby(" status’) «mean () 4 = parkinsons_data.drop{columns=('neme', 'status'], a1 y = parkinsons_data|'status"] print (4) nec) xh train, Xteat, Y_train, {test = train test_split(x, ¥, test_size=0.2, random _state=2) : print (eshepe, X trein.shape, %_test.shape) scaler = StandardScaler scaler fit (Z_train) 00 6 6646606076 0-004-0024270-4-2<4-2~ train = acaler.transform(%_train) eu t= scaler.transform(% test) print (% train) model = avn. SVC (kerne. . inear') # training the SVM model with training dat model. fit (X_train, Y train) SS yO P88 oP OP Or we en eens ek we » on training data tax_traan) Scuracy = accuracy score(¥ train, y score of training tion) print ("ACCU curacy = accuracy_score(¥_test, yata_ac' test data: ', uracy score 500, 0.00289, 0.00002, 0.00166, 0.00168, 0.00498, 802, 0.91689, 0.00339, 26.77500,0.422229,0.74 -085563) # changing input data to a numpy array input_data_as_numpy_arrey = np-asarray(input_data) # reshape the numpy array input_data_reshaped = input_data_as_numpy_array.reshape(1,-1) # standardize the data std_data = scaler.transform(input_data_reshaped) prediction node’ predict (std_data) print (prediction) (prediction [0] == 9): © Person does not have Parkinsons Disease") print ( print ("The Person hes Parkinsons")

You might also like