Professional Documents
Culture Documents
MLDM PPT
MLDM PPT
TEAM 27
P Viswa Teja Reddy CSE18095
Sai Charan K M CSE18109
Sai Teja Prasanth V CSE18111
S Vikas Reddy CSE18113
S Sura Reddy CSE18114
DATASET EXPLORATION
• Abstract:
Experimental data used for binary classification to predict if the
temperature at a particular place in the ocean is greater than the average
temperature or less.
• The Dataset used is CalCOFI which has Over 60 years of oceanographic data.
• The attributes used here in this dataset are
• Depth
• Temperature
• Salinity
• O2 Saturation Level
• No of instances: 2000
• No of attributes: 5
BASIC IMPORTS
DBSCAN
• To find the min value of eps a graph is plotted for distances vs eps.
• From the graph it’s clear that epsilon value of 0.025 will be optimal
• Upon looking for different values of min_samples and calculating Silhouette
Coefficient for min_samples as 25 the Silhouette Coefficient returned the
highest value.
KNN CLASSIFICATION
• Finds K points in the training set that are nearest to the given test input and
counts how many members of each class are in this set.
• It assigns the majority class in case of classification and for regression gives the
mean of all these test points.
• KNN is a lazy learner.
KNN CLASSIFICATION
• The first 2000 instances of the dataset and only the attributes which
determine temperature are taken.
• These are classified into 2 classes.
• Class 1 where temperature of the ocean is greater than or equal to the
average.
• Class 2 where temperature of the ocean is less than the average .
• KNN classifier is used to predict which class the temperature falls under.
• The observed min, mean, max values of temperatures are 2.78°C, 9.26 °C,
19.76 °C.
• There are 925 records of class 1 and 1022 values of class 2.
• A graph for k vs accuracy is plotted.
• Maximum accuracy is obtained for k=5.
• The observed precision, accuracy, recall • The confusion matrix is also as follows.
and f1 score are as below.
202 5
2 181
LINEAR REGRESSION
RMSE: 0.53
Mean absolute error: 0.41
R2-score: 0.72
MULTI VARIABLE SINGLE VARIABLE
RMSE: 0.23
Mean absolute error: 0.18
R2-score: 0.94
NAIVE BAYES CLASSIFICATION
• The first 2000 instances of the dataset and only the attributes which determine
temperature are taken.
• These are classified into 2 classes.
• Class 1 where temperature of the ocean is greater than or equal to the
average.
• Class 2 where temperature of the ocean is less than the average .
• Naïve Bayes is used to predict which class the temperature falls under.
NAIVE BAYES CLASSIFICATION WITH
K-FOLD
• The observed precision, accuracy, recall • The confusion matrix is also as follows.
and f1 score are as below.
968 54
34 891
NAIVE BAYES CLASSIFICATION WITH MINMAX
SCALER
• The observed precision, accuracy, recall • The confusion matrix is also as follows.
and f1 score are as below.
204 12
7 167
NAIVE BAYES CLASSIFICATION WITH MINMAX
AND K-FOLD
• The observed precision, accuracy, recall • The confusion matrix is also as follows.
and f1 score are as below.
969 53
32 893
DECISION TREE
• A Decision tree is a flowchart like tree structure, where each internal node
denotes a test on an attribute, each branch represents an outcome of the test,
and each leaf node (terminal node) holds a class label.
•
DECISION TREE
• The first 2000 instances of the dataset and only the attributes which
determine temperature are taken.
• These are classified into 2 classes.
• Class 1 where temperature of the ocean is greater than or equal to the
average.
• Class 2 where temperature of the ocean is less than the average .
• SVM is used to predict which class the temperature falls under.
Decision tree with k fold and min max scaler.
• The observed precision, accuracy, recall • The confusion matrix is also as follows.
and f1 score are as below.
1543 0
1 1378
Decision tree with test train split as 0.8
• The observed precision, accuracy, recall • The confusion matrix is also as follows.
and f1 score are as below.
479 0
0 398
SVM CLASSIFICATION
• The first 2000 instances of the dataset and only the attributes which
determine temperature are taken.
• These are classified into 2 classes.
• Class 1 where temperature of the ocean is greater than or equal to the
average.
• Class 2 where temperature of the ocean is less than the average .
• SVM is used to predict which class the temperature falls under.
SVM CLASSIFICATION WITH LINEAR KERNEL
205 0
1 184
SVM CLASSIFICATION WITH MINMAX
NORMALIZATION
• The observed precision, accuracy, recall • The confusion matrix is also as follows.
and f1 score are as below.
198 3
1 188
SVM CLASSIFICATION WITH MINMAX
NORMALIZATION AND K-FOLD
168 18
9 195
1. STANDARDIZATION
USING THE FIRST AND SECOND USING THE SECOND AND THIRD USING THE FIRST AND THIRD
PRINCIPAL COMPONENT PRINCIPAL COMPONENT PRINCIPAL COMPONENT
K MEANS