Professional Documents
Culture Documents
Machine Learninf File Final
Machine Learninf File Final
Machine Learninf File Final
Submitted By:
Rishit Lowanshi
Enrollment No-
0875CS191106
CS-B/ III Year/ VI Semester
Objective:- Find-S algorithm to the real-world problem and find the most specific hyposis from the
training data.
Outcome:- Students will be able to apply the Find-S algorithm to the real world problem and find the
most specific hypnosis from the training data.
Theory:-
Find-S Algorithm:- Find-S algorithm is a basic concept learning algorithm in machine learning. The
Find- S algorithm finds the most specific hypothesis that fits all the positive examples. We have to note
here that the algorithm considers only those positive training examples. Find-S algorithm starts with the
most specific hypothesis and generalizes this hypothesis each time it fails to classify an observed positive
training data. Hence, Find-S algorithm moves from the most specific hypothesis to the most general
hypothesis.
FIND-S Algorithm
1. Initialize h to the most specific hypothesis in H
2. For each positive training instance x For each attribute constraint ai in h If the constraint ai is satisfied
by x Then do nothing Else replace ai in h by the next more general constraint that is satisfied by x
3. Output hypothesis h
Training Example:-
Programme:-
import csv
num_attributes = 6
a = [] print("\n The Given Training Data Set \n")
with open('enjoysport.csv', 'r') as csvfile:
reader = csv.reader(csvfile)
for row in reader:
a.append (row)
print(row)
print("\n The initial value of hypothesis: ")
hypothesis = ['0'] * num_attributes
print(hypothesis)
for j in range(0,num_attributes):
hypothesis[j] = a[0][j];
print("\n Find S: Finding a Maximally Specific Hypothesis\n")
for i in range(0,len(a)):
if a[i][num_attributes]=='yes':
for j in range(0,num_attributes):
if a[i][j]!=hypothesis[j]:
hypothesis[j]='?'
else :
hypothesis[j]= a[i][j]
print(" For Training instance No:{0} the hypothesis is ".format(i),hypothesis)
print("\n The Maximally Specific Hypothesis for a given Training Examples :\n") print(hypothesis)
Data Set:
Output :-
Objective:- To know the candidate elimination algorithm and output a description of the set of all
hypotheses consistent with the training example.
Outcome:- The students will be able to apply the candidate elimination algorithm and output a description
of the set of all hypotheses consistent with the training examples.
Theory:-
Candidate Elimination:-
You can consider this as an extended form of the Fund-S algorithm.
Consider both positive and negative examples.
Actually, positive examples are used here as the Find-S algorithm (Basically they are generalized
from the specification).
While negative examples are specified from generalized form.
Programme:-
import csv
with open("trainingdata.csv") as f:
csv_file=csv.reader(f)
data=list(csv_file)
s=data[1][:-1]
g=[['?' for i in range(len(s))] for j in range(len(s))]
for i in data:
if i[-1]=="Yes":
for j in range(len(s)):
if i[j]!=s[j]:
s[j]='?'
g[j][j]='?'
elif i[-1]=="No":
for j in range(len(s)):
if i[j]!=s[j]:
g[j][j]=s[j]
else:
g[j][j]="?"
print("\nSteps of Candidate Elimination Algorithm",data.index(i)+1)
print(s)
print(g)
gh=[]
for i in g:
for j in i:
if j!='?':
gh.append(i)
break
print("\nFinal specific hypothesis:\n",s)
print("\nFinal general hypothesis:\n",gh)
Output:-
Steps of Candidate Elimination Algorithm 1
['Sunny', 'Warm', '?', 'Strong', 'Warm', 'Same']
[['?', '?', '?', '?', '?', '?'], ['?', '?', '?', '?', '?', '?'], ['?', '?', '?', '?', '?', '?'], ['?', '?', '?', '?', '?', '?'], ['?', '?', '?', '?', '?', '?'],
['?', '?', '?', '?', '?', '?']]
Objective:- How to demonstrate the working of the decision tree-based ID3 algorithm, use an appropriate
data set for building the decision tree and apply this knowledge to classify a new sample.
Outcome:- The student will be able to demonstrate the working of the decision tree-based ID3
algorithm, use an appropriate data set for building the decision tree and apply this
knowledge to classify a new sample.
Theory:-
Decision Tree:- Decision tree is the most powerful and popular tool for classification and prediction. A
Decision tree is a flowchart-like tree structure, where each internal node denotes a test on an attribute,
each branch represents an outcome of the test, and each leaf node (terminal node) holds a class label.
A decision tree is basically a tree-shaped diagram, which shows a graphical representation of a problem
with all the possible solutions to a decision. All decisions are based on some conditions, which is used to
determine a solution for a problem.
A tree can be “learned” by splitting the source set into subsets based on an attribute value test. This
process is repeated on each derived subset in a recursive manner called recursive partitioning.
Decision tree terminology
Information Gain:- The information gain is based on the decrease in entropy after a data
set is split on an attribute. An information gain is the difference of error in the root node
and in the leaf node.
Root node:- Root node is basically a problem that is going to solve. The topmost node.
Leaf node:- Node that can not be further divided into nodes. The final node is the
leaf node.
Splitting:- Splitting is dividing root nodes into sub-nodes.
Branches:- Formed by splitting root nodes into nodes and further nodes into sub-nodes.
Decision Node:- When a sub-node splits into further sub-nodes, then it is called a
decision node.
Parent/ Child node:- All the top nodes are parent nodes to the nodes derived by
them, and derived nodes are child nodes.
Pruning:- It’s opposite to splitting, removing sub-nodes from a decision node.
Basically removing unwanted nodes/branches.
Algorithms
CART (Gini Index)
ID3 (Entropy, Information Gain)
Note:- Here we will understand the ID3 algorithm
Algorithm Concepts:-
Data set:-
Output:-
Calculating Entropy of Whole Data-set
Output:-
Output:-
In the same way, we will Calculate the information gain of the remaining attributes and then the attribute
that has the most information will be named the best attribute.
Output:-
Note:-The pprint module provides a capability to pretty-print arbitrary Python data structures in a well-
formatted and more readable way.
Note:- After running the algorithm the output will be very large because we have also called the
information gain function in it, which is required for the ID3 Algorithm.
Output:-
Shivajirao Kadam Institute of Technology and Management, Indore
(Acropolis Technical Campus)
Department of Computer Science and Engineering
Machine Learning (CS-601)
Experiment No. 4
Objective:- To build an Artificial Neural Network by implementing the Backpropagation algorithm and
test the same using appropriate data sets.
Outcome:- The student will be able to build an Artificial Neural Network by implementing the
Backpropagation algorithm and test the same using appropriate data sets.
Theory:-
BackPropagation
Backpropagation is a supervised learning algorithm for training Neural Networks.
Every node in Neural Network represent a Neuron, so we can say that Neural
Network is a circuit of neurons,
A Neural Network consists of an input layer, an output layer and a hidden layer, let's
see in the diagram.
Forward Propagation:-
Back-propagation Implementation
import numpy as np
x = np.array(([2, 9], [1, 5], [3, 6]), dtype=float)
print("small x",x)
#original output
y = np.array(([92], [86], [89]), dtype=float)
X = X/np.amax(X,axis=0) #maximum along the first axis
print("Capital X",X)
Output:-
#Variables initialization
epoch=7000 #Setting training iterations
lr=0.1 #Setting learning rate
inputlayer_neurons = 2 #number of input layer neurons
hiddenlayer_neurons = 3 #number of hidden layers neurons
output_neurons = 1 #number of neurons at output layer
Note:
In this code, we have defined the sigmoid function and its derivative function.
As you know, we train the Neural network many times at a single point, for that we need the
number of epochs.
Below that we have defined the only number of neurons in each layer.
Note:
Here we have defined random weights and bias
As we know, we should first define the weights and biases for the first (here we have only one
hidden layer) hidden layer.
After that, we have defined the weights and bias for the output layer.
Keep in mind when defining the size of the weight (how many neurons are in the previous layer,
the number of neurons in the layer for that we have defined weights).
Size of bias (number of neurons in output layer, the number of neurons in the layer for that we
have defined biases).
#Forward Propagation
for i in range(epoch):
hinp1=np.dot(X,wh)
hinp=hinp1 + bh
hlayer_act =
sigmoid(hinp)
outinp1=np.dot(hlayer_act,wout)
outin poutine outil p1+ bout
output = sigmoid(outinp)
Note:
Here we are just calculating the output of our model, first, we have done this for the hidden layer
and after that for the output layer and finally get the output.
np .dot is used for the dot product of two matrices.
#Backpropagation Algorithm
EO = y-output
outgrad = derivatives_sigmoid(output)
d_output = EO* outgrad
EH = d_output.dot(wout.T)
hiddengrad = derivatives_sigmoid(hlayer_act)
#Updating Weights
wh += X.T.dot(d_hiddenlayer) *lr
print("Actual Output: \n" + str(y))
print("Predicted Output: \n" ,output)
Output:-
Shivajirao Kadam Institute of Technology and Management, Indore
(Acropolis Technical Campus)
Department of Computer Science and Engineering
Machine Learning (CS-601)
Experiment No. 5
Problem Statement:- Write a program to implement the naive Bayesian classifier for a sample training
dataset stored as a .CSV file. Compute the accuracy of the classifier, considering few test datasets.
Objective:- To apply a naive bayesian classifier for the relevant problem and analyse the result.
Outcome:- The student will be able to apply a naive bayesian classifier for the relevant problem and
analyse the result.
Theory:-
Introduction to
Bayes
Naive Bayes is among one of the very simple and powerful algorithms for classification based on Bayes
Theorem with an assumption of independence among the predictors. The Naive Bayes classifier assumes
that the presence of a feature in a class is not related to any other feature. Naive Bayes is a classification
algorithm for binary and multi-class classification problems.
Bayes theorem:-
Based on prior knowledge of conditions that may be related to an event, Bayes theorem describes the
probability of the event
conditional probability can be found this way
Assume we have a Hypothesis(H) and evidence(E),
According to Bayes theorem, the relationship between the probability of Hypothesis before getting the
evidence represented as P(H) and the probability of the hypothesis after getting the evidence represented
as P(H|E) is:
P(H|E) = P(E|H)*P(H)/P(E)
Training Example:-
Programme:-
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import math
import csv
def pre_processing(df):
Output:-
Shivajirao Kadam Institute of Technology and Management, Indore
(Acropolis Technical Campus)
Department of Computer Science and Engineering
Machine Learning (CS-601)
Experiment No. 6
Problem Statement:- Define a classifier to mark email as spam in your inbox based on the defined Naive
Bayes classifier.
Objective:- To apply a naive bayesian classifier for document classification and analyse the results.
Outcome:- The student will be able to apply a naive bayesian classifier for document classification and
analyse the results.
Theory:-
Naive Bayes classification is a simple probability algorithm based on the fact that all features of the model
are independent. In the context of the spam filter, we suppose that every word in the message is
independent of all other words and we count them with the ignorance of the context.
First of all, we take the Bayes formula of the conditional probability and apply it to our task:
The probability of the message that contains words (w1, w2, w3, …) to be spam is proportional
to the probability to get the spam multiplied by a product of probabilities for every word in the
formula:
Alpha — the coefficient for the cases when a word in the message is absent in
our dataset.
Dataset:-
Implementation
We will follow the formulas mentioned above and define the main values:
‘SMS’].apply(len).sum()
‘SMS’].apply(len).sum()
Nvoc = len(train_data.columns - 3)
set alpha = 1
Program:-
def p_w_spam(word):
if word in train_data.columns:
return (train_data.loc[train_data['Label'] == 'spam', word].sum() + alpha) / (Nspam + alpha*Nvoc)
else:
return 1
def p_w_ham(word):
if word in train_data.columns:
return (train_data.loc[train_data['Label'] == 'ham', word].sum() + alpha) / (Nham + alpha*Nvoc)
else:
return 1
Output:-
Shivajirao Kadam Institute of Technology and Management, Indore
(Acropolis Technical Campus)
Department of Computer Science and Engineering
Machine Learning (CS-601)
Experiment No. 7
Problem Statement:- Define a Bayesian network model to demonstrate the diagnosis of heart patients
using standard Heart Disease Data Set.
Objective:- To apply a bayesian network for the medical data and demonstrate the diagnosis of heart
patients using standard Heart Disease Data Set.
Outcome:- Define a Bayesian network model to demonstrate the diagnosis of heart patients using
standard Heart Disease Data Set.
Theory:-Bayesian networks are a type of probabilistic graphical model that uses Bayesian
inference for probability computations. Bayesian networks aim to model conditional
dependence, and therefore causation, by representing conditional dependence by edges in a
directed graph. Through these relationships, one can efficiently conduct inference on the random
variables in the graph through the use of factors.A Bayesian network is a directed acyclic graph in
which each edge corresponds to a conditional dependency, and each node corresponds to a unique random
variable. Bayesian network consists of two major parts: a directed acyclic graph and a set of conditional
probability distributions
• The directed acyclic graph is a set of random variables represented by nodes.
• The conditional probability distribution of a node (random variable) is defined for every
possible outcome of the preceding causal node(s).
Dataset:-
Attribute Information:
1. age: age in years
2. sex: sex (1 = male; 0 = female)
3. cp: chest pain type
• Value 1: typical angina
• Value 2: atypical angina
• Value 3: non-anginal pain
• Value 4: asymptomatic
4. trestbps: resting blood pressure (in mm Hg on admission to the hospital)
5. chol: serum cholesterol in mg/dl
6. fbs: (fasting blood sugar > 120 mg/dl) (1 = true; 0 = false)
7. restecg: resting electrocardiographic results
• Value 0: normal
• Value 1: having ST-T wave abnormality (T wave inversions and/or ST elevation or depression of
> 0.05 mV)
• Value 2: showing probable or definite left ventricular hypertrophy by Estes' criteria
8. thalach: maximum heart rate achieved.
9. exang: exercise induced angina (1 = yes; 0 = no)
10. oldpeak = ST depression induced by exercise relative to rest
11. slope: the slope of the peak exercise ST segment.
• Value 1: upsloping
• Value 2: flat
• Value 3: downsloping
12.thal: 3 = normal; 6 = fixed defect; 7 = reversible defect
13.Heartdisease: It is integer valued from 0 (no presence) to
4.
age sex cp trestbps chol fbs restec thalach exang oldpeak slope ca thal Heart
Disease
63 1 1 145 233 1 2 150 0 2.3 3 0 6 0
67 1 4 160 286 0 2 108 1 1.5 2 3 3 2
67 1 4 120 229 0 2 129 1 2.6 2 2 7 1
41 0 2 130 204 0 2 172 0 1.4 1 1 3 0
62 0 4 140 268 0 2 160 0 3.6 3 3 3 3
60 1 4 130 206 0 2 132 1 2.4 2 2 7 4
Program:
import numpy as np
import pandas as pd
import csv
from pgmpy.estimators import MaximumLikelihoodEstimator
from pgmpy.models import BayesianModel
from pgmpy.inference import VariableElimination
#read Cleveland Heart Disease data
heartDisease = pd.read_csv('heart.csv')
heartDisease = heartDisease.replace('?',np.nan)
Output:-
’m
0
¥
Shivajirao Kadam Institute of Technology and Management, Indore
(Acropolis Technical Campus)
Department of Computer Science and Engineering
Machine Learning (CS-601)
Experiment No. 8
Problem Statement:-Take data set as .CSV file and apply EM algorithm and k-Means algorithm to
cluster. Compare the results of these two algorithms and comment on the quality of clustering.
Objective:-To apply EM algorithm and k-Means algorithm for clustering and analyse the data.
Outcome:-The student will be able to apply EM algorithm and k-Means algorithm for clustering and
analyse the results.
Theory:-
GMMs are probabilistic models that assume all the data points are generated from a mixture of several
Gaussian distributions with unknown parameters. They differ from k-means clustering in that GMMs
incorporate information about the center(mean) and variability(variance) of each cluster and provide
posterior probabilities.
The K-means approach is an example of a hard assignment clustering, where each point can belong to
only one cluster. Expectation-Maximization algorithm is a way to generalize the approach to consider the
soft assignment of points to clusters so that each point has a probability of belonging to each cluster.
EM algorithm:
EM is an iterative algorithm to find the maximum likelihood when there are latent variables. The
algorithm iterates between performing an expectation (E) step, which creates a heuristic of the posterior
distribution and the log-likelihood using the current estimate for the parameters, and a maximization (M)
step, which computes parameters by maximizing the expected log-likelihood from the E step. The
parameter-estimates from M step are then used in the next E step.
K-means clustering is able to gradually learn how to cluster the unlabelled points into groups by analysis
of the mean distance of said points. In this case, the variable k depicts the number of clusters or different
groups in which the data will be gathered. The algorithm functions by moving the data in such a manner
that error function is minimized.
Dataset:-
Program:-
from sklearn.cluster import KMeans
from sklearn import preprocessing
from sklearn.mixture import GaussianMixture
from sklearn.datasets import load_iris
import sklearn.metrics as sm
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
dataset=load_iris()
X=pd.DataFrame(dataset.data)
X.columns=['Sepal_Length','Sepal_Width','Petal_Length','Petal_Width']
y=pd.DataFrame(dataset.target)
y.columns=['Targets']
plt.figure(figsize=(14,7))
colormap=np.array(['red','lime','black'])
#REAL PLOT
plt.subplot(1,3,1)
plt.scatter(X.Petal_Length,X.Petal_Width,c=colormap[y.Targets],s=40)
plt.title('Real')
#KMeans -PLOT
plt.subplot(1,3,2)
model=KMeans(n_clusters=3)
model.fit(X)
predY=np.choose(model.labels_,[0,1,2]).astype(np.int64)
plt.scatter(X.Petal_Length,X.Petal_Width,c=colormap[predY],s=40)
plt.title('KMeans')
#GMM PLOT
scaler=preprocessing.StandardScaler()
scaler.fit(X)
xsa=scaler.transform(X)
xs=pd.DataFrame(xsa,columns=X.columns)
gmm=GaussianMixture(n_components=3)
gmm.fit(xs)
y_cluster_gmm=gmm.predict(xs)
plt.subplot(1,3,3)
plt.scatter(X.Petal_Length,X.Petal_Width,c=colormap[y_cluster_gmm],s=40)
plt.title('GMM Classification')
Output:-
Shivajirao Kadam Institute of Technology and Management, Indore
(Acropolis Technical Campus)
Department of Computer Science and Engineering
Machine Learning (CS-601)
Experiment No. 9
Problem Statement:- Classify iris data set using k-Nearest Neighbour algorithm. Print both correct and
wrong predictions. Python ML library classes can be used for this problem.
Objective:- To implement k-Nearest Neighbour algorithm to classify the iris data set and Print both
correct and wrong predictions.
Outcome:- The student will be able to implement k-Nearest Neighbour algorithm to classify the iris
data set and Print both correct and wrong predictions.
Theory:-
K-Nearest Neighbor (KNN)
KNN is simple supervised learning algorithm used for both regression and classification
problems.
KNN is basically store all available cases and classify new cases based on similarities
with stored cases.
Concept: So the concept that KNN works on is Basically similarities measurements, for
example, if you look at Mango,it is more similar to Apple then dog or cat, then what
KNN will do is put it in the category of fruits not in the category of animals.
let's understand the concept of KNN algorithm with iris flower problem
Data: This data consist of total 150 instances (samples) , 4 features , and three classes (targets).
Problem: Using four features we have to classify which flower belongs to which category.
Importing Data-set
import sklearn
import pandas as pd
from sklearn.datasets import load_iris
iris=load_iris()
iris.keys()
df=pd.DataFrame(iris['data'])
print(df)
print(iris['target_names'])
Iris['feature_names']
Output :-
Note:
1. Now we need a target and data so that we can train the model
2. As we know that we have to find out the class from the features we have
3. With this logic,our target is classes (0,1,2) and data is in
df. 4.
X=df
y=iris['targe
t']
Splitting Data
1. The data is split so that with some data we can train the model and from the
remaining data we can test the model and can check how well our model is
2. To do this we have an inbuilt function in sklearn
from sklearn.model_selection import train_test_split
Note: It will split our 33% data into testing data and remaining data is our training data
KNN Classifier and Training of the Model
KNeighborsClassifier
knn=KNeighborsClassifier(n_neighbors=3)
Note:
1. It implements the concepts of KNN. Here we have taken number of neighbors (K)= 3.
2. First, it will calculate the distance with all the training points to the test point and then
select the three lowest distance points.
3. And test data point is classify to the class most common in among three.
knn.fit(X_train,y_train
)
Note:- Training the model with features values (data) and target values (target)
prediction=knn.predict(x_new) iris['target_names']
[prediction]
Output
Note: As we can see that our point belongs to class (0 or setosa class), this demo is just for
understanding
from sklearn.metrics import confusion_matrix
from sklearn.metrics import accuracy_score
from sklearn.metrics import classification_report
y_pred=knn.predict(X_test)
cm=confusion_matrix(y_test,y_pred)
print(cm)
print(" correct predicition",accuracy_score(y_test,y_pred))
print(" worng predicition",(1-accuracy_score(y_test,y_pred)))
Output :-
Note: As you can see in confusion matrix only one prediction is wrong , and also our accuracy is
0.98 (98%).
Practice KNN - We have a dataset that contains multiple user's information through the social
network who are interested in buying SUV Car or not.
Shivajirao Kadam Institute of Technology and Management, Indore
(Acropolis Technical Campus)
Department of Computer Science and Engineering
Machine Learning (CS-601)
Experiment No. 10
Problem Statement:- Pick an appropriate data set for your experiment, draw graphs using the non-
parametric Locally Weighted Regression algorithm in order to fit data points.
Objective:- To understand and implement linear regression and analyse the results with change in the
parameters.
Outcome:- To understand and implement linear regression and analyse the results with change in the
parameters
Theory:-
Locally weighted Linear Regression
Linear weighted regression is the same as linear regression.
What is linear regression then
linear regression - Linear regression is a supervised learning algorithm.It basically
work on the concept of line equation.
Y = mX +C
m-C oefficient of X and C - Constant
Linear regression perform a task to predict a dependent variable (Y) based on the
independent variable (X).
So, basically what we do in this modeling we try to find best fit line (regression
line),As we can see from the equation ofcourse will be a straight line.
Note:
Here line represent Linear Regression,If we do not have the data like this, then what will we do? See
the diagram below
Note:
Here we need a polynomial type model ,and the concept of Locally weighted regression comes
here.
Let's understand it with cost function (calculate least square error)
Cost Function (Linear regression)
Cost Function (Locally weighted regression)
1. As we can see, there is only one difference in both of them is the weight.
2. Here we use least weighted squared error.
3. Let's see it by formula
So,The interesting facts in this formula is we can get a non-linear regression model by changing
the value of T(tau) that is as strong as polynomial regression of any degree.
How to implement it
import numpy as np
import matplotlib.pyplot as plt X=np.linspace(-3,3,1000) print(X)
X+=np.random.normal(scale=0.05,size=1000) Y=np.log(np.abs((X**2)-1)+0.5)
print(Y)
plt.scatter(X,Y,alpha=0.32)
1. Here we are creating a function, this function is calculating our final h(x0).
2. As you can see in the formulas above, we have 2 functions of beta(x0), here we
are using the below one (in orange box), which is modify form of the above one.
3. np.r_ will create an array which will contain one row and any number of columns,
np.c_ will create an array which will contain one column and any number of rows.
4. We have defined below the radial_kernel function which will calculate our weight
w(x,xo).
5. X.T is transform of matrix (array).
6. Here @ represent matrix multiplication and the pinv used to invert the matrix
def radial_kernel(x0, X, tau):
return np.exp(np.sum((X - x0) ** 2, axis=1) / (-2 * tau * tau))
Note:
1. Here we have defined our training point x0,and then called our
function local_regression.
2. After that we plot our original plot and predicted plot (model).
3. As you can see in the plot, our model is perfectly fit, if you change the value of tau
then the red line (model) will change.
4. Shape of our model depends on the value of tau if you change the value of tau
the shape will change.
Output :-