Machine Learninf File Final

202
Shivaji Rao Kadam

Institute of
Technology and
Management, Indore
Submitted To:
Prof. Kunal Batra Department of Computer
(Asst. Prof., CSE Dept.)
Science & Engineering
Machine Learning [CS-601]
Submitted By:
Rishit Lowanshi
Enrollment No-
0875CS191106
CS-B/ III Year/ VI Semester
[LAB ASSIGNMENT Machine Learning CS-601]

Experiment No. 1
Problem Statement:- Implement and demonstrate the FIND-S algorithm for finding the most specific
hypothesis based on a given set of training data samples. Read the training data from a .CSV file.
Objective:- Find-S algorithm to the real-world problem and find the most specific hyposis from the
training data.
Outcome:- Students will be able to apply the Find-S algorithm to the real world problem and find the
most specific hypnosis from the training data.
Theory:-
Find-S Algorithm:- Find-S algorithm is a basic concept learning algorithm in machine learning. The
Find- S algorithm finds the most specific hypothesis that fits all the positive examples. We have to note
here that the algorithm considers only those positive training examples. Find-S algorithm starts with the
most specific hypothesis and generalizes this hypothesis each time it fails to classify an observed positive
training data. Hence, Find-S algorithm moves from the most specific hypothesis to the most general
hypothesis.
FIND-S Algorithm
1. Initialize h to the most specific hypothesis in H
2. For each positive training instance x For each attribute constraint ai in h If the constraint ai is satisfied
by x Then do nothing Else replace ai in h by the next more general constraint that is satisfied by x
3. Output hypothesis h
Training Example:-
Programme:-
import csv
num_attributes = 6
a = [] print("\n The Given Training Data Set \n")
with open('enjoysport.csv', 'r') as csvfile:
reader = csv.reader(csvfile)
for row in reader:
a.append (row)
print(row)
print("\n The initial value of hypothesis: ")
hypothesis = ['0'] * num_attributes
print(hypothesis)
for j in range(0,num_attributes):
hypothesis[j] = a[0][j];
print("\n Find S: Finding a Maximally Specific Hypothesis\n")
for i in range(0,len(a)):
if a[i][num_attributes]=='yes':
for j in range(0,num_attributes):
if a[i][j]!=hypothesis[j]:
hypothesis[j]='?'
else :
hypothesis[j]= a[i][j]
print(" For Training instance No:{0} the hypothesis is ".format(i),hypothesis)
print("\n The Maximally Specific Hypothesis for a given Training Examples :\n") print(hypothesis)
Data Set:
Output :-
The Given Training Data Set

['sunny', 'warm', 'normal', 'strong', 'warm', 'same', 'yes']
['sunny', 'warm', 'high', 'strong', 'warm', 'same', 'yes']
['rainy', 'cold', 'high', 'strong', 'warm', 'change', 'no']
['sunny', 'warm', 'high', 'strong', 'cool', 'change', 'yes']
The initial value of hypothesis: ['0', '0', '0', '0', '0', '0']
Find S: Finding a Maximally Specific Hypothesis For Training Example No:0 the hypothesis is
['sunny', 'warm', 'normal', 'strong', 'warm', 'same']
For Training Example No:1 the hypothesis
is ['sunny', 'warm', '?', 'strong', 'warm',
'same'] For Training Example No:2 the
hypothesis is ['sunny', 'warm', '?', 'strong',
'warm', 'same'] For Training Example No:3
the hypothesis is ['sunny', 'warm', '?', 'strong',
'?', '?']
The Maximally Specific Hypothesis for a given Training
Examples: ['sunny', 'warm', '?', 'strong', '?', '?']
Shivajirao Kadam Institute of Technology and Management, Indore
(Acropolis Technical Campus)
Department of Computer Science and Engineering
Machine Learning (CS-601)
Experiment No. 2
Problem Statement:- For a given set of training data examples stored in a .CSV file, implement and
demonstrate the Candidate-Elimination algorithm to output a description of the set of all hypotheses
consistent with the training examples.
Objective:- To know the candidate elimination algorithm and output a description of the set of all
hypotheses consistent with the training example.
Outcome:- The students will be able to apply the candidate elimination algorithm and output a description
of the set of all hypotheses consistent with the training examples.
Theory:-
Candidate Elimination:-
 You can consider this as an extended form of the Fund-S algorithm.
 Consider both positive and negative examples.
 Actually, positive examples are used here as the Find-S algorithm (Basically they are generalized
from the specification).
 While negative examples are specified from generalized form.
Programme:-
import csv
with open("trainingdata.csv") as f:
csv_file=csv.reader(f)
data=list(csv_file)
s=data[1][:-1]
g=[['?' for i in range(len(s))] for j in range(len(s))]
for i in data:
if i[-1]=="Yes":
for j in range(len(s)):
if i[j]!=s[j]:
s[j]='?'
g[j][j]='?'
elif i[-1]=="No":
for j in range(len(s)):
if i[j]!=s[j]:
g[j][j]=s[j]
else:
g[j][j]="?"
print("\nSteps of Candidate Elimination Algorithm",data.index(i)+1)
print(s)
print(g)
gh=[]
for i in g:
for j in i:
if j!='?':
gh.append(i)
break
print("\nFinal specific hypothesis:\n",s)
print("\nFinal general hypothesis:\n",gh)
Output:-
Steps of Candidate Elimination Algorithm 1
['Sunny', 'Warm', '?', 'Strong', 'Warm', 'Same']
[['?', '?', '?', '?', '?', '?'], ['?', '?', '?', '?', '?', '?'], ['?', '?', '?', '?', '?', '?'], ['?', '?', '?', '?', '?', '?'], ['?', '?', '?', '?', '?', '?'],
['?', '?', '?', '?', '?', '?']]

[['?', '?', '?', '?', '?', '?'], ['?', '?', '?', '?', '?', '?'], ['?', '?', '?', '?', '?', '?'], ['?', '?', '?', '?', '?', '?'], ['?', '?', '?', '?', '?', '?'],
['?', '?', '?', '?', '?', '?']]

[['Sunny', '?', '?', '?', '?', '?'], ['?', 'Warm', '?', '?', '?', '?'], ['?', '?', '?', '?', '?', '?'], ['?', '?', '?', '?', '?', '?'], ['?', '?', '?',
'?', '?', '?'], ['?', '?', '?', '?', '?', 'Same']]

['Sunny', 'Warm', '?', 'Strong', '?', '?']
[['Sunny', '?', '?', '?', '?', '?'], ['?', 'Warm', '?', '?', '?', '?'], ['?', '?', '?', '?', '?', '?'], ['?', '?', '?', '?', '?', '?'], ['?', '?', '?',
'?', '?', '?'], ['?', '?', '?', '?', '?', '?']]
Final specific hypothesis:

['Sunny', 'Warm', '?', 'Strong', '?', '?']
Final general hypothesis:

[['Sunny', '?', '?', '?', '?', '?'], ['?', 'Warm', '?', '?', '?', '?']]
Experiment No. 3
Problem Statement:- Write a program to demonstrate the working of the decision tree-based ID3
algorithm. Use an appropriate data set for building the decision tree and apply this knowledge to classify a
new sample.
Objective:- How to demonstrate the working of the decision tree-based ID3 algorithm, use an appropriate
data set for building the decision tree and apply this knowledge to classify a new sample.
Outcome:- The student will be able to demonstrate the working of the decision tree-based ID3
algorithm, use an appropriate data set for building the decision tree and apply this
knowledge to classify a new sample.
Theory:-
Decision Tree:- Decision tree is the most powerful and popular tool for classification and prediction. A
Decision tree is a flowchart-like tree structure, where each internal node denotes a test on an attribute,
each branch represents an outcome of the test, and each leaf node (terminal node) holds a class label.
A decision tree is basically a tree-shaped diagram, which shows a graphical representation of a problem
with all the possible solutions to a decision. All decisions are based on some conditions, which is used to
determine a solution for a problem.
A tree can be “learned” by splitting the source set into subsets based on an attribute value test. This
process is repeated on each derived subset in a recursive manner called recursive partitioning.
Decision tree terminology
 Information Gain:- The information gain is based on the decrease in entropy after a data
set is split on an attribute. An information gain is the difference of error in the root node
and in the leaf node.
 Root node:- Root node is basically a problem that is going to solve. The topmost node.
 Leaf node:- Node that can not be further divided into nodes. The final node is the
leaf node.
 Splitting:- Splitting is dividing root nodes into sub-nodes.
 Branches:- Formed by splitting root nodes into nodes and further nodes into sub-nodes.
 Decision Node:- When a sub-node splits into further sub-nodes, then it is called a
decision node.
 Parent/ Child node:- All the top nodes are parent nodes to the nodes derived by
them, and derived nodes are child nodes.
 Pruning:- It’s opposite to splitting, removing sub-nodes from a decision node.
Basically removing unwanted nodes/branches.
Algorithms
 CART (Gini Index)
 ID3 (Entropy, Information Gain)
Note:- Here we will understand the ID3 algorithm
Algorithm Concepts:-
1. To understand this concept, we take an example, assuming we have a data set.

2. Based on this data, we have to find out if we can play someday or not.
3. We have four attributes in the data set. Now how do we decide which attribute we
should put on the root node?
4. For this, we will Calculate the information gain of all the attributes (Features),
which will have maximum information and will be our root node.
Data set:-
Step1: Creating a root node

Entropy(Entropy of whole data-set)
Entropy (S) = (-p/p+n)*log2 (p/p+n) - (n/n+p)*log2 ((n/n+p))
p- p stand for the number of positive examples
n- n stands for the number of negative examples.
Step2: For Every Attribute/Features

Average Information (AIG of a particular attribute)
I(Attribute) = Sum of {(pi+ni/p+n)*Entropy(Entropy of Attribute)}
pi- Here pi stands for the number of positive examples in a particular attribute.
ni- Here ni stands for the number of negative examples in a particular
attribute.
Entropy (Attribute) - Entropy of Attribute calculated in same as we calculated for System (Whole Data-Set)
Information Gain
Gain = Entropy(S) - I (Attribute)
1. If all examples are positive, Return the single-node tree, with label=+
2. If all examples are Negative, Return the single-node tree, with label= -
3. If Attribute is empty, Return the single-node tree.
Step4: Pick The Highest Gain Attribute

1. The attribute that has the most information gain has to create a group of all its
attributes and process them in the same way as which we have done for the parent
(Root) node.
2. Again, the feature which has maximum information gain will become a node and
this process will continue until we get the leaf node.
Step5: Repeat Until we get the final node (Leaf node )
Implementation of Decision-Tree (ID3) Algorithm
#Importing important libraries

import pandas as pd
from pandas import DataFrame
#Reading Dataset
df_tennis = pd.read_csv('DS.csv')
print( df_tennis)
Output:-
Calculating Entropy of Whole Data-set
#Function to calculate final Entropy

def entropy(probs):
import math
return sum( [-prob*math.log(prob, 2) for prob in probs] )
#Function to calculate Probabilities of positive and negative examples
def entropy_of_list(a_list):
from collections import Counter
cnt = Counter(x for x in a_list) #Count the positive and negative ex
num_instances = len(a_list)
#Calculate the probabilities that we required for our entropy
formula probs = [x / num_instances for x in cnt.values()]
#Calling entropy function for final
entropy return entropy(probs)
total_entropy = entropy_of_list(df_tennis['PT'])
print("\n Total Entropy of PlayTennis DataSet:",total_entropy)
Output:-
Calculate Information Gain for each Attribute:-
#Defining Information Gain Function

def information_gain(df, split_attribute_name, target_attribute_name, trace=0):
print("Information Gain Calculation of ",split_attribute_name)
print("target_attribute_name",target_attribute_name)
#Grouping features of Current Attribute
df_split = df.groupby(split_attribute_name)
for name,group in df_split:
print("Name: ",name)
print("Group: ",group)
nobs = len(df.index) * 1.0
print("NOBS",nobs)
#Calculating Entropy of the Attribute and probability part of formula
df_agg_ent = df_split.agg({target_attribute_name : [entropy_of_list, lambda x: len(x)/nobs]
})[target_attribute_name]
print("df_agg_ent",df_agg_ent)
# Calculate Information Gain
avg_info = sum( df_agg_ent['Entropy'] * df_agg_ent['Prob1'] )
old_entropy = entropy_of_list(df[target_attribute_name])
return old_entropy - avg_info
print('Info-gain for Outlook is :'+str(information_gain(df_tennis, 'Outlook', 'PT')),"\n")
Output:-
In the same way, we will Calculate the information gain of the remaining attributes and then the attribute
that has the most information will be named the best attribute.
#Defining ID3 Algorithm Function

def id3(df, target_attribute_name, attribute_names, default_class=None):
#Counting Total number of yes and no classes (Positive and negative Ex)
from collections import Counter
cnt = Counter(x for x in df[target_attribute_name])
if len(cnt) == 1:
return next(iter(cnt))
# Return None for Empty Data Set
elif df.empty or (not attribute_names):
return default_class
else:
default_class = max(cnt.keys())
print("attribute_names:",attribute_names)
gainz = [information_gain(df, attr, target_attribute_name) for attr in attribute_names]
#Separating the maximum information gain attribute after calculating the information gain
index_of_max = gainz.index(max(gainz)) #Index of Best Attribute
best_attr = attribute_names[index_of_max] #choosing best attribute
#The tree is initially an empty dictionary
tree = {best_attr:{}} # Initiate the tree with best attribute as a node
remaining_attribute_names = [i for i in attribute_names if i != best_attr]
for attr_val, data_subset in df.groupby(best_attr):
subtree = id3(data_subset,
target_attribute_name,
remaining_attribute_names,
default_class)
tree[best_attr][attr_val] = subtree
return tree
Note
# Get Predictor Names (all but 'class')
attribute_names = list(df_tennis.columns)
print("List of Attributes:", attribute_names)
attribute_names.remove('PT')
#Remove the class attribute
print("Predicting Attributes:", attribute_names)
Output:-
# Run Algorithm (Calling ID3 function)

from pprint import pprint
tree = id3(df_tennis,'PT',attribute_names)
print("\n\nThe Resultant Decision Tree is :\n")
pprint(tree)
attribute = next(iter(tree))
print("Best Attribute :\n",attribute)
print("Tree Keys:\n",tree[attribute].keys())
Note:-The pprint module provides a capability to pretty-print arbitrary Python data structures in a well-
formatted and more readable way.
Note:- After running the algorithm the output will be very large because we have also called the
information gain function in it, which is required for the ID3 Algorithm.
Output:-
Experiment No. 4
Problem Statement:- Build an Artificial Neural Network by implementing the Backpropagation

algorithm and test the same using appropriate data sets.
Objective:- To build an Artificial Neural Network by implementing the Backpropagation algorithm and
test the same using appropriate data sets.
Outcome:- The student will be able to build an Artificial Neural Network by implementing the
Backpropagation algorithm and test the same using appropriate data sets.
Theory:-
BackPropagation
 Backpropagation is a supervised learning algorithm for training Neural Networks.
 Every node in Neural Network represent a Neuron, so we can say that Neural
Network is a circuit of neurons,
 A Neural Network consists of an input layer, an output layer and a hidden layer, let's
see in the diagram.
What is the Role of Backpropagation?

1. First of all, if I want to create a neural network, then I have to initialize some weights.
2. Now, whatever values I have selected for weights I do not know how much they are
correct.
3. To check that the weight values that I have selected are correct or incorrect I have
to calculate the error of the model.
4. Suppose my model error occurred too much
5. Meaning my predicated output is very different from the actual output, so what shall I
do? I will try to minimize the error.
Gradient Descent
1. We have several optimizers but here we are using a Gradient descent optimizer.
2. Gradient descent works as an optimizer, for finding the minimum of a function.
3. In our case, we update the weights using gradient descent and try to minimize
the error function.
How does the backpropagation algorithm work?

Suppose we have a neural network that has an input layer, a hidden layer and an output layer
Step1: First, we give random weights to the model.
Step2: Forward propagation (normal neural network calculation)
Step3: Calculate total error.
Step4: Backward propagation (gradient descent), updating parameters (weights and bias)
Step5: Until the error is minimized (Predicted output to be approximately equal to original
output)
The formulas that we are using here
Forward Propagation:-
1. To calculate the value of h1
2. To calculate the output of h1
3. To calculate error of output of h1
4. To calculate the total error of the
model Now will propagate Backward

Backward Propagation
1. Here we are writing the process and formulas to update our w5 weight.
2. For that, we should know how much total error has come concerning w5 weight.
1. Calculating our total error concerning output one.

2. calculating our total output 1 concerning net output 1
3. Calculate net output1 concerning weight5
4. Calculating updated weight
Back-propagation Implementation
Initializing variables value
import numpy as np
x = np.array(([2, 9], [1, 5], [3, 6]), dtype=float)
print("small x",x)
#original output
y = np.array(([92], [86], [89]), dtype=float)
X = X/np.amax(X,axis=0) #maximum along the first axis
print("Capital X",X)
Output:-
#Defining Sigmoid Function for output

def sigmoid (x):
return (1/(1 + np.exp(-x)))
#Derivative of Sigmoid Function

def derivatives_sigmoid(x):
return x * (1 - x)
#Variables initialization
epoch=7000 #Setting training iterations
lr=0.1 #Setting learning rate
inputlayer_neurons = 2 #number of input layer neurons
hiddenlayer_neurons = 3 #number of hidden layers neurons
output_neurons = 1 #number of neurons at output layer
Note:
In this code, we have defined the sigmoid function and its derivative function.
As you know, we train the Neural network many times at a single point, for that we need the
number of epochs.
Below that we have defined the only number of neurons in each layer.
#Defining weight and biases for hidden and output layer

wh=np.random.uniform(size=(inputlayer_neurons,hiddenlayer_neurons))
bh=np.random.uniform(size=(1,hiddenlayer_neurons))
wout=np.random.uniform(size=(hiddenlayer_neurons,output_neurons))
bout=np.random.uniform(size=(1,output_neurons))
Note:
Here we have defined random weights and bias
As we know, we should first define the weights and biases for the first (here we have only one
hidden layer) hidden layer.
After that, we have defined the weights and bias for the output layer.
Keep in mind when defining the size of the weight (how many neurons are in the previous layer,
the number of neurons in the layer for that we have defined weights).
Size of bias (number of neurons in output layer, the number of neurons in the layer for that we
have defined biases).
#Forward Propagation
for i in range(epoch):
hinp1=np.dot(X,wh)
hinp=hinp1 + bh
hlayer_act =
sigmoid(hinp)
outinp1=np.dot(hlayer_act,wout)
outin poutine outil p1+ bout
output = sigmoid(outinp)
Note:
Here we are just calculating the output of our model, first, we have done this for the hidden layer
and after that for the output layer and finally get the output.
np .dot is used for the dot product of two matrices.
#Backpropagation Algorithm
EO = y-output
outgrad = derivatives_sigmoid(output)
d_output = EO* outgrad
EH = d_output.dot(wout.T)
hiddengrad = derivatives_sigmoid(hlayer_act)
#how much hidden layer wts contributed to error

d_hiddenlayer = EH * hiddengrad
wout += hlayer_act.T.dot(d_output) *lr
# dot product of nextlayer error and currentlayerop

bout += np.sum(d_output, axis=0,keepdims=True) *lr
#Updating Weights
wh += X.T.dot(d_hiddenlayer) *lr
print("Actual Output: \n" + str(y))
print("Predicted Output: \n" ,output)
Output:-
Experiment No. 5
Problem Statement:- Write a program to implement the naive Bayesian classifier for a sample training
dataset stored as a .CSV file. Compute the accuracy of the classifier, considering few test datasets.
Objective:- To apply a naive bayesian classifier for the relevant problem and analyse the result.
Outcome:- The student will be able to apply a naive bayesian classifier for the relevant problem and
analyse the result.
Theory:-
Introduction to
Bayes
Naive Bayes is among one of the very simple and powerful algorithms for classification based on Bayes
Theorem with an assumption of independence among the predictors. The Naive Bayes classifier assumes
that the presence of a feature in a class is not related to any other feature. Naive Bayes is a classification
algorithm for binary and multi-class classification problems.
Bayes theorem:-
Based on prior knowledge of conditions that may be related to an event, Bayes theorem describes the
probability of the event
conditional probability can be found this way
Assume we have a Hypothesis(H) and evidence(E),
According to Bayes theorem, the relationship between the probability of Hypothesis before getting the
evidence represented as P(H) and the probability of the hypothesis after getting the evidence represented
as P(H|E) is:
P(H|E) = P(E|H)*P(H)/P(E)
Training Example:-
Programme:-
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import math
import csv
def pre_processing(df):
""" partitioning data into features and target """

X = df.drop([df.columns[-1]], axis = 1)
y = df[df.columns[-1]]
return X, y
if name == " main ":
#Weather Dataset print("\
nWeather Dataset:")
df = pd.read_table("../Data/weather.txt")
#Split features and target
X,y = pre_processing(df)
def _calc_class_prior(self):
""" P(c) - Prior Class Probability """
for outcome in np.unique(self.y_train):
outcome_count = sum(self.y_train == outcome)
self.class_priors[outcome] = outcome_count / self.train_size
def _calc_likelihoods(self):
for feature in self.features:
outcome_count = sum(self.y_train == outcome)
feat_likelihood = self.X_train[feature][self.y_train[self.y_train
== outcome].index.values.tolist()].value_counts().to_dict()
for feat_val, count in feat_likelihood.items(): self.likelihoods[feature]
[feat_val + '_' + outcome] = count/outcome_count
def _calc_predictor_prior(self):
for feature in self.features:
feat_vals = self.X_train[feature].value_counts().to_dict()
for feat_val, count in feat_vals.items():
self.pred_priors[feature][feat_val] = count/self.train_size
def predict(self, X):
""" Calculates Posterior probability P(c|x) """

results = []
X = np.array(X)
for query in X:
probs_outcome = {}

prior = self.class_priors[outcome]
likelihood = 1
evidence = 1
for feat, feat_val in zip(self.features, query):
likelihood *= self.likelihoods[feat][feat_val + '_' + outcome]
evidence *= self.pred_priors[feat][feat_val]
posterior = (likelihood * prior) / (evidence)
probs_outcome[outcome] = posterior
result = max(probs_outcome, key = lambda x: probs_outcome[x])
results.append(result)
return np.array(results)
if name == " main ":

#Weather Dataset print("\
nWeather Dataset:")
df = pd.read_table("../Data/weather.txt")
#print(df)
#Split features and target
X,y = pre_processing(df)
nb_clf = NaiveBayes()
nb_clf.fit(X, y)
print("Train Accuracy: {}".format(accuracy_score(y, nb_clf.predict(X))))
#Query 1:
query = np.array([['Rainy','Mild', 'Normal', 't']])
print("Query 1:- {} ---> {}".format(query, nb_clf.predict(query)))

#Query 2:
query = np.array([['Overcast','Cool', 'Normal', 't']])
#Query 3:
query = np.array([['Sunny','Hot', 'High', 't']])
Output:-
Experiment No. 6
Problem Statement:- Define a classifier to mark email as spam in your inbox based on the defined Naive
Bayes classifier.
Objective:- To apply a naive bayesian classifier for document classification and analyse the results.
Outcome:- The student will be able to apply a naive bayesian classifier for document classification and
analyse the results.
Theory:-
Naive Bayes classification is a simple probability algorithm based on the fact that all features of the model
are independent. In the context of the spam filter, we suppose that every word in the message is
independent of all other words and we count them with the ignorance of the context.
First of all, we take the Bayes formula of the conditional probability and apply it to our task:
The probability of the message that contains words (w1, w2, w3, …) to be spam is proportional
to the probability to get the spam multiplied by a product of probabilities for every word in the
message to belong to a spam message.
P_spam — the part of spam messages in our dataset
P_wi_spam — the probability of a word to be found in the spam messages.
By the same logic we define:
 P_not_spam — the part of non-spam messages in the dataset
 P_wi_non_spam — the probability of a word to be found in the non-spam messages.

But we still do not know how to calculate each word probabilities. Though, we have another
formula:
What do we have here:
 N_vocabulary — the number of unique words in the whole dataset.
 N_spam — the total number of words in the spam messages.
 N_wi_spam — the number of a word repeats in all spam messages.
 Alpha — the coefficient for the cases when a word in the message is absent in
our dataset.
Dataset:-
Implementation
We will follow the formulas mentioned above and define the main values:
probability of message to be spam
Pspam = train_data[‘Label’].value_counts()[‘spam’] / train_data.shape[0]
probability of non-spam message

Pham = train_data[‘Label’].value_counts()[‘ham’] / train_data.shape[0]
the number of words in spam messages
Nspam = train_data.loc[train_data[‘Label’] == ‘spam’,
‘SMS’].apply(len).sum()
the number of words in non-spam messages
Nham = train_data.loc[train_data[‘Label’] == ‘ham’,
‘SMS’].apply(len).sum()
the size of the vocabulary
Nvoc = len(train_data.columns - 3)
set alpha = 1
Program:-
def p_w_spam(word):
if word in train_data.columns:
return (train_data.loc[train_data['Label'] == 'spam', word].sum() + alpha) / (Nspam + alpha*Nvoc)
else:
return 1
def p_w_ham(word):
if word in train_data.columns:
return (train_data.loc[train_data['Label'] == 'ham', word].sum() + alpha) / (Nham + alpha*Nvoc)
else:
return 1
Our classification function:-

def classify(message):
p_spam_given_message = Pspam
p_ham_given_message = Pham
for word in message:
p_spam_given_message *= p_w_spam(word)
p_ham_given_message *= p_w_ham(word)
if p_ham_given_message > p_spam_given_message:
return 'ham'
elif p_ham_given_message < p_spam_given_message:
return 'spam'
else:
return 'needs human classification'
Output:-
Experiment No. 7
Problem Statement:- Define a Bayesian network model to demonstrate the diagnosis of heart patients
using standard Heart Disease Data Set.
Objective:- To apply a bayesian network for the medical data and demonstrate the diagnosis of heart
patients using standard Heart Disease Data Set.
Outcome:- Define a Bayesian network model to demonstrate the diagnosis of heart patients using
standard Heart Disease Data Set.
Theory:-Bayesian networks are a type of probabilistic graphical model that uses Bayesian
inference for probability computations. Bayesian networks aim to model conditional
dependence, and therefore causation, by representing conditional dependence by edges in a
directed graph. Through these relationships, one can efficiently conduct inference on the random
variables in the graph through the use of factors.A Bayesian network is a directed acyclic graph in
which each edge corresponds to a conditional dependency, and each node corresponds to a unique random
variable. Bayesian network consists of two major parts: a directed acyclic graph and a set of conditional
probability distributions
• The directed acyclic graph is a set of random variables represented by nodes.
• The conditional probability distribution of a node (random variable) is defined for every
possible outcome of the preceding causal node(s).
Dataset:-
Title: Heart Disease Databases.

The Cleveland database contains 76 attributes, but all published experiments refer to using a subset of 14
of them. In particular, the database is the only one that has been used by ML researchers to this date. The
"Heart Disease" field refers to the presence of heart disease in the patient. It is integer valued from 0 (no
presence) to 4.
Database: 0 1 2 3 4 Total
Cleveland: 164 55 36 35 13 303
Attribute Information:
1. age: age in years
2. sex: sex (1 = male; 0 = female)
3. cp: chest pain type
• Value 1: typical angina
• Value 2: atypical angina
• Value 3: non-anginal pain
• Value 4: asymptomatic
4. trestbps: resting blood pressure (in mm Hg on admission to the hospital)
5. chol: serum cholesterol in mg/dl
6. fbs: (fasting blood sugar > 120 mg/dl) (1 = true; 0 = false)
7. restecg: resting electrocardiographic results
• Value 0: normal
• Value 1: having ST-T wave abnormality (T wave inversions and/or ST elevation or depression of
> 0.05 mV)
• Value 2: showing probable or definite left ventricular hypertrophy by Estes' criteria
8. thalach: maximum heart rate achieved.
9. exang: exercise induced angina (1 = yes; 0 = no)
10. oldpeak = ST depression induced by exercise relative to rest
11. slope: the slope of the peak exercise ST segment.
• Value 1: upsloping
• Value 2: flat
• Value 3: downsloping
12.thal: 3 = normal; 6 = fixed defect; 7 = reversible defect
13.Heartdisease: It is integer valued from 0 (no presence) to
4.
Some instance from the dataset:
age sex cp trestbps chol fbs restec thalach exang oldpeak slope ca thal Heart
Disease
63 1 1 145 233 1 2 150 0 2.3 3 0 6 0
67 1 4 160 286 0 2 108 1 1.5 2 3 3 2
67 1 4 120 229 0 2 129 1 2.6 2 2 7 1
41 0 2 130 204 0 2 172 0 1.4 1 1 3 0
62 0 4 140 268 0 2 160 0 3.6 3 3 3 3
60 1 4 130 206 0 2 132 1 2.4 2 2 7 4
Program:
import numpy as np
import pandas as pd
import csv
from pgmpy.estimators import MaximumLikelihoodEstimator
from pgmpy.models import BayesianModel
from pgmpy.inference import VariableElimination
#read Cleveland Heart Disease data
heartDisease = pd.read_csv('heart.csv')
heartDisease = heartDisease.replace('?',np.nan)
#display the data

print('Sample instances from the dataset are given
below') print(heartDisease.head())
#display the Attributes names and datatypes

print('\n Attributes and datatypes')
print(heartDisease.dtypes)
#Create Model- Bayesian Network

model =
BayesianModel([('age','heartdisease'),('sex','heartdisease'),
( 'exang','heartdisease'),('cp','heartdisease'),('heartdisease', 'restecg'),('heartdisease','chol')])
#Learning CPDs using Maximum Likelihood Estimators

print('\n Learning CPD using Maximum likelihood estimators') model.fit(heartDisease,
estimator=MaximumLikelihoodEstimator)
# Inferencing with Bayesian Network

print('\n Inferencing with Bayesian Network:')
HeartDiseasetest_infer = VariableElimination(model)
#computing the Probability of HeartDisease given restecg

print('\n 1.Probability of HeartDisease given evidence= restecg :1')
q1=HeartDiseasetest_infer.query(variables=['heart disease'],evidence={'restecg':1})
print(q1)
#computing the Probability of HeartDisease given cp

print('\n 2.Probability of HeartDisease given evidence= cp:2 ')
q2=HeartDiseasetest_infer.query(variables=['heart disease'],evidence={'cp':2})
print(q2)
Output:-
’m
0
¥
Experiment No. 8
Problem Statement:-Take data set as .CSV file and apply EM algorithm and k-Means algorithm to
cluster. Compare the results of these two algorithms and comment on the quality of clustering.
Objective:-To apply EM algorithm and k-Means algorithm for clustering and analyse the data.
Outcome:-The student will be able to apply EM algorithm and k-Means algorithm for clustering and
analyse the results.
Theory:-
GMMs are probabilistic models that assume all the data points are generated from a mixture of several
Gaussian distributions with unknown parameters. They differ from k-means clustering in that GMMs
incorporate information about the center(mean) and variability(variance) of each cluster and provide
posterior probabilities.
The K-means approach is an example of a hard assignment clustering, where each point can belong to
only one cluster. Expectation-Maximization algorithm is a way to generalize the approach to consider the
soft assignment of points to clusters so that each point has a probability of belonging to each cluster.
EM algorithm:
EM is an iterative algorithm to find the maximum likelihood when there are latent variables. The
algorithm iterates between performing an expectation (E) step, which creates a heuristic of the posterior
distribution and the log-likelihood using the current estimate for the parameters, and a maximization (M)
step, which computes parameters by maximizing the expected log-likelihood from the E step. The
parameter-estimates from M step are then used in the next E step.
K-means clustering is able to gradually learn how to cluster the unlabelled points into groups by analysis
of the mean distance of said points. In this case, the variable k depicts the number of clusters or different
groups in which the data will be gathered. The algorithm functions by moving the data in such a manner
that error function is minimized.
Dataset:-
Program:-
from sklearn.cluster import KMeans
from sklearn import preprocessing
from sklearn.mixture import GaussianMixture
from sklearn.datasets import load_iris
import sklearn.metrics as sm
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
dataset=load_iris()
X=pd.DataFrame(dataset.data)
X.columns=['Sepal_Length','Sepal_Width','Petal_Length','Petal_Width']
y=pd.DataFrame(dataset.target)
y.columns=['Targets']
plt.figure(figsize=(14,7))
colormap=np.array(['red','lime','black'])
#REAL PLOT
plt.subplot(1,3,1)
plt.scatter(X.Petal_Length,X.Petal_Width,c=colormap[y.Targets],s=40)
plt.title('Real')
#KMeans -PLOT
plt.subplot(1,3,2)
model=KMeans(n_clusters=3)
model.fit(X)
predY=np.choose(model.labels_,[0,1,2]).astype(np.int64)
plt.scatter(X.Petal_Length,X.Petal_Width,c=colormap[predY],s=40)
plt.title('KMeans')
#GMM PLOT
scaler=preprocessing.StandardScaler()
scaler.fit(X)
xsa=scaler.transform(X)
xs=pd.DataFrame(xsa,columns=X.columns)
gmm=GaussianMixture(n_components=3)
gmm.fit(xs)
y_cluster_gmm=gmm.predict(xs)
plt.subplot(1,3,3)
plt.scatter(X.Petal_Length,X.Petal_Width,c=colormap[y_cluster_gmm],s=40)
plt.title('GMM Classification')
Output:-
Experiment No. 9
Problem Statement:- Classify iris data set using k-Nearest Neighbour algorithm. Print both correct and
wrong predictions. Python ML library classes can be used for this problem.
Objective:- To implement k-Nearest Neighbour algorithm to classify the iris data set and Print both
correct and wrong predictions.
Outcome:- The student will be able to implement k-Nearest Neighbour algorithm to classify the iris
data set and Print both correct and wrong predictions.
Theory:-
K-Nearest Neighbor (KNN)
 KNN is simple supervised learning algorithm used for both regression and classification
problems.
 KNN is basically store all available cases and classify new cases based on similarities
with stored cases.
 Concept: So the concept that KNN works on is Basically similarities measurements, for
example, if you look at Mango,it is more similar to Apple then dog or cat, then what
 KNN will do is put it in the category of fruits not in the category of animals.
K- Number of nearest neighbors

K=1 means the testing data are given the same level as the closet example in training set.
K=4 means the labels of the four closet classes are check and most common class is assign to the
testing data.
How does KNN is work ?
KNN Algorithm
let's understand the concept of KNN algorithm with iris flower problem
Data: This data consist of total 150 instances (samples) , 4 features , and three classes (targets).
Problem: Using four features we have to classify which flower belongs to which category.
Importing Data-set
import sklearn
import pandas as pd
from sklearn.datasets import load_iris
iris=load_iris()
iris.keys()
df=pd.DataFrame(iris['data'])
print(df)
print(iris['target_names'])
Iris['feature_names']
Output :-
Note:
1. Now we need a target and data so that we can train the model
2. As we know that we have to find out the class from the features we have
3. With this logic,our target is classes (0,1,2) and data is in
df. 4.
X=df
y=iris['targe
t']
Splitting Data
1. The data is split so that with some data we can train the model and from the
remaining data we can test the model and can check how well our model is
2. To do this we have an inbuilt function in sklearn
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33,

random_state=42)
Note: It will split our 33% data into testing data and remaining data is our training data
KNN Classifier and Training of the Model
from sklearn.neighbors import
KNeighborsClassifier
knn=KNeighborsClassifier(n_neighbors=3)
Note:
1. It implements the concepts of KNN. Here we have taken number of neighbors (K)= 3.
2. First, it will calculate the distance with all the training points to the test point and then
select the three lowest distance points.
3. And test data point is classify to the class most common in among three.
knn.fit(X_train,y_train
)
Note:- Training the model with features values (data) and target values (target)
Prediction and Accuracy

Demo:
1. Here I want to show you just by taking one data point

2. we have a data point x_new
Now we want to see the class or category of this point
prediction=knn.predict(x_new) iris['target_names']
[prediction]
Output
Note: As we can see that our point belongs to class (0 or setosa class), this demo is just for
understanding
from sklearn.metrics import confusion_matrix
from sklearn.metrics import accuracy_score
from sklearn.metrics import classification_report
y_pred=knn.predict(X_test)
cm=confusion_matrix(y_test,y_pred)
print(cm)
print(" correct predicition",accuracy_score(y_test,y_pred))
print(" worng predicition",(1-accuracy_score(y_test,y_pred)))
Output :-
Note: As you can see in confusion matrix only one prediction is wrong , and also our accuracy is
0.98 (98%).
Practice KNN - We have a dataset that contains multiple user's information through the social
network who are interested in buying SUV Car or not.
Experiment No. 10
Problem Statement:- Pick an appropriate data set for your experiment, draw graphs using the non-
parametric Locally Weighted Regression algorithm in order to fit data points.
Objective:- To understand and implement linear regression and analyse the results with change in the
parameters.
Outcome:- To understand and implement linear regression and analyse the results with change in the
parameters
Theory:-
Locally weighted Linear Regression
 Linear weighted regression is the same as linear regression.
 What is linear regression then
 linear regression - Linear regression is a supervised learning algorithm.It basically
work on the concept of line equation.
Y = mX +C
m-C oefficient of X and C - Constant
 Linear regression perform a task to predict a dependent variable (Y) based on the
independent variable (X).
 So, basically what we do in this modeling we try to find best fit line (regression
line),As we can see from the equation ofcourse will be a straight line.
Note:
Here line represent Linear Regression,If we do not have the data like this, then what will we do? See
the diagram below
Note:
Here we need a polynomial type model ,and the concept of Locally weighted regression comes
here.
Let's understand it with cost function (calculate least square error)
Cost Function (Linear regression)
Cost Function (Locally weighted regression)
1. As we can see, there is only one difference in both of them is the weight.
2. Here we use least weighted squared error.
3. Let's see it by formula
So,The interesting facts in this formula is we can get a non-linear regression model by changing
the value of T(tau) that is as strong as polynomial regression of any degree.
Where T(tau) is bandwidth parameter

x = query point
x0 = training point
Locally weighted regression
 It is supervised learning algorithm and extended form of linear regression

 It is non-parametric, and no training phase exist in this only testing.
How to implement it
import numpy as np
import matplotlib.pyplot as plt X=np.linspace(-3,3,1000) print(X)
X+=np.random.normal(scale=0.05,size=1000) Y=np.log(np.abs((X**2)-1)+0.5)
print(Y)
Note: numpy.linspace(start, stop, num=50, endpoint=True, retstep=False, dtype=None),

what the line space is doing here is creating an array which will take 1000 values between -3 to
3.
Demo of linspace :
np.linspace(2.0, 3.0, num=5)
Output :-
array([ 2. , 2.25, 2.5 , 2.75, 3. ])
Note: Here Y is only a function that has a non-linear relationship with X.
plt.scatter(X,Y,alpha=0.32)
Note:To see the relation of X and Y we are making a plot.

def local_regression(x0,X,Y,tau):
x0=np.r_[1,x0]
# print(x0)
#print("len(X)",len(X))
X=np.c_[np.ones(len(X)),X]
#print(X)
xw=X.T *radial_kernel(x0,X,tau)
print(xw)
beta=np.linalg.pinv(xw@X)@xw@Y
return x0@beta
Note:
1. Here we are creating a function, this function is calculating our final h(x0).
2. As you can see in the formulas above, we have 2 functions of beta(x0), here we
are using the below one (in orange box), which is modify form of the above one.
3. np.r_ will create an array which will contain one row and any number of columns,
np.c_ will create an array which will contain one column and any number of rows.
4. We have defined below the radial_kernel function which will calculate our weight
w(x,xo).
5. X.T is transform of matrix (array).
6. Here @ represent matrix multiplication and the pinv used to invert the matrix
def radial_kernel(x0, X, tau):
return np.exp(np.sum((X - x0) ** 2, axis=1) / (-2 * tau * tau))
Note: It's a simple function to calculate local weight w(x,x0)
def plot_lwr(tau): domain=np.linspace(-3,3,num=300)

prediction=[local_regression(x0,X,Y,tau) for x0 in domain] plt.scatter(X,Y,alpha=0.3) plt.plot(domain,prediction
return plt
plot_lwr(0.01)
Note:
1. Here we have defined our training point x0,and then called our
function local_regression.
2. After that we plot our original plot and predicted plot (model).
3. As you can see in the plot, our model is perfectly fit, if you change the value of tau
then the red line (model) will change.
4. Shape of our model depends on the value of tau if you change the value of tau
the shape will change.
Output :-

Machine Learninf File Final

Uploaded by

Copyright:

Available Formats

You might also like

Machine Learninf File Final

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Machine Learninf File Final

Uploaded by

Copyright:

Available Formats

202

Shivaji Rao Kadam

Machine Learning [CS-601]

[LAB ASSIGNMENT Machine Learning CS-601]

The Given Training Data Set

Steps of Candidate Elimination Algorithm 2

Steps of Candidate Elimination Algorithm 3

Steps of Candidate Elimination Algorithm 4

Final specific hypothesis:

Final general hypothesis:

1. To understand this concept, we take an example, assuming we have a data set.

Step1: Creating a root node

Step2: For Every Attribute/Features

Step4: Pick The Highest Gain Attribute

Step5: Repeat Until we get the final node (Leaf node )

Implementation of Decision-Tree (ID3) Algorithm

#Importing important libraries

#Function to calculate final Entropy

Calculate Information Gain for each Attribute:-

#Defining Information Gain Function

#Defining ID3 Algorithm Function

# Run Algorithm (Calling ID3 function)

Problem Statement:- Build an Artificial Neural Network by implementing the Backpropagation

What is the Role of Backpropagation?

How does the backpropagation algorithm work?

1. To calculate the value of h1

2. To calculate the output of h1

3. To calculate error of output of h1

4. To calculate the total error of the

model Now will propagate Backward

1. Calculating our total error concerning output one.

3. Calculate net output1 concerning weight5

4. Calculating updated weight

Initializing variables value

#Defining Sigmoid Function for output

#Derivative of Sigmoid Function

#Defining weight and biases for hidden and output layer

#how much hidden layer wts contributed to error

# dot product of nextlayer error and currentlayerop

""" partitioning data into features and target """

def predict(self, X):

""" Calculates Posterior probability P(c|x) """

for outcome in np.unique(self.y_train):

if name == " main ":

print("Query 1:- {} ---> {}".format(query, nb_clf.predict(query)))

message to belong to a spam message.

P_spam — the part of spam messages in our dataset

P_wi_spam — the probability of a word to be found in the spam messages.

By the same logic we define:

 P_not_spam — the part of non-spam messages in the dataset

 P_wi_non_spam — the probability of a word to be found in the non-spam messages.

What do we have here:

 N_vocabulary — the number of unique words in the whole dataset.

 N_spam — the total number of words in the spam messages.

 N_wi_spam — the number of a word repeats in all spam messages.

probability of message to be spam

Pspam = train_data[‘Label’].value_counts()[‘spam’] / train_data.shape[0]