Professional Documents
Culture Documents
IT ML Lab
IT ML Lab
Certified that this is the bonafide record of work done by Mr. / Ms..……………………………………….
the U19CS312–Machine Learning Laboratory during the 6th Semester of the academic year 2023 –2024
(Even Semester).
8 EM ALGORITHM IMPLEMENTATION
Average Marks:
Date:
AIM:
To write a Python program to implement find S algorithm to find the most specific hypothesis based
on given set of training data samples.
ALGORITHM:
1. Start with the most specific hypothesis.
h = {ϕ, ϕ, ϕ, ϕ, ϕ, ϕ}
2. Take the next example and if it is negative, then no changes occur to the hypothesis.
3. If the example is positive and we find that our initial hypothesis is too specific then we update our
current hypothesis to a general condition.
4. Keep repeating the above steps till all the training examples are complete.
5. After we have completed all the training examples, we will have the final hypothesis when can use
to classify the new examples.
PROGRAM:
import csv
from google.colab import files
uploaded=files.upload()
num_attributes = 5
a = []
print("\n The Given Training Data Set \n")
with open('PlayTennis_1.csv', 'r') as csvfile:
reader = csv.reader(csvfile)
for row in reader:
a.append (row)
print(row)
print("\n The initial value of hypothesis: ")
hypothesis = ['0'] * num_attributes
print(hypothesis)
for j in range(0,num_attributes):
hypothesis[j] = a[1][j];
print(hypothesis)
print("\n Find S: Finding a Maximally Specific Hypothesis\n")
for i in range(1,len(a)):
if a[i][4]=='Yes':
for j in range(0,num_attributes):
if a[i][j]!=hypothesis[j]:
hypothesis[j]='?'
else:
hypothesis[j]= a[i][j]
else:
pass
print(" For Training instance No:{0} the hypothesis is".format(i),hypothesis)
print("\n The Maximally Specific Hypothesis for a given Training Examples :\n")
print(hypothesis)
OUTPUT:
'Sunny', 'Warm', 'Normal', 'Strong', 'Warm', 'Same',True
'Sunny', 'Warm', 'High', 'Strong', 'Warm','Same',True
'Rainy', 'Cold', 'High', 'Strong', 'Warm','Change',False
'Sunny', 'Warm', 'High', 'Strong', 'Cool','Change',True
Maximally Specific set
[['Sunny', 'Warm', '?', 'Strong', '?', '?']]
RESULT:
Thus the Find-S Algorithm Implementation was completed successfully
Ex:No:2 CANDIDATE ELIMINATION ALGORITHM IMPLEMENTATION
Date:
AIM:
To write a program to implement Candidate-Elimination algorithm to output a description of the set of
all hypotheses consistent with the training examples.
ALGORITHM:
1. Load Data set
2. Initialize General Hypothesis and Specific Hypothesis.
3. For each training example
4. If example is positive example
if attribute_value == hypothesis_value:
Do nothing
else:
replace attribute value with '?' (Basically generalizing it)
5. If example is Negative example
Make generalize hypothesis more specific.
PROGRAM:
OUTPUT:
[('sunny', 'warm', 'normal', 'strong', 'warm','same')]
[('sunny', 'warm', 'normal', 'strong', 'warm','same')]
[('sunny', 'warm', '?', 'strong', 'warm', 'same')]
[('?', '?', '?', '?', '?', '?')]
[('sunny', '?', '?', '?', '?', '?'), ('?', 'warm', '?', '?', '?', '?'), ('?', '?', '?', '?', '?', 'same')]
[('sunny', 'warm', '?', 'strong', 'warm', 'same')]
[('sunny', 'warm', '?', 'strong', '?', '?')]
[('sunny', 'warm', '?', 'strong', '?', '?')]
[('sunny', '?', '?', '?', '?', '?'), ('?', 'warm', '?', '?', '?', '?')]
RESULT:
Thus the Candidate Elimination Algorithm Implementation was completed successfully
Ex:No:3 DECISION TREE BASED ID3 ALGORITHM IMPLEMENTATION
Date:
AIM:
To write python program to implement decision tree based ID3 algorithm to classify a new sample by
using given data set.
ALGORITHM:
1. It begins with the original set S as the root node.
2. On each iteration of the algorithm, it iterates through the very unused attribute of the set S and calculates
Entropy(H) and Information gain (IG) of this attribute.
3. It then selects the attribute which has the smallest Entropy or Largest Information gain.
4. The set S is then split by the selected attribute to produce a subset of the data.
5. The algorithm continues to recur on each subset, considering only attributes never selected before.
PROGRAM:
#Create Object from LabelEncoder
label_En = LabelEncoder()
X = data_PlayTennis.drop(['play'] , axis=1 )
Y = data_PlayTennis['play']
from sklearn.model_selection import train_test_split
x_train , x_test, y_train, y_test =train_test_split(X ,Y , test_size=0.3 ,stratify=Y ,random_state=101)
x_train
y_train
x_test
y_test
print(len(X))
print(len(Y))
print(len(x_train))
print(len(y_train))
print(len(x_test))
print(len(y_test))
print(len(X))
print(len(Y))
print(len(x_train))
print(len(y_train))
print(len(x_test))
print(len(y_test))
from sklearn.tree import DecisionTreeClassifier
#Create Object from Decision Tree Classifier
D_T_C_Model =DecisionTreeClassifier(criterion='entropy' ,random_state=10)
D_T_C_Model.fit(x_train , y_train)
D_T_C_Model.score(x_train , y_train)
D_T_C_Model.score(x_test , y_test)
y_pred = D_T_C_Model.predict(x_test)
y_pred
D_T_C_Model.predict_proba(x_test)
#Graphviz
import graphviz
graph_data = tree.export_graphviz(D_T_C_Model, out_file=None)
graph = graphviz.Source(graph_data)
graph
#visualize the tree using tree.plot_tree
from sklearn import tree
tree.plot_tree(D_T_C_Model)
#Check Accurcy score(y_test , y_pred)
from sklearn.metrics import accuracy_score
accuracy_score(y_test , y_pred)
OUTPUT:
outlook
overcast
b'yes'
rain
wind
b'strong'
b'no'
b'weak'
b'yes'
sunny
humidity
b'high'
b'no'
b'normal'
b'yes
RESULT:
Thus the Decision tree based ID3 Algorithm Implementation was completed successfully
Ex:No:4 ARTIFICIAL NEURAL NETWORK IMPLEMENTATION
Date:
AIM:
To write a python program to implement the Artificial Neural Network using Back propagation
algorithm and test the same using appropriate data sets.
ALGORITHM:
1. Getting the weighted sum of inputs of a particular unit using the h(x) function we defined earlier.
2. Plugging the value we get from step 1 into the activation function we have (f(a)=a in this example)
and using the activation value we get (i.e. the output of the activation function) as the input feature
for the connected nodes in the next layer.
3. If feeding forward happened using the following functions:
f(a) = a
4. Then feeding backward will happen through the partial derivatives of those functions. There is no
need to go through the working of arriving at these derivatives. All we need to know is that the above
functions will follow:
f'(a) = 1
J'(w) = Z . delta
5. Updating the weights.
PROGRAM:
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
dataset = pd.read_csv('Churn_Modelling.csv')
X = dataset.iloc[:, 3:13]
y = dataset.iloc[:, 13]
geography=pd.get_dummies(X["Geography"],drop_first=True)
gender=pd.get_dummies(X['Gender'],drop_first=True)
X=pd.concat([X,geography,gender],axis=1)
X=X.drop(['Geography','Gender'],axis=1)
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.2, random_state = 0)
from sklearn.preprocessing import StandardScaler
sc = StandardScaler()
X_train = sc.fit_transform(X_train)
X_test = sc.transform(X_test)
RESULT:
Thus the Artificial Neural Network Algorithm Implementation using back propagation was
completed successfully
Ex:No:5 NAIVE BAYESIAN CLASSIFIER ALGORITHM IMPLEMENTATION
Date:
AIM:
To write a python program to implement the naive Bayesian classifier for a sample training data set.
ALGORITHM:
1. Convert the given dataset into frequency tables.
2. Generate Likelihood table by finding the probabilities of given features.
3. Fitting Naive Bayes to the Training set
4. Predicting the test result
5. Test accuracy of the result (Creation of Confusion matrix)
6. Visualizing the test set result.
PROGRAM:
import csv
import random
import math
def loadCsv(filename):
lines = csv.reader(open(filename, "r"));
dataset = list(lines)
for i in range(len(dataset)):
#converting strings into numbers for processing
dataset[i] = [float(x) for x in dataset[i]]
return dataset
def splitDataset(dataset, splitRatio):
#67% training size
trainSize = int(len(dataset) * splitRatio);
trainSet = []
copy = list(dataset);
while len(trainSet) < trainSize:
#generate indices for the dataset list randomly to pick ele for training data
index = random.randrange(len(copy));
trainSet.append(copy.pop(index))
return [trainSet, copy]
def separateByClass(dataset):
separated = {}
#creates a dictionary of classes 1 and 0 where the values are the instacnes belonging to
each class
for i in range(len(dataset)):
vector = dataset[i]
if (vector[-1] not in separated):
separated[vector[-1]] = []
separated[vector[-1]].append(vector)
return separated
def mean(numbers):
return sum(numbers)/float(len(numbers))
def stdev(numbers):
avg = mean(numbers)
variance = sum([pow(x-avg,2) for x in numbers])/float(len(numbers)-1)
return math.sqrt(variance)
def summarize(dataset):
summaries = [(mean(attribute), stdev(attribute)) for attribute in zip(*dataset)];
del summaries[-1]
return summaries
def summarizeByClass(dataset):
separated = separateByClass(dataset);
summaries = {}
for classValue, instances in separated.items():
#summaries is a dic of tuples(mean,std) for each class value
summaries[classValue] = summarize(instances)
return summaries
def calculateProbability(x, mean, stdev):
exponent = math.exp(-(math.pow(x-mean,2)/(2*math.pow(stdev,2))))
return (1 / (math.sqrt(2*math.pi) * stdev)) * exponent
def calculateClassProbabilities(summaries, inputVector):
probabilities = {}
for classValue, classSummaries in summaries.items():#class and attribute information
as mean and sd
probabilities[classValue] = 1
for i in range(len(classSummaries)):
mean, stdev = classSummaries[i] #take mean and sd of every attribute
for class 0 and 1 seperaely
x = inputVector[i] #testvector's first attribute
probabilities[classValue] *= calculateProbability(x, mean, stdev);#use
normal dist
return probabilities
def predict(summaries, inputVector):
probabilities = calculateClassProbabilities(summaries, inputVector)
bestLabel, bestProb = None, -1
for classValue, probability in probabilities.items():#assigns that class which has he
highest prob
if bestLabel is None or probability > bestProb:
bestProb = probability
bestLabel = classValue
return bestLabel
def getPredictions(summaries, testSet):predictions = []
for i in range(len(testSet)):
result = predict(summaries, testSet[i])
predictions.append(result)
return predictions
[[17 0 0]
[ 0 17 0]
[ 0 0 11]]
Accuracy metrics
Precision Recall f1-score Support
0 1.00 1.00 1.00 I7
1 1.00 1.00 1.00 I7
2 1.00 1.00 1.00 I1
RESULT:
Thus the naive Bayesian classifier Algorithm Implementation using back propagation was
completed successfully.
Ex:No:6 NAIVE BAYESIAN CLASSIFIER ALGORITHM IMPLEMENTATION
(BUILT-IN LIBRARIES)
Date:
AIM:
To write a python program to implement the naive Bayesian classifier using Built-In
Libraries for a sample training data set.
ALGORITHM:
1. Convert the given dataset into frequency tables.
2. Generate Likelihood table by finding the probabilities of given features.
3. Fitting Naive Bayes to the Training set
4. Predicting the test result
5. Test accuracy of the result (Creation of Confusion matrix)
6. Visualizing the test set result..
PROGRAM:
import pandas as pd
msg=pd.read_csv('naivetext1.csv',names=['message','label])
print('The dimensions of the dataset',msg.shape)
msg['labelnum']=msg.label.map({'pos':1,'neg':0})
X=msg.message
y=msg.labelnum
print(X)
print(y)
X_new_counts = count_vect.transform(docs_new)
predictednew = clf.predict(X_new_counts)
for doc, category in zip(docs_new, predictednew):
print('%s->%s' % (doc, msg.labelnum[category]))'''
OUTPUT
['about', 'am', 'amazing', 'an', 'and', 'awesome', 'beers', 'best', 'boss', 'can', 'deal',
'do', 'enemy', 'feel', 'fun', 'good', 'have', 'horrible', 'house', 'is', 'like', 'love', 'my',
'not', 'of', 'place', 'restaurant', 'sandwich', 'sick', 'stuff', 'these', 'this', 'tired', 'to',
'today', 'tomorrow', 'very', 'view', 'we', 'went', 'what', 'will', 'with', 'work']
about am amazing an and awesome beers best boss can ... today \
0 1 0 0 0 0 0 1 0 0 0 ... 0
1 0 0 0 0 0 0 0 1 0 0 ... 0
2 0 0 1 1 0 0 0 0 0 0 ... 0
3 0 0 0 0 0 0 0 0 0 0 ... 1
4 0 0 0 0 0 0 0 0 0 0 ... 0
5 0 1 0 0 1 0 0 0 0 0 ... 0
6 0 0 0 0 0 0 0 0 0 1 ... 0
7 0 0 0 0 0 0 0 0 0 0 ... 0
8 0 1 0 0 0 0 0 0 0 0 ... 0
9 0 0 0 1 0 1 0 0 0 0 ... 0
10 0 0 0 0 0 0 0 0 0 0 ... 0
11 0 0 0 0 0 0 0 0 1 0 ... 0
12 0 0 0 1 0 1 0 0 0 0 ... 0
RESULT:
Thus the naive Bayesian classifier Algorithm Implementation using built-in
libraries was completed successfully.
Ex:No:7 BAYESIAN NETWORK IMPLEMENTATION FOR MEDICAL DATA
Date:
AIM:
To write a python program to implement the Bayesian Network for medical data.
ALGORITHM:
1. First, identify which are the main variable in the problem to solve. Each variable
corresponds to a node of the network. It is important to choose the number states for each
variable, for instance, there are usually two states (true or false).
2. Second, define structure of the network, that is, the causal relationships between all the
variables (nodes).
3. Third, define the probability rules governing the relationships between the variables.
PROGRAM:
import numpy as np
from urllib.request import urlopen
import urllib
import pandas as pd
from pgmpy.inference import VariableElimination
from pgmpy.models import BayesianModel
from pgmpy.estimators import MaximumLikelihoodEstimator, BayesianEstimator
names = ['age', 'sex', 'cp', 'trestbps', 'chol', 'fbs', 'restecg', 'thalach', 'exang', 'oldpeak', 'slope', 'ca',
'thal', 'heartdisease']
heartDisease = pd.read_csv('heart.csv', names = names)
heartDisease = heartDisease.replace('?', np.nan)
model = BayesianModel([('age', 'trestbps'), ('age', 'fbs'), ('sex', 'trestbps'), ('exang',
'trestbps'),('trestbps','heartdisease'),('fbs','heartdisease'),('heartdisease','restecg'),
('heartdisease','thalach'), ('heartdisease','chol')])
model.fit(heartDisease, estimator=MaximumLikelihoodEstimator)
from pgmpy.inference import VariableElimination
HeartDisease_infer = VariableElimination(model)
q = HeartDisease_infer.query(variables=['heartdisease'], evidence={'age': 37, 'sex' :0})
print(q['heartdisease'])
OUTPUT:
RESULT:
Thus, the Bayesian network for medical data Implementation was completed
successfully.
Ex:No:8 EM ALGORITHM IMPLEMENTATION
Date:
AIM:
To write a python program to implement EM Algorithm.
ALGORITHM:
1. Given a set of incomplete data, consider a set of starting parameters.
2. Expectation step (E – step): Using the observed available data of the dataset, estimate
(guess) the values of the missing data.
3. Maximization step (M – step): Complete data generated after the expectation (E) step is
used in order to update the parameters.
4. Repeat step 2 and step 3 until convergence.
PROGRAM:
import numpy as np
from sklearn.cluster import KMeans
import matplotlib.pyplot as plt
from sklearn.mixture import GaussianMixture
import pandas as pd
X=pd.read_csv("kmeansdata.csv")
x1 = X['Distance_Feature'].values
x2 = X['Speeding_Feature'].values
X = np.array(list(zip(x1, x2))).reshape(len(x1), 2)
plt.plot()
plt.xlim([0, 100])
plt.ylim([0, 50])
plt.title('Dataset')
plt.scatter(x1, x2)
plt.show()
#code for EM
gmm = GaussianMixture(n_components=3)
gmm.fit(X)
em_predictions = gmm.predict(X)
print("\nEM predictions")
print(em_predictions)
print("mean:\n",gmm.means_)
print('\n')
print("Covariances\n",gmm.covariances_)
print(X)
plt.title('Exceptation Maximum')
plt.scatter(X[:,0], X[:,1],c=em_predictions,s=50)
plt.show()
Date:
AIM:
To write a python program to implement k-Nearest Neighbour Algorithm.
ALGORITHM:
1. Select the number K of the neighbors
2. Calculate the Euclidean distance of K number of neighbors
3. Take the K nearest neighbors as per the calculated Euclidean distance.
4. Among these k neighbors, count the number of the data points in each category.
5. Assign the new data points to that category for which the number of the neighbor is
maximum.
6. Our model is ready.
PROGRAM:
Accuracy Metrics
precision recall f1-score support
Iris-setosa 1.00 1.00 1.00 13
Iris-versicolor 1.00 0.94 0.97 16
Iris-virginica 0.90 1.00 0.95 9
avg / total 0.98 0.97 0.97 38
correct prediction 0.973
wrong prediction 0.027
RESULT:
Thus, the k-Nearest Neighbour Algorithm Implementation was completed
successfully.
Ex:No:10 NON-PARAMETRIC LOCALLY WEIGHTED REGRESSION
ALGORITHM IMPLEMENTATION
Date:
AIM:
To write a python program to implement non-parametric Locally Weighted Regression
Algorithm.
ALGORITHM:
1. Read the Given data Sample to X and the curve (linear or non linear) to Y
2. Set the value for Smoothening parameter or Free parameter say τ
3. Set the bias /Point of interest set x0 which is a subset of X
4. Determine the value of model term parameter β
5. Prediction = x0*β
PROGRAM:
import numpy as np
from bokeh.plotting import figure, show, output_notebook
from bokeh.layouts import gridplot
from bokeh.io import push_notebook
def local_regression(x0, X, Y, tau):
# jitter X
X += np.random.normal(scale=.1, size=n)
print("Normalised (10 Samples) X :\n",X[1:10])
domain = np.linspace(-3, 3, num=300)
print(" Xo Domain Space(10 Samples) :\n",domain[1:10])
def plot_lwr(tau):
# prediction through regression
prediction = [local_regression(x0, X, Y, tau) for x0 in domain]
plot = figure(plot_width=400, plot_height=400)
plot.title.text='tau=%g' % tau
plot.scatter(X, Y, alpha=.3)
plot.line(domain, prediction, line_width=2, color='red')
return plot
OUTPUT:
Date:
AIM:
ALGORITHM:
PROGRAM:
def make_Dictionary(train_dir):
all_words = []
with open(mail) as m:
words = line.split()
all_words += words
dictionary = Counter(all_words)
return dictionary
list_to_remove = dictionary.keys()
for item in list_to_remove:
if item.isalpha() == False:
del dictionary[item]
elif len(item) == 1:
del dictionary[item]
dictionary = dictionary.most_common(3000)
def extract_features(mail_dir):
features_matrix = np.zeros((len(files),3000))
docID = 0;
if i == 2:
words = line.split()
wordID = 0
if d[0] == word:
wordID = i
features_matrix[docID,wordID] = words.count(word)
docID = docID + 1
return features_matrix
import os
import numpy as np
from collections import Counter
train_dir = 'train-mails'
dictionary = make_Dictionary(train_dir)
train_labels = np.zeros(702)
train_labels[351:701] = 1
train_matrix = extract_features(train_dir)
model1 = MultinomialNB()
model2 = LinearSVC()
model1.fit(train_matrix,train_labels)
model2.fit(train_matrix,train_labels)
test_dir = 'test-mails'
test_matrix = extract_features(test_dir)
test_labels = np.zeros(260)
test_labels[130:260] = 1
result1 = model1.predict(test_matrix)
result2 = model2.predict(test_matrix)
print confusion_matrix(test_labels,result1)
print confusion_matrix(test_labels,result2)
OUTPUT:
Ham 129 1
Spam 9 121
Ham 126 4
Spam 6 124
RESULT:
Thus the Develop a Machine Learning Method to Classify an incoming mail
Implementation was completed successfully.