Download as pdf or txt
Download as pdf or txt
You are on page 1of 35

MACHINE LEARNING LABORATORY RECORD

DEPARTMENT OF INFORMATION TECHNOLOGY

SRI ESHWAR COLLEGE OF ENGINEERING


KINATHUKADAVU
COIMBATORE – 641 202
SRI ESHWAR COLLEGE OF ENGINEERING
KINATHUKADAVU, COIMBATORE-641202
(An Autonomous Institution Affiliated to Anna University, Chennai)

DEPARTMENT OF INFORMATION TECHNOLOGY


BONAFIDE CERTIFICATE

Certified that this is the bonafide record of work done by Mr. / Ms..……………………………………….

Register No: ……….......……………… of 3rd Year B.E……………………………………………….… in

the U19CS312–Machine Learning Laboratory during the 6th Semester of the academic year 2023 –2024
(Even Semester).

Signature of Staff In-charge Head of the Department

Submitted for the practical examinations of Anna University, held on…………….

Internal Examiner External Examiner


Contents
Exp. Date List of Experiments Page. Marks Faculty
no No (50) Signature

1 FIND-S ALGORITHM IMPLEMENTATION

CANDIDATE ELIMINATION ALGORITHM


2
IMPLEMENTATION
DECISION TREE BASED ID3 ALGORITHM
3
IMPLEMENTATION
ARTIFICIAL NEURAL NETWORK
4
IMPLEMENTATION
NAIVE BAYESIAN CLASSIFIER ALGORITHM
5
IMPLEMENTATION
NAIVE BAYESIAN CLASSIFIER ALGORITHM
6
IMPLEMENTATION (BUILT-IN LIBRARIES)
BAYESIAN NETWORK IMPLEMENTATION FOR
7
MEDICAL DATA

8 EM ALGORITHM IMPLEMENTATION

K-NEAREST NEIGHBOUR ALGORITHM


9
IMPLEMENTATION
NON-PARAMETRIC LOCALLY WEIGHTED
10
REGRESSION ALGORITHM IMPLEMENTATION

CONTENT BEYOND SYLLABUS

DEVELOP A MACHINE LEARNING METHOD TO


11
CLASSIFY AN INCOMING MAIL

Average Marks:

Average (in words):


Signature of the Faculty
Ex:No:1 FIND-S ALGORITHM IMPLEMENTATION

Date:

AIM:
To write a Python program to implement find S algorithm to find the most specific hypothesis based
on given set of training data samples.

ALGORITHM:
1. Start with the most specific hypothesis.
h = {ϕ, ϕ, ϕ, ϕ, ϕ, ϕ}
2. Take the next example and if it is negative, then no changes occur to the hypothesis.
3. If the example is positive and we find that our initial hypothesis is too specific then we update our
current hypothesis to a general condition.
4. Keep repeating the above steps till all the training examples are complete.
5. After we have completed all the training examples, we will have the final hypothesis when can use
to classify the new examples.

PROGRAM:
import csv
from google.colab import files
uploaded=files.upload()
num_attributes = 5
a = []
print("\n The Given Training Data Set \n")
with open('PlayTennis_1.csv', 'r') as csvfile:
reader = csv.reader(csvfile)
for row in reader:
a.append (row)
print(row)
print("\n The initial value of hypothesis: ")
hypothesis = ['0'] * num_attributes
print(hypothesis)
for j in range(0,num_attributes):
hypothesis[j] = a[1][j];
print(hypothesis)
print("\n Find S: Finding a Maximally Specific Hypothesis\n")
for i in range(1,len(a)):
if a[i][4]=='Yes':
for j in range(0,num_attributes):
if a[i][j]!=hypothesis[j]:
hypothesis[j]='?'
else:
hypothesis[j]= a[i][j]
else:
pass
print(" For Training instance No:{0} the hypothesis is".format(i),hypothesis)
print("\n The Maximally Specific Hypothesis for a given Training Examples :\n")
print(hypothesis)

OUTPUT:
'Sunny', 'Warm', 'Normal', 'Strong', 'Warm', 'Same',True
'Sunny', 'Warm', 'High', 'Strong', 'Warm','Same',True
'Rainy', 'Cold', 'High', 'Strong', 'Warm','Change',False
'Sunny', 'Warm', 'High', 'Strong', 'Cool','Change',True
Maximally Specific set
[['Sunny', 'Warm', '?', 'Strong', '?', '?']]

RESULT:
Thus the Find-S Algorithm Implementation was completed successfully
Ex:No:2 CANDIDATE ELIMINATION ALGORITHM IMPLEMENTATION

Date:

AIM:
To write a program to implement Candidate-Elimination algorithm to output a description of the set of
all hypotheses consistent with the training examples.

ALGORITHM:
1. Load Data set
2. Initialize General Hypothesis and Specific Hypothesis.
3. For each training example
4. If example is positive example
if attribute_value == hypothesis_value:
Do nothing
else:
replace attribute value with '?' (Basically generalizing it)
5. If example is Negative example
Make generalize hypothesis more specific.
PROGRAM:

from google.colab import files


uplaoded=files.upload()
import csv
with open("training_Data_Exp_2.csv") as f:
csv_file=csv.reader(f)
data=list(csv_file)
s=data[1][:-1]
g=[['?' for i in range(len(s))]for j in range(len(s))]
print("Specfic Hypothesis:")
print(s)
print("General Hypothesis:")
print(g)
for i in data:
if i[-1]=='Yes':
for j in range(len(s)):
if i[j]!=s[j]:
s[j]='?'
g[j][j]='?'
elif i[-1]=='No':
for j in range(len(s)):
if i[j]!=s[j]:
g[j][j]=s[j]
else:
g[j][j]= '?'
print("\nSteps of Candidate Elimination Algorithm",data.index(i)+1)
print("S=",s)
print("G=",g)
gh=[]
for i in g:
for j in i:
if j!='?':
gh.append(i)
break
print("\nFinal specific hypothesis:\n",s)
print("\nFinal general hypothesis:\n",gh)

OUTPUT:
[('sunny', 'warm', 'normal', 'strong', 'warm','same')]
[('sunny', 'warm', 'normal', 'strong', 'warm','same')]
[('sunny', 'warm', '?', 'strong', 'warm', 'same')]
[('?', '?', '?', '?', '?', '?')]
[('sunny', '?', '?', '?', '?', '?'), ('?', 'warm', '?', '?', '?', '?'), ('?', '?', '?', '?', '?', 'same')]
[('sunny', 'warm', '?', 'strong', 'warm', 'same')]
[('sunny', 'warm', '?', 'strong', '?', '?')]
[('sunny', 'warm', '?', 'strong', '?', '?')]
[('sunny', '?', '?', '?', '?', '?'), ('?', 'warm', '?', '?', '?', '?')]

RESULT:
Thus the Candidate Elimination Algorithm Implementation was completed successfully
Ex:No:3 DECISION TREE BASED ID3 ALGORITHM IMPLEMENTATION

Date:

AIM:

To write python program to implement decision tree based ID3 algorithm to classify a new sample by
using given data set.

ALGORITHM:
1. It begins with the original set S as the root node.
2. On each iteration of the algorithm, it iterates through the very unused attribute of the set S and calculates
Entropy(H) and Information gain (IG) of this attribute.
3. It then selects the attribute which has the smallest Entropy or Largest Information gain.
4. The set S is then split by the selected attribute to produce a subset of the data.
5. The algorithm continues to recur on each subset, considering only attributes never selected before.
PROGRAM:
#Create Object from LabelEncoder
label_En = LabelEncoder()
X = data_PlayTennis.drop(['play'] , axis=1 )
Y = data_PlayTennis['play']
from sklearn.model_selection import train_test_split
x_train , x_test, y_train, y_test =train_test_split(X ,Y , test_size=0.3 ,stratify=Y ,random_state=101)
x_train
y_train
x_test
y_test
print(len(X))
print(len(Y))
print(len(x_train))
print(len(y_train))
print(len(x_test))
print(len(y_test))
print(len(X))
print(len(Y))
print(len(x_train))
print(len(y_train))
print(len(x_test))
print(len(y_test))
from sklearn.tree import DecisionTreeClassifier
#Create Object from Decision Tree Classifier
D_T_C_Model =DecisionTreeClassifier(criterion='entropy' ,random_state=10)
D_T_C_Model.fit(x_train , y_train)
D_T_C_Model.score(x_train , y_train)
D_T_C_Model.score(x_test , y_test)
y_pred = D_T_C_Model.predict(x_test)
y_pred
D_T_C_Model.predict_proba(x_test)
#Graphviz
import graphviz
graph_data = tree.export_graphviz(D_T_C_Model, out_file=None)
graph = graphviz.Source(graph_data)
graph
#visualize the tree using tree.plot_tree
from sklearn import tree
tree.plot_tree(D_T_C_Model)
#Check Accurcy score(y_test , y_pred)
from sklearn.metrics import accuracy_score
accuracy_score(y_test , y_pred)

OUTPUT:

outlook
overcast
b'yes'
rain
wind
b'strong'
b'no'
b'weak'
b'yes'
sunny
humidity
b'high'
b'no'
b'normal'
b'yes

RESULT:

Thus the Decision tree based ID3 Algorithm Implementation was completed successfully
Ex:No:4 ARTIFICIAL NEURAL NETWORK IMPLEMENTATION

Date:

AIM:
To write a python program to implement the Artificial Neural Network using Back propagation
algorithm and test the same using appropriate data sets.

ALGORITHM:
1. Getting the weighted sum of inputs of a particular unit using the h(x) function we defined earlier.
2. Plugging the value we get from step 1 into the activation function we have (f(a)=a in this example)
and using the activation value we get (i.e. the output of the activation function) as the input feature
for the connected nodes in the next layer.
3. If feeding forward happened using the following functions:
f(a) = a
4. Then feeding backward will happen through the partial derivatives of those functions. There is no
need to go through the working of arriving at these derivatives. All we need to know is that the above
functions will follow:
f'(a) = 1
J'(w) = Z . delta
5. Updating the weights.

PROGRAM:
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd

from google.colab import files


uplaoded=files.upload()

dataset = pd.read_csv('Churn_Modelling.csv')
X = dataset.iloc[:, 3:13]
y = dataset.iloc[:, 13]

geography=pd.get_dummies(X["Geography"],drop_first=True)
gender=pd.get_dummies(X['Gender'],drop_first=True)

X=pd.concat([X,geography,gender],axis=1)
X=X.drop(['Geography','Gender'],axis=1)
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.2, random_state = 0)
from sklearn.preprocessing import StandardScaler
sc = StandardScaler()
X_train = sc.fit_transform(X_train)
X_test = sc.transform(X_test)

# Importing the Keras libraries and packages


import keras
from keras.models import Sequential
from keras.layers import Dense
from keras.layers import LeakyReLU,PReLU,ELU
from keras.layers import Dropout

# Initialising the ANN


classifier = Sequential()

# Adding the input layer and the first hidden layer


classifier.add(Dense(6, kernel_initializer = 'he_uniform',activation='relu',input_dim = 11))
# Adding the second hidden layer
classifier.add(Dense(6, kernel_initializer = 'he_uniform',activation='relu'))
# Adding the output layer
classifier.add(Dense(1, kernel_initializer = 'glorot_uniform', activation = 'sigmoid'))
# Compiling the ANN
classifier.compile(optimizer = 'Adamax', loss = 'binary_crossentropy', metrics = ['accuracy'])
# Fitting the ANN to the Training set
model_history=classifier.fit(X_train, y_train, batch_size = 10, epochs = 100)
# list all data in history
#print(model_history.history.keys())
# summarize history for accuracy
plt.plot(model_history.history['acc'])
plt.plot(model_history.history['val_acc'])
plt.title('model accuracy')
plt.ylabel('accuracy')
plt.xlabel('epoch')
plt.legend(['train', 'test'], loc='upper left')
plt.show()
OUTPUT:
Input:
[[ 0.66666667 1. ]
[ 0.33333333 0.55555556]
[ 1.0.66666667]]

Actual Output: [[ 0.92]


[ 0.86]
[ 0.89]]

Predicted Output: [[ 0.89559591]


[ 0.88142069]
[ 0.8928407 ]]

RESULT:
Thus the Artificial Neural Network Algorithm Implementation using back propagation was
completed successfully
Ex:No:5 NAIVE BAYESIAN CLASSIFIER ALGORITHM IMPLEMENTATION

Date:

AIM:
To write a python program to implement the naive Bayesian classifier for a sample training data set.

ALGORITHM:
1. Convert the given dataset into frequency tables.
2. Generate Likelihood table by finding the probabilities of given features.
3. Fitting Naive Bayes to the Training set
4. Predicting the test result
5. Test accuracy of the result (Creation of Confusion matrix)
6. Visualizing the test set result.

PROGRAM:

import csv
import random
import math
def loadCsv(filename):
lines = csv.reader(open(filename, "r"));
dataset = list(lines)
for i in range(len(dataset)):
#converting strings into numbers for processing
dataset[i] = [float(x) for x in dataset[i]]
return dataset
def splitDataset(dataset, splitRatio):
#67% training size
trainSize = int(len(dataset) * splitRatio);
trainSet = []
copy = list(dataset);
while len(trainSet) < trainSize:
#generate indices for the dataset list randomly to pick ele for training data
index = random.randrange(len(copy));
trainSet.append(copy.pop(index))
return [trainSet, copy]
def separateByClass(dataset):
separated = {}
#creates a dictionary of classes 1 and 0 where the values are the instacnes belonging to
each class
for i in range(len(dataset)):
vector = dataset[i]
if (vector[-1] not in separated):
separated[vector[-1]] = []
separated[vector[-1]].append(vector)
return separated
def mean(numbers):
return sum(numbers)/float(len(numbers))
def stdev(numbers):
avg = mean(numbers)
variance = sum([pow(x-avg,2) for x in numbers])/float(len(numbers)-1)
return math.sqrt(variance)

def summarize(dataset):
summaries = [(mean(attribute), stdev(attribute)) for attribute in zip(*dataset)];
del summaries[-1]
return summaries
def summarizeByClass(dataset):
separated = separateByClass(dataset);
summaries = {}
for classValue, instances in separated.items():
#summaries is a dic of tuples(mean,std) for each class value
summaries[classValue] = summarize(instances)
return summaries
def calculateProbability(x, mean, stdev):
exponent = math.exp(-(math.pow(x-mean,2)/(2*math.pow(stdev,2))))
return (1 / (math.sqrt(2*math.pi) * stdev)) * exponent
def calculateClassProbabilities(summaries, inputVector):
probabilities = {}
for classValue, classSummaries in summaries.items():#class and attribute information
as mean and sd
probabilities[classValue] = 1
for i in range(len(classSummaries)):
mean, stdev = classSummaries[i] #take mean and sd of every attribute
for class 0 and 1 seperaely
x = inputVector[i] #testvector's first attribute
probabilities[classValue] *= calculateProbability(x, mean, stdev);#use
normal dist
return probabilities
def predict(summaries, inputVector):
probabilities = calculateClassProbabilities(summaries, inputVector)
bestLabel, bestProb = None, -1
for classValue, probability in probabilities.items():#assigns that class which has he
highest prob
if bestLabel is None or probability > bestProb:
bestProb = probability
bestLabel = classValue
return bestLabel
def getPredictions(summaries, testSet):predictions = []
for i in range(len(testSet)):
result = predict(summaries, testSet[i])
predictions.append(result)
return predictions

def getAccuracy(testSet, predictions):


correct = 0
for i in range(len(testSet)):
if testSet[i][-1] == predictions[i]:
correct += 1
return (correct/float(len(testSet))) * 100.0
def main():
filename = '5data.csv'
splitRatio = 0.67
dataset = loadCsv(filename);
trainingSet, testSet = splitDataset(dataset, splitRatio)
print('Split {0} rows into train={1} and test={2} rows'.format(len(dataset),
len(trainingSet), len(testSet)))
# prepare model
summaries = summarizeByClass(trainingSet);
# test model
predictions = getPredictions(summaries, testSet)
accuracy = getAccuracy(testSet, predictions)
print('Accuracy of the classifier is : {0}%'.format(accuracy))
main()
OUTPUT:

Confusion matrix is as follows

[[17 0 0]
[ 0 17 0]
[ 0 0 11]]

Accuracy metrics
Precision Recall f1-score Support
0 1.00 1.00 1.00 I7
1 1.00 1.00 1.00 I7
2 1.00 1.00 1.00 I1

avg / total 1.00 1.00 1.00 45

RESULT:
Thus the naive Bayesian classifier Algorithm Implementation using back propagation was
completed successfully.
Ex:No:6 NAIVE BAYESIAN CLASSIFIER ALGORITHM IMPLEMENTATION
(BUILT-IN LIBRARIES)
Date:
AIM:
To write a python program to implement the naive Bayesian classifier using Built-In
Libraries for a sample training data set.

ALGORITHM:
1. Convert the given dataset into frequency tables.
2. Generate Likelihood table by finding the probabilities of given features.
3. Fitting Naive Bayes to the Training set
4. Predicting the test result
5. Test accuracy of the result (Creation of Confusion matrix)
6. Visualizing the test set result..

PROGRAM:

import pandas as pd
msg=pd.read_csv('naivetext1.csv',names=['message','label])
print('The dimensions of the dataset',msg.shape)
msg['labelnum']=msg.label.map({'pos':1,'neg':0})
X=msg.message
y=msg.labelnum
print(X)
print(y)

#splitting the dataset into train and test data


from sklearn.model_selection import train_test_split
xtrain,xtest,ytrain,ytest=train_test_split(X,y)
print(xtest.shape)
print(xtrain.shape)
print(ytest.shape)
print(ytrain.shape)
#output of count vectoriser is a sparse matrix
from sklearn.feature_extraction.text import
CountVectorizercount_vect = CountVectorizer()
xtrain_dtm = count_vect.fit_transform(xtrain)
xtest_dtm=count_vect.transform(xtest)
print(count_vect.get_feature_names())
df=pd.DataFrame(xtrain_dtm.toarray(),columns=count_vect.get_feature_names())
print(df)#tabular representation
print(xtrain_dtm) #sparse matrix representation
# Training Naive Bayes (NB) classifier on training data.
from sklearn.naive_bayes import MultinomialNB
clf = MultinomialNB().fit(xtrain_dtm,ytrain)
predicted = clf.predict(xtest_dtm)
#printing accuracy metrics
from sklearn import metrics
print('Accuracy metrics')
print('Accuracy of the classifer is',metrics.accuracy_score(ytest,predicted))
print('Confusion matrix')
print(metrics.confusion_matrix(ytest,predicted))
print('Recall and Precison ')
print(metrics.recall_score(ytest,predicted))
print(metrics.precision_score(ytest,predicted))
'''docs_new = ['I like this place', 'My boss is not my saviour']

X_new_counts = count_vect.transform(docs_new)
predictednew = clf.predict(X_new_counts)
for doc, category in zip(docs_new, predictednew):
print('%s->%s' % (doc, msg.labelnum[category]))'''

I love this sandwich,pos


This is an amazing place,pos
I feel very good about these beers,pos
This is my best work,pos
What an awesome view,pos
I do not like this restaurant,neg
I am tired of this stuff,neg
I can't deal with this,neg
He is my sworn enemy,neg
My boss is horrible,neg
This is an awesome place,pos
I do not like the taste of this juice,neg
I love to dance,pos
I am sick and tired of this place,neg
What a great holiday,pos
That is a bad locality to stay,neg
We will have good fun tomorrow,pos
I went to my enemy's house today,neg

OUTPUT
['about', 'am', 'amazing', 'an', 'and', 'awesome', 'beers', 'best', 'boss', 'can', 'deal',
'do', 'enemy', 'feel', 'fun', 'good', 'have', 'horrible', 'house', 'is', 'like', 'love', 'my',
'not', 'of', 'place', 'restaurant', 'sandwich', 'sick', 'stuff', 'these', 'this', 'tired', 'to',
'today', 'tomorrow', 'very', 'view', 'we', 'went', 'what', 'will', 'with', 'work']
about am amazing an and awesome beers best boss can ... today \
0 1 0 0 0 0 0 1 0 0 0 ... 0
1 0 0 0 0 0 0 0 1 0 0 ... 0
2 0 0 1 1 0 0 0 0 0 0 ... 0
3 0 0 0 0 0 0 0 0 0 0 ... 1
4 0 0 0 0 0 0 0 0 0 0 ... 0
5 0 1 0 0 1 0 0 0 0 0 ... 0
6 0 0 0 0 0 0 0 0 0 1 ... 0
7 0 0 0 0 0 0 0 0 0 0 ... 0
8 0 1 0 0 0 0 0 0 0 0 ... 0
9 0 0 0 1 0 1 0 0 0 0 ... 0
10 0 0 0 0 0 0 0 0 0 0 ... 0
11 0 0 0 0 0 0 0 0 1 0 ... 0
12 0 0 0 1 0 1 0 0 0 0 ... 0

tomorrow very view we went what will with work


0010000000
1000000001
2000000000
3000010000
4000000000
5000000000
6000000010
7100100100
8000000000

RESULT:
Thus the naive Bayesian classifier Algorithm Implementation using built-in
libraries was completed successfully.
Ex:No:7 BAYESIAN NETWORK IMPLEMENTATION FOR MEDICAL DATA

Date:

AIM:
To write a python program to implement the Bayesian Network for medical data.

ALGORITHM:
1. First, identify which are the main variable in the problem to solve. Each variable
corresponds to a node of the network. It is important to choose the number states for each
variable, for instance, there are usually two states (true or false).
2. Second, define structure of the network, that is, the causal relationships between all the
variables (nodes).
3. Third, define the probability rules governing the relationships between the variables.

PROGRAM:
import numpy as np
from urllib.request import urlopen
import urllib
import pandas as pd
from pgmpy.inference import VariableElimination
from pgmpy.models import BayesianModel
from pgmpy.estimators import MaximumLikelihoodEstimator, BayesianEstimator
names = ['age', 'sex', 'cp', 'trestbps', 'chol', 'fbs', 'restecg', 'thalach', 'exang', 'oldpeak', 'slope', 'ca',
'thal', 'heartdisease']
heartDisease = pd.read_csv('heart.csv', names = names)
heartDisease = heartDisease.replace('?', np.nan)
model = BayesianModel([('age', 'trestbps'), ('age', 'fbs'), ('sex', 'trestbps'), ('exang',
'trestbps'),('trestbps','heartdisease'),('fbs','heartdisease'),('heartdisease','restecg'),
('heartdisease','thalach'), ('heartdisease','chol')])
model.fit(heartDisease, estimator=MaximumLikelihoodEstimator)
from pgmpy.inference import VariableElimination
HeartDisease_infer = VariableElimination(model)
q = HeartDisease_infer.query(variables=['heartdisease'], evidence={'age': 37, 'sex' :0})
print(q['heartdisease'])
OUTPUT:

RESULT:
Thus, the Bayesian network for medical data Implementation was completed
successfully.
Ex:No:8 EM ALGORITHM IMPLEMENTATION

Date:

AIM:
To write a python program to implement EM Algorithm.

ALGORITHM:
1. Given a set of incomplete data, consider a set of starting parameters.
2. Expectation step (E – step): Using the observed available data of the dataset, estimate
(guess) the values of the missing data.
3. Maximization step (M – step): Complete data generated after the expectation (E) step is
used in order to update the parameters.
4. Repeat step 2 and step 3 until convergence.

PROGRAM:

import numpy as np
from sklearn.cluster import KMeans
import matplotlib.pyplot as plt
from sklearn.mixture import GaussianMixture
import pandas as pd
X=pd.read_csv("kmeansdata.csv")
x1 = X['Distance_Feature'].values
x2 = X['Speeding_Feature'].values
X = np.array(list(zip(x1, x2))).reshape(len(x1), 2)
plt.plot()
plt.xlim([0, 100])
plt.ylim([0, 50])
plt.title('Dataset')
plt.scatter(x1, x2)
plt.show()

#code for EM
gmm = GaussianMixture(n_components=3)
gmm.fit(X)
em_predictions = gmm.predict(X)
print("\nEM predictions")
print(em_predictions)
print("mean:\n",gmm.means_)
print('\n')
print("Covariances\n",gmm.covariances_)
print(X)
plt.title('Exceptation Maximum')
plt.scatter(X[:,0], X[:,1],c=em_predictions,s=50)
plt.show()

#code for Kmeans


import matplotlib.pyplot as plt1
kmeans = KMeans(n_clusters=3)
kmeans.fit(X)
print(kmeans.cluster_centers_)
print(kmeans.labels_)
plt.title('KMEANS')
plt1.scatter(X[:,0], X[:,1], c=kmeans.labels_, cmap='rainbow')
plt1.scatter(kmeans.cluster_centers_[:,0] ,kmeans.cluster_centers_[:,1], color='black')
OUTPUT:
RESULT:
Thus the EM Algorithm Implementation was completed successfully.
Ex:No:9 K-NEAREST NEIGHBOUR ALGORITHM IMPLEMENTATION

Date:

AIM:
To write a python program to implement k-Nearest Neighbour Algorithm.

ALGORITHM:
1. Select the number K of the neighbors
2. Calculate the Euclidean distance of K number of neighbors
3. Take the K nearest neighbors as per the calculated Euclidean distance.
4. Among these k neighbors, count the number of the data points in each category.
5. Assign the new data points to that category for which the number of the neighbor is
maximum.
6. Our model is ready.

PROGRAM:

from sklearn.neighbors import KNeighborsClassifier


from sklearn.metrics import confusion_matrix
from sklearn.metrics import accuracy_score
from sklearn.metrics import classification_report
from sklearn.model_selection import train_test_split
import pandas as pd
dataset=pd.read_csv("iris.csv")
X_train,X_test,y_train,y_test=train_test_split(X,y,random_state=0,test_size=0.25)
classifier=KNeighborsClassifier(n_neighbors=8,p=3,metric='euclidean')
classifier.fit(X_train,y_train)

#predict the test resuts


y_pred=classifier.predict(X_test)
cm=confusion_matrix(y_test,y_pred)
print('Confusion matrix is as follows\n',cm)
print('Accuracy Metrics')
print(classification_report(y_test,y_pred))
print(" correct predicition",accuracy_score(y_test,y_pred))
print(" worng predicition",(1-accuracy_score(y_test,y_pred)))
OUTPUT:

Confusion matrix is as follows


[[13 0 0]
[ 0 15 1]
[ 0 0 9]]

Accuracy Metrics
precision recall f1-score support
Iris-setosa 1.00 1.00 1.00 13
Iris-versicolor 1.00 0.94 0.97 16
Iris-virginica 0.90 1.00 0.95 9
avg / total 0.98 0.97 0.97 38
correct prediction 0.973
wrong prediction 0.027

RESULT:
Thus, the k-Nearest Neighbour Algorithm Implementation was completed
successfully.
Ex:No:10 NON-PARAMETRIC LOCALLY WEIGHTED REGRESSION
ALGORITHM IMPLEMENTATION

Date:

AIM:
To write a python program to implement non-parametric Locally Weighted Regression
Algorithm.

ALGORITHM:
1. Read the Given data Sample to X and the curve (linear or non linear) to Y
2. Set the value for Smoothening parameter or Free parameter say τ
3. Set the bias /Point of interest set x0 which is a subset of X
4. Determine the value of model term parameter β
5. Prediction = x0*β

PROGRAM:

import numpy as np
from bokeh.plotting import figure, show, output_notebook
from bokeh.layouts import gridplot
from bokeh.io import push_notebook
def local_regression(x0, X, Y, tau):

# add bias term


x0 = np.r_[1, x0] # Add one to avoid the loss in information
X = np.c_[np.ones(len(X)), X]
# fit model: normal equations with kernel
xw = X.T * radial_kernel(x0, X, tau) # XTranspose * W
beta = np.linalg.pinv(xw @ X) @ xw @ Y # @ Matrix Multiplication or Dot Product
# predict value
return x0 @ beta # @ Matrix Multiplication or Dot Product for prediction
def radial_kernel(x0, X, tau):
return np.exp(np.sum((X - x0) ** 2, axis=1) / (-2 * tau * tau))
# Weight or Radial Kernal Bias Function
n = 1000
# generate dataset
X = np.linspace(-3, 3, num=n)
print("The Data Set ( 10 Samples) X :\n",X[1:10])
Y = np.log(np.abs(X ** 2 - 1) + .5)
print("The Fitting Curve Data Set (10 Samples) Y :\n",Y[1:10])

# jitter X
X += np.random.normal(scale=.1, size=n)
print("Normalised (10 Samples) X :\n",X[1:10])
domain = np.linspace(-3, 3, num=300)
print(" Xo Domain Space(10 Samples) :\n",domain[1:10])
def plot_lwr(tau):
# prediction through regression
prediction = [local_regression(x0, X, Y, tau) for x0 in domain]
plot = figure(plot_width=400, plot_height=400)
plot.title.text='tau=%g' % tau
plot.scatter(X, Y, alpha=.3)
plot.line(domain, prediction, line_width=2, color='red')
return plot

# Plotting the curves with different tau


show(gridplot([
[plot_lwr(10.), plot_lwr(1.)],
[plot_lwr(0.1), plot_lwr(0.01)] ]))

OUTPUT:

The Data Set ( 10 Samples) X :


[-2.99399399 -2.98798799 -2.98198198 -2.97597598 -2.96996997 -2.96396396
-2.95795796 -2.95195195 -2.94594595]
The Fitting Curve Data Set (10 Samples) Y :
[2.13582188 2.13156806 2.12730467 2.12303166 2.11874898 2.11445659
2.11015444 2.10584249 2.10152068]
Normalised (10 Samples) X :
[-3.10518137 -3.00247603 -2.9388515 -2.79373602 -2.84946247 -2.85313888
-2.9622708 -3.09679502 -2.69778859]
Xo Domain Space(10 Samples) :
[-2.97993311 -2.95986622 -2.93979933 -2.91973244 -2.89966555 -2.87959866
-2.85953177 -2.83946488 -2.81939799]
RESULT:
Thus the non-parametric Locally Weighted Regression Algorithm Implementation was
completed successfully.
CONTENT BEYOND SYLLABUS

Ex:No:11 DEVELOP A MACHINE LEARNING METHOD TO CLASSIFY AN


INCOMING MAIL.

Date:

AIM:

To write a python program to Develop a Machine Learning Method to Classify an


incoming mail.

ALGORITHM:

1. Preparing the text data.


2. Creating word dictionary.
3. Feature extraction process
4. Training the classifier

PROGRAM:

def make_Dictionary(train_dir):

emails = [os.path.join(train_dir,f) for f in os.listdir(train_dir)]

all_words = []

for mail in emails:

with open(mail) as m:

for i,line in enumerate(m):

if i == 2: #Body of email is only 3rd line of text file

words = line.split()

all_words += words

dictionary = Counter(all_words)

# Paste code for non-word removal here(code snippet is given below)

return dictionary

list_to_remove = dictionary.keys()
for item in list_to_remove:

if item.isalpha() == False:

del dictionary[item]

elif len(item) == 1:

del dictionary[item]

dictionary = dictionary.most_common(3000)

def extract_features(mail_dir):

files = [os.path.join(mail_dir,fi) for fi in os.listdir(mail_dir)]

features_matrix = np.zeros((len(files),3000))

docID = 0;

for fil in files:

with open(fil) as fi:

for i,line in enumerate(fi):

if i == 2:

words = line.split()

for word in words:

wordID = 0

for i,d in enumerate(dictionary):

if d[0] == word:

wordID = i

features_matrix[docID,wordID] = words.count(word)

docID = docID + 1

return features_matrix

import os

import numpy as np
from collections import Counter

from sklearn.naive_bayes import MultinomialNB, GaussianNB, BernoulliNB

from sklearn.svm import SVC, NuSVC, LinearSVC

# Create a dictionary of words with its frequency

train_dir = 'train-mails'

dictionary = make_Dictionary(train_dir)

# Prepare feature vectors per training mail and its labels

train_labels = np.zeros(702)

train_labels[351:701] = 1

train_matrix = extract_features(train_dir)

# Training SVM and Naive bayes classifier

model1 = MultinomialNB()

model2 = LinearSVC()

model1.fit(train_matrix,train_labels)

model2.fit(train_matrix,train_labels)

# Test the unseen mails for Spam

test_dir = 'test-mails'

test_matrix = extract_features(test_dir)

test_labels = np.zeros(260)

test_labels[130:260] = 1

result1 = model1.predict(test_matrix)

result2 = model2.predict(test_matrix)

print confusion_matrix(test_labels,result1)

print confusion_matrix(test_labels,result2)
OUTPUT:

Multinomial NB Ham Spam

Ham 129 1

Spam 9 121

SVM(Linear) Ham Spam

Ham 126 4

Spam 6 124

Multinomial NB Ham Spam

Ham 6445 225

Spam 137 6680

SVM(Linear) Ham Spam

Ham 6490 180

Spam 109 6708

RESULT:
Thus the Develop a Machine Learning Method to Classify an incoming mail
Implementation was completed successfully.

You might also like