Professional Documents
Culture Documents
MLT Shivani
MLT Shivani
OBJECTIVE: For a given set of training data example stored in .csv file, implement and demonstrate the
Candidate-Elimination Algorithm to output and describes the set of all hypotheses consistent with training
example.
PROGRAM:
import numpy as np
import pandas as pd
[3]
[2100321530158]
data = pd.DataFrame(data=pd.read_csv('trainingexamples.csv'))
concepts = np.array(data.iloc[:,0:-1])
print(concepts)
target = np.array(data.iloc[:,-1])
print(target)
def learn(concepts, target):
specific_h = concepts[0].copy()
print("initialization of specific_h and general_h")
print(specific_h)
general_h = [["?" for i in range(len(specific_h))] for i in range(len(specific_h))]
print(general_h)
for i, h in enumerate(concepts):
if target[i] == "Y":
for x in range(len(specific_h)):
if h[x]!= specific_h[x]:
specific_h[x] ='?'
general_h[x][x] ='?'
print(specific_h)
print(specific_h)
if target[i] == "N":
for x in range(len(specific_h)):
if h[x]!= specific_h[x]:
general_h[x][x] = specific_h[x]
else:
general_h[x][x] = '?'
print(" steps of Candidate Elimination Algorithm",i+1)
print(specific_h)
print(general_h)
indices = [i for i, val in enumerate(general_h) if val == ['?', '?', '?', '?', '?', '?']]
for i in indices:
general_h.remove(['?', '?', '?', '?', '?', '?'])
return specific_h, general_h
s_final, g_final = learn(concepts, target)
print("Final Specific_h:", s_final, sep="\n")
print("Final General_h:", g_final, sep="\n")
#data.head()
[4]
[2100321530158]
OUTPUT:
[5]
[2100321530158]
EXPERIMENT-3
OBJECTIVE: Write a program to demonstrate the working of the decision tree based ID3 algorithm. Use
an appropriate data set for building the decision tree and apply this knowledge to classify a new sample.
THE PROCEDURE
1) In the ID3 algorithm, begin with the original set of attributes as the root node.
2) On each iteration of the algorithm, iterate through every unused attribute of the remaining set
and calculates the entropy (or information gain) of that attribute.
3) Then, select the attribute which has the smallest entropy (or largest information gain) value.
4) The set of remaining attributes is then split by the selected attribute to produce subsets of the
data.
5) The algorithm continues to recurs on each subset, considering only attributes never selected
before.
Dataset Details
Playtennis dataset which has following structure Total
number of instances=15
Attributes=Outlook, Temperature, Humidity, Wind, Answer Target
Concept=Answer
ID3 (Learning Sets S, Attributes Sets A, Attributes values V) Return Decision Tree
Begin
Load learning sets S first, create decision tree root node 'rootNode', add learning set S into root node as its
subset
For rootNode,
1) Calculate entropy of every attribute using the dataset
2) Split the set into subsets using the attribute for which entropy is minimum (or information gain is
maximum)
This approach employs a top-down, greedy search through the space of possible decision trees.
Algorithm starts by creating root node for the tree
If all the examples are positive then return node with positive label
[6]
[2100321530158]
If all the examples are negative then return node with negative label
I f Attributes is empty, Return the single-node tree Root, with label = most common value of
Targetattribute in Example
Otherwise -
1. Calculate the entropy of every attribute using the data set S using formula
Entropy = - p(a)*log(p(a)) - p(b)*log(p(b))
2. Split the set S into subsets using the attribute for which the resulting entropy (after splitting) is
minimum (or, equivalently, information gain is maximum) using formula
PROGRAM:
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
import math
import copy
dataset = pd.read_csv('tennis.csv')
X = dataset.iloc[:, 1:].values
# print(X)
attribute = ['outlook', 'temp', 'humidity', 'wind']
class Node(object):
def __init__(self):
self.value = None
self.decision = None
self.childs = None
x = yes/(yes+no)
y = no/(yes+no)
if x != 0 and y != 0:
entropy = -1 * (x*math.log2(x) + y*math.log2(y))
if x == 1:
ans = 1
if y == 1:
[7]
[2100321530158]
ans = 0
return entropy, ans
for j in columns:
mydict = {}
idx = j
for i in rows:
key = data[i][idx]
if key not in mydict:
mydict[key] = 1
else:
mydict[key] = mydict[key] + 1
gain = entropy
# print(mydict)
for key in mydict:
yes = 0
no = 0
for k in rows:
if data[k][j] == key:
if data[k][-1] == 'Yes':
yes = yes + 1
else:
no = no + 1
# print(yes, no)
x = yes/(yes+no)
y = no/(yes+no)
# print(x, y)
if x != 0 and y != 0:
gain += (mydict[key] * (x*math.log2(x) + y*math.log2(y)))/14
# print(gain)
if gain > maxGain:
# print("hello")
maxGain = gain
retidx = j
root.value = attribute[idx]
mydict = {}
for i in rows:
key = data[i][idx]
if key not in mydict:
mydict[key] = 1
else:
mydict[key] += 1
newcolumns = copy.deepcopy(columns)
newcolumns.remove(idx)
for key in mydict:
newrows = []
for i in rows:
if data[i][idx] == key:
newrows.append(i)
# print(newrows)
temp = buildTree(data, newrows, newcolumns)
temp.decision = key
root.childs.append(temp)
return root
def traverse(root):
print(root.decision)
print(root.value)
n = len(root.childs)
if n > 0:
for i in range(0, n):
traverse(root.childs[i])
def calculate():
rows = [i for i in range(0, 14)]
columns = [i for i in range(0, 4)]
root = buildTree(X, rows, columns)
root.decision = 'Start'
traverse(root)
calculate()
[9]
[2100321530158]
OUTPUT:
[10]
[2100321530158]