Download as pdf or txt
Download as pdf or txt
You are on page 1of 8

EXPERIMENT-2

OBJECTIVE: For a given set of training data example stored in .csv file, implement and demonstrate the
Candidate-Elimination Algorithm to output and describes the set of all hypotheses consistent with training
example.

Candidate-Elimination Learning Algorithm


The Candidate-Elimination algorithm computes the version space containing all hypothesis from H that are
consistent with an observed sequence of training examples. It begins by initializing the version space to the
set of all hypotheses in H; that is by initializing the G boundary set to contain most general hypothesis in H
G0 = {(?, ?, ?, ?)}
Then initialize the S boundary set to contain most specific hypothesis in H
S0 = {(Θ, Θ, Θ, Θ)}
For each training example, these S and G boundary sets are generalized and specialized, respectively, to
eliminate from the version space any hypothesis found inconsistent with the new training examples. After
execution of all the training examples, the computed version space contains all the hypotheses consistent
with these training examples.
The algorithm is summarized as below:
 Initialize G to the set of maximally general hypotheses in H
 Initialize S to the set of maximally specific hypotheses in H
 For each training example d, do
If d is a positive example
Remove from G any hypothesis inconsistent with d ,
For each hypothesis s in S that is not consistent with d ,
Remove s from S
Add to S all minimal generalizations h of s such that
» h is consistent with d, and some member of G is more general
than h
Remove from S any hypothesis that is more general than another hypothesis in S
If d is a negative example
Remove from S any hypothesis inconsistent with d
For each hypothesis g in G that is not consistent with d
Remove g from G
Add to G all minimal specializations h of g such that
» h is consistent with d, and some member of S is more specific
than h
Remove from G any hypothesis that is less general than another
hypothesis in G

PROGRAM:
import numpy as np
import pandas as pd
[3]
[2100321530158]
data = pd.DataFrame(data=pd.read_csv('trainingexamples.csv'))
concepts = np.array(data.iloc[:,0:-1])
print(concepts)
target = np.array(data.iloc[:,-1])
print(target)
def learn(concepts, target):
specific_h = concepts[0].copy()
print("initialization of specific_h and general_h")
print(specific_h)
general_h = [["?" for i in range(len(specific_h))] for i in range(len(specific_h))]
print(general_h)
for i, h in enumerate(concepts):
if target[i] == "Y":
for x in range(len(specific_h)):
if h[x]!= specific_h[x]:
specific_h[x] ='?'
general_h[x][x] ='?'
print(specific_h)
print(specific_h)
if target[i] == "N":
for x in range(len(specific_h)):
if h[x]!= specific_h[x]:
general_h[x][x] = specific_h[x]
else:
general_h[x][x] = '?'
print(" steps of Candidate Elimination Algorithm",i+1)
print(specific_h)
print(general_h)
indices = [i for i, val in enumerate(general_h) if val == ['?', '?', '?', '?', '?', '?']]
for i in indices:
general_h.remove(['?', '?', '?', '?', '?', '?'])
return specific_h, general_h
s_final, g_final = learn(concepts, target)
print("Final Specific_h:", s_final, sep="\n")
print("Final General_h:", g_final, sep="\n")
#data.head()

[4]
[2100321530158]
OUTPUT:

[5]
[2100321530158]
EXPERIMENT-3

OBJECTIVE: Write a program to demonstrate the working of the decision tree based ID3 algorithm. Use
an appropriate data set for building the decision tree and apply this knowledge to classify a new sample.

Following terminologies are used in this algorithm


 Entropy : Entropy is a measure of impurity
It is defined for a binary class with values a/b as:

 Information Gain: measuring the expected reduction in Entropy

THE PROCEDURE
1) In the ID3 algorithm, begin with the original set of attributes as the root node.
2) On each iteration of the algorithm, iterate through every unused attribute of the remaining set
and calculates the entropy (or information gain) of that attribute.
3) Then, select the attribute which has the smallest entropy (or largest information gain) value.
4) The set of remaining attributes is then split by the selected attribute to produce subsets of the
data.
5) The algorithm continues to recurs on each subset, considering only attributes never selected
before.

Dataset Details
Playtennis dataset which has following structure Total
number of instances=15
Attributes=Outlook, Temperature, Humidity, Wind, Answer Target
Concept=Answer

ID3 (Learning Sets S, Attributes Sets A, Attributes values V) Return Decision Tree
Begin
Load learning sets S first, create decision tree root node 'rootNode', add learning set S into root node as its
subset
For rootNode,
1) Calculate entropy of every attribute using the dataset
2) Split the set into subsets using the attribute for which entropy is minimum (or information gain is
maximum)

3) Make a decision tree node containing that attribute


4) Recurse on subsets using renaming attributes
End

This approach employs a top-down, greedy search through the space of possible decision trees.
 Algorithm starts by creating root node for the tree
 If all the examples are positive then return node with positive label

[6]
[2100321530158]
 If all the examples are negative then return node with negative label
 I f Attributes is empty, Return the single-node tree Root, with label = most common value of
Targetattribute in Example
 Otherwise -
1. Calculate the entropy of every attribute using the data set S using formula
Entropy = - p(a)*log(p(a)) - p(b)*log(p(b))
2. Split the set S into subsets using the attribute for which the resulting entropy (after splitting) is
minimum (or, equivalently, information gain is maximum) using formula

Gain(S,A)= Entropy(S) - Sum for v from 1 to n of (|Sv|/|S|) * Entropy(Sv)


3. Make a decision tree node containing that attribute
4. Recurring on subsets using remaining attributes.

PROGRAM:
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
import math
import copy

dataset = pd.read_csv('tennis.csv')
X = dataset.iloc[:, 1:].values
# print(X)
attribute = ['outlook', 'temp', 'humidity', 'wind']

class Node(object):
def __init__(self):
self.value = None
self.decision = None
self.childs = None

def findEntropy(data, rows):


yes = 0
no = 0
ans = -1
idx = len(data[0]) - 1
entropy = 0
for i in rows:
if data[i][idx] == 'Yes':
yes = yes + 1
else:
no = no + 1

x = yes/(yes+no)
y = no/(yes+no)
if x != 0 and y != 0:
entropy = -1 * (x*math.log2(x) + y*math.log2(y))
if x == 1:
ans = 1
if y == 1:
[7]
[2100321530158]
ans = 0
return entropy, ans

def findMaxGain(data, rows, columns):


maxGain = 0
retidx = -1
entropy, ans = findEntropy(data, rows)
if entropy == 0:
"""if ans == 1:
print("Yes")
else:
print("No")"""
return maxGain, retidx, ans

for j in columns:
mydict = {}
idx = j
for i in rows:
key = data[i][idx]
if key not in mydict:
mydict[key] = 1
else:
mydict[key] = mydict[key] + 1
gain = entropy

# print(mydict)
for key in mydict:
yes = 0
no = 0
for k in rows:
if data[k][j] == key:
if data[k][-1] == 'Yes':
yes = yes + 1
else:
no = no + 1
# print(yes, no)
x = yes/(yes+no)
y = no/(yes+no)
# print(x, y)
if x != 0 and y != 0:
gain += (mydict[key] * (x*math.log2(x) + y*math.log2(y)))/14
# print(gain)
if gain > maxGain:
# print("hello")
maxGain = gain
retidx = j

return maxGain, retidx, ans

def buildTree(data, rows, columns):

maxGain, idx, ans = findMaxGain(X, rows, columns)


root = Node()
root.childs = []
# print(maxGain
[8]
[2100321530158]
#
#)
if maxGain == 0:
if ans == 1:
root.value = 'Yes'
else:
root.value = 'No'
return root

root.value = attribute[idx]
mydict = {}
for i in rows:
key = data[i][idx]
if key not in mydict:
mydict[key] = 1
else:
mydict[key] += 1

newcolumns = copy.deepcopy(columns)
newcolumns.remove(idx)
for key in mydict:
newrows = []
for i in rows:
if data[i][idx] == key:
newrows.append(i)
# print(newrows)
temp = buildTree(data, newrows, newcolumns)
temp.decision = key
root.childs.append(temp)
return root

def traverse(root):
print(root.decision)
print(root.value)

n = len(root.childs)
if n > 0:
for i in range(0, n):
traverse(root.childs[i])

def calculate():
rows = [i for i in range(0, 14)]
columns = [i for i in range(0, 4)]
root = buildTree(X, rows, columns)
root.decision = 'Start'
traverse(root)

calculate()

[9]
[2100321530158]
OUTPUT:

[10]
[2100321530158]

You might also like