Professional Documents
Culture Documents
Lecture No. 2: AU-KBC Research Centre, MIT Campus, Anna University
Lecture No. 2: AU-KBC Research Centre, MIT Campus, Anna University
Ravi Gupta
AU-KBC Research Centre,
MIT Campus, Anna University
Date: 8.3.2008
Todays Agenda
h0 = <, , , , , >
Step 2: FIND-S
h0 = <, , , , , >
a1 a2 a3 a4 a5 a6
Iteration 1
Iteration 2
x2 = <Sunny, Warm, High, Strong, Warm, Same>
Iteration 4
x4 = < Sunny, Warm, High, Strong, Cool, Change >
Step 3
Earlier
(i.e., FIND-S)
Def.
LIST-THEN-ELIMINATE Algorithm
to Obtain Version Space
LIST-THEN-ELIMINATE Algorithm
to Obtain Version Space
Examples
Hypothesis Space
.
Version Space
.
.
.
.
VSH,D
.
H
D
LIST-THEN-ELIMINATE Algorithm
to Obtain Version Space
These members form general and specific boundary sets that delimit
the version space within the partially ordered hypothesis space.
Least General
(Specific)
Most General
Candidate-Elimination Algorithm
Example
G0 {<?, ?, ?, ?, ?, ?>}
Initialization
S0 {<, , , , , >}
G0 {<?, ?, ?, ?, ?, ?>}
S0 {<, , , , , >}
consistent
G2 {<?, ?, ?, ?, ?, ?>}
S3 {< Sunny, Warm, ?, Strong, Warm, Same >}
No
G0 {<?, ?, ?, ?, ?, ?>}
S0 {<, , , , , >}
consistent
G4 { }
Yes
No
Decision Trees
Decision Trees
An instance is classified by starting at the root node of the tree, testing the
attribute specified by this node, then moving down the tree branch
corresponding to the value of the attribute in the given example. This process
is then repeated for the subtree rooted at the new node.
Decision Trees
PlayTennis = No
Decision Trees
Edges: Attribute
value
Intermediate
Nodes: Attributes
Attribute: A1
Attribute Attribute
value Attribute
value
value
Leave node:
Output value
Decision Trees
conjunction
disjunction
No
B
False True
Yes No
Decision Trees (F = A V (B ^ C))
A
True False
Yes
B
False True
No C
False True
No Yes
Decision Trees (F = A XOR B)
F = (A ^ B') V (A' ^ B)
A
False True
B B
False False True
True
No Yes No
Yes
Decision Trees as If-then-else rule
conjunction
disjunction
Employs a top-down
An instance is classified by starting at the root
node of the tree, testing the attribute specified
by this node, then moving down the tree
branch corresponding to the value of the
attribute in the given example. This process is
then repeated for the subtree rooted at the
new node.
Attribute: A1
Attribute value
Attribute value
Attribute
value
Output value
Attribute: A2 Attribute: A3
Attribute value
Attribute value Attribute value Attribute value
Outlook
Temperature Which attribute to
select ?????
Humidity
Wind
Root
node
Which Attribute to Select ??
John von
Neumann
My greatest concern was what to call it. I thought of calling it 'information,' but the
word was overly used, so I decided to call it 'uncertainty.' When I discussed it with
John von Neumann, he had a better idea. Von Neumann told me, 'You should call
it entropy, for two reasons. In the first place your uncertainty function has
been used in statistical mechanics under that name, so it already has a name.
In the second place, and more important, no one really knows what entropy
really is, so in a debate you will always have the advantage.'
Shannon's mouse
where
I(X) is the information content or self-information of X, which is itself a
random variable; and
p(xi) = Pr(X=xi) is the probability mass function of X.
Entropy in our Context
Entropy of S after
Entropy of S
partition
Gain(S, A) is the information provided about the target &action value, given the
value of some other attribute A. The value of Gain(S, A) is the number of bits
saved when encoding the target value of an arbitrary member of S, by knowing
the value of attribute A.
Example
Entropy(Hot) = 0
Temperature
(Hot) {0+, 2-) Entropy(Mild) = 1
(Mild) {1+, 1-} Entropy(Cool) = 0
(Cool) {1+, 0-} Gain(S1, Temperature) = 0.97095 2/5*0 2/5*1 1/5*0 = 0.57095
Humidity Entropy(High) = 0
(High) {0+, 3-} Entropy(Normal) = 0
(Normal) {2+, 0-} Gain(S1, Humidity) = 0.97095 3/5*0 2/5*0 = 0.97095
Entropy(Weak) = 0.9183
Wind
(Weak) {1+, 2-} Entropy(Normal) = 1.0
(Strong) {1+, 1-} Gain(S1, Wind) = 0.97095 3/5*0.9183 2/5*1 = 0.01997
Modified Decision Tree
Gain (SRain,A)
Temperature
Humidity Wind
(Hot) {0+, 0-)
(High) {1+, 1-} (Weak) {3+, 0-}
(Mild) {2+, 1-}
(Normal) {2+, 1-} (Strong) {0+, 2-}
(Cool) {1+, 1-}
Gain (SRain,A)
Entropy(SRain) = - { 2/5 log(2/5) + 3/5 log(3/5)} = 0.97095
Entropy(Hot) = 0
Temperature
(Hot) {0+, 0-) Entropy(Mild) = 0.1383
(Mild) {2+, 1-} Entropy(Cool) = 1.0
(Cool) {1+, 1-} Gain(S1, Temperature) = 0.97095 0 2/3*0.1383 - 2/5*1 = 0.4922
Entropy(Weak) = 0.0
Wind
(Weak) {3+, 0-} Entropy(Normal) = 0.0
(Strong) {0+, 2-} Gain(S1, Humidity) = 0.97095 - 3/5*0 2/5*0 = 0.97095
Final Decision Tree
Home work
Home work
Home work
a1
(True) {2+, 1-}
(False) {1+, 2-}
a2 Entropy(a2=True) = 1.0
(True) {2+, 2-} Entropy(a1=False) = 1.0
(False) {1+, 1-} Gain (S, a1) = 1 4/6*1 -2/6*1 = 0.0
Home work
a1
True False
a1
True False
+ (Yes)
- (No) - (No)
+ (Yes)
Home work
a1
True False
+ (Yes)
- (No) - (No)
+ (Yes)
ID3 uses all training examples at each step in the search to make
statistically based decisions regarding how to refine its current
hypothesis. This contrasts with methods that make decisions
incrementally, based on individual training examples (e.g., FIND-S
or CANDIDATE-ELIMINATION). One advantage of using statistical
properties of all the examples (e.g., information gain) is that the
resulting search is much less sensitive to errors in individual training
examples. [Advantage]