Chap 18 B

Categorical data
Decision Tree Classification
Which feature to split on?
Try to classify as many as possible with each spli

(This is a good split)
Which feature to split on?
This is a bad split no classifications obtained
Improving a good split
Decision Tree Algorithm

Framework
If you have positive and negative examples,

use a splitting criterion to decide on best
attribute to split
Each child is a new decision tree call the

algorithm again with the parent feature removed
If all data points in child node are same

class, classify node as that class
If no attributes left, classify by majority rule
If no data points left, no such example seen:
classify as majority class from entire dataset
Splitting Criterion
ID3 Algorithm
Some information theory

Blackboard
Issues on training and test

sets
Do you know the correct

classification for the test set?
If you do, why not include it in the
training set to get a better
classifier?
If you dont, how can you measure
the performance of your classifier?
Cross Validation
Tenfold cross-validation
Ten iterations
Pull a different tenth of the dataset out
each time to act as a test set
Train on the remaining training set
Measure performance on the test set
Leave one out cross-validation
Similar, but leave only one point out

each time, then count correct vs.
incorrect
Noise and Overfitting
Can we always obtain a decision tree that

is consistent with the data?
Do we always want a decision tree that is
consistent with the data?
Example: Predict Carleton students who
become CEOs
Features: state/country of origin, GPA letter,

major, age, high school GPA, junior high
GPA, ...
What happens with only a few features?
What happens with many features?
Overfitting
Fitting a classifier too closely to

the data
finding patterns that arent really there
Prevented in decision trees by

pruning
When building trees, stop recursion on

irrelevant attributes
Do statistical tests at node to determine
if should continue or not
Examples of decision trees

using Weka
Preventing overfitting by
cross validation
Another technique to prevent

overfitting (is this valid)?
Keep on recursing on decision tree as

long as you continue to get improved
accuracy on the test set
Review of how to decide

on which attribute to split
Dataset has two classes, P and N

Relationship between information and
randomness
The more random a dataset is (points in P

and N), the more information is provided by
the message Your point is in class P (or N).
The less random a dataset is, the less
information is provided by the message
Your point is in class P (or N).
Information of message =
Randomness of dataset =
pP log p P p N log p N
How much randomness in

split?
0 log 2 0 1log 2 1 0
2
2 4
4
log 2 log 2 0.9183
6
6 6
6
1log 2 1 0 log 2 0 0
2
4
6
Weighted average (0) (0) (0.9183) 0.4591
12
12
12
How much randomness in

split?
1
1 1
1
log 2 log 2 1
2
2 2
2
2
2 2
2
log 2 log 2 1
4
4 4
4
Weighted average 1
Which split is better?
Patrons split
Type split
Randomness = 0.4591
Randomness = 1
Patrons has less randomness, so it

is a better split
Randomness is often referred to as
entropy (similarities with
thermodynamics)
Learning Logical
Descriptions
Hypothesis
x WillWait ( x)
Patrons ( x, Some)
Patrons ( x, Full ) Hungry ( x) Type ( x, French )
Patrons ( x, Full ) Hungry ( x) Type ( x, Burger )
Patrons ( x, Full ) Hungry ( x) Type ( x, Thai ) FriSat ( x)
Learning Logical
Descriptions
Goal is to learn a logical hypothesis consistent

with the data
Example of hypothesis consistent with X1:
x WillWait ( x)
Alternate( x) Bar ( x) Est 0 10( x)
Is this consistent with X2?

X2 is a false negative for hypothesis if
hypothesis says negative, but should be positive
X2 is a false positive for hypothesis if hypothesis
says positive, but should be negative
Current-best-hypothesis
search
Start with an initial hypothesis and adjust it as

you see examples
Example: based on X1, arbitrarily start with
H 1 : x WillWait ( x) Alternate( x)
X2 should be -, but H1 says +. H1 is not

restrictive enough, specialize it:
H 2 : x WillWait ( x)
Alternate( x) Patrons ( x, Some)
X3 should be +, but H2 says -. H2 is too

restrictive, generalize:
Current-best-hypothesis
search
H 2 : x WillWait ( x) Alternate( x) Patrons ( x, Some)
H 3 : x WillWait ( x) Patrons ( x, Some)
X4 should be +, H3 says -. Must generalize:
H 4 : x WillWait ( x)
Patrons ( x, Some)
( Patrons ( x, Full ) FriSat ( x))
What if you end up with an inconsistent hypothesis that you cannot

modify to make work?
Backup search and try a different route

Tree on blackboard
Neural Networks
Moving on to Chapter 19, neural

networks

Chap 18 B

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Chap 18 B

Uploaded by

Copyright:

Available Formats

Categorical data

Decision Tree Classification

Which feature to split on?

Try to classify as many as possible with each spli

Which feature to split on?

This is a bad split no classifications obtained

Improving a good split

Decision Tree Algorithm

If you have positive and negative examples,

Each child is a new decision tree call the

If all data points in child node are same

Some information theory

Issues on training and test

Do you know the correct

Leave one out cross-validation

Similar, but leave only one point out

Noise and Overfitting

Can we always obtain a decision tree that

Features: state/country of origin, GPA letter,

Fitting a classifier too closely to

finding patterns that arent really there

Prevented in decision trees by

When building trees, stop recursion on

Examples of decision trees

Another technique to prevent

Keep on recursing on decision tree as

Review of how to decide

Dataset has two classes, P and N

The more random a dataset is (points in P

How much randomness in

How much randomness in

Which split is better?

Patrons has less randomness, so it

Goal is to learn a logical hypothesis consistent

Alternate( x) Bar ( x) Est 0 10( x)

Is this consistent with X2?

Start with an initial hypothesis and adjust it as

X2 should be -, but H1 says +. H1 is not

X3 should be +, but H2 says -. H2 is too

X4 should be +, H3 says -. Must generalize:

What if you end up with an inconsistent hypothesis that you cannot

Backup search and try a different route

Moving on to Chapter 19, neural

You might also like