Classification - Issues Regarding Classification and Prediction

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 42

CSI3010 Data Warehousing and

Data Mining
Module-4:Classification

Dr Sunil Kumar P V

SCOPE
Classification: Overview
Introduction

• Bank loan
• Buys computer
• Best treatment
• The data analysis task is classification
• A model or classifier is constructed to predict
categorical labels, such as
• “safe” or “risky” for the loan application data
• “yes” or “no” for the marketing data
• “treatment A,” “treatment B,” or “treatment
C” for the medical data
2/39
Classification Vs Prediction

• Predict how much a given customer will spend


during a sale
• The model constructed predicts a
continuous-valued function, or ordered value, as
opposed to a categorical label
• Process is known as prediction/regression
• The model is known as predictor

3/39
Classification Step1

4/39
Classification Step2

5/39
Classification: Step1-Learning

• Learning step using the training set


• Training set: Database tuples and their
associated class labels
• A tuple X = (x1 , x2 , . . . , xn ) is an attribute
vector belongs to a predefined class label
(categorical, discrete and unordered)
• Learning y = f (X ), where y is the class label for
tuple X
• Because the class label of each training tuple is
provided, this step is also known as supervised
6/39
learning
Classification: Step2- Prediction

• A test set is used, made up of test tuples and


their associated class labels which are not used to
construct the classifier
• Accuracy of a classifier is the percentage of test
set tuples that are correctly classified by the
classifier

7/39
Points to note

• Supervised vs Unsupervised learning


• In prediction, no class label attribute, just the
predicted attribute
• Can be viewed as a mapping or function,
y = f (X ), where X is the input (e.g., a tuple
describing a loan applicant), and the output y is
a continuous or ordered value
• Eg :- Predicted loan amount

8/39
Issues Regarding Classification
and Prediction
Data Preparation

• Prepare data for classification and prediction to


improve the accuracy, efficiency, and scalability:
• Data cleaning to avoid noise and missing
values
• Relevance analysis to identify whether any
two given attributes are statistically related
(use correlation analysis)
• Attribute subset selection to remove
irrelevant/redundant attributes
• Data Transformation. Eg :- Normalization
9/39
Comparing Classification and Prediction Methods

• Accuracy
• Speed
• Robustness (performance given noisy data or
data with missing values)
• Scalability
• Iterpretability (level of understanding and insight
that is being provided by the classifier/predictor)

10/39
Decision Tree Induction
Decision Tree Algorithm Basics

• Three algorithms
• All were proposed in 1980s
• ID3 (Iterative Dichotomiser)
• C4.5 (a successor of ID3)
• Classification and Regression Trees (CART)
• Has three parameters: D, attribute_list, and
Attribute_selection_method
• Refer to Figure 6.3, Page 293, Han and Kamber
2nd Ed Textbook
11/39
Decision Tree Algorithms

• D is a data partition, Initiallly the entire training


data set
• attribute_list is a list of attributes describing the
tuples
• Attribute_selection_method specifies a
heuristic procedure for selecting the attribute
that “best” discriminates the given tuples
according to class
• information gain or the gini index (binary only)
• We use information gain as our trees many not
12/39
always be binary
Entropy(H)

• H(V ) = P(vk ) log2 P(v1 k ) =


P
k
P
− k P(vk ) log2 P(vk )
• Let X be a collection of training examples
• p+ is the collection of positive examples
• p− is the collection of negative examples
• H(X ) = −p+ log2 p+ − p− log2 p−

13/39
Entropy Examples

• H(X ) = −p+ log2 p+ − p− log2 p−


• Examples:
• H(14+, 0−) =
−14/14 log2 (14/14) − 0/14 log2 (0/14) = 0
• H(9+, 5−) =
−9/14 log2 (9/14) − 5/14 log2 (5/14) = 0.94
• H(7+, 7−) =
−7/14 log2 (7/14) − 7/14 log2 (7/14) = 1

14/39
Information Gain

• Estimates the expected reduction in the entropy


in the given data, caused by partitioning the
example based on an attribute
• The information gain Gain(X , A) of an attribute
A, relative to a collection of examples X is
defined as
X |X (v )|
Gain(X , A) = H(X ) − H(X (v ))
|X |
v ∈Values(A)

15/39
Decision Tree Demo

16/39
Entropy of Outlook

• Entropy of the dataset


• H(X ) = H(9+, 5−) =
−9/14 log2 (9/14) − 5/14 log2 (5/14) = 0.94
• To determine the entropy of Outlook
• Values of Outlook −→
Sunny , Overcast, Rainy

17/39
Entropy of Outlook

• H(Sunny ) = H(2+, 3−) =


−2/5 log2 (2/5) − 3/5 log2 (2/5) = 0.971
• H(Overcast) = H(4+, 0−) =
−4/4 log2 (4/4) − 0/4 log2 (0/4) = 0
• H(Rainy ) = H(3+, 2−) =
−3/5 log2 (3/5) − 2/5 log2 (2/5) = 0.971

18/39
Information Gain of Outlook

• Gain(X , Outlook) = H(X ) −


P |X (v )|
v ∈{Sunny ,Overcast,Rainy } |X | Entropy (X (v ))
• H(X ) − (5/14)H(Sunny ) −
(4/14)H(Overcast) − (5/14)H(Rainy )
• = 0.94 − (5/14) × 0.971 − (4/14) × 0 −
(5/14) × 0.971 = 0.2464

19/39
Information Gain of Temperature

• Entropy of the dataset


• H(X ) = H(9+, 5−) =
−9/14 log2 (9/14) − 5/14 log2 (5/14) = 0.94
• Entropy of Temperature
• Values of Temperature −→ Hot, Mild, Cool
• H(Hot) = H(2+, 2−) =
−2/4 log2 (2/4) − 2/4 log2 (2/4) = 1
• H(Mild) = H(4+, 2−) =
−4/6 log2 (4/6) − 2/6 log2 (2/6) = 0.9183
• H(Cool) = H(3+, 1−) =
−3/4 log2 (3/4) − 1/4 log2 (1/4) = 0.8113 20/39
Information Gain of Temperature

• Gain(X , Temperature) =
P |X (v )|
H(X ) − v ∈{Hot,Mild,Cool} |X | Entropy (X (v ))
• H(X ) − 4/14H(Hot) − 6/14H(Mild) −
4/14H(Cool)
• = 0.94−(4/14) × 1 − (6/14) × 0.9183 −
(4/14) × 0.8113 = 0.0289

21/39
Entropy of Humidity

• Entropy of the dataset


• H(X ) = H(9+, 5−) =
−9/14 log2 (9/14) − 5/14 log2 (5/14) = 0.94
• Entropy of Humidity
• Values of Humidity −→ High, Normal
• H(High) = H(3+, 4−) =
−3/7 log2 (3/7) − 4/7 log2 (4/7) = 0.9852
• H(Normal) = H(6+, 1−) =
−6/7 log2 (6/7) − 1/7 log2 (1/7) = 0.5916
22/39
Information Gain of Humidity

• Gain(X , Humidity ) =
P |X (v )|
H(X ) − v ∈{High,Normal} |X | Entropy (X (v ))
• H(X ) − 7/14H(High) − 7/14H(Normal)
• = 0.94−(7/14) × 0.9852 − (7/14) × 0.5916 =
0.1516

23/39
Entropy of Wind

• Entropy of the dataset


• H(X ) = H(9+, 5−) =
−9/14 log2 (9/14) − 5/14 log2 (5/14) = 0.94
• Entropy of Wind
• Values of Wind −→ Strong , Weak
• H(Strong ) = H(3+, 3−) =
−3/6 log2 (3/6) − 3/6 log2 (3/6) = 1
• H(Weak) = H(6+, 2−) =
−6/8 log2 (6/8) − 2/8 log2 (2/8) = 0.8113
24/39
Information Gain of Wind

• Gain(X , Wind) =
P |X (v )|
H(X ) − v ∈{Strong ,Weak} |X | Entropy (X (v ))
• H(X ) − 6/14H(Strong ) − 8/14H(Weak)
• = 0.94−(6/14) × 1 − (8/14) × 0.8113 = 0.0478

25/39
Attribute with the Maximum Information Gain

• Gain(X , Outlook) = 0.2464


• Gain(X , Temperature) = 0.0289
• Gain(X , Humidity ) = 0.1516
• Gain(X , Wind) = 0.0478
• Attribute to be chosen is Outlook

26/39
The tree so far

27/39
Dataset corresponding to Outlook = Sunny

28/39
X (Outlook = Sunny , Temperature)

• Entropy of the dataset (X (Sunny ))


• H(Sunny ) = H(3+, 2−) =
−3/5 log2 (3/5) − 2/5 log2 (2/5) = 0.97
• Values of Temperature −→ Hot, Mild, Cool
• H(Hot) = H(0+, 2−) = 0
• H(Mild) = H(1+, 1−) = 1
• H(Cool) = H(1+, 0−) = 0
• Gain(X (Sunny ), Temperature) = H(Sunny ) −
P |X (v )|
v ∈{Hot,Mild,Cool} |X | Entropy (X (v ))
• 0.97 − 2/5 × 0 − 2/5 × 1 − 1/5 × 0 = 0.570
29/39
X (Outlook = Sunny , Humidity )

• Entropy of the dataset (X (Sunny ))


• H(Sunny ) = H(3+, 2−) =
−3/5 log2 (3/5) − 2/5 log2 (2/5) = 0.97
• Values of Humidity −→ High, Normal
• H(High) = H(0+, 3−) = 0
• H(Normal) = H(2+, 0−) = 0
• Gain(X (Sunny ), Humidity ) =
P |X (v )|
H(Sunny )− v ∈{High,Normal} |X | Entropy (X (v ))
• 0.97 − 3/5 × 0 − 2/5 × 0 = 0.97

30/39
X (Outlook = Sunny , Wind)

• Entropy of the dataset (X (Sunny ))


• H(Sunny ) = H(3+, 2−) =
−3/5 log2 (3/5) − 2/5 log2 (2/5) = 0.97
• Values of Wind −→ Strong , Weak
• H(Strong ) = H(1+, 1−) = 1
• H(Weak) = H(1+, 2−) = 0.9183
• Gain(X (Sunny ), Wind) = H(Sunny ) −
P |X (v )|
v ∈{Strong ,Weak} |X | Entropy (X (v ))
• 0.97 − 2/5 × 1 − 3/5 × 0.918 = 0.0192

31/39
Attribute with the Maximum Information Gain

• Gain(X (Sunny ), Temperature) = 0.570


• Gain(X (Sunny ), Humidity ) = 0.97
• Gain(X (Sunny ), Wind) = 0.0192
• Attribute to be chosen is Humidity

32/39
The tree so far

33/39
Dataset corresponding to Outlook = Rain

34/39
X (Outlook = Rain, Temperature)

• Entropy of the dataset (X (Rain))


• H(Rain) = H(3+, 2−) =
−3/5 log2 (3/5) − 2/5 log2 (2/5) = 0.97
• Values of Temperature −→ Hot, Mild, Cool
• H(Hot) = H(0+, 0−) = 0
• H(Mild) = H(2+, 1−) = 0.9183
• H(Cool) = H(1+, 1−) = 1
• Gain(X (Sunny ), Temperature) = H(Sunny ) −
P |X (v )|
v ∈{Hot,Mild,Cool} |X | Entropy (X (v ))
• 0.97 − 0/5 × 0 − 3/5 × 0.9183 − 2/5 × 1 = 0.0192
35/39
X (Outlook = Rain, Humidity )

• Entropy of the dataset (X (Rain))


• H(Rain) = H(3+, 2−) =
−3/5 log2 (3/5) − 2/5 log2 (2/5) = 0.97
• Values of Humidity −→ High, Normal
• H(High) = H(1+, 1−) = 1
• H(Normal) = H(2+, 1−) = 0.9183
• Gain(X (Rain), Humidity ) =
P |X (v )|
H(Rain) − v ∈{High,Normal} |X | Entropy (X (v ))
• 0.97 − 2/5 × 1 − 3/5 × 0.9183 = 0.0192

36/39
X (Outlook = Rain, Wind)

• Entropy of the dataset (X (Rain))


• H(Rain) = H(3+, 2−) =
−3/5 log2 (3/5) − 2/5 log2 (2/5) = 0.97
• Values of Wind −→ Strong , Weak
• H(Strong ) = H(0+, 2−) = 0
• H(Weak) = H(3+, 0−) = 0
• Gain(X (Rain), Wind) =
P |X (v )|
H(Rain) − v ∈{Strong ,Weak} |X | Entropy (X (v ))
• 0.97 − 2/5 × 0 − 3/5 × 0 = 0.97

37/39
Attribute with the Maximum Information Gain

• Gain(X (Rain), Temperature) = 0.0192


• Gain(X (Rain), Humidity ) = 0.0192
• Gain(X (Rain), Wind) = 0.97
• Attribute to be chosen is Wind

38/39
The Final Decision Tree

39/39

You might also like