Download as pdf or txt
Download as pdf or txt
You are on page 1of 22

Classification - A Learning Paradigm

Technical Description of the Problems that are fit for Learning a ‘class’
Summary with an Exercise

Learning Paradigm - Classification

L Jeganathan

School of Computing Sciences and Engineering

L Jeganathan Learning Paradigm - Classification


Classification - A Learning Paradigm
Technical Description of the Problems that are fit for Learning a ‘class’
Summary with an Exercise

S O FAR , WE HAVE LEARNT...

What is learning (in general)? - Different perspectives of Learning.


A general definition of a Machine Learning Model.
A learning paradigm called Regression.

L Jeganathan Learning Paradigm - Classification


Classification - A Learning Paradigm
Technical Description of the Problems that are fit for Learning a ‘class’
Summary with an Exercise

O UTLINE

1 C LASSIFICATION - A L EARNING PARADIGM

2 T ECHNICAL D ESCRIPTION OF THE P ROBLEMS THAT ARE FIT FOR


L EARNING A ‘ CLASS ’

3 S UMMARY WITH AN E XERCISE

L Jeganathan Learning Paradigm - Classification


Classification - A Learning Paradigm
Technical Description of the Problems that are fit for Learning a ‘class’
Summary with an Exercise

L OAN - APPLICATION CLASSIFICATION PROBLEM


A bank loans money to the customers.
Loan should be paid back with interest, in monthly installments.
Banks calculate the ‘risk’ associated with the loan.
‘Risk’ is the probability that the customer will not pay the whole amount back.
Based on the Risk-score, banks sanction the loan
A customer has applied for a loan in the Bank. Given the information about the
customer from the loan-application, Classify the application based on the risk
associated with the loan. This classification may help the bank to decide on the
loan-application.

How to solve this problem, through ‘Learning’?


R EQUIREMENTS
‘Experience’ has to be made ready.
L Jeganathan Learning Paradigm - Classification
Classification - A Learning Paradigm
Technical Description of the Problems that are fit for Learning a ‘class’
Summary with an Exercise

M AKING THE ‘E XPERIENCE (E)’, THE DATA SET

Identify few factors, on which, risk-score depends.


Some factors may be : Loan Amount (A), Income (I), Savings(S), Age(A),
Collateral (C) (assets pledged as security for repayment of a loan, to be
forfeited in the event of a default), Past financial history(H).
Based on the regularity in paying the monthly installments, call the customer
as ‘Low-risk’ and ‘High-risk’.
‘risk’ can take two values : ‘High’ and ’Low’.
Customers with ‘Low-risk’ are the one’s who pay the monthly installments
regularly and have a high probability of paying back the loan amount fully.
Collect the information on the above factors for each of the loan sanctioned by
the bank in the past and the ‘risk-score’ of each loan.

L Jeganathan Learning Paradigm - Classification


Classification - A Learning Paradigm
Technical Description of the Problems that are fit for Learning a ‘class’
Summary with an Exercise

A T YPICAL E XPERIENCE (E)

S.no. Loan Amount (L) Monthly Income (I) Monthly Savings(S) Age(A) Collaterals(C) Financial History (H) risk-score(r)
in Lakhs. Lakhs. in Lakhs. in years. in Lakhs
1 10 0.2 0.02 35 3 good Low
2 23 0.5 0.1 39 10 good Low
3 70 0.7 0.01 45 9 bad High
4 50 1.5 0.5 40 9 good Low
5 35 0.9 0.3 29 10 bad High
. . . . . . ..
. . . . . . .
. . . . . . .

Factors : L, I, S, A, C, H are the input variables. r is the output variable.


Input variables may be continuous (can take any number as values) or discrete (can take finite values).
Input variables need not be numerals alone.
View the output variable as a ‘class’ (or a group). All the instances whose out put is ’Low’ are considered as the Class of inputs which have
Low-risk’ Similarly, we have the class of ‘High risk’.
In our E, a loan with the information : ( 35, 0.9, 0.3, 29, 10, bad) has the output as ‘High’. We interpret that a loan with such information
belongs to the class ‘High-risk’

L Jeganathan Learning Paradigm - Classification


Classification - A Learning Paradigm
Technical Description of the Problems that are fit for Learning a ‘class’
Summary with an Exercise

‘E XPERIENCE ’ WITH ALL THE INPUT VALUES AS THE NUMERALS

We code H as 1 if H is ‘good’. Code H as 0 if H is ‘bad’.


S.no. Loan Amount (L) Monthly Income (I) Monthly Savings(S) Age(A) Collaterals(C) Financial History (H) risk-score(r)
in Lakhs. Lakhs. in Lakhs. in years. in Lakhs
1 10 0.2 0.02 35 3 1 Low
2 23 0.5 0.1 39 10 1 Low
3 70 0.7 0.01 45 9 0 High
4 50 1.5 0.5 40 9 1 Low
5 35 0.9 0.3 29 10 0 High
. . . . . . ..
. . . . . . .
. . . . . . .

L Jeganathan Learning Paradigm - Classification


Classification - A Learning Paradigm
Technical Description of the Problems that are fit for Learning a ‘class’
Summary with an Exercise

W HAT IS THE TASK ?

TASK :
Given the information on the input attributes from a loan application, To compute
the class ( High risk or Low risk) to which the loan application may belong?

L Jeganathan Learning Paradigm - Classification


Classification - A Learning Paradigm
Technical Description of the Problems that are fit for Learning a ‘class’
Summary with an Exercise

H OW TO ACCOMPLISH THE TASK (T)?

Procedure:
Propose an equation (called as a hypothesis) that involves all the input
variables.
Let Score(x) = w1 L + w2 I + w3 S + w4 A + w5 C + w6 H + w0 , where
x = (L, I, S, A, C, H)
Score(x) is the credit score, for the input x.

L Jeganathan Learning Paradigm - Classification


Classification - A Learning Paradigm
Technical Description of the Problems that are fit for Learning a ‘class’
Summary with an Exercise

L EARNING P ROCESS CALLED C LASSIFICATION


Considering all the values (both input and the output) of the training data, we
compute the values of the unknowns (Weights): w0 , w1 , w2 , w3 , w4 , w5 , w6 ,
there by computing the ‘Score of an input’ called ‘Credit Score’
Deciding Threshold: Fix a threshold for the ‘Score’. Let it be θ
CLASSIFIER : If Score(x) is greater than θ, then x is in a class called
‘High-risk’, otherwise x is in a class ‘Low-risk’.

Process of learning the weights is called Classification.

We will discuss the ‘classification’ technically later.


At present, Assume that, with the training data, we have learnt
w0 , w1 , w2 , w3 , w4 , w5 , w6 .
Also, Assume that we have decided the threshold value of the score.
L Jeganathan Learning Paradigm - Classification
Classification - A Learning Paradigm
Technical Description of the Problems that are fit for Learning a ‘class’
Summary with an Exercise

P ROCESS - CONTINUED

Let w0 = −6, w1 = 1, w2 = −12, w3 = −12, w4 = −0.1, w5 = −1, w6 = −1.


Score(x) = (1)L + (−12)I + (−12)S + (0.1)(A) + (−1)(C) + (−1)H+, where
x = (L, I, S, A, C, H)
For the score x= (35, 1, 0, 0.5, 29, 10, 1), then
Score(x) =
(1)(35) + (−12)(1) + (−12)(0.5) + (0.1)(29) + (−1)(10) + (−1)1 + (−6) = −2
Let the threshold value be 0.
If the Score(x) ≤ 0, then x belongs to the class ‘Low-risk’
If the Score(x) > 0, then x belongs to the class ‘High-risk’
So, our input, (loan-application), (35, 1, 0, 0.5, 29, 10, 1) is classified as ‘Low-risk’.
‘Low-risk’ means that the probability of paying back the loan is high.

L Jeganathan Learning Paradigm - Classification


Classification - A Learning Paradigm
Technical Description of the Problems that are fit for Learning a ‘class’
Summary with an Exercise

Thus, we have understood the process of classifying the loan-application,


based on the experience E.
T is accomplished based on E, through a learning process (model) called
‘Classification’.
‘CLASSIFIER’ does the task by learning the values: w0 , w1 , w2 , w3 , w4 , w5 , w6
and properly fixing the threshold value.

H OW FAR OUR ‘ LEARNING MODEL’ IS CORRECT IN PREDICTING THE CLASS OF THE


‘ LOAN APPLICATION ’

L Jeganathan Learning Paradigm - Classification


Classification - A Learning Paradigm
Technical Description of the Problems that are fit for Learning a ‘class’
Summary with an Exercise

P ERFORMANCE MEASURE OF THE L EARNING


To compute the value of ‘P’:
For the first item in E:
x 1 = (L1 , I 1 , S 1 , A1 , C 1 , H 1 ) = (10, 0.2, 0.02, 35, 3, good). Calculate r̂ 1 (x 1 ),
through our classifier. Let it be r̂ 1 .
r̂ 1 is the ‘class’ predicted by our classifier for the first item in E.
From
Price for the first item in ‘E’, r 1 is ’Low’.
Check if r̂ 1 and r 1 are same or not.
If r̂ 1 6= r 1 , that means, the ‘classifier’ has misclassfied.
If the predicted class of an item and the actual class of an item are different,
there is an error in the first item.
Like this, we check whether the error in all the N items of ‘E’.
Count the total number of errors that had occurred in the items of E.
This count is called the Performance measure of the classifier.
L Jeganathan Learning Paradigm - Classification
Classification - A Learning Paradigm
Technical Description of the Problems that are fit for Learning a ‘class’
Summary with an Exercise

P ERFORMANCE MEASURE - CONTINUED


P ERFORMANCE M EASURE OF THE C LASSIFIER
N
X
PerformanceMeasure = kt,
t=1

where
k t = 1 if r̂ t 6= r t
k t = 0 if r̂ t = r t
P is the count of the number of misclassifications done by the classifier.
N OTE
‘P=0’ means that, our learning model had classified correctly.

P is high =⇒ Our learning model had not learnt nicely.


Then, we try to learn through another hypothesis,
L Jeganathan say - Classification
Learning Paradigm
Classification - A Learning Paradigm
Technical Description of the Problems that are fit for Learning a ‘class’
Summary with an Exercise

G EOMETRICAL I NTERPRETATION - I DENTIFYING THE D ISCRIMINANT


Consider an E for the ‘Loan-application Classification Problem’, with two input
attributes.
S.No. Age Income risk
(in years ) (in lakhs) Risk
1 35 2.5 Low
2 36 5.2 Low
3 45 2.7 Low
4 56 6 High
5 65 5.2 Low
6 67 4.5 High
7 70 5.6 High
8 27 3.5 Low
9 75 3.8 High
L Jeganathan Learning Paradigm - Classification
Classification - A Learning Paradigm
Technical Description of the Problems that are fit for Learning a ‘class’
Summary with an Exercise

P LOTTING E
Any input with d-factors can be plotted as a point in d-dimensional surface.

N OTE : I NSTEAD OF LEARNING w0 , w1 , ...wd AND THE THRESHOLD VALUE


We can also learn to classify using the rule:
If Age > 70 years and Income > 3 Lakhs, then the loan-application is
classified as ‘High-risk’. Otherwise ‘Low-risk’.

The line which separates the two classes, is called the ‘Discriminant’.
L Jeganathan Learning Paradigm - Classification
Classification - A Learning Paradigm
Technical Description of the Problems that are fit for Learning a ‘class’
Summary with an Exercise

T ECHNICAL D ESCRIPTION OF THE P ROBLEMS THAT ARE FIT FOR


L EARNING A ‘ CLASS ’
Experience (E):
The input data should be of the form
N
X = {x t , r t }t=1 , where x = (x1 , x2 , ...xd ) ∈ R d .

x is a d-dimensional vector. r ∈ R. Here, the xi ’s are the input attributes. x is the


input variable. r is the output variable.
x t = (x1 t , x2 t , ...xdt ) is the t th -item in the input data. r t is the t th output in the data.
Task(T):
To find the relationship that involves x1 , x2 , ...xd and r , by proposing a hypothesis
with weights w0 , w1 , ...wd
We have to learn the weights in such a way that the error made in the learning is
minimum.
L Jeganathan Learning Paradigm - Classification
Classification - A Learning Paradigm
Technical Description of the Problems that are fit for Learning a ‘class’
Summary with an Exercise

Performance Measure:
Total number of misclassifications made by the learning (Classifier). Average of
the square of the errors (difference between the actual output and the predicted
output) made in each of the instance of E.

L Jeganathan Learning Paradigm - Classification


Classification - A Learning Paradigm
Technical Description of the Problems that are fit for Learning a ‘class’
Summary with an Exercise

E XPERIENCE (E), IN A GENERAL SENSE

S.no. x1 x2 x3 . . r
1 x1 1 x2 1 x3 1 . . r1
2 x1 2 x2 1 x3 2 . . r2
3 x1 3 x2 3 x3 3 . . r3
. . . . . . .
. . . . . . .
N x1 N x2 N x3 N . . rN

L Jeganathan Learning Paradigm - Classification


Classification - A Learning Paradigm
Technical Description of the Problems that are fit for Learning a ‘class’
Summary with an Exercise

S OME REAL - TIME PROBLEMS WHICH CAN USE THE ‘C LASSIFICATION ’


L EARNING

E-mail spam classification Problem : To classify an e-mail as spam or not.


Disease Diagnosis Problem : To classify a patient as affected by a disease or
not.

L Jeganathan Learning Paradigm - Classification


Classification - A Learning Paradigm
Technical Description of the Problems that are fit for Learning a ‘class’
Summary with an Exercise

S UMMARY

Thus, we have understood


The Process of Learning (in a general sense), called ‘Classification’. - yet to
learn the actual ‘learning’ (model which will compute the weights involved in
the hypothesis)
Technical description of the problems for which classification model can be
applied

L Jeganathan Learning Paradigm - Classification


Classification - A Learning Paradigm
Technical Description of the Problems that are fit for Learning a ‘class’
Summary with an Exercise

E XERCISE

1. Say True or False : In a classification problem with two features as the input
variables, Can we plot the data points in a two-dimensional plane?
2. Propose a new problem (not discussed by us) with a clear description of E
and T where classification based learning is feasible.
3. What is the main difference between the two learning paradigms:
Classification, Regression.
4. Performance measure of a classifier is defined as the total number of
misclassifications. Can we define the performance measure of a classifier as
the total number of correct classifications?

L Jeganathan Learning Paradigm - Classification

You might also like