Download as pdf or txt
Download as pdf or txt
You are on page 1of 21

Subject: Statistics

Paper: Multivariate Analysis


Module: Discriminant Analysis and
Classification
1 / 21
Development Team

Principal investigator: Dr. Bhaswati Ganguli, Professor,


Department of Statistics, University of Calcutta
Paper co-ordinator: Dr. Sugata SenRoy, Professor, Department of
Statistics, University of Calcutta
Content writer: Souvik Bandyopadhyay, Senior Lecturer, Indian
Institute of Public Health, Hyderabad
Content reviewer: Dr. Kalyan Das, Professor, Department of
Statistics, University of Calcutta

2 / 21
Discriminant Analysis

Discriminant analysis is the technique of separating distinct


observations into well-defined groups or clusters.

I Its primary difference with cluster analysis is that, unlike the


latter, the characteristics of the groups are known to a certain
degree.
I The problem is more of allocating the individuals into
specified groups and not of defining the groups themselves.
I Hence whereas cluster analysis is exploratory by nature with
the clusters formed without any prior information regarding
their nature, discriminant analysis is based more on the known
distinctive features of the groups.

3 / 21
Classification

Classification is the problem of assigning new observations to one


or the other of the groups or clusters.

I Thus while discriminant analysis separates the observations


into specified groups, classification allocates individual
observations into these groups.
I The two problems are intrinsically related and the distinction
between the two is often blurred. To quote from Johnson and
Wichern (2002)
”A function that separates objects may sometimes serve as an
allocator, and, conversely, a rule that allocates objects may
suggest a discriminatory procedure.”

4 / 21
The distinction

I Let x = (x1 , x2 , ., xm )0 be the vector of the m characteristics


under study.
I Let there be n individuals in the study.
I These individuals belong to one of several groups G1 , G2 , ...,
Gr .
I The first problem is to find functions based on the parameters
of the r groups which discriminates between the groups and
hence separates the individuals into them.
I The next problem is to find a rule which classifies a new
individual to one of the r groups.

5 / 21
Two groups

I To begin with, let us consider r = 2 i.e. we have 2 groups G1


and G2 .
I Let f1 (x) and f2 (x) be the two probability density functions
that characterize the groups G1 and G2 respectively.
I Also let the probability of an individual belonging to G1 be p1
and his belonging to G2 be p2 .
p1 and p2 are known as prior probabilities.
I Let Ω be the sample space of x.

6 / 21
Misclassification

I Subdivide Ω into two mutually exclusive and exhaustive


subsets R1 and R2 = Ω − R1 such that
I if x ∈ R1 we assign x to G1
I and if x ∈ R2 we assign x to G2 .
I So every individual x is assigned to one and only one of the
two groups.

However, the split in Ω may be such that there are individuals who
actually come from G2 but are in R1 and hence classified in G1 ,
and vice versa. These are known as misclassifications.

Thus the aim is to find a good discriminator or separator of Ω such


that the probability of misclassification is minimized.

7 / 21
Conditional probability of misclassification

I Let P (j|k) denote the conditional probability that an


individual coming from Gk is classified in Gj .
I Then the conditional probability of misclassifying an individual
from G1 as coming from G2 is
Z
P (2|1) = P (x ∈ R2 |G1 ) = f1 (x)dx. (1)
R2

I Similarly,the conditional probability of misclassifying an


individual from G2 as coming from G1 is
Z
P (1|2) = P (x ∈ R1 |G2 ) = f2 (x)dx. (2)
R1

8 / 21
Unconditional probability of misclassification

The unconditional probabilities of correctly classifying an individual


from G1 in G1 and from G2 in G2 are respectively

P (correctly classified in G1 ) = P (G1 )P (x ∈ R1 |G1 ) = p1 P (1|1)


P (correctly classified in G2 ) = P (G2 )P (x ∈ R2 |G2 ) = p2 P (2|2),

while the unconditional probabilities of misclassifying an individual


from G1 in G2 and from G2 in G1 are respectively

P (misclassified as G1 ) = P (G2 )P (x ∈ R1 |G2 ) = p2 P (1|2) (3)


P (misclassified as G2 ) = P (G1 )P (x ∈ R2 |G1 ) = p1 P (2|1). (4)

9 / 21
Classification Rule

A classification rule may now be developed by minimizing the


misclassification probabilities (3) and (4).

However, very often the cost of misclassifications are not always


the same.
Example
When classifying individuals as healthy or ailing based on
pathological reports, the cost of misclassifying an ailing person as
healthy is always much greater than the cost of misclassifying a
healthy person as ailing.

Hence in deciding on the classification rule the cost needs to be


accounted for.

10 / 21
Cost of misclassification

I Let C(j|k) denote the cost of misclassifying an individual


from k th group into the j th group.
I Then the expected cost of misclassification is

ECM = p1 P (2|1)C(2|1) + p2 P (1|2)C(1|2) (5)

I The classification rule can then be obtained by minimizing the


ECM.

11 / 21
The Rule

Result 1
The subsets R1 and R2 that minimizes the ECM are as follows :
f1 (x) p2 C(1|2)
R1 : ≥ (6)
f2 (x) p1 C(2|1)
f1 (x) p2 C(1|2)
and R2 : < (7)
f2 (x) p1 C(2|1)

12 / 21
Proof

Since, R1 ∪ R2 = Ω and R1 ∩ R2 = φ, for j = 1 and 2,


Z Z Z
fj (x)dx + fj (x)dx = fj (x)dx = 1.
R1 R2 Ω

so that
Z Z
ECM = p1 C(2|1) f1 (x)dx + p2 C(1|2) f2 (x)dx
R2 R1
Z Z
= p1 C(2|1)[1 − f1 (x)dx] + p2 C(1|2) f2 (x)dx
R1 R1
Z
= p1 C(2|1) + [p2 C(1|2)f2 (x) − p1 C(2|1)f1 (x)]dx (8)
R1

13 / 21
Proof (contd.)

Now the prior probabilities p1 , p2 and the costs C(2|1), C(1|2)


are all nonnegative. Also f1 (x) and f2 (x) are nonnegative for all
x. Hence ECM will be minimized if the quantity in the third
bracket under the integral sign of (8) is negative for all x ∈ R1 , i.e.

R1 : p2 C(1|2)f2 (x) − p1 C(2|1)f1 (x) ≤ 0.

By similar logic, R2 should include all x such that

p2 C(1|2)f2 (x) − p1 C(2|1)f1 (x) > 0.

Of course, for equality the classification could be either way, but to


avoid ambiguity it is arbitrarily associated with any one of the two
subspaces. The result thus follows. ∇

14 / 21
Corollaries

The classification rule is thus primarily based on


I the density ratio f1 (x)/f2 (x)
I the prior probability ratio p2 /p1
I and the cost ratio C(1|2)/C(2|1).

Corollary 1
If the misclassification costs are equal i.e. C(1|2) = C(2|1),

f1 (x) p2 f1 (x) p2
R1 : ≥ and R2 : < (9)
f2 (x) p1 f2 (x) p1

15 / 21
Corollaries (contd.)

Corollary 2
If the prior probabilities are equal i.e. p1 = p2 ,

f1 (x) C(1|2) f1 (x) C(1|2)


R1 : ≥ and R2 : < (10)
f2 (x) C(2|1) f2 (x) C(2|1)

Corollary 3
If both the misclassification costs and the prior probabilities are
equal i.e. C(1|2) = C(2|1) and p1 = p2 ,

f1 (x) f1 (x)
R1 : ≥1 and R2 : <1 (11)
f2 (x) f2 (x)

16 / 21
The TPM

A criterion alternative to the ECM is the Total probability of


misclassification (TPM). An optimal classification is then obtained
by minimizing
Z Z
T P M = p1 f1 (x)dx + p2 f2 (x)dx (12)
R2 R1

Classification Rule

f1 (x) p2 f1 (x) p2
R1 : ≥ and R2 : < (13)
f2 (x) p1 f2 (x) p1

This is readily seen to be the same as Corollary 1, where the two


misclassification costs are assumed equal.
17 / 21
Alternative Rule

Another alternative is to allocate an observation to a group based


on the largest posterior probability, P (Gi |x), i.e. the probability of
belonging to group i given x, i = 1, 2.

By Bayes’ rule

p1 f1 (x)
P (G1 |x) =
p1 f1 (x) + p2 f2 (x)
p2 f2 (x)
and P (G2 |x) =
p1 f1 (x) + p2 f2 (x)

18 / 21
Alternative Rule

The classificton rule is then

Allocate to G1 if P (G1 |x) ≥ P (G2 |x)


and to G2 if P (G1 |x) < P (G2 |x).

I But since the denominators of the two posterior probabilities


are the same, the rule reduces to Corollary 1.
I However, the computations of the posterior probabilities are
by themselves often of interest.

19 / 21
Summary

I Distinction between Discriminant Analysis and Classification is


made.
I Clasification rules based on misclassification probabilities are
discussed.
I Rules based on the Expected Cost of misclassification are
described.

20 / 21
Thank You

21 / 21

You might also like