Download as pdf or txt
Download as pdf or txt
You are on page 1of 13

Discriminant Analysis

A quick revision of techniques and related
Methodology Purpose Independent variable Dependent Variable
T-tests Group difference between two groups Categorical Metric
(Nominal) (Interval or ratio)
ANOVA Group difference between more than Categorical Metric
two groups (Nominal) (Interval or ratio)
Regression Variation in a variable explained by a Metric Metric
set of other variables (Except dummy

Discriminant Variation in a categorical variable, Metric Categorical/Binary

analysis because of a set of other variables (Interval or ratio) (Nominal)
Discriminant analysis
◦ Predicts group membership - Classifies individuals/objects into one of the alternative
groups on the basis of a set of predictor variables
◦ Dependent–categorical, independent/predictor variable-either interval or ratio scale in
◦ E.g.
◦ Visited/Did not visit resort (dependent variable-with two categories)
◦ Income, attitude towards travel, importance attached to vacation etc(independent variables)
◦ Types-
◦ Two-group discriminant analysis - when there are two groups (categories) of dependent
◦ Multiple discriminant analysis - when there are more than two groups
The Discriminant Analysis Model
Mathematical form of discriminant function- –
D = Discriminant score
bs = Coefficients of independent variables (or discriminant coefficient)
Xs = Predictor or independent variables (continuous variable)

Principle for estimating bs – Maximize (Between-group sum of squares/within-group

sum of squares)
Discriminant function – unstandardized (with constant), and standardized (without
Estimation and validation sample
Cross validation:
◦ Estimation (Analysis) sample – a sample to estimate the model
◦ Validation (hold-out) sample – a sample to test the model based on estimated value in the
estimation sample

Double cross-validation
◦ Cross validation
◦ Then, exchanging role of estimation and validation sample, and conducting the
validation again
Key Terms and Steps Used in Discriminant Analysis
◦ Assessing basics - descriptive, correlation (for multi-collinearity), F-ratio significance (individually income,
vacation, and hsize significant)
◦ Eigen value -Ratio of between group variance to within group variance (higher the better) of a function
◦ Canonical correlation – Simple correlation coefficient between discriminant score and corresponding group
membership (visited/not visited)
◦ Square of canonical correlation reflects the % of variance (64% in this case) of dependent variable is
explained by independent variable i.e. model

◦ Wilk’s Lambda - H0: Means of all discriminant functions in all groups are equal
◦ For each predictor, Ratio of within group sum of squares to total sum of squares (lower the better)
◦ Takes value between 0 to 1 ; Smaller value indicates group means seem to be different, and larger value
indicate group mean may not be different
Key Terms and Steps Used in Discriminant Analysis
Unstandardized coefficients -> Canonical discriminant function coefficient (in table)
◦ Is interpreted the same way as regression coefficient, gives discriminant score
Standardized coefficients -> Standardized discriminant function coefficients
◦ Indicate the relative contribution of the variables in discriminating between the two groups
◦ There is no constant
Structure matrix -> Correlation between discriminant score and each of the predictor variable
◦ Arranged in descending order; represents the variance that predictor shares with discriminant function
Group centroid -> Mean discriminant score of a group (visited/not visited)
Developing characteristic profile -> based on the mean value of significant variables
Key Terms and Steps Used in Discriminant Analysis
Classification of cases using discriminant function
◦ Cut-off score for classification-
◦ When equal sample size- Average of two groups of centroids {i.e. (-1.219+
1.219)/2 = 0} is taken as criteria
◦ When unequal sample size
Assessing classification accuracy
Classification Matrix:
◦ Hit ratio
◦ Hit ratio = No of correct predictions/Total number of cases

◦ Cross validation- Leave one out classification

◦ One observation is left and discriminant function is estimated based on rest of the observations
◦ Based on model excluded case is predicted to belong to one of the groups
◦ The same process is repeated for all the observations and the hit ratio on the same is calculated

◦ Out of sample performance (Validation sample)

Assessing classification accuracy
% correctly classified vs. % by chance
◦ % of chance classification= 1/(no. of groups); i.e. for two groups- classification by chance would be 50%
◦ Recommended is to have above 25% more accuracy than chance classification
In-Class Exercise
Household resort visit
▪ Visit-
▪ Visited a resort in the last two years (1)
▪ Did not visit a resort in the last two years (2)
▪ Income- Annual family income in 000 dollars
▪ Travel- Attitude towards travel (9 point scale)
▪ Vacation – Importance attached to family vacation (9 point scale)
▪ Hsize- Household size
▪ Age- Age of the head of the household

▪ The data is divided into estimation sample (size-30), and validation sample (size-12) , with equal
representation of Visit/Did not visit resort
Take Home Exercise
Do a two group discriminant analysis with the loyalty data set (shared in the drive)
a. Analyze and interpret the results
b. Remove insignificant independent variable/s from the model, rerun the analysis and
interpret the results
c. Keep first 20 data set as estimation sample, and the rest 10 as validation sample. Run
analysis and interpret the results

◦ Description of the loyalty data set

◦ Brand Loyalty- Loyal (1), not loyal (2)
◦ Independent variables measured on a scale of 1 to 7, 7 being most favourable
◦ Brand- attitude towards the brand
◦ Product- attitude towards product category
◦ Shopping- attitude towards shopping
Thank You

You might also like