Download as pdf or txt
Download as pdf or txt
You are on page 1of 9

____________________________________________________________________________________________________

Subject PSYCHOLOGY

Paper No and Title Paper No 2: Quantitative Methods

Module No and Title Module No.30: Discriminant analysis

Module Tag PSY_P2_M30

TABLE OF CONTENTS
1. Learning Outcomes
2. Introduction
3. Discriminant analysis
3.1 Discriminant functions variate
3.2 predictive power of discriminant functions
4. Types and assumptions of discriminant analysis
4.1 Types
4.2 assumptions
5. Using SPSS for discriminant analysis
6. Summary

PSYCHOLOGY Paper No 2: Quantitative Methods


Module No 30: Discriminant analysis
____________________________________________________________________________________________________

1. Learning Outcomes
After studying this module, you shall be able to

 Know about the concept of discriminant analysis


 Learn the types and assumptions related to the concept
 Know about the SPSS analysis for discriminant analysis

2. Introduction

Discriminant analysis

In previous few modules we have discussed the concepts related to regression and its application
to other multivariate techniques. One has understood that a dependent variable (criterion) can be
predicted if one has knowledge on independent variables (regressors). Of course both the
regressors and the criterion have to be continuous data. Sometimes the research study requires the
qualitative/ categorical dependent variables to be predicted, in such circumstances two regression
techniques are useful: logistic regression and discriminant functional analysis. The current
module focuses on the latter technique i.e. the discriminant functional analysis.

A researcher wishes to ascertain if the rise of hypertension among the middle aged men
can be ascribed to the smoking and drinking behaviors. A representative sample of
middle aged men was taken and the data was collected for hypertension (present/ absent)
and smoking and drinking behaviors. Is it possible to predict hypertension upon their data
for smoking and drinking?

3. Discriminant functional analysis (DFA)

3.1 The Concept of discriminant functions variate

The concept of discriminant functional analysis has been briefly introduced to the readers in the
modules related to multivariate analysis of variance. The MANOVA aims to compare the groups
on the levels of performance, whereas, the DFA focuses on prediction of category membership.
The DFA may be taken up as a follow up to MANOVA.

When there are more variables which significantly predict the criterion then the solution can be
identifying the underlying dimensions, these linear combinations of the dependent variables are
known as variates (latent variables or factors). These linear variates are then used to separate the
persons into categories or the groups (hypertension present or hypertension absent). Therefore
these variates are referred to as discriminant functions or discriminant functions variate. Several
discriminant functions can be extracted from a set of dependent variables. One would remember
that in multiple regression only one regression function is possible that includes all the predictor
variables).

PSYCHOLOGY Paper No 2: Quantitative Methods


Module No 30: Discriminant analysis
____________________________________________________________________________________________________

So, a discriminant function is where, the aim is to maximize the differences between the groups.
Given below are the equations 1,2 and 3 to discuss how variates can be arrived at.

Y = b0 + b 1X1 + b 2X2 ------------------------------------ eqn. (1)

V1 = b0 + b 1DV1 + b 2DV2 ------------------------------------ eqn. (2)

V1 = b0 + b1 smoking + b2 drinking ------------------------------------ eqn. (3)

The equation (1) is a simple regression equation with two predictor variables. The subsequent
equations2 and 3 can be the discriminant functions derived. The “b” values in the equation are the
weights and show the relative proportion of each dependent variable to the variate. The b value is
derived from the eigenvectors of the matrix (as this is a complex mathematical procedure
involving vector matrix the discussion is not possible here; readers may refer to multivariate
statistics books for further understanding). As the regression equation (1) has only two variables
only one variate function can be derived.

In order to generate the discriminant functions from the data a complex mathematical procedure
is followed: known as the maximization principle. This implies that the ratio of systematic to
unsystematic variance is maximum for the first discriminant function; of course, the subsequently
derived functions will have smaller value of this ratio. This systematic to unsystematic variance
ratio is similar to the F ratio of ANOVA. So when the first function is said to have a maximum
share of the variance ratio it also implies that it explains the maximum share of the F too.

Of course the real research data may not be so simple, one may have more than two dependent
variables and the categories may be more than two also. In such complex cases more than one
discriminant functional variate can be derived. The number of variates derived will be always less
than the number of variables (p, i.e. p -1); or less than the number of categories (k; i.e. k -1).

3.2 predictive power of discriminant functions

The aim of applying discriminant functional analysis to data is not to identify the significant
differences among the means. DFA emphasizes the prediction of category membership by
ascertaining the independent variables that significantly contribute to the discriminant function
variates. The two statistics that are useful in measuring the power of DF in their ability to
discriminate among the groups are:

3.2.1Canonical correlation:

A canonical correlation is a statistical technique applicable to a research situation in which one


group of variables can be viewed as consequent upon or causally influenced by the variables in
the second group, the former being viewed as dependent variables (DV) and the latter as

PSYCHOLOGY Paper No 2: Quantitative Methods


Module No 30: Discriminant analysis
____________________________________________________________________________________________________

independent variables (IV). A pair of linear functions, one for


the DV and the other for the IV is constructed so that the
correlation between them is maximized. The functions make a canonical variate pair and the
correlation between them is the canonical correlation.

The canonical correlation is the correlation between the discriminant function and scores on
πcoding variables defining group membership.

3.2.2 Eigenvalue

Eigen vectors are the coefficients of the discriminant function. Each discriminant function
extracted from the has an eigenvalue that is represented by λ (lambda) or (the characteristic root).

𝟏
Ʌ= π ------------------------- (eqn. 4)
(𝟏+ 𝛌)
The readers have read Wilk’s lambda in the module on MANOVA, lambda can be
expressed in terms of eigenvalues as shown in the equation (4). Here, π is pi which is the product
of d terms. Ʌ is the eigenvalue.

The eigenvalue gives the proportion of total variance that is accounted for by that particular
discriminant function. The eigenvalues are important as they consolidate the variance in the
vector matrix at the same time providing the statisticians with the linear combination of variables
that contribute in it. The smaller the value of lambda, the greater is the power of the discriminant
function to discriminate among the groups.

. 4. Types and assumptions of discriminant analysis

4.1 Types of discriminant analysis

The discriminant analysis can be classified as:

 Direct DFA
 Hierarchical DFA
 Stepwise DFA

These three types of DFA are exactly similar as they are in multiple regression. The direct DFA
is same as the simultaneous multiple regression. All the independent variables are entered
simultaneously in the equation and the ones that contribute to the functional variate the most are
retained while others get successively weeded out. If the researcher wishes to adopt the DFA as a
follow up to MANCOVA or MANOVA then the direct method of DFA is the most suitable. The
hierarchical method of DFA uses some theory or collateral evidence to enter the variables in the
equation. The researcher sets a schedule based on some theoretical inputs and on the basis of that
the variables find their place in the regression equation. The stepwise DFA is the most commonly

PSYCHOLOGY Paper No 2: Quantitative Methods


Module No 30: Discriminant analysis
____________________________________________________________________________________________________

used technique. The researcher uses some statistical criteria


and determines the order of entry of variables in the regression
equation. The statistical criteria chosen to enter or remove the variables from the DFA are :
Wilk’s lambda (Ʌ); if the lambda value is decreasing the DF is more effective in separating the
groups.

4.2 Assumptions

The DFA carries a number of restrictive assumptions, just like MANOVA and
MANCOVA. Some of the assumptions are:

 Multinormality: for applying the DFA to data it is imperative to assume that the
data are multivariate normal. This implies that for a given set of values for ‘p-1’
variables; the remaining set of variables is normally distributed. The technique
used in DFA is mathematically robust and are capable of withstanding slight
skewness in the data. However, to overcome this slight deviation from normalcy
the researcher must ensure that the sample sizes are sufficiently large. In smaller
sample size the deviations from multivariate normality are quite possible and one
of the serious consequences would be the occurrence of type I error.
 Homogeneity of variance- covariance matrices: the variance- covariance matrix
must have the property of spherecity i.e. the covariance must be uniformly
distributed across all the groups. This assumption gains all the more importance
when the sample sizes are unequal. The presence of outliers in the data poses a
serious potential threat to the authenticity of the results. It is best to eliminate the
extreme values (if possible) from the data set to bring in validity to the results. It
is always suggested that the Box’s test may be carried out and its results be
considered seriously. The analyst must keep in mind that the Box test is a very
sensitive one and provided that the sample sizes are equal and large; a significant
result can be ignored.
 Multicollinearity: the multicollinearity refers to the condition when there is a
high inter- correlation among the independent variables. It is very important that
the condition of multi-collinearity be avoided, no variable should be an exact
linear function of any of the other variables; a condition known as singularity.
 Scale of measurement: Mostly it is assumed that the independent variables used
are quantitative in nature, but occasionally one may include qualitative
independent variables (like; sex, marital status) too as in multiple regression.

5. Using SPSS for DFA


A school counselor is interested in knowing if the scholastic and non- scholastic subjects studied
at school level can help in predicting the choice of subjects at the college level. The subject
categories at the college level (psychology, architecture, and engineering) are the dependent
variable. The independent variables are the scholastic and non- scholastic subjects (total ten) in
the school taken by the students, measured by their school leaving examinations results. The
dependent variables are categorical and the independent variables are continuous in nature; the
data from 118 students are taken.

PSYCHOLOGY Paper No 2: Quantitative Methods


Module No 30: Discriminant analysis
____________________________________________________________________________________________________

5.1 Preparing the data view by entering the obtained data in the relevant rows and
columns.
5.2 Exploring the data: once the data set has been prepared, the next step is to identify;
if there are any violations to the assumptions of DFA in the data set. This can be
achieved by checking for the extreme scores and the outliers by using the explore
command.

 Dialog box> explore> plots> display


 Drag the predictor variables names to dependent list
 Drag all the independent variables to the factor list> ok

Study the resulting boxplots. If they are satisfactory continue, otherwise the
recommended corrective measures should be adopted. The readers are advised to refer to
the multivariate statistical texts for a deeper understanding of the corrective measures;
however, some of them have already been discussed in the introductory module on
multivariate methods of this paper also.

5.3 Running the discriminant analysis: once the data has been corrected for the
violations in assumptions, and the analyst finds the data suitable enough then; the
next step is to run the discriminant analysis.

 Choose Analyze> classify> discriminant functional analysis


 Transfer dependent variable name (in this case subject choice in college)
to the grouping variable> define range (in this case there are 3
categories so, type1 in minimum and 3 in maximum box)

On the upper right hand of the discriminant functional analysis dialog box one finds
buttons labeled as statistics… and classify… the analyst may use them to open further
useful outputs. Some of the chosen ones can be: descriptive analyses, ANOVA, Box’s M
etc. I

 Clicks on classify… the discriminant functional analysis: classification dialog


box would open;
 Click on method… > stepwise

The exhibit below shows the DFA dialog box for the grouping variables study subjects
(with three levels) and several independent variables, using the stepwise method.

 Click save… >


 Drag the cursor to independent variables and transfer them to independent box
 Ok > run discriminant functional analysis

PSYCHOLOGY Paper No 2: Quantitative Methods


Module No 30: Discriminant analysis
____________________________________________________________________________________________________

5.4 Output for discriminant analysis: the output of the Discriminant functional
analysis is shown in the left hand pane of the SPSS viewer. The exhibit shown
below also shows the lay out of the monitor screen showing the DFA output.

The output may contain many useful information too and therefore not all would be needed. The
group statistics, univariate ANOVA, Wilk’s lambda…. So, the analyst should pick the relevant
statistic with care and caution. The percent variance of each discriminant function (along with
their level of significance), Wilk’s lambda values, and canonical correlation are some of the
important ones.

PSYCHOLOGY Paper No 2: Quantitative Methods


Module No 30: Discriminant analysis
____________________________________________________________________________________________________

5.5 Predicting group membership: once the output has been searched for the relevant
statistics and the redundant information sidelined the analyst comes to the original
problem being investigated i.e. to know if the scholastic and non- scholastic subjects
studied at school level can help in predicting the choice of subjects at the college
level. To compare the actual and the predicted subject of study one takes up the
discriminant procedure.

PSYCHOLOGY Paper No 2: Quantitative Methods


Module No 30: Discriminant analysis
____________________________________________________________________________________________________

 Complete the DFA dialog box as in earlier step >


click Save…> click Predicted group
membership > click Continue > OK
 The predicted group member ship will appear in a new column labeled Dis_1in data
view, along with the prediction of all the other cases.

6. Summary
 A dependent variable (criterion) can be predicted if one has knowledge on independent
variables (regressors). Of course both the regressors and the criterion have to be
continuous data. Sometimes the research study requires the qualitative/ categorical
dependent variables to be predicted, in such circumstances two regression techniques are
useful: logistic regression and discriminant functional analysis.the present module
focused on the DFA.
 The analyst is advised to use DFA when there are more variables which significantly
predict the criterion then the solution can be identifying the underlying dimensions, these
linear combinations of the dependent variables are known as variates (latent variables or
factors). These linear variates are then used to separate the persons into categories or the
groups.
 In order to generate the discriminant functions from the data a complex mathematical
procedure is followed: known as the maximization principle. This implies that the ratio of
systematic to unsystematic variance is maximum for the first discriminant function; of
course, the subsequently derived functions will have smaller value of this ratio
 DFA emphasizes the prediction of category membership by ascertaining the independent
variables that significantly contribute to the discriminant function variates. The two
statistics that are useful in measuring the power of DF in their ability to discriminate
among the groups are: Canonical correlation and eigenvalue

 The discriminant analysis can be classified as: Direct DFA, Hierarchical DFA and
Stepwise DFA
 The DFA carries a number of restrictive assumptions, just like MANOVA and
MANCOVA. Some of the assumptions are: Multinormality, Homogeneity of
variance- covariance matrices, Multicollinearity and Scale of measurement used.

 The SPSS commands to apply the DFA have also been discussed in the text by taking an
example.

PSYCHOLOGY Paper No 2: Quantitative Methods


Module No 30: Discriminant analysis

You might also like