Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 19

[Date] Multivariate Data

Analysis Project
[Document subtitle]

MOHSIN ALI RAZA, 12918


NATIONAL UNIVERSITY OF MODREN
LANGUAGES, ISLAMABAD
Table of Contents
What is Data? ......................................................................................................................................... 2
Types of data: ..................................................................................................................................... 2
The Research Process: ............................................................................................................................ 2
Data Analysis Process:............................................................................................................................ 2
Data Screening: ...................................................................................................................................... 3
Factor Analysis: ...................................................................................................................................... 3
Assumptions of factor analysis: ............................................................................................................. 3
Exploratory Factor Analysis: .................................................................................................................. 3
Objectives of EFA:............................................................................................................................... 4
Five Steps of Exploratory Factor Analysis:......................................................................................... 4
KMO and Bartlett’s Test: .................................................................................................................... 4
Scree Plot Test: ................................................................................................................................... 5
Rotated Component Matrix: .............................................................................................................. 6
Interpretation of EFA: ........................................................................................................................ 7
Multiple Linear Regression: ................................................................................................................... 7
Formula of Regression: ...................................................................................................................... 8
Assumptions of MLA: ......................................................................................................................... 8
Interpretation: .................................................................................................................................... 8
Multiple Discriminant Analysis: ........................................................................................................... 10
Linear Equation of MDA: .................................................................................................................. 11
Assumption of MDA: ........................................................................................................................ 11
Results of MDA: ................................................................................................................................ 11
Interpretation of MDA: .................................................................................................................... 18
What is Data?
Data collection is a systematic process of collecting detail information about desire
objective from selected sample under controlled settings.

Types of data:
There are three major types of data are given below.
1. Time series data.

A time series is sequence of data points being recorded at specific time.


2. Cross sectional data.

Cross-sectional data are data that are collected from participants at one point in
time. Time is not considered one of the study variables in a cross-sectional research
design. However, it is worth noting that in a cross-sectional study, all participants do
not provide data at one exact moment. Even in one session, a participant will
complete the questionnaire over some duration of time. Nonetheless, cross-sectional
data are usually collected from respondents making up the sample within a relatively
short time frame (field period). In a cross-sectional study, time is assumed to have
random effect that produces only variance, not bias. In contrast, time series
data or longitudinal data refers to data collected by following an individual
respondent over a course of time.
3. Longitudinal data.

Longitudinal data sometimes referred to as panel data track the same sample at
different points in time. The sample can consist of individuals, households,
establishments, and so on. In contrast, repeated cross-sectional data, which also
provides long-term data, gives the same survey to different samples over time.

The Research Process:

Data Analysis Process:


1. Screening of data
a) Variable
b) Case Screening
 Missing value analysis
 Ungagged response
 Outliers
2. Validity and Reliability
3. Variable Compute
4. Feel of data
a) Graphs
b) Descriptive Analysis
 Measure of central tendency
 Measure of dispersion
5. Relationship among variables
6. Application of main technique

Data Screening:

Factor Analysis:
Factor Analysis is a method for identifying a structure (or factors, or dimensions) that
underlies the relations among a set of observed variables. It is a technique that
transforms the correlations among a set of observed variables into smaller number of
underlying factors, which contain all the essential information about the linear
interrelationships among the original test scores.

Assumptions of factor analysis:


 Data Matrix must have sufficient number of correlations
 Variables must be inter-related in some way since factor analysis seeks the
underlying common dimensions among the variables. If the variables are not
related each variable will be its own factor!!
 Rule of thumb: substantial number of correlations greater than .30
 Metric variables are assumed, although dummy variables may be used (coded
0,1).
 The factors or unobserved variables are assumed to be independent of one
another. All variables in a factor analysis must consist of at least an ordinal
scale. Nominal data are not appropriate for factor analysis.

Exploratory Factor Analysis:


Exploratory Factor Analysis is concerned with how many factors are necessary to
explain the relations among a set of indicators and with estimation of factor loadings.
It is associated with theory development.
In EFA, the researcher is attempting to explore the relationships among items to
determine if the items can be grouped into a smaller number of underlying factors.
In this analysis, all items are assumed to be related to all factors.

Objectives of EFA:

 Examine the structure or relationship between variables.


 Detection and assessment of unidimensional of a theoretical construct.
 Evaluates the construct validity of a scale, test, or instrument.
 Reduce the number of variables
 Development of parsimonious (simple) analysis and interpretation.
 Addresses multicollinearity(two or more variables that are correlated)
 Used to develop theoretical constructs.
 Used to prove/disprove proposed theories.

Five Steps of Exploratory Factor Analysis:


 Is the data suitable for factor analysis?
 How the factors will be extracted?
 What criteria will assist in determining factor extraction?
 Selection of rotation of method.
 Interpretation and Labeling.

KMO and Bartlett’s Test:


KMO and Bartlett's Test

Kaiser-Meyer-Olkin Measure of Sampling Adequacy. .933


Approx. Chi-Square 12478.574

Bartlett's Test of Sphericity df 903

Sig. .000

The Kaiser–Meyer–Olkin measure verified the sampling adequacy for the analysis
KMO = .933(‘superb’ according to Field, 2009), and all KMO values for individual
items were > 80, which is well above the acceptable limit of .5 (Field, 2009). Bartlett’s
test of sphericity χ² (903) = 12478.574, p < .001, indicated that correlations between
items were sufficiently large for PCA.
Total Variance Explained
Extraction Sums of Squared Rotation Sums of Squared
Initial Eigenvalues Loadings Loadings
% of Cumulative % of Cumulative % of Cumulative
Component Total Variance % Total Variance % Total Variance %
1 14.368 33.413 33.413 14.368 33.413 33.413 5.481 12.747 12.747
2 4.540 10.559 43.972 4.540 10.559 43.972 5.201 12.096 24.843
3 3.351 7.793 51.765 3.351 7.793 51.765 5.078 11.809 36.652
4 2.640 6.140 57.906 2.640 6.140 57.906 4.731 11.002 47.655
5 2.033 4.727 62.633 2.033 4.727 62.633 4.015 9.337 56.991
6 1.871 4.350 66.983 1.871 4.350 66.983 2.776 6.457 63.448
7 1.110 2.582 69.565 1.110 2.582 69.565 2.630 6.117 69.565

An initial analysis was run to obtain eigenvalues for each component in the data.
Seven components had eigenvalues over Kaiser’s criterion of 1 and in combination
explained 69.565% of the variance.

Scree Plot Test:


The scree plot was slightly ambiguous and showed inflexions that would justify
retaining both components 5 and 7. Given the large sample size, and the
convergence of the scree plot and Kaiser’s criterion on seven components, this is the
number of components that were retained in the final analysis.

Rotated Component Matrix:


Rotated Component Matrixa

Component

1 2 3 4 5 6 7

Useful_4 .846
Useful_5 .837
Useful_3 .831
Useful_1 .810
Useful_2 .796
Useful_6 .770
Useful_7 .712
Joy_7 .829
Joy_2 .818
Joy_4 .812
Joy_5 .801
Joy_3 .786
Joy_6 .778
Joy_1 .688
Playful_6 .831
Playful_2 .828
Playful_5 .812
Playful_3 .755
Playful_4 .707
Playful_7 .662
Playful_1 .658
DecQual_3 .812
DecQual_4 .803
DecQual_5 .777
DecQual_2 .761
DecQual_7 .668
DecQual_8 .605
DecQual_6 .540
AtypUse_3 .884
AtypUse_5 .869
AtypUse_4 .865
AtypUse_2 .848
AtypUse_1 .796
CompLatent_4 .777
CompLatent_3 .775
CompLatent_1 .771
CompLatent_5 .723
InfoAcq_1 .703
InfoAcq_2 .689
InfoAcq_3 .668
InfoAcq_5 .483

Extraction Method: Principal Component Analysis.


Rotation Method: Varimax with Kaiser Normalization.
a. Rotation converged in 6 iterations.

Table rotated component matrix shows the factor loadings after rotation. The items
that cluster on the same components suggest that component 1 represents a Useful,
component 2 a joy, component 3 a playful, component 4 decision quality, component
5 a atypuse component 6 a complatent and component 7 a infoacq concerns.in this
table we can see that there is no cross loading on the same component so this table
show the convergent validity.

Interpretation of EFA:
An exploratory factor analysis was conducted by using principal component analysis
(PCA) on the 43 items with orthogonal rotation (varimax). The Kaiser–Meyer–Olkin
measure verified the sampling adequacy for the analysis, KMO = .933 (‘superb’
according to Field, 2009), and all KMO values for individual items were > .80, which
is well above the acceptable limit of .5 (Field, 2009). Bartlett’s test of sphericity χ²
(903) = 12478.574, p < .001, indicated that correlations between items were
sufficiently large for PCA. An initial analysis was run to obtain eigenvalues for each
component in the data. Seven components had eigenvalues over Kaiser’s criterion of
1 and in combination explained 69.565% of the variance. The scree plot was slightly
ambiguous and showed inflexions that would justify retaining both components 5 and
7. Given the large sample size, and the convergence of the scree plot and Kaiser’s
criterion on seven components, this is the number of components that were retained
in the final analysis. Table rotated component matrix shows the factor loadings after
rotation. The items that cluster on the same components suggest that component 1
represents a Useful, component 2 a joy, component 3 a playful, component 4
decision quality, component 5 a atypuse component 6 a complatent and component
7 a infoacq concerns
These all results clearly indicate that, we obtained the construct validity, in construct
validity, convergent validity can be seen when we see rotated

Multiple Linear Regression:


Multiple linear regression (MLR), also known simply as multiple regression, is a
statistical technique that uses several explanatory variables to predict the outcome of
a response variable. The goal of multiple linear regressions (MLR) is to model the
linear relationship between the explanatory (independent) variables and response
(dependent) variable.
Formula of Regression:

Yi = β0+β1xi1+β2xi2+...+βpxip+ϵ

where, for i=n observations:

yi = dependent variable

xi = expanatory variables

β0=yintercept (constant term)

βp = slope coefficients for each explanatory variable

ϵ=the model’s error term (also known as the residuals)

Assumptions of MLA:
 There is a linear relationship between the dependent variables and the
independent variables.
 X should be variable
 X should be non-stochastic and fixed in repeated sample.
 There should be no relationship between independent variable ( if there is
relation exist then called it multicollinearity).
 The spread of residual (error term) should be constant called
homoscedasticity Vice versa heteroscedasticity.
 There should be no outlier in the residual.

Interpretation:
Model Summaryb
Mod R R Adjusted R Std. Error Durbin-
el Square Square of the Watson
Estimate
a
1 .892 .796 .792 .26862 2.058
a. Predictors: (Constant), use, aty, play, comp, info, joy
b. Dependent Variable: dec
R=represents the simple correlation and is 0.892, which indicates a high degree of
correlation. Greater value of R is good
R SQUARE = model productivity (.796) means 79% of variation in Dependent
variable which is decision quality explained by independent variables (use, aty, play,
comp, info, joy ) other variation is not explained. R SQUARE 79% which is very
higher. The regression equation appears to be very useful for making predictions
since the value of R square is close to 1.

ANOVAa
Model Sum of df Mean F Sig.
Squares Square
Regressio 104.791 6 17.465 242.05 .000b
n 2
1
Residual 26.914 373 .072
Total 131.705 379
a. Dependent Variable: dec
b. Predictors: (Constant), use, aty, play, comp, info, joy

This table indicates that the regression model predicts the dependent variable
significantly well. Regression which is (104.791) and probability p <0.000 (This
indicates the statistical significance of the regression model that was run) Here, p <
which is less than 0.05, and indicates that, overall, the regression model statistically
significantly predicts the outcome variable (i.e., it is a good fit for the data).

Coefficients
Model Unstandardized Standardize t Sig. Collinearity
Coefficients d Statistics
Coefficient
s
B Std. Error Beta Toleranc VIF
e
(Consta -.058 .112 -.515 .607
nt)
comp -.079 .027 -.085 -2.968 .003 .667 1.500
info 1.121 .045 .820 25.055 .000 .512 1.953
1
aty .059 .023 .072 2.587 .010 .708 1.412
play -.036 .025 -.046 -1.424 .155 .525 1.905
joy .006 .029 .006 .191 .849 .511 1.957
use .138 .030 .144 4.564 .000 .550 1.819
a. Dependent Variable: dec
1. For comp total the unstandardized partial slope (-0.79) and standardized
partial slope (-0.045) are statistically significantly different from 0 (t= -2.968 ,
p < 0.003); with every increase or decrease one point in comp (IDV), the
decision quality(DV ) will increase or decrease by approximately 1/100 of one
point when controlling for info.
2. For info total the unstandardized partial slope (1.121) and standardized partial
slope (-0.085) are statistically significantly different from 0 (t= 25.055, p <
0.000); with every increase or decrease one point in info (IDV), the decision
quality(DV ) will increase or decrease by approximately 1/100 .
3. For aty total the unstandardized partial slope (0.59) and standardized partial
slope (0.072) are statistically no significantly different from 0 (t= 2.587 , p
=0.010); with every increase or decrease one point in aty (IDV), the decision
quality(DV ) will increase or decrease by approximately 1/100 of one point
when controlling for
4. For play total the unstandardized partial slope (-0.36) and standardized
partial slope (-0.046) are statistically no significantly different from 0 (t= -1.424
,p>0.005); with every increase or decrease one point in play (IDV), the
decision quality(DV ) will increase or decrease by approximately 1/100 .

5. For joy total the unstandardzied partial slope (0.006) and standardized partial
slope (0.006) are statistically no significantly different from 0 (t= 0.191,
p>0.005 p=.191); with every increase or decrease one point in joy (IDV), the
decision quality(DV ) will increase or decrease by approximately 1/100 .

6. For use total the unstandardized partial slope (0.138) and standardized partial
slope (0.144) are statistically not significantly different from 0 (t= 4.564
,p>0.005 p=0.849); with every increase or decrease one point in play (IDV),
the decision quality(DV ) will increase or decrease by approximately 1/100 .

7. Variance inflation factor (VIF) should always be less than 5, Tolerance should
b higher than 10 (t > 0.1) both shows collinearity statistics. there are no
apparent multicollinearity problems.

Multiple Discriminant Analysis:


Discriminant Function Analysis (DA) undertakes the same task as multiple linear
regression by predicting an outcome. However, multiple linear regression is limited to
cases where the dependent variable on the Y axis is an interval variable so that the
combination of predictors will, through the regression equation, produce estimated
mean population numerical Y values for given values of weighted combinations of X
values. But many interesting variables are categorical, such as political party voting
intention, migrant/non-migrant status, making a profi t or not, holding a particular
credit card, owning, renting or paying a mortgage for a house,
employed/unemployed, satisfi ed versus dissatisfi ed employees, which customers
are likely to buy a product or not buy, what distinguishes Stellar Bean clients from
Gloria Beans clients, whether a person is a credit risk or not, etc.

Linear Equation of MDA:


DA involves the determination of a linear equation like regression that will predict
which group the case belongs to. The form of the equation or function is:
𝐷 = 𝑣1𝑋1 + 𝑣2𝑋2 + 𝑣3𝑉3 … … … 𝑣𝑖𝑋𝑖 + 𝑎
Where D = discriminant function
v = the discriminant coefficient or weight for that variable
X = respondents score for that variable
. a = a constant
I = the number of predictor variable

Assumption of MDA:
 the observations are a random sample;
 each predictor variable is normally distributed;
 each of the allocations for the dependent categories in the initial
classification are correctly classified;
 there must be at least two groups or categories, with each case belonging to
only one group so that the groups are mutually exclusive and collectively
exhaustive (all cases can be placed in a group);
 for instance, three groups taking three available levels of amounts of
housing loan.

Results of MDA:
Group Statistics

1=COMPLETED PHD, 2=DID NOT COMPLETE PHD Mean Std. Deviation Valid N (listwise)

Unweighted Weighted

1=FEMALE 2=MALE 1.2400 .43589 25 25.000

OVERALL COLLEGE GPA 3.6296 .17676 25 25.000


FINISH MAJOR AREA GPA 3.8216 .15135 25 25.000

GRE SCORE ON 655.6000 74.94887 25 25.000


SPECIALITY EXAM
GRE SCORE ON 724.0000 46.09772 25 25.000
QUANTATIVE

GRE SCORE ON VERBAL 643.2000 73.52551 25 25.000

FIRST LETTER OF 7.7200 1.10000 25 25.000


RECOMMENDATION

SECOND LETTER OF 7.6400 1.11355 25 25.000


RECOMMENDATION

THIRD LETTER OF 7.9600 .88882 25 25.000


RECOMMENDATION

STUDENTS MOTIVATION 8.3600 .81035 25 25.000

STUDENTS EMOTIONAL 6.4000 1.82574 25 25.000


STABILITY

FINAICIAL/PERSONAL 5.9200 1.60520 25 25.000


RESOURCES TO
COMPLETE
MARITAL STATUS, 1.6400 .48990 25 25.000
1=MARRIED 2=SINGLE

AGE IN YEARS AT ENTRY 29.9600 5.49606 25 25.000

ABILITY TO INTERACT 7.0000 1.29099 25 25.000


EASILY

RATING OF STUDENT 2.1200 .83267 25 25.000


HOSTILITY

MEAN RATING OF 7.2800 1.13725 25 25.000


SELECTORS IMPRESSION
OF APPLICANT
1=FEMALE 2=MALE 1.4800 .50990 25 25.000
OVERALL COLLEGE GPA 3.3904 .29072 25 25.000
MAJOR AREA GPA 3.7340 .23249 25 25.000
GRE SCORE ON 648.8000 67.90434 25 25.000
SPECIALITY EXAM
GRE SCORE ON 646.8000 55.88083 25 25.000
QUANTATIVE
NOT FINISH GRE SCORE ON VERBAL 620.0000 71.29750 25 25.000
FIRST LETTER OF 6.1600 1.06771 25 25.000
RECOMMENDATION
SECOND LETTER OF 6.3600 1.18603 25 25.000
RECOMMENDATION
THIRD LETTER OF 6.1600 1.06771 25 25.000
RECOMMENDATION
STUDENTS MOTIVATION 7.2800 .79162 25 25.000
STUDENTS EMOTIONAL 6.3600 1.55134 25 25.000
STABILITY
FINAICIAL/PERSONAL 5.6400 1.80000 25 25.000
RESOURCES TO
COMPLETE
MARITAL STATUS, 1.5600 .50662 25 25.000
1=MARRIED 2=SINGLE
AGE IN YEARS AT ENTRY 25.1200 3.17962 25 25.000
ABILITY TO INTERACT 6.1600 1.34412 25 25.000
EASILY
RATING OF STUDENT 3.0800 1.03763 25 25.000
HOSTILITY
MEAN RATING OF 6.8800 1.26886 25 25.000
SELECTORS IMPRESSION
OF APPLICANT
1=FEMALE 2=MALE 1.3600 .48487 50 50.000
OVERALL COLLEGE GPA 3.5100 .26702 50 50.000

MAJOR AREA GPA 3.7778 .19912 50 50.000

GRE SCORE ON 652.2000 70.86319 50 50.000


SPECIALITY EXAM

GRE SCORE ON 685.4000 63.95821 50 50.000


QUANTATIVE

GRE SCORE ON VERBAL 631.6000 72.62877 50 50.000

FIRST LETTER OF 6.9400 1.33110 50 50.000


RECOMMENDATION

SECOND LETTER OF 7.0000 1.30931 50 50.000


RECOMMENDATION
Total THIRD LETTER OF 7.0600 1.33110 50 50.000
RECOMMENDATION

STUDENTS MOTIVATION 7.8200 .96235 50 50.000

STUDENTS EMOTIONAL 6.3800 1.67685 50 50.000


STABILITY

FINAICIAL/PERSONAL 5.7800 1.69381 50 50.000


RESOURCES TO
COMPLETE

MARITAL STATUS, 1.6000 .49487 50 50.000


1=MARRIED 2=SINGLE

AGE IN YEARS AT ENTRY 27.5400 5.07177 50 50.000

ABILITY TO INTERACT 6.5800 1.37158 50 50.000


EASILY
RATING OF STUDENT 2.6000 1.04978 50 50.000
HOSTILITY

MEAN RATING OF 7.0800 1.20949 50 50.000


SELECTORS IMPRESSION
OF APPLICANT

Here we alloted 1 to the Females students and 2 to the Male students.


According to results, we can see that the Mean value of males and female who finish
their degree are 1.24 its mean that the number of females students are high as
compare to male students.
On the other hand Mean value of male and female who do not finish their degree are
1.48 its mean that the number of males and female are almost equal who do not
finish their degrees.

Tests of Equality of Group Means

Wilks' Lambda F df1 df2 Sig.

1=FEMALE 2=MALE .938 3.200 1 48 .080


OVERALL COLLEGE GPA .795 12.356 1 48 .001
MAJOR AREA GPA .951 2.493 1 48 .121
GRE SCORE ON .998 .113 1 48 .738
SPECIALITY EXAM
GRE SCORE ON .628 28.393 1 48 .000
QUANTATIVE
GRE SCORE ON VERBAL .974 1.283 1 48 .263
FIRST LETTER OF .650 25.889 1 48 .000
RECOMMENDATION
SECOND LETTER OF .756 15.476 1 48 .000
RECOMMENDATION
THIRD LETTER OF .534 41.969 1 48 .000
RECOMMENDATION
STUDENTS MOTIVATION .679 22.722 1 48 .000
STUDENTS EMOTIONAL 1.000 .007 1 48 .934
STABILITY
FINAICIAL/PERSONAL .993 .337 1 48 .564
RESOURCES TO
COMPLETE
MARITAL STATUS, .993 .322 1 48 .573
1=MARRIED 2=SINGLE
AGE IN YEARS AT ENTRY .768 14.526 1 48 .000
ABILITY TO INTERACT .904 5.079 1 48 .029
EASILY
RATING OF STUDENT .787 13.017 1 48 .001
HOSTILITY
MEAN RATING OF .972 1.378 1 48 .246
SELECTORS IMPRESSION
OF APPLICANT

This table show the overall classification of the groups which are very easy to
understand insignificant attributes are not part of the discriminant variable and those
which are excluded from the test only significant attributes are included for the
further test. Here significant variable is discriminant variable.
Significant when P >0.05
Insignificant when P < 0.05
One of the important assumptions of the MDA is variance and co variance. It means
that log discriminant table show the equal value and Box M table show insignificantly
its mean that it retain the Null hypothesis.

Log Determinants

1=COMPLETED PHD, Rank Log


2=DID NOT COMPLETE Determinant
PHD

FINISH 9 4.833
NOT FINISH 9 6.176
Pooled within-groups 9 6.849

The ranks and natural logarithms of determinants printed


are those of the group covariance matrices.

The value of finish and not finish approximately shows the equal value.

Test Results

Box's M 64.518
Approx. 1.144

df1 45
F
df2 7569.059
Sig. .235
Tests null hypothesis of equal
population covariance matrices.

This table show the insignificant because the value is greater than 0.05 it will retain
the Null hypothesis.

Eigenvalues

Function Eigenvalue % of Variance Cumulative % Canonical


Correlation

1 1.760a 100.0 100.0 .799

a. First 1 canonical discriminant functions were used in the analysis.

The eigenvalue should always be greater than 1 in any case. Here the eigenvalues
table shows that the value is greater then 1. And we can also see that the Canonical
Correlation show the value .799 the square of this value is 63.8% its mean that
variation in DA are explained the variation of other factor which are not part of this
model.

Canonical Discriminant Function


Coefficients

Function

OVERALL COLLEGE GPA .546


FIRST LETTER OF .260
RECOMMENDATION
SECOND LETTER OF .165
RECOMMENDATION
THIRD LETTER OF .486
RECOMMENDATION
STUDENTS MOTIVATION .259
AGE IN YEARS AT ENTRY .071
ABILITY TO INTERACT .059
EASILY
RATING OF STUDENT -.197
HOSTILITY
GRE SCORE ON .001
QUANTATIVE
(Constant) -13.186
Unstandardized coefficients

Structural model correlate the discriminant score and discriminant variable if the
correlation is high its mean that more variable are explained.in this model we can
create our equation.

D=13.186+.546OCG+.260FLO+.165SLR+.486TLR+.259SM+.071AYE+.059AIE-
197RSH+.001GSQ

Functions at Group Centroids

1=COMPLETED PHD, Function


2=DID NOT COMPLETE 1
PHD

FINISH 1.300
NOT FINISH -1.300

Unstandardized canonical discriminant


functions evaluated at group means

This table tells us about group average value its meant that if the value of D is close
to 1.300 most of the 0students are completed their degree or if the value of D is
close to-1.300 most of the student are not complete their degree.

Classification Resultsa,c

1=COMPLETED PHD, Predicted Group Membership Total


2=DID NOT COMPLETE FINISH NOT FINISH
PHD

FINISH 22 3 25
Count
NOT FINISH 2 23 25
Original
FINISH 88.0 12.0 100.0
%
NOT FINISH 8.0 92.0 100.0
FINISH 21 4 25
Count
NOT FINISH 2 23 25
Cross-validatedb
FINISH 84.0 16.0 100.0
%
NOT FINISH 8.0 92.0 100.0

a. 90.0% of original grouped cases correctly classified.


b. Cross validation is done only for those cases in the analysis. In cross validation, each case is classified
by the functions derived from all cases other than that case.
c. 88.0% of cross-validated grouped cases correctly classified.

in this table we can see the cross validated data and we have 88.0% grouped
cases are correctly classified.

Interpretation of MDA:
‘A discriminant analysis was conducted to predict whether a candidate would
complete or not complete their Phd after getting scholarship. Predictor variables
were GRE score on quantative, first letter of recommendation, second letter of
recommendation, third letter of recommendation, Students motivation, age in
years at entry, rating of student hostility, and overall college GPA. Significant
mean differences were observed for all the predictors on the DV. While the log
determinants were quite similar, Box’s M indicated that the assumption of equality
of covariance matrices was met. The discriminate function revealed a significant
association between groups and all predictors,
accounting for 64.7% of between group variability, although closer analysis of the
structure matrix revealed only four significant predictors,namely self-concept
score (.706) and anxiety score (–.527) with age and absence poor predictors. The
cross validated classification showed that overall 88% were correctly classified’.

You might also like