Professional Documents
Culture Documents
MVDA Project
MVDA Project
Analysis Project
[Document subtitle]
Types of data:
There are three major types of data are given below.
1. Time series data.
Cross-sectional data are data that are collected from participants at one point in
time. Time is not considered one of the study variables in a cross-sectional research
design. However, it is worth noting that in a cross-sectional study, all participants do
not provide data at one exact moment. Even in one session, a participant will
complete the questionnaire over some duration of time. Nonetheless, cross-sectional
data are usually collected from respondents making up the sample within a relatively
short time frame (field period). In a cross-sectional study, time is assumed to have
random effect that produces only variance, not bias. In contrast, time series
data or longitudinal data refers to data collected by following an individual
respondent over a course of time.
3. Longitudinal data.
Longitudinal data sometimes referred to as panel data track the same sample at
different points in time. The sample can consist of individuals, households,
establishments, and so on. In contrast, repeated cross-sectional data, which also
provides long-term data, gives the same survey to different samples over time.
Data Screening:
Factor Analysis:
Factor Analysis is a method for identifying a structure (or factors, or dimensions) that
underlies the relations among a set of observed variables. It is a technique that
transforms the correlations among a set of observed variables into smaller number of
underlying factors, which contain all the essential information about the linear
interrelationships among the original test scores.
Objectives of EFA:
Sig. .000
The Kaiser–Meyer–Olkin measure verified the sampling adequacy for the analysis
KMO = .933(‘superb’ according to Field, 2009), and all KMO values for individual
items were > 80, which is well above the acceptable limit of .5 (Field, 2009). Bartlett’s
test of sphericity χ² (903) = 12478.574, p < .001, indicated that correlations between
items were sufficiently large for PCA.
Total Variance Explained
Extraction Sums of Squared Rotation Sums of Squared
Initial Eigenvalues Loadings Loadings
% of Cumulative % of Cumulative % of Cumulative
Component Total Variance % Total Variance % Total Variance %
1 14.368 33.413 33.413 14.368 33.413 33.413 5.481 12.747 12.747
2 4.540 10.559 43.972 4.540 10.559 43.972 5.201 12.096 24.843
3 3.351 7.793 51.765 3.351 7.793 51.765 5.078 11.809 36.652
4 2.640 6.140 57.906 2.640 6.140 57.906 4.731 11.002 47.655
5 2.033 4.727 62.633 2.033 4.727 62.633 4.015 9.337 56.991
6 1.871 4.350 66.983 1.871 4.350 66.983 2.776 6.457 63.448
7 1.110 2.582 69.565 1.110 2.582 69.565 2.630 6.117 69.565
An initial analysis was run to obtain eigenvalues for each component in the data.
Seven components had eigenvalues over Kaiser’s criterion of 1 and in combination
explained 69.565% of the variance.
Component
1 2 3 4 5 6 7
Useful_4 .846
Useful_5 .837
Useful_3 .831
Useful_1 .810
Useful_2 .796
Useful_6 .770
Useful_7 .712
Joy_7 .829
Joy_2 .818
Joy_4 .812
Joy_5 .801
Joy_3 .786
Joy_6 .778
Joy_1 .688
Playful_6 .831
Playful_2 .828
Playful_5 .812
Playful_3 .755
Playful_4 .707
Playful_7 .662
Playful_1 .658
DecQual_3 .812
DecQual_4 .803
DecQual_5 .777
DecQual_2 .761
DecQual_7 .668
DecQual_8 .605
DecQual_6 .540
AtypUse_3 .884
AtypUse_5 .869
AtypUse_4 .865
AtypUse_2 .848
AtypUse_1 .796
CompLatent_4 .777
CompLatent_3 .775
CompLatent_1 .771
CompLatent_5 .723
InfoAcq_1 .703
InfoAcq_2 .689
InfoAcq_3 .668
InfoAcq_5 .483
Table rotated component matrix shows the factor loadings after rotation. The items
that cluster on the same components suggest that component 1 represents a Useful,
component 2 a joy, component 3 a playful, component 4 decision quality, component
5 a atypuse component 6 a complatent and component 7 a infoacq concerns.in this
table we can see that there is no cross loading on the same component so this table
show the convergent validity.
Interpretation of EFA:
An exploratory factor analysis was conducted by using principal component analysis
(PCA) on the 43 items with orthogonal rotation (varimax). The Kaiser–Meyer–Olkin
measure verified the sampling adequacy for the analysis, KMO = .933 (‘superb’
according to Field, 2009), and all KMO values for individual items were > .80, which
is well above the acceptable limit of .5 (Field, 2009). Bartlett’s test of sphericity χ²
(903) = 12478.574, p < .001, indicated that correlations between items were
sufficiently large for PCA. An initial analysis was run to obtain eigenvalues for each
component in the data. Seven components had eigenvalues over Kaiser’s criterion of
1 and in combination explained 69.565% of the variance. The scree plot was slightly
ambiguous and showed inflexions that would justify retaining both components 5 and
7. Given the large sample size, and the convergence of the scree plot and Kaiser’s
criterion on seven components, this is the number of components that were retained
in the final analysis. Table rotated component matrix shows the factor loadings after
rotation. The items that cluster on the same components suggest that component 1
represents a Useful, component 2 a joy, component 3 a playful, component 4
decision quality, component 5 a atypuse component 6 a complatent and component
7 a infoacq concerns
These all results clearly indicate that, we obtained the construct validity, in construct
validity, convergent validity can be seen when we see rotated
Yi = β0+β1xi1+β2xi2+...+βpxip+ϵ
yi = dependent variable
xi = expanatory variables
Assumptions of MLA:
There is a linear relationship between the dependent variables and the
independent variables.
X should be variable
X should be non-stochastic and fixed in repeated sample.
There should be no relationship between independent variable ( if there is
relation exist then called it multicollinearity).
The spread of residual (error term) should be constant called
homoscedasticity Vice versa heteroscedasticity.
There should be no outlier in the residual.
Interpretation:
Model Summaryb
Mod R R Adjusted R Std. Error Durbin-
el Square Square of the Watson
Estimate
a
1 .892 .796 .792 .26862 2.058
a. Predictors: (Constant), use, aty, play, comp, info, joy
b. Dependent Variable: dec
R=represents the simple correlation and is 0.892, which indicates a high degree of
correlation. Greater value of R is good
R SQUARE = model productivity (.796) means 79% of variation in Dependent
variable which is decision quality explained by independent variables (use, aty, play,
comp, info, joy ) other variation is not explained. R SQUARE 79% which is very
higher. The regression equation appears to be very useful for making predictions
since the value of R square is close to 1.
ANOVAa
Model Sum of df Mean F Sig.
Squares Square
Regressio 104.791 6 17.465 242.05 .000b
n 2
1
Residual 26.914 373 .072
Total 131.705 379
a. Dependent Variable: dec
b. Predictors: (Constant), use, aty, play, comp, info, joy
This table indicates that the regression model predicts the dependent variable
significantly well. Regression which is (104.791) and probability p <0.000 (This
indicates the statistical significance of the regression model that was run) Here, p <
which is less than 0.05, and indicates that, overall, the regression model statistically
significantly predicts the outcome variable (i.e., it is a good fit for the data).
Coefficients
Model Unstandardized Standardize t Sig. Collinearity
Coefficients d Statistics
Coefficient
s
B Std. Error Beta Toleranc VIF
e
(Consta -.058 .112 -.515 .607
nt)
comp -.079 .027 -.085 -2.968 .003 .667 1.500
info 1.121 .045 .820 25.055 .000 .512 1.953
1
aty .059 .023 .072 2.587 .010 .708 1.412
play -.036 .025 -.046 -1.424 .155 .525 1.905
joy .006 .029 .006 .191 .849 .511 1.957
use .138 .030 .144 4.564 .000 .550 1.819
a. Dependent Variable: dec
1. For comp total the unstandardized partial slope (-0.79) and standardized
partial slope (-0.045) are statistically significantly different from 0 (t= -2.968 ,
p < 0.003); with every increase or decrease one point in comp (IDV), the
decision quality(DV ) will increase or decrease by approximately 1/100 of one
point when controlling for info.
2. For info total the unstandardized partial slope (1.121) and standardized partial
slope (-0.085) are statistically significantly different from 0 (t= 25.055, p <
0.000); with every increase or decrease one point in info (IDV), the decision
quality(DV ) will increase or decrease by approximately 1/100 .
3. For aty total the unstandardized partial slope (0.59) and standardized partial
slope (0.072) are statistically no significantly different from 0 (t= 2.587 , p
=0.010); with every increase or decrease one point in aty (IDV), the decision
quality(DV ) will increase or decrease by approximately 1/100 of one point
when controlling for
4. For play total the unstandardized partial slope (-0.36) and standardized
partial slope (-0.046) are statistically no significantly different from 0 (t= -1.424
,p>0.005); with every increase or decrease one point in play (IDV), the
decision quality(DV ) will increase or decrease by approximately 1/100 .
5. For joy total the unstandardzied partial slope (0.006) and standardized partial
slope (0.006) are statistically no significantly different from 0 (t= 0.191,
p>0.005 p=.191); with every increase or decrease one point in joy (IDV), the
decision quality(DV ) will increase or decrease by approximately 1/100 .
6. For use total the unstandardized partial slope (0.138) and standardized partial
slope (0.144) are statistically not significantly different from 0 (t= 4.564
,p>0.005 p=0.849); with every increase or decrease one point in play (IDV),
the decision quality(DV ) will increase or decrease by approximately 1/100 .
7. Variance inflation factor (VIF) should always be less than 5, Tolerance should
b higher than 10 (t > 0.1) both shows collinearity statistics. there are no
apparent multicollinearity problems.
Assumption of MDA:
the observations are a random sample;
each predictor variable is normally distributed;
each of the allocations for the dependent categories in the initial
classification are correctly classified;
there must be at least two groups or categories, with each case belonging to
only one group so that the groups are mutually exclusive and collectively
exhaustive (all cases can be placed in a group);
for instance, three groups taking three available levels of amounts of
housing loan.
Results of MDA:
Group Statistics
1=COMPLETED PHD, 2=DID NOT COMPLETE PHD Mean Std. Deviation Valid N (listwise)
Unweighted Weighted
This table show the overall classification of the groups which are very easy to
understand insignificant attributes are not part of the discriminant variable and those
which are excluded from the test only significant attributes are included for the
further test. Here significant variable is discriminant variable.
Significant when P >0.05
Insignificant when P < 0.05
One of the important assumptions of the MDA is variance and co variance. It means
that log discriminant table show the equal value and Box M table show insignificantly
its mean that it retain the Null hypothesis.
Log Determinants
FINISH 9 4.833
NOT FINISH 9 6.176
Pooled within-groups 9 6.849
The value of finish and not finish approximately shows the equal value.
Test Results
Box's M 64.518
Approx. 1.144
df1 45
F
df2 7569.059
Sig. .235
Tests null hypothesis of equal
population covariance matrices.
This table show the insignificant because the value is greater than 0.05 it will retain
the Null hypothesis.
Eigenvalues
The eigenvalue should always be greater than 1 in any case. Here the eigenvalues
table shows that the value is greater then 1. And we can also see that the Canonical
Correlation show the value .799 the square of this value is 63.8% its mean that
variation in DA are explained the variation of other factor which are not part of this
model.
Function
Structural model correlate the discriminant score and discriminant variable if the
correlation is high its mean that more variable are explained.in this model we can
create our equation.
D=13.186+.546OCG+.260FLO+.165SLR+.486TLR+.259SM+.071AYE+.059AIE-
197RSH+.001GSQ
FINISH 1.300
NOT FINISH -1.300
This table tells us about group average value its meant that if the value of D is close
to 1.300 most of the 0students are completed their degree or if the value of D is
close to-1.300 most of the student are not complete their degree.
Classification Resultsa,c
FINISH 22 3 25
Count
NOT FINISH 2 23 25
Original
FINISH 88.0 12.0 100.0
%
NOT FINISH 8.0 92.0 100.0
FINISH 21 4 25
Count
NOT FINISH 2 23 25
Cross-validatedb
FINISH 84.0 16.0 100.0
%
NOT FINISH 8.0 92.0 100.0
in this table we can see the cross validated data and we have 88.0% grouped
cases are correctly classified.
Interpretation of MDA:
‘A discriminant analysis was conducted to predict whether a candidate would
complete or not complete their Phd after getting scholarship. Predictor variables
were GRE score on quantative, first letter of recommendation, second letter of
recommendation, third letter of recommendation, Students motivation, age in
years at entry, rating of student hostility, and overall college GPA. Significant
mean differences were observed for all the predictors on the DV. While the log
determinants were quite similar, Box’s M indicated that the assumption of equality
of covariance matrices was met. The discriminate function revealed a significant
association between groups and all predictors,
accounting for 64.7% of between group variability, although closer analysis of the
structure matrix revealed only four significant predictors,namely self-concept
score (.706) and anxiety score (–.527) with age and absence poor predictors. The
cross validated classification showed that overall 88% were correctly classified’.