Professional Documents
Culture Documents
3.multiple Correlation & Regression
3.multiple Correlation & Regression
Module III
MULTIPLE CORRELATION
AND REGRESSION
In the previous module we had one criterion
variable (Y) and
one predictor variable (X) and wished to
predict Y on the basis of
X. In this module, we will consider the case in which we still have
only one criterion (X,) but have multiple predictors (X, i-2,...p),
and want to predict X, on the basis of simultaneous
knowledge of
all p predictors. The problem of multiple regression is that of
finding a regression equation to predict criterion variable on the
basis of p predictors.
Ihe multiple regression analysis is used when there 15 one
quantitative dependent variable two or more quantitative
Independent variables. The values of the independent variables
are USed to predict the values of the dependent variable of interest.
and
ne academic performance (X.) depends on the 1Q X)
time spent on study (X,). Hence the multiple regression equation
C
X, on X, andX, can be used to predict the performance, given
X. and X To give another example, we might wish to predict
Succes in graduate school (X.) on the basis ofundergraduate grade
point average and number
ofof courses Graduate Record Exam scores (X.)
(X,),in the major discipline (X). Similarly, we might
taken
in city to
wSh to predict the time it takes to go from one point a
another X) on the basis of number of traffic lights (X,), speed
lnimit(x Jand traffic density (X). These examples both analyzed
init are
about
the samne in the first we
presumably
care
understand
the
relationship
multiple io
uple regression is to
58
om
trom the
the equation We
eauati.
prediction
make a
variables
and
between
that rresults
elinear regression that
l i n e a r regression
derive extension of
simple
natural about one
about one particula
pDartiom
It is a to make
predictions
can use
criterion variable. Butther
there
in an equation you the
interest, usually called However iif
variable of in the equation.
variabl
several predictor multiple
can be available as predictors,
two variables
only
you have can actually do
to0 complicated, and you
egression does get
not
with a simple electronic calculator
all of the calculations quickly
regression is more often
In psychological research, multiple
role of variable is more
used to provide evidence that the
one
in usual notations.
Here ba
by I unit and finds the average
X, is kept Tage change inin X,A when X, changes
constant.
59
o 1-3
b 2 32
O3 1-3
are actually
the population regression
The above equations as in the
case of
come to the sample,
en Cquations and when we to accommodate for
an error
have
Ovariate regression model,
we we
of the error,
because of the presence
n in the equation and involved in the
relationship. Thus,
get estimates of all the statistics as
be written
a
Sample regression model can
to predict X, from A,
-X, =bz Cx, -Xi)+6ga x, -X)+e ,
values.
estimates ofthe
respective
and Here, a denotes the multiple
assumptions
in a
ming multiple
a regression analysis. for her, to
do an
difficult and
words
Worde and ideas may sound a little taken for
can be
Cxere
exercise the assumptions without
on multiple gression,
of the
estimators
verifying
eTfyino the assumptions
3.3 Analysis
1. Assummptions of Multiple Regression
measured on a
continuous
scale
interval or
revision tim
(i.e., it is
either an
this
criterion
include
ime
variables that meet
(measured using IQ score)
intelligence
(measured in hours), from 0 to 100), weiok
ght
(measured
performance
exam
forth.
(measured in kg), and so between the outcome
relationship
be a linear
2 There must
variables.
Scatter plots can show
variable and the independent
curvilinear relationship.
whether there is a linear or
regression a s s u m e s that the
Multivariate Normality-Multiple
3. distributed. The
e r r o r s between
Illustration
Answer:
54 0 35 2916
X X,X, X,X,XX,
100 1225 540
5 2 38 1890 350
3025 144 1444 660
57 16 31 2090 456
3249 256 961 912
59 13 34 1767 496
3481 169 1156
60 767 2006 442
34 3600 144 1156 720 2040 408
62 14 40 3844 196 1600 868
347 77 212
2480 506
Sum 20115 1009 7542 4467 2712
12273
X= 57.8333
=12.8333
6
X 35.3333
6
20115(57.8333? =2.7945
6
,=1.8636
2.9254
a =0.4331
i0.2518
T-0.2646
62
b2 1-T2
=0.4806
bs21 1-21
=0.7348
The regression equation of X, on
X, and X,
=buz - X ) ba (X-X)
X- X
+
16.9698
X=0.4806 X,- 0.7349X,+
Example students-
Given the following data for
a group of
achievement test
X, Scores on
test
X, scores on intelligence
Xscores on hours of study. M, =3.35
M = 101.71 M, = 10.06
a, 2.02
o , = 13.65 o, =3.06
230.16
T0.41 0.5
and 4 m
b) Ifa student scores 12 in the intelligence test X,
what will be his estimated score in X,
Answer
a) The regression equation is
X-X =bpa K, -X,)+ baa (X, -X,)
b..
b3 O Tipsa2
o 1-13
63
A5 -0.41-0.3x0.10=1.507
3.06 1-(0.16)
Di32 O3 1-23
13.65 0.5-0.41x0.16 = 3.014
2.02 1-(0.16)
Hence, X- 101.71 =
1.507 (X, 10.06) +3.014 (X, -3.35)
-
ability
to determine the
Teiatve influence of one or more predictor variables in the criterion
aue. e.g., the real estate agent could find that the size of the
Sand the number of bedrooms have a strong correlation to
pice of a home, while
the proximity to schools has no
CoTelation at all, or even a
Telirement community. negative correlation if it is primarily
a
Thehe second
advantage is the ability to identify outliers, or
nalies. For example, while reviewing the data related to
Tnanageme
that the mbersalaries, the human resources manager could find
allll hahad a of hours
worked, the department size and its budget
Altmong strong correlation
co to salaries, while seniority did not.
emat
aager
ively, it could be that all of
telated to each of the salaries being examined,
the listed except
predictor for one
values were
of Multiple Regression
3.5 Disadvantages
disadvantage of using a multiple regression moe
Any used. Two examples afa
usually comes down to
the data being of
that a corel
data and falsely conciuding
are using incomplete
is a causation.
of homes, for example, suppo
sun
When reviewing the price
10 homes, seven of wh
the real agent looked at only
estate
the relationg
were purchased by young
parents. In this case,
between the proximity of schools
may lead her to believe that
for all homes being soldin
had an effect on the sale price
the pitfalls of incomplete data. H
community. This illustrates
she used a larger sample, she could have found that, out of
homes sold, only ten percent of the related
home values were
security.
Entry Method
Questions
1.. Given the following data for a group of students:-
a , = 10.35
a = 10.21
a,6.02
0.63
=0.67 T0.75
EStablish the multiple regression equation of A
and X
sub test and
40 0
If student obtains 80 in a memory
a xpected score in
s
reasoning sub test, what can be his expected
intelligence test?
b
Suppose we want to predict job performance
mechanical aptitude test scores X)and testsed S s ( A )
conscientiousnes Tegress
personality test that measures regr
multiple
the
following data is obtained. Find out
equation of Yon X, and X,
67
Y X,
40 25
45 20
38 30
3 50 30
2 48 28
3 55 30
3 53 34
55 36
4 58 32
3 40 34
5 55 38
3 48 28
3 45 30
2 55 36
60 34
intluenced
WO
variables. E.g.. academic performance in jointly quality of
by Variables like for study,
intelligence, time spentThus, in order to study
eachers, ralCntal education and so on.
how t (combined)
affected by the joint
lependent
of all the
variable is
we study
the concept
of
Wedepstudy
yendenttheariable,
conceptgivof multiple regression equation.
independent
PeDdent values ofthe
aDIe, given the
statistical
a
signifies
The Coefficient of multiple correlation
68
eg R,=t-2aisa
-Ys
69
Note 3.1
The multiple correlation coefficients lie between 0
bicher value indicates a better
and 1. A
variable from the independent
predictability of the
dependent
variables, with a value of 1
indicating that the predictions are exactly correct and a value of 0
indicating that no linear combination of the
better independent variables
is a
predictor than is the fixed mean of the dependent
variable.
Note 3.2
Amultiple correlation coefficient yields the maximum degree
of liner relationship that can be obtained
between two or more
ndependent variables and a single dependent variable.
Rrepresents the proportion of the total variance in the
variable that can be accounted for by the independent variables.
dependent
lhe
ndependent variables are each optimally weighted such that
ur Composite will have the largest possible correlation with the
dependent variable.
NOw, let us compute a multiple correlation coefficient given
the
following data set.
Illustration
Alarge predicting a measure of
interested in
large corporation is
Job have collected data
On
on
satisfaction
15
among its employees. They
level
levelempl oyer
of responsibility
who each Supplied information on job
and years of service. The data follow:
satistaction,
and
variable, job
he folo DeTVIce, on the first
ing calumna for +he caleulation.
70
Ycars of X X X X X2 XX
Responsibility
Job service
Satisfaction X
16 25 8 20
|49 4 14
25 9 5 15
9 36 9 18 18
25 4 9 10 6
25 64 36 40 48
36 16 9 24 12
36 25 30 0
36 64 49 48 56
49 64 9 56 24
64 81 25 72 45 40 fi
64 36 25 48 30 ef
64 9 64 24 24 64
81 49 54 63 56
81 81 1 81 9 9 E
Total 87 84 71 587 558 403 535 387 41
=5.8
X-5.6 15
4.7333
587
-6.8) =V5.4933 =2.3438
o 2.5014
O 2.1865
=0.5626
"-0.0108
13
P-0.1384
71
i+i-21i
R,2 1-23
Examples
In a
study,researcher wanted to know the impact of a
a
R +i-22r2s
1-r23
0.6)+(0.4) -2x0.6x-0.4x-0.
1-(0.5)
0.28
V0.3733 =0.61
0:15
A researcher was interested in studying the relationship
betweer success in job and the trainining received. He
collected
12
041,r=0.5, r,0.16
Find the multiple correlation that measures the joint effect
and interest on success in the job.
of
training
R +i-2r
1-3
0.41) +(0.5)* -2 x0.41 x0.5x0.16
1-(0.16)
0.352503618 0.601
h
Question
1000 candidates
some sub-
appeared for an entrance test. The test
tests, namely, general intelligence tests, profess0 W
awareness test,
general knowledge test and aptitude tesl.
researcher got interested in
the association of
knowing the impact or the strengu
any two subjects on the total entrance test
(X). Initially, he took two sub- tests scores s
(X) and professional awareness scores intelligence t
(X,) and derived
Ca
necessary correlation. Compute the
for
measuring the strength of multiple correlation co
if
r0.8, r relationship between X, aand (X"
0.7, r23=0.6
3.8
Advantages of Multiple Correlation lysis
It serves as a Anay
measure of the
SO
me
regression and consequeny
nsequently as
73
i3i23
Ti32
- 1-r
23'is
Application
Partial correlation can be used as a special statistical technique
for eliminating the effects of one or more variables on the two
main variables, for which we want to compute an independent
da reliable measure of correlation.
Example
From a certain number of schools in Delhi, a sample of 500
Sudents studying in classes IX and X was taken. These students
were evaluated in terms of their academic achievement and
Vl-y1-
0.18-0.6x0.7
VI-(0.6 1-(0.7)
04201
76
Question
r0.75, r,=0.63, find y23
Given ,0.67,
Correlation
Partial
3.11 Advantages of
correlation analysis assumes great significance
The partial have multil
under consideration
cases where the phenomena
in physical and experimen
factors influencing them, especially
sciences, where it is possible to control the variables and
between-1 and +1
11
values are good, they don't tell you how far the datapoints
R? is valid for
are from the regression line. Additionally, ronly
use R to compare a linear model
linear models. You can't
a nonlinear model.
Additional questions
write down the multiple regrein
1. For the data given below, value
and 2 on yield. Find the of R2.Whati
equation of factors 1
your conclusion?
d e n a
systolic
X,
B.P, X
2. For the following data, X, denotessys
Pound Forma pounds.
age in years and X, denotes weight in
Variables
regression equationden5Pa
taking B.P as the dependehe m t h em u l
Fin
age and weight as independent variables.
correlation coefficient R,
Coc
and interpret i1s
79
X
132 52 173
143 59 184
153 67 194
162 73 211
154 64 196
168 74 220
137 54 188
149 61 188
159 65 207
128 46 167
166 72 217
3. For the data given below, taking the final marks as the
dependent variable, find a multiple regression equation. Find
the partial correlation coefficient by removing the effect of
the marks in exam 1 from the other two variables.
Test Scores for General Psychology. The data (X, X,, X,)
are for each student. X, = score on exam 1, X, = Score on
X X2 X
73 80 152
93 88 185
89 91 180
96 98 196
73 66 142
53 46 101
69 74 149
47 56 115
87 79 175
79 70 164
69 70 141
70 141
65
93 95 184
80
regression
model for the data on
das.
suitable multiple
4
4. Find a
movies given
below by choosing your ependen
Hollywood
and independent
variable. Find the
three partial correlar
coefficients and multiple
correlation coefficient.
Substantiate
y o u r answer.
movie
The data (X, X,, X,) are for each
receipts/millions
X, =
first year box office
costs/millions
X, =
total production
costs/millions 4.1
total promotional
X,
All
X X X5.1 of
85.1 8.5 cha
106.3 | 12.9 5.8
2.1 are
50.2 5.2 veb
130.6 10.7 8.4
tha
54.8 3.1 2.9 su
30.3 3.5 1.2 Ou
79.4 9.2 3.7 Ch-
91 9 7.6 a
135.4 15.1 7.7
Qt
89.3 10.2 4.5
ex
Ca
a