Download as pdf or txt
Download as pdf or txt
You are on page 1of 24

57

Module III
MULTIPLE CORRELATION
AND REGRESSION
In the previous module we had one criterion
variable (Y) and
one predictor variable (X) and wished to
predict Y on the basis of
X. In this module, we will consider the case in which we still have
only one criterion (X,) but have multiple predictors (X, i-2,...p),
and want to predict X, on the basis of simultaneous
knowledge of
all p predictors. The problem of multiple regression is that of
finding a regression equation to predict criterion variable on the
basis of p predictors.
Ihe multiple regression analysis is used when there 15 one
quantitative dependent variable two or more quantitative
Independent variables. The values of the independent variables
are USed to predict the values of the dependent variable of interest.
and
ne academic performance (X.) depends on the 1Q X)
time spent on study (X,). Hence the multiple regression equation
C
X, on X, andX, can be used to predict the performance, given
X. and X To give another example, we might wish to predict
Succes in graduate school (X.) on the basis ofundergraduate grade
point average and number
ofof courses Graduate Record Exam scores (X.)
(X,),in the major discipline (X). Similarly, we might
taken
in city to
wSh to predict the time it takes to go from one point a
another X) on the basis of number of traffic lights (X,), speed
lnimit(x Jand traffic density (X). These examples both analyzed
init are
about
the samne in the first we
presumably
care

way, although in the


second
we

night 1ons for individual app s,


whereas
interested
be
inthe less interested in the prediction itselfand the
more
most
the role In fact,
Oimon use of pl cach of the
predictors.

understand
the
relationship

multiple io
uple regression is to
58
om
trom the
the equation We
eauati.

prediction
make a
variables
and
between
that rresults
elinear regression that
l i n e a r regression
derive extension of
simple
natural about one
about one particula
pDartiom
It is a to make
predictions
can use
criterion variable. Butther
there
in an equation you the
interest, usually called However iif
variable of in the equation.
variabl
several predictor multiple
can be available as predictors,
two variables
only
you have can actually do
to0 complicated, and you
egression does get
not
with a simple electronic calculator
all of the calculations quickly
regression is more often
In psychological research, multiple
role of variable is more
used to provide evidence that the
one

than another. Multiple


primary or direct in affecting the criterion
regression has also been used frequently to determine the relative
importance of various variables in affecting psychological
outcomes. However, the use of multiple regression for predictions
is more mechanical and therefore easier to
explain.
3.2 Multiple
Regression Equation
When a linear
the multiple
relationship is assumed among three variabies
regression equations of X,,X, and X, is given by
X-X, =ba (X,
-X,)+bu, X, -X,),
Tegression coeficient of X, on X, keepingwhere bi2.3 is
tn
denotes the X constant oi3
ana
regression coefficient of X, on X, keeping X, X constant.
Uven a set of observations on Co
can
estimated using the X, X and X . these coetficIethe
s
be
estimates are
given by principle of least Squares aand

in usual notations.

Here ba
by I unit and finds the average
X, is kept Tage change inin X,A when X, changes
constant.
59

measures the average change in X, when X, changes


b23
and X,kept constant.
is
by I unit
The intercept
term gives the value of X, when X, = X,=0.
the role of dependent and independent
Now, by interchanging can be formed.
i.e. the
regression equations
variables, two more is the dependent
equation of X, on X, and X, (here X,
regression on X, and
and the regression equation of X,
larvariable)
at
(here X, is the dependent variable).
oft X, equations of X, on X, and
Thus e.g. the multiple regression
X -X,)
X, is given by X-X, =bu (X-X,)+ba
elau
where b 1 = 2 2 - i s 3

o 1-3

b 2 32
O3 1-3
are actually
the population regression
The above equations as in the
case of
come to the sample,
en Cquations and when we to accommodate for
an error
have
Ovariate regression model,
we we
of the error,
because of the presence
n in the equation and involved in the
relationship. Thus,
get estimates of all the statistics as
be written
a
Sample regression model can
to predict X, from A,
-X, =bz Cx, -Xi)+6ga x, -X)+e ,
values.
estimates ofthe
respective
and Here, a denotes the multiple
assumptions
in a

Also, we have the following verified before


1S0, should be
regression
n model. These assumptions for a beginner
But
these

ming multiple
a regression analysis. for her, to
do an
difficult and
words
Worde and ideas may sound a little taken for
can be
Cxere
exercise the assumptions without
on multiple gression,
of the
estimators

granted and art inding the


values

verifying
eTfyino the assumptions
3.3 Analysis
1. Assummptions of Multiple Regression

measured on a
continuous
scale

dent variable should be


60
ratio
variable). Examples of ot

interval or
revision tim
(i.e., it is
either an

this
criterion
include
ime
variables that meet
(measured using IQ score)
intelligence
(measured in hours), from 0 to 100), weiok
ght
(measured
performance
exam
forth.
(measured in kg), and so between the outcome
relationship
be a linear
2 There must
variables.
Scatter plots can show
variable and the independent
curvilinear relationship.
whether there is a linear or
regression a s s u m e s that the
Multivariate Normality-Multiple
3. distributed. The
e r r o r s between

residuals are normally


values (i.e., the residuals of
observed and predicted
distributed. This
should be normally
the regression)
by looking at a histogram
or a
checked
assumption may be
also be checked with a goodness
Q-0-Plot. Normality can
this
of fit test (e.g, the Kolmogorov-Smimov
test), though
test must be conducted on
the residuals themselves. These

topics will be discussed later.


4. The errors have zero mean.
that the
5. No Multicollinearity -

Multiple regression assumes


independent variables are not highly correlated with each
other.
66. Homoscedasticity-This assumption states that the variance
of error terms are similar across the values of the independent
variables. A plot of standardized residuals versus predicted
values can show whether points are equally distributed across
all values of the independent variables. There should be no
clear pattem in the distribution; if there is a cone-shapeud
pattern the data is heteroscedastic.
7. There should be no
significant can have
outliers.
They a
very negative effect on the
regression equation that is usea to
predict the value of the dependent variable based on
independent variables.
61

Illustration

Set up a multiple regression cquation of X, on and


X, X,
XX, 54 55 57 59 60 62
X, 10 12 16 13
35 38 31 34
12
34
14
40

Answer:

54 0 35 2916
X X,X, X,X,XX,
100 1225 540
5 2 38 1890 350
3025 144 1444 660
57 16 31 2090 456
3249 256 961 912
59 13 34 1767 496
3481 169 1156
60 767 2006 442
34 3600 144 1156 720 2040 408
62 14 40 3844 196 1600 868
347 77 212
2480 506
Sum 20115 1009 7542 4467 2712
12273
X= 57.8333

=12.8333
6

X 35.3333
6

20115(57.8333? =2.7945
6

,=1.8636
2.9254
a =0.4331
i0.2518
T-0.2646
62

b2 1-T2
=0.4806

bs21 1-21
=0.7348
The regression equation of X, on
X, and X,
=buz - X ) ba (X-X)
X- X
+

16.9698
X=0.4806 X,- 0.7349X,+

Example students-
Given the following data for
a group of

achievement test
X, Scores on
test
X, scores on intelligence
Xscores on hours of study. M, =3.35
M = 101.71 M, = 10.06

a, 2.02
o , = 13.65 o, =3.06
230.16

T0.41 0.5

a) Write down the regression equation of X, X, and A


on

and 4 m
b) Ifa student scores 12 in the intelligence test X,
what will be his estimated score in X,

Answer
a) The regression equation is
X-X =bpa K, -X,)+ baa (X, -X,)
b..
b3 O Tipsa2
o 1-13
63

A5 -0.41-0.3x0.10=1.507
3.06 1-(0.16)

Di32 O3 1-23
13.65 0.5-0.41x0.16 = 3.014
2.02 1-(0.16)

Hence, X- 101.71 =
1.507 (X, 10.06) +3.014 (X, -3.35)
-

X=1.507 X,+ 3.014 X,+ 76.4


When X,= 12, X=4, X= 107.
34
Advantages of Multiple Regression
There are main advantages to
two
multiple regression model. The first is the analyzing
data using a

ability
to determine the
Teiatve influence of one or more predictor variables in the criterion
aue. e.g., the real estate agent could find that the size of the
Sand the number of bedrooms have a strong correlation to
pice of a home, while
the proximity to schools has no
CoTelation at all, or even a
Telirement community. negative correlation if it is primarily
a

Thehe second
advantage is the ability to identify outliers, or
nalies. For example, while reviewing the data related to
Tnanageme
that the mbersalaries, the human resources manager could find
allll hahad a of hours
worked, the department size and its budget
Altmong strong correlation
co to salaries, while seniority did not.
emat
aager
ively, it could be that all of
telated to each of the salaries being examined,
the listed except
predictor for one
values were

Tanvager who was


being overpaid compared to the others.
64

of Multiple Regression
3.5 Disadvantages
disadvantage of using a multiple regression moe
Any used. Two examples afa
usually comes down to
the data being of
that a corel
data and falsely conciuding
are using incomplete
is a causation.
of homes, for example, suppo
sun
When reviewing the price
10 homes, seven of wh
the real agent looked at only
estate
the relationg
were purchased by young
parents. In this case,
between the proximity of schools
may lead her to believe that
for all homes being soldin
had an effect on the sale price
the pitfalls of incomplete data. H
community. This illustrates
she used a larger sample, she could have found that, out of
homes sold, only ten percent of the related
home values were

school's proximity. If she had used the buyers' ages as a predict


buyers were willng
value, she could have found that younger
pay more for homes in the community than older buyers.
there a
In the example of management salaries, suppose
one outlier who had a smaller budget, less seniority and withte

than anyone else.


personnel to manage but was making more
HR manager could look at the data and conclude that thisindivu
is being overpaid. However, this conclusion would be errou
was in cnaug
if he didn't take into account that this manager
etwo
highly coveted skillset n
in
the company's website and had a

security.

3.6 Selection Process for Multiple Regression e


Sess w h e t
The basis of a multiple linear regression is to assess Ste
set
one continuous dependent variable can be predicted froma
mu
ndependent(or predictor) variables. Or in other words, no
explained bya
variance in a continuous dependent variable is explainco
comeacd
of predictors. In the computer packages like SPSS, we co
65

the following approachesf for calculating the regression coefficients.


Again for a beginner, from the class room point of view, she doesn't
have to wory which method to follow. She can directly calculate
theestimates using the formulae above. This is included here just
tofamiliarize you with some of the terminologies used in computer
packages.

Entry Method

The standard method of entry is simultaneous (a.k.a.the enter


method); all independent variables are entered into the
time. This is
equation
at the same an
appropriate analysis when dealing
with a small set of predictors and when the researcher does
not
know which independent variables will create the best
prediction
equation
Selection Methods
Selection, on the other hand, allows for the construction of
an
optimal regression equation along with
Specitic predictor variables. The aim of investigation into
ot
selection is to reduce the
St predictor variables to those that are necessary and account
Of
nearly as much of the variance as is accounted for the total
by
t0 In essence, selection helps to determine the level of
cach predictor importance
variable. It also assists in assessing the effects
he other predictor variables are statistically eliminated. The
Cumstances of the study, along with the nature of the research
questions guide the selection of predictor variables.
Three selection pr
icction procedures are used to yield the most appropriate
regression quation: forward selection, backward elimination and
slepwise selection.
orward selection with an
added one at a begins
are empty equation. Predictors
time beginning with the predictor with the
Rres relation with the dependent variable. Variables of
greater theoreti
COretical importance are entered first. Once in the
66

variable remains there.


cquation, the
backward deletion) is the reverv
Backward elimination (or
process. All the
independent variables are entered into the
is deleted one at a time if they d
equation first and each one
not contribute to the regression
equation.
considered a variation of the previou
Stepwise selection is
two methods. Stepwise selection involves analysis at eaci
contribution of the predictor variabl
step to determine the
entered previously in the equation. In this way it is possibl
to understand the contribution
of the previous variables no
that another variable has been added. Variables can b

retained or deleted based on their statistical contribution.

Questions
1.. Given the following data for a group of students:-

X, Scores on an intelligence test


X, scores on a memory sub test

X, scores on a reasoning sub test.


M, =32.8
M 78 M, = 87.2

a , = 10.35
a = 10.21
a,6.02
0.63
=0.67 T0.75
EStablish the multiple regression equation of A
and X
sub test and
40 0
If student obtains 80 in a memory
a xpected score in
s
reasoning sub test, what can be his expected
intelligence test?
b
Suppose we want to predict job performance
mechanical aptitude test scores X)and testsed S s ( A )

conscientiousnes Tegress
personality test that measures regr
multiple
the
following data is obtained. Find out
equation of Yon X, and X,
67

Y X,
40 25
45 20
38 30
3 50 30
2 48 28
3 55 30
3 53 34
55 36
4 58 32
3 40 34
5 55 38
3 48 28
3 45 30
2 55 36
60 34

3.7 Multiple Correlation

education and psychology, we


n y practical situations in
findthethat the dependent variable is jointly influenced by
depen
than more

intluenced
WO
variables. E.g.. academic performance in jointly quality of
by Variables like for study,
intelligence, time spentThus, in order to study
eachers, ralCntal education and so on.
how t (combined)
affected by the joint
lependent
of all the
variable is
we study
the concept
of

independent variables, predict the value of the


multiple correlation
On and
in order to variables,

Wedepstudy
yendenttheariable,
conceptgivof multiple regression equation.
independent
PeDdent values ofthe
aDIe, given the
statistical

a
signifies
The Coefficient of multiple correlation
68

linear relationship betwss Note


the strength of
tool used for denoting independent variable
variable and two or more
the dependent
correlation, denoted R, is a scalarth high
The coefficient of multiple coefficient between t vari
Pearson correlation
is defined as the mdie
and the actual values ofthe dependent variable in a line
predicted indie
a measure of how
well a given variahlu
regression model. It is is a
set of other variables
a linear function of a
can be predicted using var
the variable's values and the bes
It is the correlation between
from the predictive Not
predictions that can be computed linearly
variables. The correlation is said to be simple when only two

variables are studied. The correlation is either multiple or partil of


when three or more variables are studied. inde
Consider the case of three variables X,, X, and X,. Thoug R
we can consider any number ofindependent variables or predictonvarvar
in our analysis, here we restrict our attention to the case of te The
variables. thei
The multiple correlation coefficient denoted as R measur dep
the strength ofrelationship between the dependent variable
and the combined effect of X, and X, . It is given by the formu the

R2s +-2ia 2 where ri2 Is and Job


1-r On
respectively denote the Karl Pearson's coefficient of COL
between the specified
variables.
Similarly, we can get expressions for R213 and Rs.12

eg R,=t-2aisa
-Ys
69

Note 3.1
The multiple correlation coefficients lie between 0
bicher value indicates a better
and 1. A
variable from the independent
predictability of the
dependent
variables, with a value of 1
indicating that the predictions are exactly correct and a value of 0
indicating that no linear combination of the
better independent variables
is a
predictor than is the fixed mean of the dependent
variable.

Note 3.2
Amultiple correlation coefficient yields the maximum degree
of liner relationship that can be obtained
between two or more
ndependent variables and a single dependent variable.
Rrepresents the proportion of the total variance in the
variable that can be accounted for by the independent variables.
dependent
lhe
ndependent variables are each optimally weighted such that
ur Composite will have the largest possible correlation with the
dependent variable.
NOw, let us compute a multiple correlation coefficient given
the
following data set.
Illustration
Alarge predicting a measure of
interested in
large corporation is
Job have collected data
On
on
satisfaction
15
among its employees. They
level
levelempl oyer
of responsibility
who each Supplied information on job
and years of service. The data follow:
satistaction,

Satisfaction (X) 2 2 3 3 55666


788899

Kesponsíibility (X,): 5 753632866327355


Cars of Service(X):
4 2
55881
4 58896379

and

efind the joint effect of the variables, responsibility


years of serV satisfaction.
We need

variable, job
he folo DeTVIce, on the first
ing calumna for +he caleulation.
70

Ycars of X X X X X2 XX
Responsibility
Job service
Satisfaction X
16 25 8 20
|49 4 14
25 9 5 15
9 36 9 18 18
25 4 9 10 6
25 64 36 40 48
36 16 9 24 12
36 25 30 0
36 64 49 48 56
49 64 9 56 24
64 81 25 72 45 40 fi
64 36 25 48 30 ef
64 9 64 24 24 64
81 49 54 63 56
81 81 1 81 9 9 E
Total 87 84 71 587 558 403 535 387 41

=5.8

X-5.6 15

4.7333

587
-6.8) =V5.4933 =2.3438
o 2.5014

O 2.1865
=0.5626
"-0.0108
13

P-0.1384
71

i+i-21i
R,2 1-23

0.5626+-0.0108-(2 x0.5626x -0.0108x-0.1 384


1--0.1384)
=
0.3149/0.9808) V0.3211 =0.5667
=

Here, R2=0.3211. This means that 32% of the change in the


first variable, job satisfaction can be explained from the combined
effect of responsibility and years of service( as predicted from the
linear regression) and the rest is unexplained from the given model.

Examples
In a
study,researcher wanted to know the impact of a
a

person's intelligence and his socio-economic status on his


academic success. He computed the following corelation
coefficients.
'=0.6, =0.4, r=0.5
Find out the
multiple correlation coefficient Ri23

R +i-22r2s
1-r23

0.6)+(0.4) -2x0.6x-0.4x-0.
1-(0.5)
0.28
V0.3733 =0.61
0:15
A researcher was interested in studying the relationship
betweer success in job and the trainining received. He
collected
12

and added a third variable


le "interest
inte
data regarding these
The correlation amonot
(measuredby interest inventory).
three variables are:-

041,r=0.5, r,0.16
Find the multiple correlation that measures the joint effect
and interest on success in the job.
of
training

R +i-2r
1-3
0.41) +(0.5)* -2 x0.41 x0.5x0.16
1-(0.16)

0.352503618 0.601
h
Question
1000 candidates
some sub-
appeared for an entrance test. The test
tests, namely, general intelligence tests, profess0 W

awareness test,
general knowledge test and aptitude tesl.
researcher got interested in
the association of
knowing the impact or the strengu
any two subjects on the total entrance test
(X). Initially, he took two sub- tests scores s
(X) and professional awareness scores intelligence t
(X,) and derived
Ca
necessary correlation. Compute the
for
measuring the strength of multiple correlation co
if
r0.8, r relationship between X, aand (X"
0.7, r23=0.6
3.8
Advantages of Multiple Correlation lysis
It serves as a Anay
measure of the
SO

one variable taken as the degree of assocla plD of


ofot
variables taken as the dependent variable and a
s
It
also serves as
independent variables. offit of
a measure of the goodness
calculated plane of goo s aa
measure

me
regression and consequeny
nsequently as
73

the general degree of accuracy of estimates made by the reference


tocquations for the plane of regression.

19 Limitations of Multiple Correlation Analysis


Multiple regression analysis is based on the assumption that
the relationship between the variables is linear. In
practice, most
relationships are not linear but follow some other patterm. This
limits somewhat the use of multiple correlation analysis.
The second important limitation is the
assumption
that effects
ofindependent variables on the dependent variables are separate,
distinct and additive, which also may not hold good
always.
3.10 Partial Correlation
In partial correlation analysis we
study the correlation between
two quantitative variables, by eliminating out the linear effect of
all the
other independent variables on them. If your focus is on
ne
relationship between two particular variables, such as coffee
cOnsumption (CC) and cholesterol level (CL), but there are a
Tumber of variables that affect both of them,
you will probably
want to calculate a partial correlation. The problem with observing
laly high corelation between CC and CL is that it could be due
Some third variable, such as stress. Under high levels of stress,
especially at work, some people may drink more coffee. It is also
PS1ble that stress directlv affects CL. If these two statements are
both true,e, CC and CL will have some spurious correlation with
each other
because each is related to stress. Any two variables
Teler
ated to just
tress will be correlated, even if there is no direct
ction between them. Ifwe are interested in the direct eftect
of CC
on CL, We would want to hold stress at a constant level.
is
Thisis we can use
generallyn feasible to do experimentally, but
both CC and CL
and independent measure of stress to predict
between these
twos sets Tesiduals for both variables. The correlation
two of and CL wih
correlation of CC
residuals is the partial
partialled out of both variables. The variable partia andis
Stress in the previous example) is called a eovariate
74

usually thought of as a "nuisance variable with respect to

problem hand. For instance, if we are studying the


at rol

between social skills and memory in elderly subjects, age can


nuisance variable. Social skills and memory can be correl
be
be correlatei
simply because both 'decline with age in the clderly. If vou we
wan
to see that within a given age there is still a correlation betwa
ween
memory and social skills, you would partial out age. In the comme
where the covariate is correlated highly with each of the te
nmon
case,
variables of interest, the partial correlation will be lower than th
original correlation. For instance, in the CCICL exanmple you cou
decide to partial out an obesity measure, in addition to
stress, T
see whether there is a direct connection
between CC and CL, yo
would want to partial out all the extraneous variables
that coud
make it look like there's a connection between
CC and CL whe
there isn't one. Before
computers, the calculation involved
partialling more than one variable was daunting, but that is
out
longer a consideration. a
The basic distinction between
multiple and partial correlanu
analysis is that in the former we measure the
degree of relationsi
between the variable X, and all the variables X Xtak
together. In the latter we measure the degree of
X, and one of the variables X relationship berw
X, with the effect of all othe
variables removed.
In the case of 3
correlation coefficients.
variables, X , X andX, there are 3 part
They are represented as
i23 measures the correlation between X, and j23132p
of
X,a constant or, removing the X, keeping
linear effect ot A,

The formula for 12.3 =1213


Vl-ri-
be
Similarly, the other two
partial correlation
C o e f i c i e n t s

interpreted and defined.


75
Thus we have,

i3i23
Ti32
- 1-r
23'is

Application
Partial correlation can be used as a special statistical technique
for eliminating the effects of one or more variables on the two
main variables, for which we want to compute an independent
da reliable measure of correlation.

Example
From a certain number of schools in Delhi, a sample of 500
Sudents studying in classes IX and X was taken. These students
were evaluated in terms of their academic achievement and

Paricipation in co-curricular activities. Their IQQs are also tested.


UC COTTelation among there 3 variables was obtained and recorded
as follows.

0.18, r =0.6, r=0.7


Find out the correlation between the first two variables in co-
curricular activities. We need,

Vl-y1-
0.18-0.6x0.7
VI-(0.6 1-(0.7)
04201
76

Question
r0.75, r,=0.63, find y23
Given ,0.67,
Correlation
Partial
3.11 Advantages of
correlation analysis assumes great significance
The partial have multil
under consideration
cases where the phenomena
in physical and experimen
factors influencing them, especially
sciences, where it is possible to control the variables and

variable can be studied separately. This techniae


the effect ofcach
various experimental designs where variou
is of great use in
interrelated phenomena are to
be studied.
Ifmultiple and partial correlation are studied together, a ven
by
useful analysis of the relationship between the different variabl
is possible.
g
3.12 Limitations of Partial Correlation

However, this technique suffers from some limitations som


of which are stated below.
T h e calculation ofthe partial correlation coefficient is bas
on the simple correlation coefficient. However, SImp
correlation coefficient assumes linear relationship. Genen
ces, a
this assumption is not valid especially in social scienc
linear relationship rarely exists in such phenomena
As the order of the partial correlation coefficient goes
een
1S
reliability goes down. Simple correlation bet
variables is called the zero order coefficient since in s
correlat
correlation, no factor is held constant. The partial t h e t h i r d varia
studied between two variables
by keeping n ev a r i a b l e

constant is called a first order coefficient, as one varn


o
kept constant. Similarly, we can define a
second

coefficient and so on. The partial correlatio cient


on coefticient V

between-1 and +1
11

3.13 Standard Error of Estimate


With the help of regression cquation perfect prediction is
practically impossible. What is needed then is a measure which
would indicate how precIse the prediction of Y is based on X, or a
set of independent variables. A measure that is used for this purpose
is called the standard error of the estimate.
In the case of 3 variables, we try to predict the value of
X
(dependent) when the values of the independent variables X, and
X. are given. The standard error of estimate in this case is given
2
by OaO V1-R,2
The standard error of the regression
R and are two key
goodness-of-fit measures for regression analysis.
The standard error of the regression provides the absolute
measure of the typical distance that the data points fall from
the regression line.

R provides the relative measure of the percentage of the


ependent variable variance that the model explains. R can
range from 0 to 100 if expressed as a percentage.
'Ihe standard error of the
regression has several advantages.
Lells you straight up how precise the model's predictions
are using the units of the dependent variable. This statistic
indicates how far the data points are from the regression line
On averag You want lower values of standard error, because
it signifies that the distances between the data points and
the dfitted values
value are smaller. It is also valid for both linear
and convenienl if
nonlinear
Voyou need to compare
regression models. This fact is
the fit between both types of models
For R, you want the regression model to explain higher
indicate that
es of the variance. Higher R values
the data points are closer to the fitted values. While higherR
78

values are good, they don't tell you how far the datapoints
R? is valid for
are from the regression line. Additionally, ronly
use R to compare a linear model
linear models. You can't
a nonlinear model.

Additional questions
write down the multiple regrein
1. For the data given below, value
and 2 on yield. Find the of R2.Whati
equation of factors 1
your conclusion?

wwwwem Factor 2 Yield


Observation Factor
() (y,)
Number 251.3
41.9 29.1
43.4 29.3 251.3
29.5 248.3
43.9
44.5 29.7 267.5
29.9 273.0
47.3
47.5 30.3 276.
30.5 270.3
47.9
274.9
8 50.2 30.7
30.8 285.0
9 52.8
10 53.2 30.9 290.0
297.0
11 56.7 31.5
57.0 31.7 302.5
12 304.5
13 63.5 31.9
65.3 32.0 309.3
14 321.7
15 71.1 32.1
330.7
16 77.0 32.5
349,0

17 77.8 32.9 wwwww

d e n a

systolic
X,
B.P, X
2. For the following data, X, denotessys
Pound Forma pounds.
age in years and X, denotes weight in
Variables

regression equationden5Pa
taking B.P as the dependehe m t h em u l

Fin
age and weight as independent variables.
correlation coefficient R,
Coc
and interpret i1s
79

X
132 52 173
143 59 184
153 67 194
162 73 211
154 64 196
168 74 220
137 54 188
149 61 188
159 65 207
128 46 167
166 72 217
3. For the data given below, taking the final marks as the
dependent variable, find a multiple regression equation. Find
the partial correlation coefficient by removing the effect of
the marks in exam 1 from the other two variables.
Test Scores for General Psychology. The data (X, X,, X,)
are for each student. X, = score on exam 1, X, = Score on

exam 2 and X, = score on final exam.

X X2 X
73 80 152
93 88 185
89 91 180
96 98 196
73 66 142
53 46 101
69 74 149
47 56 115
87 79 175
79 70 164
69 70 141
70 141
65
93 95 184
80

regression
model for the data on
das.
suitable multiple
4
4. Find a
movies given
below by choosing your ependen
Hollywood
and independent
variable. Find the
three partial correlar
coefficients and multiple
correlation coefficient.
Substantiate
y o u r answer.

movie
The data (X, X,, X,) are for each
receipts/millions
X, =
first year box office
costs/millions
X, =
total production
costs/millions 4.1
total promotional
X,
All
X X X5.1 of
85.1 8.5 cha
106.3 | 12.9 5.8
2.1 are
50.2 5.2 veb
130.6 10.7 8.4
tha
54.8 3.1 2.9 su
30.3 3.5 1.2 Ou
79.4 9.2 3.7 Ch-
91 9 7.6 a
135.4 15.1 7.7
Qt
89.3 10.2 4.5
ex
Ca
a

You might also like