2 - Stat-701 Correlation

Correlation
1 of 16
CORRELATION ANALYSIS
In regression the dependent variable Y is the variable in which we are principally interested. The
independent variable X is discussed only because only of its possible effect on Y. In some
situations, however, we are interested not in the dependence of Y on X but in the general problem
of measuring the degree of association between the two variables X and Y.
The term correlation is used to describe the relationship/association between two or more
variables. If there are only two variables then the correlation between them is called simple
correlation. For example
(i) Marks of students in physics are associated with the marks in mathematics.
(ii) The wing length of birds is related with its tail length.
(iii) The cost of a commodity in the market is related to the quantity of the commodity
available for sale in the market.
• We can determine the kind of correlation between two variables by direct observation of the
scatter plot. Correlation may be linear? When all (X,Y) points on a scatter diagram seem to
cluster near a straight line or NONLINEAR, when all points seem to lie near a curve.
• Two variables may have a positive correlation, a negative correlation or they may be
uncorrelated. This holds both for linear and nonlinear correlation.
Correlation
2 of 16
Positive correlation:-
Two variables are said to be positively Positive Correlation
correlated if they tend to change together in
30
the same direction e.g. in economic theory it
20
is postulated that quantity of commodity
Y
10
supplied is positively correlated with its
0
price. Similarly, for a given pressure, 0 5 10 15
temperature of gas is positively correlated X
with its volume.
Negative Correlation:-
Two variables are said to be negatively
Nagative Correlation
correlated if they tend to change in opposite 30
directions e.g. the quantity demanded and the
20
Y
price of a normal good are negatively

10
correlated. When price of a commodity
0
increases, its demand is reduced. Similarly,
0 5 10 15
for a given temperature pressure of a gas and
X
its volume are negatively correlated.
Uncorrelated:-Two variables are No Correlation

uncorrelated when they tend to change with 30
25
no connection to each other 20
Y
15
10
5
0
0 5 10 15
X
Correlation
3 of 16
• The scatter diagram indicates the nature and approximate strength of the relationship
between the two variables. If the points lie close to the line, the correlation is linear and
strong. On the other hand a great dispersion of points about the line/curve implies weak
correlation. The inspection of scatter diagram gives only a rough idea about the relationship
between the two variables. For a precise quantitative measurement of the degree of
correlation between two variables we use a quantity which is called correlation coefficient.
Eta()Coefficient:- The eta coefficient measures the strength of nonlinear correlation between
two variables.
Simple linear correlation coefficient:-It is used to measure the strength of linear
relationship between two variables. The population correlation coefficient is denoted by
where its point estimate estimated from sample is denoted by ‘r’ and defined as
S ( X ,Y )
r
S ( X , X ) S (Y , Y )
Positive value of r indicate positive correlation, negative value indicate negative correlation and
zero value indicate no correlation.
• If ‘r’is used to measure the relationship between two variables when this relation is
curvilinear then computed ‘r ‘ is always an underestimate of the real relationship between
the variables. Thus before using r to measure the strength of relationship, it is advisable to
plot scatter plot to see how the points are arranged and use ‘r’ only when points lie around a
straight line
• Although correlation measures co-variability of variables it does not imply any functional
relationship between the variables. It discovers existing co-variation, but does not establish
or prove any causal relationship between variables.
A high correlation between two variables may describe any one of the following situations
▪ Variation in X is the cause of variation in Y
▪ Variation in Y is the cause of variation in X
Correlation
4 of 16
▪ There is another common factor (Z) that affects X and Y in such a way that as
to show a close relation between them
▪ The correlation between two variables may be due to chance.
• The use of the correlation coefficient is rather limited since the number of situations where
we want to know whether two variables are associated, but are not interested in the equation
of the relationship, is very small. Correlation techniques are useful in the preliminary
examination of a large number of variables to see which variables are associated. However
even there are regression techniques that are rather more effective than correlation methods
Properties:-
1. The range of the correlation coefficient is -1 ≤ r ≤ +1
2. Correlation coefficient is symmetrical with respect to variables i.e rxy = ryx
3. Correlation coefficient is independent of units of measurements
4. Correlation coefficient is zero when one of the variables is constant
5. Correlation coefficient is independent of change of origin and scale i.e Correlation
coefficient remains unchanged by adding, subtracting, multiplying or dividing the value of
one or both of the variables by some constant
Correlation
5 of 16
Example:- The following data represent the wing length and tail length of sparrows
Wing length Tail length
(X) (Y) XY X2 Y2
10.4 7.4 76.96 108.16 54.76
10.8 7.6 82.08 116.64 57.76
11.1 7.9 87.69 123.21 62.41
10.2 7.2 73.44 104.04 51.84
10.3 7.4 76.22 106.09 54.76
10.2 7.1 72.42 104.04 50.41
10.7 7.4 79.18 114.49 54.76
10.5 7.2 75.6 110.25 51.84
10.8 7.8 84.24 116.64 60.84
11.2 7.7 86.24 125.44 59.29
10.6 7.8 82.68 112.36 60.84
11.4 8.3 94.62 129.96 68.89
128.2 90.8 971.37 1371.31 688.40
S(X,X)=1.72 S(X,Y)=1.32 S(Y,Y)=1.35
S ( X ,Y )
r  0.866
S ( X , X ) S (Y , Y )
Correlation
6 of 16
INFERENCE IN SIMPLE LINEAR CORRELATION

(FROM SAMPLES TO POPULATION)
Generally, more is sought in correlation analysis than a description of observed data. One usually
wishes to draw inferences about the relationship of the variables in the population from which the
sample was taken. To draw inferences about population values based on sample results, the
following assumptions are needed.
• The population from which sample is selected should be normal
Test of hypothesis for =0

1) Construction of hypotheses
Ho :  = 0
H1:   0
2) Level of significance
 = 5%
3) TEST STATISTIC
r   0.866  0 1 r2 1  0.8662
t   5.47 where SE (r )    0.158
SE (r ) 0.158 n2 12  2
4) Decision Rule:- Reject Ho if tcal  t/2(n-2)=2.228 or tcal  - t/2(n-2)= -2.228

5) Result:- So reject Ho and conclude that there is significant linear relationship between
wing and tail length.
Correlation
7 of 16
TEST OF HYPOTHESIS FOR =0 (WHERE 0≠0)

Example:- If sample correlation coefficient from 20 observations is equal to 0.45 can we
conclude that population correlation coefficient of a bivariate normal distribution is equal to
0.9
Ho :  = 0.9
H1:   0.9
 = 5%
3) TEST STATISTIC
z   z 0.48  1.47
Z   4.08
SE ( z ) 0.2425
1 1  r  1 1  0.45 
Z ln    ln  0.48
2 1  r  2 1  0.45 
1 1    1 1  0.9 
 Z  ln    ln    1.47
2 1    2 1  0.9 
1 1
SE( Z )    0.2425
n3 20  3
4) Decision Rule:- Reject Ho if Zcal  Z/2=1.96 or Zcal  - Z/2=-1.96

5) Result:- So reject Ho and conclude :   0.9.
Correlation
8 of 16
CONFIDENCE INTERVALS FOR CORRELATION COEFFICIENT

Example:- A sample of 16 pairs of observations give correlation coefficient of 0.85.Construct 95%
C.I for 
95 % C.I is given by
1 1
2a 2b
e  e
1 1
2a 2b
e e
where a  z  Z / 2 SE ( z )  1.2562  (1.96)(0.2773)  0.7127
b  z  Z / 2 SE ( z )  1.2562  (1.96)(0.2773)  1.7997
1 1  r  1 1  0.85 
z ln    ln    1.2562
2 1  r  2 1  0.85 
1 1
SE ( z )    0.2773
n3 16  3
1 1
1.4254 3.5994
e e 
3.16

35.58
1 1
1.4254 3.5994
e e 5.16 37.58
0.61    0.95
Correlation
9 of 16
COMPARING TWO CORRELATION COEFFICIENT
Example : From following information, test the hypothesis that 1   2

n1=22, n2=21, r1=0.31, r2=0.542,
Solution:
H 0:  1   2
H1: 1   2
  5%
( z 1  z 2 )  (  z1   z 2 )
Test Statistic: Z 
1 1

n1  3 n 2  3
1 1  r1  1 1  0.31
Z1  ln    ln  0.32055
2 1  r1  2 1  0.31
1 1  r2  1 1  0.542 
Z2  ln    ln  0.60699
2 1  r2  2 1  0.542 
 z1   z 2  1   2  0
1 1
 z1 z 2    0.0526  0.0556  0.329
n1  3 n 2  3
( z 1  z 2 )  (  z1   z 2 ) 0.60699 0.32055
Z   0.87
1 1 0.329

n1  3 n 2  3
Decision Rule: Reject H0 if Z cal  Z   Z 0.025  1.96

2
Conclusion: As Z cal  0.87  Z 0.025  1.96 , so don’t reject H0 and conclude that there is non-
significant difference b/w two population correlation coefficients

Correlation
10 of 16
PARTIAL CORRELATION COEFFICIENT

The relationship between two variables may be affected by other variables which either strength or
decrease the relationship. For example; The relationship between monthly income and education
level of an individual is affected by the type of work, and the experience of the individual. To get
the REAL relationship between two variables other extraneous factors which are suspected to
affect the relationship are controlled or partial led out by the use of partial correlation coefficient.
The partial correlation coefficient measures the strength of linear relationship between two
variables when the effect of other variables is controlled. By holding other variables constant we
mean that we are estimating what the correlation between two variables would be if other
variables had the same values. Partial correlation is sometimes referred to as the correlation
between two variables adjusted for additional variables. The purpose of partial correlation is to
measure that part of the correlation between two variables that is free of their relationship with the
remaining variables.
e.g Suppose there are three variables X1, X2, X3
• The partial correlation coefficient between X1&X2 keeping X3 constant
r12  r13 r23
For population 12.3 its point estimate from sample r12.3 
(1  r132 )(1  r232 )

r13  r12 r23
(1  r12 2 )(1  r232 )

r23  r12 r13
(1  r12 2 )(1  r132 )
• The partial correlation coefficient between X1&X2 keeping X3 & X4 constant

Correlation
11 of 16
For population 12.34 its point estimate from sample

r12.3  r14.3 r24.3 r12.4  r13.4 r23.4
r12.34  
(1  r14.3 )(1  r24.3 )
2 2
(1  r13.4 2 )(1  r23.4 2 )
Exampel:- Suppose that

X1=Fish Length X2=Fish weight X3=Fish age
And r12=0.62 r13 =0.79 r23=0.41 n=6
r12  r13r23 (0.62)  (0.79)(0.41)

r12.3    0.53
(1  r13 )(1  r23 )
2 2
(1  0.79 )(1  0.41 )
2 2
Test of hypothesis for partial correlation coefficient

Ho : 12.3 = 0
H1: 12.3  0
 = 5%
3) TEST STATISTIC
r12.3  12.3 0.53  0 1  r12.3

2
1  0.532
t   1.08 where SE (r12.3 )    0.4895
SE (r12.3 ) 0.4895 n2k n  2 1
4) Decision Rule:- Reject Ho if tcal  t/2(n-q-2)=3.182 or tcal  - t/2(n-q-2)=-3.182
5) Result:- So don’t reject Ho.
Correlation
12 of 16
TEST OF HYPOTHESIS FOR =0 (WHERE 0≠0)

Example:- If sample partial correlation coefficient between x1 and x2 keeping X3
constant from 10 triplets is equal to 0.51.can we conclude that population partial correlation
coefficient between x1 and x2 keeping X3 constant is equal to 0.6
Ho : 12.3 = 0.6
H1: 12.3  0.6
 = 5%
3) TEST STATISTIC
z  z 0.5627  0.6931
Z    0.32
SE ( z ) 0.4082
1 1  r12.3  1 1  0.51
z ln    ln  0.5627
2 1  r12.3  2 1  0.51
1 1  12.3  1 1  0.6 
Z  ln    ln    0.6931
2 1  12.3  2 1  0.6 
1 1
SE ( z )    0.4082
nk 3 10  1  3
4) Decision Rule:- Reject Ho if Zcal  Z/2=1.96 or Zcal  - Z/2= -1.96
5) Result:- So don,t reject Ho
Correlation
13 of 16
CONFIDENCE INTERVALS FOR

PARTIAL CORRELATION COEFFICIENT
Example:- If sample partial correlation coefficient between X1 and X2 keeping X3 constant
from 10 triplets is equal to 0.51 construct 95 % C.I for population partial correlation coefficient
1 1
2a 2b
e  12.3  e2b where a  z  Z  / 2 SE( z ) b  z  Z  / 2 SE( z )
1 1
2a
e e
a=0.5627 –(1.96)(0.41)= -0.2409
b= 0.5627 +(1.96)(0.41)= 1.3663
0.4818
1 1
2.7326
e  0.3823
e
14.3728
0.4818
 12.3  ,  0.2336 , 0.8778
1 1
2.7326
e e 1.6176 16.3728
Semi-Partial correlation
Measures the correlation between two variables with the effect of other variables removed form
ONE variable only
r12  r13r23
r1(2.3)  correlation between X1 & X2 with the effect X3 removed from X2 only
1  r232
Correlation
14 of 16
MULTIPLE CORRELATION COEFFICIENTS

The multiple correlation coefficient measures the strength of linear relationship between one
dependent variable and joint effect of all the independent variables
For example;
1. The yield of a crop and the joint effect of soil fertility and quantity fertilizer used
2. Weight of a person and joint effect of age and height
g Suppose there are three variables X1, X2, X3

• The multiple correlation coefficient between X1&X2,X3
r 212  r 213  2r12 r13r23
For population 1.2.3 its point estimate from sample R1.23 
(1  r232 )

r 212  r 2 23  2r12 r13r23
For population 2.13 its point estimate from sample R 2.13 
(1  r132 )

r 213  r 2 23  2r12 r13r23
For population 3.12 its point estimate from sample R3.12 
(1  r12 2 )
Example:- From the following information

r12 =0.99945 r13=0.97606 r23=0.97391 n=12
Find R1.23
Solution:-
r 212  r 213  2r12 r13r23 0.0505

R1.23    0.99
(1  r23 )
2
0.051499
Correlation
15 of 16
TEST OF HYPOTHESIS OF MULTIPLE LINEAR CORRELATION

Ho : 1.2.3 = 0
H1: 1.23  0
 = 5%
3) TEST STATISTIC
(n  1  k ) (0.99) 12  1  2
2 2
R1.23
F   226.176
k (1  R1.23 )
2
2 1  0.99 
2
4) Decision Rule:- Reject Ho if Fcal  F(k,n-1-k)=4.256

5) Result:- So reject Ho.
Correlation
16 of 16

2 - Stat-701 Correlation

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

2 - Stat-701 Correlation

Uploaded by

Copyright:

Available Formats

Correlation

price of a normal good are negatively

Uncorrelated:-Two variables are No Correlation

S(X,X)=1.72 S(X,Y)=1.32 S(Y,Y)=1.35

INFERENCE IN SIMPLE LINEAR CORRELATION

Test of hypothesis for =0

4) Decision Rule:- Reject Ho if tcal  t/2(n-2)=2.228 or tcal  - t/2(n-2)= -2.228

TEST OF HYPOTHESIS FOR =0 (WHERE 0≠0)

4) Decision Rule:- Reject Ho if Zcal  Z/2=1.96 or Zcal  - Z/2=-1.96

CONFIDENCE INTERVALS FOR CORRELATION COEFFICIENT

COMPARING TWO CORRELATION COEFFICIENT

Example : From following information, test the hypothesis that 1   2

Decision Rule: Reject H0 if Z cal  Z   Z 0.025  1.96

significant difference b/w two population correlation coefficients

PARTIAL CORRELATION COEFFICIENT

• The partial correlation coefficient between X1&X3 keeping X2 constant

• The partial correlation coefficient between X2&X3 keeping X1 constant

• The partial correlation coefficient between X1&X2 keeping X3 & X4 constant

For population 12.34 its point estimate from sample

Exampel:- Suppose that

r12  r13r23 (0.62)  (0.79)(0.41)

Test of hypothesis for partial correlation coefficient

r12.3  12.3 0.53  0 1  r12.3

TEST OF HYPOTHESIS FOR =0 (WHERE 0≠0)

CONFIDENCE INTERVALS FOR

MULTIPLE CORRELATION COEFFICIENTS

g Suppose there are three variables X1, X2, X3

• The multiple correlation coefficient between X2&X1,X3

• The multiple correlation coefficient between X3&X1,X2

Example:- From the following information

r 212  r 213  2r12 r13r23 0.0505

TEST OF HYPOTHESIS OF MULTIPLE LINEAR CORRELATION

4) Decision Rule:- Reject Ho if Fcal  F(k,n-1-k)=4.256

You might also like