Professional Documents
Culture Documents
2 - Stat-701 Correlation
2 - Stat-701 Correlation
1 of 16
CORRELATION ANALYSIS
In regression the dependent variable Y is the variable in which we are principally interested. The
independent variable X is discussed only because only of its possible effect on Y. In some
situations, however, we are interested not in the dependence of Y on X but in the general problem
of measuring the degree of association between the two variables X and Y.
The term correlation is used to describe the relationship/association between two or more
variables. If there are only two variables then the correlation between them is called simple
correlation. For example
(i) Marks of students in physics are associated with the marks in mathematics.
(ii) The wing length of birds is related with its tail length.
(iii) The cost of a commodity in the market is related to the quantity of the commodity
available for sale in the market.
• We can determine the kind of correlation between two variables by direct observation of the
scatter plot. Correlation may be linear? When all (X,Y) points on a scatter diagram seem to
cluster near a straight line or NONLINEAR, when all points seem to lie near a curve.
• Two variables may have a positive correlation, a negative correlation or they may be
uncorrelated. This holds both for linear and nonlinear correlation.
Correlation
2 of 16
Positive correlation:-
Two variables are said to be positively Positive Correlation
correlated if they tend to change together in
30
the same direction e.g. in economic theory it
20
is postulated that quantity of commodity
Y
10
supplied is positively correlated with its
0
price. Similarly, for a given pressure, 0 5 10 15
temperature of gas is positively correlated X
with its volume.
Negative Correlation:-
Two variables are said to be negatively
Nagative Correlation
correlated if they tend to change in opposite 30
directions e.g. the quantity demanded and the
20
Y
15
10
5
0
0 5 10 15
X
Correlation
3 of 16
• The scatter diagram indicates the nature and approximate strength of the relationship
between the two variables. If the points lie close to the line, the correlation is linear and
strong. On the other hand a great dispersion of points about the line/curve implies weak
correlation. The inspection of scatter diagram gives only a rough idea about the relationship
between the two variables. For a precise quantitative measurement of the degree of
correlation between two variables we use a quantity which is called correlation coefficient.
Eta()Coefficient:- The eta coefficient measures the strength of nonlinear correlation between
two variables.
Simple linear correlation coefficient:-It is used to measure the strength of linear
relationship between two variables. The population correlation coefficient is denoted by
where its point estimate estimated from sample is denoted by ‘r’ and defined as
S ( X ,Y )
r
S ( X , X ) S (Y , Y )
Positive value of r indicate positive correlation, negative value indicate negative correlation and
zero value indicate no correlation.
• If ‘r’is used to measure the relationship between two variables when this relation is
curvilinear then computed ‘r ‘ is always an underestimate of the real relationship between
the variables. Thus before using r to measure the strength of relationship, it is advisable to
plot scatter plot to see how the points are arranged and use ‘r’ only when points lie around a
straight line
• Although correlation measures co-variability of variables it does not imply any functional
relationship between the variables. It discovers existing co-variation, but does not establish
or prove any causal relationship between variables.
A high correlation between two variables may describe any one of the following situations
▪ Variation in X is the cause of variation in Y
▪ Variation in Y is the cause of variation in X
Correlation
4 of 16
▪ There is another common factor (Z) that affects X and Y in such a way that as
to show a close relation between them
▪ The correlation between two variables may be due to chance.
• The use of the correlation coefficient is rather limited since the number of situations where
we want to know whether two variables are associated, but are not interested in the equation
of the relationship, is very small. Correlation techniques are useful in the preliminary
examination of a large number of variables to see which variables are associated. However
even there are regression techniques that are rather more effective than correlation methods
Properties:-
1. The range of the correlation coefficient is -1 ≤ r ≤ +1
2. Correlation coefficient is symmetrical with respect to variables i.e rxy = ryx
3. Correlation coefficient is independent of units of measurements
4. Correlation coefficient is zero when one of the variables is constant
5. Correlation coefficient is independent of change of origin and scale i.e Correlation
coefficient remains unchanged by adding, subtracting, multiplying or dividing the value of
one or both of the variables by some constant
Correlation
5 of 16
Example:- The following data represent the wing length and tail length of sparrows
Wing length Tail length
(X) (Y) XY X2 Y2
10.4 7.4 76.96 108.16 54.76
10.8 7.6 82.08 116.64 57.76
11.1 7.9 87.69 123.21 62.41
10.2 7.2 73.44 104.04 51.84
10.3 7.4 76.22 106.09 54.76
10.2 7.1 72.42 104.04 50.41
10.7 7.4 79.18 114.49 54.76
10.5 7.2 75.6 110.25 51.84
10.8 7.8 84.24 116.64 60.84
11.2 7.7 86.24 125.44 59.29
10.6 7.8 82.68 112.36 60.84
11.4 8.3 94.62 129.96 68.89
128.2 90.8 971.37 1371.31 688.40
S ( X ,Y )
r 0.866
S ( X , X ) S (Y , Y )
Correlation
6 of 16
r 0.866 0 1 r2 1 0.8662
t 5.47 where SE (r ) 0.158
SE (r ) 0.158 n2 12 2
1 1 r 1 1 0.45
Z ln ln 0.48
2 1 r 2 1 0.45
1 1 1 1 0.9
Z ln ln 1.47
2 1 2 1 0.9
1 1
SE( Z ) 0.2425
n3 20 3
1 1
2a 2b
e e
1 1
2a 2b
e e
where a z Z / 2 SE ( z ) 1.2562 (1.96)(0.2773) 0.7127
b z Z / 2 SE ( z ) 1.2562 (1.96)(0.2773) 1.7997
1 1 r 1 1 0.85
z ln ln 1.2562
2 1 r 2 1 0.85
1 1
SE ( z ) 0.2773
n3 16 3
1 1
1.4254 3.5994
e e
3.16
35.58
1 1
1.4254 3.5994
e e 5.16 37.58
0.61 0.95
Correlation
9 of 16
1 1
z1 z 2 0.0526 0.0556 0.329
n1 3 n 2 3
( z 1 z 2 ) ( z1 z 2 ) 0.60699 0.32055
Z 0.87
1 1 0.329
n1 3 n 2 3
Conclusion: As Z cal 0.87 Z 0.025 1.96 , so don’t reject H0 and conclude that there is non-
z z 0.5627 0.6931
Z 0.32
SE ( z ) 0.4082
1 1 r12.3 1 1 0.51
z ln ln 0.5627
2 1 r12.3 2 1 0.51
1 1 12.3 1 1 0.6
Z ln ln 0.6931
2 1 12.3 2 1 0.6
1 1
SE ( z ) 0.4082
nk 3 10 1 3
4) Decision Rule:- Reject Ho if Zcal Z/2=1.96 or Zcal - Z/2= -1.96
5) Result:- So don,t reject Ho
Correlation
13 of 16
1 1
2a 2b
e 12.3 e2b where a z Z / 2 SE( z ) b z Z / 2 SE( z )
1 1
2a
e e
a=0.5627 –(1.96)(0.41)= -0.2409
b= 0.5627 +(1.96)(0.41)= 1.3663
0.4818
1 1
2.7326
e 0.3823
e
14.3728
0.4818
12.3 , 0.2336 , 0.8778
1 1
2.7326
e e 1.6176 16.3728
Semi-Partial correlation
Measures the correlation between two variables with the effect of other variables removed form
ONE variable only
r12 r13r23
r1(2.3) correlation between X1 & X2 with the effect X3 removed from X2 only
1 r232
Correlation
14 of 16
(n 1 k ) (0.99) 12 1 2
2 2
R1.23
F 226.176
k (1 R1.23 )
2
2 1 0.99
2