Professional Documents
Culture Documents
IV. (A) Correlation
IV. (A) Correlation
Correlation
Correlation analysis
Correlation analysis is the statistical tool that we can use to describe the degree to which
onevariable is linearly related to another. Frequently, correlation analysis is used in conjunction
withregression analysis to measure how well the least squares line fits the data. Correlation
analysis can also be used by itself, however, to measure the degree of association between two
variables.
In this section we present two measures for describing the correlation between two variables: the
coefficient of determination and the coefficient of correlation.
According to A. M.Tuttle, “an analysis of covariation of two or more variables is usually called
correlation”.
Coefficient of correlation is the measure of the strength of the linear relationship between two
variables. It is generally denoted by r.
Types of Correlation
Correlation is described or classified in several different ways. Three of the most important are:
a. Positive and negative;
b. Simple, partial and multiple; and
c. Linear and non-linear.
Generally, there are three types of correleation. They are:
i. Simple correlation
ii. Partial correlation and
iii. Multiple correlation.
The distinct between simple, partial and multiple correlation is based upon the number of variables
studied. When only two variables are studied it is a problem of simple correlation. When three or
more variables are studied it is a problem of either multiple or partial correlation. In multiple
correlation three or more variables are studied simultaneously. For example, when we study the
relationship between the yield of rice per acre and both the amount of rainfall and the amount of
fertilizers used, it is a problem of multiple correlation. Similarly, the relationship of plastic
hardness, temperature and pressure is multivariate. In partial correlation we recognize more than
two variables. But consider only two variables to be influencing each other, the effect of other
influencing variable being kept constant. For example, in the rice problem taken above if we limit
1
our correlation analysis of yield and rainfall to periods when a certain average daily temperature
existed, it becomes a problem of partial correlation.
N.B.: Here we shall discuss only simple correlation.
Simple Correlation
The coefficient of correlation
Definition: Karl Pearson product moment coefficient of correlation (or simply, the coefficient
ofcorrelation) r is a measure of the strength of the linear relationship between twovariables x and
y. It is computed (for a sample of n measurements on x and y) asfollows:
r
( x x)( y y)
( x x ) 2 ( y y ) 2
Where x = The values of the first variable
y = The values of the second variable
x = The mean of x variable
y = The mean of y variable
N = The number of observations
The numerator ( x x)( y y) determines the direction of the movement i.e., the nature of
correlation (positive or negative) and the magnitude of the co-efficient. The value of
( x x)( y y) will be positive if large values of one series occur with the large values of the other
and small values go with small values and in such a case r will be positive. Similarly if the value
of ( x x)( y y) is negative the value of r will be negative indicating negative correlation.
Formula:
Computation of simple co-efficient of Correlation
(b) Short-cut Method: Calculation of co-efficient of correlation can also be done by short-cut
method. This method has got the advantage of ease in calculation. The formulae for
correlation co-efficient according to the short-cut method are:
d d
d x d y xn y
i. r
( d x ) 2
( d y ) 2
d x
2
n
d y
2
n
n dxd y dx d y
ii. r
n d x
2
( d x ) 2 n d y ( d y ) 2
2
x y
xy
iii . r n
( x ) 2
( y ) 2
x y
2 2
n n
2
dy= y-Ay = The deviation of Y from Ay
Ax = The assumed mean (arbitrary values) in the x series.
Ay = The assumed mean in the y series.
2. Computation of r from Grouped Data
xy
which may also be written as r =
n x y
where x = X - x and x = X
n
y = Y - y and y = Y
n
X = The mid-value of the class interval of x variable
Y = The mid-value of the class interval of y variable
x2 , y2
x y
n n
(b) Short-cut Method
The formula for computation of r by short-cut method from grouped data is given by-
dxd y dx d y
r
( d x ) 2 ( d y ) 2
d x d y
2 2
n n
Other Formulae:
1. The value of correlation co-efficient lies between -1 to +1.
r
( x x)( y y)
XY Answer: r 1
( x x) ( y y )
2 2
X Y 2 2
Here, X x x and Y y y and x and y are means x and y.