Download as pdf or txt
Download as pdf or txt
You are on page 1of 2

Correlation Analysis

Correlation means a statistical relationship between two random variables, irrespective of


whether these variables are related by a casual (i.e., cause and effect) relation. A familiar
example is the relation between the weights and the heights of adult men – in this case, if the
height is more, the weight also tends to be more. This is an example of a positive correlation,
where one variable tends to increase with the increase of the other. However, there are examples
of negative correlation as well, such as the one between supply of a commodity and its selling
price. An important characteristic of this type of relationships is the absence of a definite
inherent functional relation between them, unlike what is assumed to be present in case of least
square curve fitting. Thus, here it would be futile to look for a functional dependence of weight
of a man on his height. Let’s now come to the other question of a casual relation: The above two
examples have some understandable cause and effect relations within them; but correlations may
arise also without such casual relations – as in the case of annual alcohol production of a state
and the number of graduations awarded there per year – the positive correlation generally found
here may be simply the result of independent increase of both parameters with progress of time!

A correlation is said to be positive if with increase of one variable, the other variable also tend to
increase. Similarly, a correlation is said to be negative if with increase of one variable, the other
variable tend to decrease. Examples of them are as given above. A relationship is said to have nil
correlation, if with increase of one variable, the other variable tend to neither increase nor
decrease definitively. The degree or extent of correlation between two variables is quantitatively
measured using some parameters called correlation coefficients. The most common correlation
coefficient is the Pearson correlation coefficient r (also called the Pearson product-moment
correlation coefficient), as defined below. The figure beside (Courtesy: Wikipedia) shows
various cases of positive, nil and
negative correlations, with their values
of the coefficient r mentioned at the top row. Note that r can vary from 1 to –1, indicating
positive (1 ≥ r > 0), practically nil (r ≈ 0) or negative (–1 ≤ r < 0) correlation.

_ _
Here x and y are the mean values of the variables x (i.e., of xi) and y (i.e., of yi) respectively, n
is the number of observations, while sx and sy are the standard deviations of x and y respectively.

In practice, however, the following equivalent relation is used to easily calculate the value of r:

1
As an illustrative example, let us consider the following data of heights and weights ffor 8 men:

Height (cm) 165 167 170 172 173 176 180 183
Weight (kg) 62 60 64 65 65 68 73 72

here, let us calculate the sums of {xi}, {yi}, {xi2}, {xi yi} and {yi2}
To calculate the coefficient r here
using the following table (we note that here the number of points n = 8):

Point No. Height in cm (xi) Weight in kg (yi) xi2 xi yi yi2


1 165 62 27225 10230 3844
2 167 60 27889 10020 3600
3 170 64 28900 10880 4096
4 172 65 29584 11180 4225
5 173 65 29929 11245 4225
6 176 68 30976 11968 4624
7 180 73 32400 13140 5329
8 183 72 33489 13176 5184
TOTAL 1386 529 240392 91839 35127

Thus, we get: n = 8, Σ xi = 1386, Σ yi = 529, Σ xi2 = 240392, Σ xi yi = 91839,


91839 Σ yi2 = 35127

So, r = {8 x 91839 – 1386 x 529529}/√{(8 x 240392 – 1386 x 1386) x (8 x 35127 – 529 x 529)}
= 1518/√{2140 x 1175}
= 0.957
Thus it is a case of strong positive correlation, with the correlation coefficient r ≈ 1.

[To have an idea of how the two variables here are varying with each other
other, corresponding to this
strong positive correlation,, we may view the X-Y scatter plot for this data as found below:]

You might also like