ined our discussion to ‘univariat
Iso saw how the various meast
te distributions only ie, the distributions involving
kurtosis can be used for the
paar S measures of central tendency, dispersion, skewness and
series where cach item of 2 Comparison and analysis. We may, however, come across certain
heights and weights of snc ism the values of vo or more variables. If we measure the
assumes twe Values —one relat en oat @ series in which each unit (individual) of the series
re ing to heights the other relating to weights. Such distribution, in which
= eres assumes two values 1s called a bivariate distribution, Further, if we measure miore
two variables on each unital
on which diff wution, it is cz ‘@ multivariate distribution. In series, the units
; ifferent measurements are taken may be of almost any nature such as different individuals,
times, places, etc. For example we may have :
(The series of marks of individuals in two subjects in an examination.
{Gi) The series of sales revenve and advertising expenditure of different companies in a particular year.
(iii) The series of expons of raw cotton in crores of rupees ana imports of manufactured goods during
umber of years from 1989 t0 1994, say.
{iv) The series of ages of husbands and wives in a sample of selected married couples and so on.
‘Thus‘in a bivariate distribution we are given a set of pairs of observations, one value of each pair
the values of each of the two variables.
In a bivariate distribution, we may be interested to find if there is any relationship between the two
variables under stedy. The correlation is a statistical tool which studies the relationship between two
variables and correlation analysis involves various methods and techniques used for studying and
measuring the extent of the relaticnship between the two variables.
WHAT THEY SAY ABOUT CORRELATION — SOME DEFINITIONS AND USES
“When the relationship {s of a quantitative nature, the appropriate statistical tool for
discovering and measuring the relationship and expressing it ina brief formula is known as
correlation.’-~Croxton and Cowdex :
“Correlation is an analysis of the covariation between two or more variables.” —AM. Tuttle
“Correlation analysis contributes to the understanding of economic behaviour, aids in
locating the critically Important variables on which others depend, may reveat to the economist
the connections by which disturbances spread and suggest to him the paths through which
stabilising forces may become effective."-W.A, Nelswanger
“The effect of cotéelation is to reduce the range of | uncertainty of our prediction.” —Tippett
‘ire said to be correlated if the change in one variable results in a corresponding
‘ther variable. .
(ryper of Correlation
G4) POSITIVE AND NEGATIVE CORRELATION
If the values of the two variables deviate inthe same direction ie. if the increase inthe values of one
‘variable results, on an average, in a corresponding increase in the values of the other variable or if aie FUNDAMENTALS OF STATISTICS
‘ecrease in the values of one variable results, on an average, in a corresponding decrease in the values of
‘he other variable, correlation is said to be positive or direct
Some examples of series of positive correlation are ;
(Weights and weights, ~
(Gi) The family income and expenditure on luxury items.
(iii) Amount of rainfall and yield of crop (up toa point).
. iv) Price and suppiy of a commodity and so on, v
= dices on {Me other hand, correlation is sid to be negative or inverse ifthe variables deviate inthe opposite
= Hrection i.e. if the increase (decrease) in the values of one variable results, on the average, in a
| corresponding decrease (increase) in the values.of the other variable.
Some examples of negative comelation are the series relating to :
() Price and demand of a commodity. a
(if) Yotume and pressure of a perfect gas. ~
GiiYSale of woollen garments and tne day temperature, and $0 on.
(6) LINEAR AND NON-LINEAR CORRELATION :
The coriefatici Between two variables is said to be finear if corresponding to a unit change in one
‘Variable Over tie emire range Of The Values. For example, let
variable, there is a constant change inthe oth
us consider the following data :
x 1 2 3 4 5
y 5 7 9 i 3
‘Thus for a unit change in the value of x, there is a constant change viz.. 2 in the corresponding values of
y. Mathematically, above data can be: expressed by the relation -
43
In general, two variables x and y'are said to bg linearly related, if there exists a relationship of the form
yaa bx oe)
between them. But we know that (*) is the epee “a straight line with slope ‘b’ ard which niakes an
intercept ‘a’ on the y-axis {cf y= mx + c form of equation of the line]. Hence, if the values of the two
variables are plotted as points in the xy-plane, we shall get a straight line. This can be easily checked for the
example given above. Such phenomena occur frequently in. physical sciences but in economics and sorial
sciences, we very rarely come across the data which give a straight line graph. The relationship between
two variables is said to be non-linear or curvilinear if corresponding to a unit change in one variable, the
other variable does not change al a constant rate but at fluctuating rate, In such cases iF Iie dala are plowed
on the xj-plane, we do not get a straight line curve. Mathematically speaking, the correlation is said to be
non-linear if the slope of the plotted curve is not constant. Such phenomena are common in the data relating
to economics and social sciences.
Since the techniques for the analysis and measurement of non-linear relatlon are quite complicated and
tedious as compared to the methods of studying and measuring linear relationship, we generally assume
that the relationship between the two variables under study is linear. In this chapter, we shall confine
ourselves to the measurement of linear relationship only. The measurement of non-linear relationship is,
however, beyond the scope of this book. bina wena piel
The study of comelation is easy in physical sciences since on the basis of experimental
rerrae ao eatash mathematical relationship between two or moe vriables under study. But in
itis very difficult to establish mathematical relationship between the
is ince i ff the variables under study are affected
der study since in such phenomena, the values o! n
Fadl by noltiplcity of factors and iti extremely dificult, sometimes impossible, to study the
sffeate ofadeh factor separately. Hence, inthe data relating to social and economic phenomena, the study
of corre}ftion cannot be as accurate and pregfse. Spe ear
i ion, Correlation analysis enables us to have an idea about the deg
pfi2. Correlation and Cats 7p variables unde sy. However, ails tefl poy the
resulis, itis easy
social and economic sciences,gr
CORRELATION ANALYSIS 83
cause and effect relationship between the variables. In a bivariate distribution, if the variables have the
caus at effect relationship, Tey are bound to vary in sympathy with each other and, therefore, there is
bound to be a high degree of correintion between them. In other words, causation always implies
correlation, However, the converse isnot true i. even a fairly high degree of correlation between the two
nchip between: thew. The high degiee of eurrefarion
between the variables may be due to the following reasons ;
|. Mitual dependence, The phenomena under study may iter-influence each other. Such situations are
usually observed in data relating to economic and business situations. For instance, it is well-known
princi
principle in economics that prices of a commodity are influenced by the forces of supply and demand, For
instance, if the price of a commodity increases, its demand generally decreases (other factors remaining
constant), Here increased price is the cause and reduction in demand is the effect, However, a decrease in
the demand of a commodity due to emigration of the people or due to fashion or some other factors i
changes in the tastes and habits of people may result in decrease in its price. Here, the cause is the reduce
demand and the effect is the reduced price. Accordingly, the two variables may show @ good degree of
correlation due (6 interaction of each on the other, yet it becomes very difficult to isolate the exact cause
from the effect.
2. Both the variables being influenced. by the same external factors. A high degree of correlation
between the two variables may be due to the effect or interaction of a third variable or a number of
Yariables on each ofthese two variables. For example, a fairly high degree of corvelation may be observed
between the yield per hectare of two crops, say, rice and potalo, due to the effect of a number of factors Hike
favourable weather conditions, fertilizers used, irrigation facilities, etc., on each of them. But none of the
two is the cause of the other.
3. Pure chance. It may happen that a small randomly selected sample from a bivariate distribution may.
show a fairly high degree of correlation though, actually, the variables may not be correlated in the
Population. Such correlation may be attributed to chance fluctuations. Moreover, the conscious or
unconscious bias on the part of the investigator, in the selection of the sample may also result in high
‘degree of corrclation in the sample. ln this connection, it may be worthwhile to make a mention cf the wo
phenomena where a fairly high degree of correlation may be observed though itis not possible to conceive
them as being causally related. For example, we may observe a high degree of correlation belween the size
of shoe and the inteligeice of a group of persons. Such correlation is called spurious or non-sehse
correlgiiofi. [For details see § 84-2 (ii),)
METHODS OF STUDYING CORRELATION
We shall confine our discussion to the methods of ascertaining only linea relationship between two
variables (series). The commonly used mg#fods for studying the correlation between two variables are «
(@ Scatter diagram method.
(ii) Karl Pearson's coefficient of correlation (Covariance method),
(ii) Two-way frequency table (Bivariste correlation method),
(iv) Rank method. :
(v) Concurrent deviations method.
8-3. SCATTER DIAGRAM METHOD
i aes ee eee,
Seater diagram is one of the simplest ways of diagrammatic representation ofa bivariate distribution
and provides us one of simplest tools. of ascestaining the comelaton between iwo variables. Suppose we are
give m pairs of values (21,71) (2.72) ~-» Qn Ja) Of two variables X and ¥. For example, if the variables
and ¥ denote the height and weight respectively, them the pairs (511) (4) «yp. 9) may represem
the heights and weights (in pairs) of» individuals. These m points may be ploted as dots () on the axis
and y-axis in the xysplane, (It is customary to take the dependent variable along the y-axis and independent
variable along the x-axis.) The diagram of dots so cblained is known as scatter diagram. Front seat
diagram we can form a fairly good, though rough. idea about the relationship between the two Variabl =
The following points may be borac in mind in interpreting the seller diagram regarding the exe Geet
between the (wo variables : ‘orrelationa4 FUNDAMENTALS OF STATISTICS
a ee Pains are very dense Le, very close to cach ther fatty good amount of correlation may be
exported between the two variables. On the other hand, ifthe point are widely scattered, a poor correlation
‘may be expected between them, Pa .
G2 AF the points on the scatter diagram reveal any trend (either upward or downward), the variables are
said to be correlated and if no trend is evealed, tha variables ae uncorrelated,
{18) Tf thece is an upward trend rising from lov
hand comer, the correlation is positive since this evesis that the values of the two variables move in the
same direction. If, on the oth
hand, the points depict a downward trend from the upper left hand corner
the lower tight hand corner, the Correlation is negative since in this case the values of the two variables
move in the opposite directions.
Gv) In
towards,
aha at fall the points lie on a straight line starting from the let bottom and ing up
right top, the correlation is perfect and positive, and if all the poin We Bn a straight ine
om left top and coming down t8 right ba ic
rams of the scattered data depict dit
PERFECT Posivive.
= PERFECT NEGATIVE LOW DEGREE OF
= ‘CORRELATION CORRELATION POSITIVE CORRELATION
Z Y Y ae
5 oo