ined our discussion to ‘univariat Iso saw how the various meast te distributions only ie, the distributions involving kurtosis can be used for the paar S measures of central tendency, dispersion, skewness and series where cach item of 2 Comparison and analysis. We may, however, come across certain heights and weights of snc ism the values of vo or more variables. If we measure the assumes twe Values —one relat en oat @ series in which each unit (individual) of the series re ing to heights the other relating to weights. Such distribution, in which = eres assumes two values 1s called a bivariate distribution, Further, if we measure miore two variables on each unital on which diff wution, it is cz ‘@ multivariate distribution. In series, the units ; ifferent measurements are taken may be of almost any nature such as different individuals, times, places, etc. For example we may have : (The series of marks of individuals in two subjects in an examination. {Gi) The series of sales revenve and advertising expenditure of different companies in a particular year. (iii) The series of expons of raw cotton in crores of rupees ana imports of manufactured goods during umber of years from 1989 t0 1994, say. {iv) The series of ages of husbands and wives in a sample of selected married couples and so on. ‘Thus‘in a bivariate distribution we are given a set of pairs of observations, one value of each pair the values of each of the two variables. In a bivariate distribution, we may be interested to find if there is any relationship between the two variables under stedy. The correlation is a statistical tool which studies the relationship between two variables and correlation analysis involves various methods and techniques used for studying and measuring the extent of the relaticnship between the two variables. WHAT THEY SAY ABOUT CORRELATION — SOME DEFINITIONS AND USES “When the relationship {s of a quantitative nature, the appropriate statistical tool for discovering and measuring the relationship and expressing it ina brief formula is known as correlation.’-~Croxton and Cowdex : “Correlation is an analysis of the covariation between two or more variables.” —AM. Tuttle “Correlation analysis contributes to the understanding of economic behaviour, aids in locating the critically Important variables on which others depend, may reveat to the economist the connections by which disturbances spread and suggest to him the paths through which stabilising forces may become effective."-W.A, Nelswanger “The effect of cotéelation is to reduce the range of | uncertainty of our prediction.” —Tippett ‘ire said to be correlated if the change in one variable results in a corresponding ‘ther variable. . (ryper of Correlation G4) POSITIVE AND NEGATIVE CORRELATION If the values of the two variables deviate inthe same direction ie. if the increase inthe values of one ‘variable results, on an average, in a corresponding increase in the values of the other variable or if a ie FUNDAMENTALS OF STATISTICS ‘ecrease in the values of one variable results, on an average, in a corresponding decrease in the values of ‘he other variable, correlation is said to be positive or direct Some examples of series of positive correlation are ; (Weights and weights, ~ (Gi) The family income and expenditure on luxury items. (iii) Amount of rainfall and yield of crop (up toa point). . iv) Price and suppiy of a commodity and so on, v = dices on {Me other hand, correlation is sid to be negative or inverse ifthe variables deviate inthe opposite = Hrection i.e. if the increase (decrease) in the values of one variable results, on the average, in a | corresponding decrease (increase) in the values.of the other variable. Some examples of negative comelation are the series relating to : () Price and demand of a commodity. a (if) Yotume and pressure of a perfect gas. ~ GiiYSale of woollen garments and tne day temperature, and $0 on. (6) LINEAR AND NON-LINEAR CORRELATION : The coriefatici Between two variables is said to be finear if corresponding to a unit change in one ‘Variable Over tie emire range Of The Values. For example, let variable, there is a constant change inthe oth us consider the following data : x 1 2 3 4 5 y 5 7 9 i 3 ‘Thus for a unit change in the value of x, there is a constant change viz.. 2 in the corresponding values of y. Mathematically, above data can be: expressed by the relation - 43 In general, two variables x and y'are said to bg linearly related, if there exists a relationship of the form yaa bx oe) between them. But we know that (*) is the epee “a straight line with slope ‘b’ ard which niakes an intercept ‘a’ on the y-axis {cf y= mx + c form of equation of the line]. Hence, if the values of the two variables are plotted as points in the xy-plane, we shall get a straight line. This can be easily checked for the example given above. Such phenomena occur frequently in. physical sciences but in economics and sorial sciences, we very rarely come across the data which give a straight line graph. The relationship between two variables is said to be non-linear or curvilinear if corresponding to a unit change in one variable, the other variable does not change al a constant rate but at fluctuating rate, In such cases iF Iie dala are plowed on the xj-plane, we do not get a straight line curve. Mathematically speaking, the correlation is said to be non-linear if the slope of the plotted curve is not constant. Such phenomena are common in the data relating to economics and social sciences. Since the techniques for the analysis and measurement of non-linear relatlon are quite complicated and tedious as compared to the methods of studying and measuring linear relationship, we generally assume that the relationship between the two variables under study is linear. In this chapter, we shall confine ourselves to the measurement of linear relationship only. The measurement of non-linear relationship is, however, beyond the scope of this book. bina wena piel The study of comelation is easy in physical sciences since on the basis of experimental rerrae ao eatash mathematical relationship between two or moe vriables under study. But in itis very difficult to establish mathematical relationship between the is ince i ff the variables under study are affected der study since in such phenomena, the values o! n Fadl by noltiplcity of factors and iti extremely dificult, sometimes impossible, to study the sffeate ofadeh factor separately. Hence, inthe data relating to social and economic phenomena, the study of corre}ftion cannot be as accurate and pregfse. Spe ear i ion, Correlation analysis enables us to have an idea about the deg pfi2. Correlation and Cats 7p variables unde sy. However, ails tefl poy the resulis, itis easy social and economic sciences, gr CORRELATION ANALYSIS 83 cause and effect relationship between the variables. In a bivariate distribution, if the variables have the caus at effect relationship, Tey are bound to vary in sympathy with each other and, therefore, there is bound to be a high degree of correintion between them. In other words, causation always implies correlation, However, the converse isnot true i. even a fairly high degree of correlation between the two nchip between: thew. The high degiee of eurrefarion between the variables may be due to the following reasons ; |. Mitual dependence, The phenomena under study may iter-influence each other. Such situations are usually observed in data relating to economic and business situations. For instance, it is well-known princi principle in economics that prices of a commodity are influenced by the forces of supply and demand, For instance, if the price of a commodity increases, its demand generally decreases (other factors remaining constant), Here increased price is the cause and reduction in demand is the effect, However, a decrease in the demand of a commodity due to emigration of the people or due to fashion or some other factors i changes in the tastes and habits of people may result in decrease in its price. Here, the cause is the reduce demand and the effect is the reduced price. Accordingly, the two variables may show @ good degree of correlation due (6 interaction of each on the other, yet it becomes very difficult to isolate the exact cause from the effect. 2. Both the variables being influenced. by the same external factors. A high degree of correlation between the two variables may be due to the effect or interaction of a third variable or a number of Yariables on each ofthese two variables. For example, a fairly high degree of corvelation may be observed between the yield per hectare of two crops, say, rice and potalo, due to the effect of a number of factors Hike favourable weather conditions, fertilizers used, irrigation facilities, etc., on each of them. But none of the two is the cause of the other. 3. Pure chance. It may happen that a small randomly selected sample from a bivariate distribution may. show a fairly high degree of correlation though, actually, the variables may not be correlated in the Population. Such correlation may be attributed to chance fluctuations. Moreover, the conscious or unconscious bias on the part of the investigator, in the selection of the sample may also result in high ‘degree of corrclation in the sample. ln this connection, it may be worthwhile to make a mention cf the wo phenomena where a fairly high degree of correlation may be observed though itis not possible to conceive them as being causally related. For example, we may observe a high degree of correlation belween the size of shoe and the inteligeice of a group of persons. Such correlation is called spurious or non-sehse correlgiiofi. [For details see § 84-2 (ii),) METHODS OF STUDYING CORRELATION We shall confine our discussion to the methods of ascertaining only linea relationship between two variables (series). The commonly used mg#fods for studying the correlation between two variables are « (@ Scatter diagram method. (ii) Karl Pearson's coefficient of correlation (Covariance method), (ii) Two-way frequency table (Bivariste correlation method), (iv) Rank method. : (v) Concurrent deviations method. 8-3. SCATTER DIAGRAM METHOD i aes ee eee, Seater diagram is one of the simplest ways of diagrammatic representation ofa bivariate distribution and provides us one of simplest tools. of ascestaining the comelaton between iwo variables. Suppose we are give m pairs of values (21,71) (2.72) ~-» Qn Ja) Of two variables X and ¥. For example, if the variables and ¥ denote the height and weight respectively, them the pairs (511) (4) «yp. 9) may represem the heights and weights (in pairs) of» individuals. These m points may be ploted as dots () on the axis and y-axis in the xysplane, (It is customary to take the dependent variable along the y-axis and independent variable along the x-axis.) The diagram of dots so cblained is known as scatter diagram. Front seat diagram we can form a fairly good, though rough. idea about the relationship between the two Variabl = The following points may be borac in mind in interpreting the seller diagram regarding the exe Geet between the (wo variables : ‘orrelation a4 FUNDAMENTALS OF STATISTICS a ee Pains are very dense Le, very close to cach ther fatty good amount of correlation may be exported between the two variables. On the other hand, ifthe point are widely scattered, a poor correlation ‘may be expected between them, Pa . G2 AF the points on the scatter diagram reveal any trend (either upward or downward), the variables are said to be correlated and if no trend is evealed, tha variables ae uncorrelated, {18) Tf thece is an upward trend rising from lov hand comer, the correlation is positive since this evesis that the values of the two variables move in the same direction. If, on the oth hand, the points depict a downward trend from the upper left hand corner the lower tight hand corner, the Correlation is negative since in this case the values of the two variables move in the opposite directions. Gv) In towards, aha at fall the points lie on a straight line starting from the let bottom and ing up right top, the correlation is perfect and positive, and if all the poin We Bn a straight ine om left top and coming down t8 right ba ic rams of the scattered data depict dit PERFECT Posivive. = PERFECT NEGATIVE LOW DEGREE OF = ‘CORRELATION CORRELATION POSITIVE CORRELATION Z Y Y ae 5 oo

