Download as pdf
Download as pdf
You are on page 1of 9
15 | CORRELATION 15.1 Correlation In the simple regression analysis we obtain a regression equation which can be used to estimate the values of the dependent variable on the basis of independent variable whose values are known. Now we discuss a closely related Problem of correlation analysis in which we study the degree of closeness of relationship between the variables. If the values‘of two variables vary in such a way that the movements (increase or decrease) in one variable are accompanied by movements (increase ot decrease) in the other, the variables are said to be correlated. Thus correlation is the degree of covariation between the variables. Many examples come to mind. An increaée in the amount of rainfall is accompanied by, to some extent, an increase in the yield of wheat or rice; an increase in the-heights of children is usually accompanied by an increase in their weights; an increase in the issue of television licenses is accompanied by a decrease in the riumber of cinema goers; an increase in the temperature during winter is accompanied by a decrease in the sale of warm clothes; etc. In each of the above examples, movements in one variable are accompanied by movements in the other. However, in Some cases, the movements of the variables are in the same direction, i.e. an increase in one variable is accompanied by an increase in the other or vice versa. In such a case where the movements of the variables are in the same direction, the correlation is said to be positive. If the movements of the variables are in opposite direction, the correlation is said to be negative or inverse. ‘Thus in the first two examples, the correlation is positive while it is negative in the last two examples, ! It is important to remember that in the case of regression the deperident variable is assumed to be random while the independent variable is assumed to be fixed or known. In the case of correlation both the variables are assumed to be random. _ ; 15.2 Correlation and Causation It is important to note that correlation simply refers to the sympathy of movements in the variables, It does not mean that a movement or change in one variable causes a change in the other. The changes in variables may be due to some common cause. For example, when we say that heights and weights of children are correlated, it does not mean that an increase in height causes an increase in weight; the increasé in both may be due to a coinmon cause, namely, the age. As a child grows in age, his height and weight also increase. 15.8 Nature of Correlation If X and Y denote the two variables under consideration, a scatter diagram shows the location of points (X,Y) on a rectangular co-ordinate system. If all the points in the scatter diagram seem to lie near a line, as in (a) and (b) of Fig.15.1, the correlation is said to be linear. If both the variables tend to move in the same direction, i.e. Y tends to increase as X increases or Y tends to decrease as X decreases as Fig.15.1(a), the correlation is called positive or direct correlation. 490 Scanned with CamScanner Correlation _.497 On the other han: ; Pposite direction, i. Y to decrease as } increases or vice versa as in Fig.15.1(b), the correlation is weit negative or inverse correlation : if thre is no lness relationship indicated between the variables as. in pig.15.1(c), We say that ‘there is ho Correlation betwee; Mt related. We shall consider i, “ n them, ie. they are > sha ation only. Unless otherwise specified, ie term correlation is used to mean linear correlation, ! _ If all the points on the scatte fig. 16.102), the correlation is said t diagram tend to lie near a smooth curve as in © be non-linear or curvilinean correlation, Y Y x ° {a) Positive Linear Correlation x i (b) Negative Linear Correlation Fig. 15.1(a) Positive Linear Correlation "ig. 15.1) Negative Linear Correlation ‘one which is independent of absolute ti ring the degree of variability a coefficier “easure of the degree of relationship between tbat in free from the particular units e “led a eoehicient of correlation, to4z The Possible values of the coefficient of co vail The sign of r is the same ae the sign of b sitive correlation, r erm of the given problem. Just ae fot eau nt of-variation is employed, we need a two variables ~ «in abstract coefficient, mployed in a given ease. Such a measure 1 relation, denoted by r, range from -1 in the regression equation. If there ia | Ptfect +1; for negative perfect correlation; r = 1. Lesser Scanned with CamScanner Scanned with CamScanner = es “ 5 7 Erur- Fy -7F) o> ; L(x - XP Ly - 18) The Population correlation co-effigatnt for a bivariate distribution, denoted by p, has alrea= defined as S ps Cowra)’ NS Faresver) For computational pu a we have an alternative form of ras LAY (EX) (EY) n re = = ~ (EX? =~ (LX (EY? (LY ny) +. nEAY-ENTY : VinEX? (Ex) Yn Ey? -(LY)] Thies fe me mene emilee Cod PD Bo. a rey . 7 . Scanned with CamScanner Uh Example 10.5 Calculate the produ nent co-efficient of correlation between ¥ and Y from the data: 2 3 4 5 s. 3 8 7 (P.U., B.A/B.Sc, 1973) The calculations needed to compute r are given below: x Y 1 2 2 5 3 3 4 8 5 7 15 | 25 - https://stat9943. blogspot.com aasticat Hee Fait. Bea an P= EY Bas TUX-HY-7) 3 13 Ss VEUx -F¥ T-7? Yiox26 "161" Alternatively, the following table is set up for calculation LAY-(LY) (LY) in EX? -(LX)?/n) [EY Scanned with CamScanner . - -- - — : 10.5.2 Correlation and Causation “fact that correlation exists between two variz> not imply any cause-and-effect relations ‘o unrelated variables such as the sale of bananas death rate from cancer in a city, may ce a high positive correlation which may be due t & unknown variable (namely. the city pdplilation). The Jarger the city, the more consumption of and the higher will be the death raya rom cancer. Clearly, this is a false or a merely incidental coc which is the result of 2 third vafieble, the city size. Such a false correlanon berween two unc variables is called nonsens: rious correlation. We therefore should be very careful in int= the correlation coefficien measure of relationship or interdependence between two variables 10.5.3 Properties of r, The sample correlation co-efficient r has the following properties i) The correlation co-efficient r is symmetrical with respect to the variables Vand Y, ie ii) The correlation co-efficient lies between —-] and +1, i.e, -lsrs+l, iii) The correlation co-efficient is independent of the origin and scale. =— 7 Proof: Tet and v he the ren new variahles defined by ne = x and v= 26 en thar Y= «= Scanned with CamScanner 500 _ Caravan Elementary Statistics for B.S (Hons) (ii) r=-V(1.5)(056) = Y0.84 = 0.9165. erties 15.6 Correlation of Ranked Data Sometimes sale ceria ioe items whose exact magnitude cannot be ascertaine i 73 1, 2, oR. If the to size, importance or some other criterion using esa ae two variables are ranked in such a manner the coefficien' 6rd? (15.10) Pr 2 n(n? - 1) where d; = differences between ranks of corresponding ave oe oa Y and n= number of pairs of values (X, Y) in the data. Formula (15.10) is cal pearman’s formula for rank correlation. Ts= Example 15.10 Two judges in a constant, who were rer Fag arco 4 B,C, D, E, F, G and H in order of their preference, submit nin the following table. Find the coefficient of rank correlation. acl B-|\C'|'D:| £.4.F | 24.2 FirstJudge | 5 | 2}8]}1]4]6]|3 47 Second Judge | 4 5 7 3 2 8 1 6 Solution ‘The difference of ranks d of the two judges for each candidate is given in the following table. The computations of d” and Ed” are also shown in the table. Difference of ranks,d| 1 |-3) 1 |-2} 2]-2) 2 1| Id=0 a: 1}9} bla} a}al a | eat=2s | 2 = SOEs tt 1G) Os) reel a yt BIO which indicates that the judges agreed well in their choices. Example15,11 (@) Compute the Coefficient of rank correlation by using the Spearman's formula - for the following data, giving ranks to the measured quantities, ) Interpret the results of the Spearman's rank coefficient of correlation. a; | 63) 62] 61) 59) 43158] 67] 76] 41 6 | 78] 79 | 76 | 74 | 7.2 7.0 | 83] 81.) 71 Scanned with CamScanner Correlation _ 901 f ee Zero 10 @) = 6x? Se 6x10 _ 0 dee ©) There ig Ger 1) 720-720 70° Wantities, 16.18.1 Rank Corr, : lati to each of tied obs dation, for Tied Ranks When there are ties in rank, we assign inetanee, if the third ang untne, Mean of the ranks which they jointly cceupy. Fer assign cach the rank (gt a fourth largest values of a variable are the same, we of a variable are the game),7, , 9-5: And if the fifth, sixth and scventh largest veluce Formula (15.10) to caleulaig © *S8i8" each the rank (5 +6-+ 7)/3 = 6, Then we apply 4 stron, ‘ © Positive correlation between the ranks of two measured Alternatively, we agi ‘ We adjust Fi : where ¢ is the number of tieg gee? (18-10) for tied ranks. For each tie, (!° — s/12, ie X 1/8] 5 {11/13 }iol’s Jislis| 2 Y | 56] 44 | 79 72| 70 | 54\ 94 85 | 33 | 65 | 4 Solution: To calculate r, we rank X's and Ys as folliws. We rank the X’s giving rank 1to the highest value 18, rank 2 to 15, rank 3 to 13, rank 4 to 11, rank 5 to 10, rank 6.5 (mean of rank 6 and 7) to both 8, rank 8.5 (mean of rank 8 and rank 9) to both 5, and rank 10 to 2. Similarly, we rank the Ys giving rank 1 to the highest value 94, rank 2 to 85, rank 3 to 79, ..., and rank 10 to 33, which is the smallest. We observe that first set of ranking contain ties. The coefficient of rank correlation is therefore computed as below: Scanned with CamScanner 502_Caravan Elementary Statisties for .S (HONS) d @ { x Y | Rank of x | Rank of ¥ [oar : a.|-t ].66 1 028 e 5] 44 8 9 4.00, if meh yay oF 58 ihe 13 | 72 3 : 0.00 10 | 70 5, 5 a 5 | 64 85 8 ” 1s. | 94 1 1 ks ono is | 86 ‘2 2 reed haynes 2 0.00 8 | 65 |... 65 6 oz : 2d”=3.00 Since n = 10, we'get el 2 di Sed _ | 169) 9982, n(n? 1) 10010" - 1), 5 " * 3 ‘There “are two’ ties each with t= 2. Therefore we ‘add 2(° ~ #2 = 2(23 = 2)/12 = 1 to Ea”, Thus 7, = 1-8-1 24 o976. 1010-1). ‘The corrélation coefficient r calculated for the original X's and Y's’is 0.96, as can easily be verified, and the difference between r, =0.98 or 0.976 and r = 0.96 is quite small. ; When there are. no ties, r; equals the correlation coefficient r calculated for the two sets of ranks; when there are few ties in two stts of ranks, there may be a small but negligible difference. By using ranks we lose some information. but rank correlation methods have the advantage that r, is usually easier to determine than r. ‘These methods can be used in problems where items can be ranked even though they cannot be measured on a numerical scale. For example; we can use r, to measure the correlation between the rankings given by the two judges to various exhibits at an industrial fair or we can use it to measure the relationship. between two pergon's preferences (rankings) for various kinds of foods. In either case, Formula (15.10) Provides an easy way to measure the correlation although we are not dealing with numerical data Example 15.12(b) Compute rank correlation coefficient from the data. Scanned with CamScanner

You might also like