Professional Documents
Culture Documents
CH-5 Correlation and Regression
CH-5 Correlation and Regression
REGRESSION
Unit 5 Correlation and Regression
❖ Bivariate Data:
When data are collected on two variables simultaneously, they are known as
bivariate data and the corresponding frequency distribution is known as Bivariate
Frequency Distribution.
Example:
Y↓ X→ 10 11 12 13 Total (𝑓𝑦 )
21 | (1) - || (2) | (1) 4
22 | (1) ||| (3) | (1) - 5
23 || (2) - | (1) | (1) 4
24 | (1) || (2) || (2) || (2) 7
Total (𝑓𝑥 ) 5 5 6 4 20
❖ Correlation:
If the two variable quantities are interdependent i.e. change in one variable tends
to be accompanied by corresponding change in other variable either directly or
inversely, then the two variables are known to be associated or correlated and the
process of establishing a relation between them is known as correlation.
Types of correlation:
1. Positive correlation: If two variables move in the same direction i.e. When one
variable increases other also increases or when one variable decreases other
also decreases, is called positive correlation.
Example: Profit and investment, the longer your hair grows, the more shampoo
you will need, knowledge and study are positively correlated.
2. Negative correlation: If two variables move in the opposite direction i.e. When
one variable increases other decreases, is called Negative Correlation.
Example: Temperature of environment and sale of winter wear clothes, the more
you sleep the less you prepare for exams, if speed increases time required to
cover particular distances decreases.
3. Zero correlation: When any increase or decrease in x does not influence y, then
x and y are said to be uncorrelated and the correlation coefficient between
them is zero. In this case the two variables are known as dissociated or
uncorrelated or independent
Example: Shoesize and intelligence are uncorrelated, You playing mobile games
at home and rise in no. of covid cases, height and exam scores
❖ Correlation coefficient:
It measures the degree or strength of linear relation between two variables. It lies
between -1 and +1, and is denoted by “r”. It is a unit free measure.
• If r = 0, it means no correlation
• If r = +1, it means perfect positive correlation
• If r = -1, it means perfect negative correlation
0 0
0 1 2 3 0 1 2 3 4
2.5 3
2 2.5
2
1.5
1.5
1
1
0.5
0.5
0 0
0 1 2 3 4 0 1 2 3
e) Negative correlation with high degree f)Negative correlation with low degree
2.5 2.5
2 2
1.5 1.5
1 1
0.5 0.5
0 0
0 1 2 3 4 0 1 2 3 4
g) No correlation
2.5
1.5
0.5
0
0 0.5 1 1.5
It is the most used method for establishing the measure of extent as well as
the relationship between the two variables. It can measure correlation only when
the variables are having a linear relationship.
𝑪𝑶𝑽(𝒙𝒚)
• 𝑟𝒙𝒚 = 𝝈𝒙 ×𝝈𝒚
𝒏∑𝒙𝒚−∑𝒙∑𝒚
• 𝑟𝒙𝒚 =
√𝒏∑𝒙𝟐 −(∑𝝒)𝟐 √𝒏∑𝒚𝟐 −(∑𝒚)𝟐
∑(𝑥−𝑥)(𝑦−𝑦)
• 𝐶𝑂𝑉(𝑥𝑦) = 𝑛
∑ 𝑥𝑦
• 𝐶𝑂𝑉(𝑥𝑦) = − 𝑥. 𝑦
𝑛
∑ 𝑥2
• 𝑆. 𝐷𝑥 = √ − (𝑥)2
𝑛
∑ 𝑦2
• 𝑆. 𝐷𝑦 = √ − (𝑦)2
𝑛
where,
x y xy 𝑥2 𝑦2
2 9 18 4 81
3 8 24 9 64
5 8 40 25 64
5 6 30 25 36
6 5 30 36 25
8 3 24 64 9
29 39 166 163 279
Solution:
29 39
Now, 𝑥 = 6
= 4.83, 𝑦 = 6
= 6.50
∑ 𝑥𝑦
COV(x,y) = 𝑛 − 𝑥. 𝑦 = -3.74
∑ 𝑥2
𝑆. 𝐷𝑥 = √ 𝑛
− (𝑥)2 = 1.95
∑ 𝑦2
𝑆. 𝐷𝑦 = √ 𝑛
− (𝑦)2 = 2.06
𝑪𝑶𝑽(𝒙𝒚)
𝑟𝒙𝒚 = = - 0.93
𝝈𝒙 ×𝝈𝒚
𝒏∑𝒙𝒚−∑𝒙∑𝒚
Alternate Method: 𝑟𝒙𝒚 = = - 0.93
√𝒏∑𝒙𝟐 −(∑𝝒)𝟐 √𝒏∑𝒚𝟐 −(∑𝒚)𝟐
Problem No.2: Given that the correlation coefficient between x and y is 0.8,
write down the correlation coefficient between u and v where
where,
a) Based on Ranks
b) Not affected by change of origin and scale.
Solution:
Since n = 8, ∑ 𝐷 2 = 4,
6 ∑ 𝐷2
𝑟 = 1 − 𝑛(𝑛2 −1) = 0.95
2𝐶−𝑁
thus, 𝑟𝑐 = ±√± ( 𝑁
) = -0.65
❖ Regression:
Example: One can estimate the profit for a given level of investment on the basis of
the past records.
𝑐𝑜𝑣(𝑥𝑦) 𝜎𝑦 𝒏∑𝒙𝒚−∑𝒙∑𝒚
3. Regression coefficient of y on x: 𝑏𝑦𝑥 = =𝑟 =
𝜎𝑥2 𝜎𝑥 √𝒏∑𝒙𝟐 −(∑𝝒)𝟐
𝑐𝑜𝑣(𝑥𝑦) 𝜎 𝒏∑𝒙𝒚−∑𝒙∑𝒚
4. Regression coefficient of x on y: 𝑏𝑥𝑦 = 𝜎𝑦2
= 𝑟 𝜎𝑥 =
𝑦 √𝒏∑𝒚𝟐 −(∑𝒚)𝟐
5. 𝑟 = √𝑏𝑦𝑥 × 𝑏𝑥𝑦
Properties of regression:
Problem No. 5: the following gives mean and S.D of the prices of two shares.
Coefficient of correlation between the share prices = 0.48. Find the most likely
price of share B for a price of Rs. 50 of share A.
Share Mean SD
Company A 𝑥 =44 𝜎𝑥 = 5.60
Company B 𝑦 = 58 𝜎𝑦 = 6.30
Solution:
The regression line of y on x is given by : y = a + bx
𝜎𝑦
Where 𝑏𝑦𝑥 = 𝑟 𝜎 = 0.54
𝑥
𝑎 = 𝑦 − 𝑏𝑥 = 34.24
Thus, when x = 50, y = 34.24 + 0.54x = Rs. 61.24
❖ Probable error:
❖ Spurious correlation:
There are some cases where we may find a correlation between two variables
although the two variables are not casually related. This is due to existence of third
variable which is related to both the variables under consideration. Such a
correlation is known as spurious correlation or non-sense correlation.
Example: there could be a positive correlation between production of rice and that
of iron in India for last 20 years due to the effect of third variable time on both
these variables.
❖ Coefficient of Determination:
𝑬𝒙𝒑𝒍𝒂𝒊𝒏𝒆𝒅 𝒗𝒂𝒓𝒊𝒂𝒏𝒄𝒆
𝒓𝟐 =
𝑻𝒐𝒕𝒂𝒍 𝒗𝒂𝒓𝒊𝒂𝒏𝒄𝒆
Thus, a value of 0.6 for r indicates that (0.6)2 × 100% or 36% of the variation has
been accounted for by the factor under consideration and the remaining 64%
variation is due to other factors.
Coefficient of non-determination = (𝟏 − 𝒓𝟐 )
3. If the relation between two variables x and y is given by 2x + 3y + 4 = 0, then the value of
the correlation coefficient between x and y is {}
a) 0
b) 1
c) – 1
d) Negative
4. [Nov 07] In rank correlation, the association need not be linear: {}
a) True
b) False
c) Partly True
d) Partly False
5. [June 08] If the correlation coefficient between two variables is 1, then the two lines of
regression are: {}
a) Parallel
b) At right angles
c) Coincident
d) None of these
6. [June 10] ________ of the regression coefficients is greater than the correlation coefficient.
{}
a) Combined mean
b) Harmonic mean
c) Geometric mean
d) Arithmetic mean
d) None of these
8. [Jan 21] The intersection points of the two regression lines: y on x and x on y is {}
a) (0,0)
b) (𝑥, 𝑦)
c) (𝑏𝑦𝑥 , 𝑏𝑥𝑦 )
d) (1,1)
10. [July 21] If the sum of the product of the deviation of x and y from their mean is zero, the
correlation coefficient between x and y is {}
a) Zero
b) Positive
c) Negative
d) 10
11. [July 21] The straight-line graph of the linear equation y = a + bx, slope is horizontal if
{}
a) b = 1
b) b ≠ 0
c) b = 0
d) a = b ≠ 0
12. [Dec 21] The regression coefficients remain unchanged due to {}
a) Shift of origin
b) Shift of scale
c) Always
d) Never
13. If high values of one tend to low values of the other, they are said to be {}
a) Negatively correlated
b) Indirectly correlated
c) Both
d) None
15. Age of applicants for life insurance and premium of insurance – correlation is {}
a) Positive
b) Negative
c) Zero
d) None
16. Unemployment index and the purchasing power of common man – correlation is {}
a) Positive
b) Negative
c) Zero
d) None
17. Production of Pig iron and soot content in Durgapur – correlation is {}
a) Positive
b) Negative
c) Zero
d) None
20. [MTP: S1 Dec 21] For a p × q bivariate frequency table, the maximum number of
conditional distributions is {}
a) P
b) P + q
c) pq
d) p or q
21. [MTP: S1 Dec 21] For a p × q bivariate frequency table, the maximum number of marginal
distributions is {}
a) P
b) P + q
c) 1
d) 2
b) 0.625
c) 0.4
d) 0.5
23. If the sum of squares of difference of ranks, given by two judges A and B, of 8 students in 21,
what is the value of rank correlation coefficient? {}
a) 0.7
b) 0.65
c) 0.75
d) 0.8
24. If the rank correlation coefficient between marks in management and mathematics for a
group of students in 0.6 and the sum of squares of the differences in ranks in 66, what is the
number of students in the group? {}
a) 10
b) 9
c) 8
d) 11
25. For 10 pairs of observations, No. of concurrent deviations was found to be 4. What is the
value of the coefficient of concurrent deviation? {}
a) √0.2
b) √−0.2
c) 1/3
d) -1/3
26. The coefficient of concurrent deviation for p pairs of observations was found to be 1/ √3 . If
the number of concurrent deviations was found to be 6, then the value of p is. {}
a) 10
b) 9
c) 8
d) None of these
27. What is the value of correlation coefficient due to Pearson on the basis of the following
data: {}
x: –5 –4 –3 –2 –1 0 1 2 3 4 5
Y: 27 18 11 6 3 2 3 6 11 18 27
a) 1
b) -1
c) 0
d) -0.5
29. If the regression line of y on x and that of x on y are given by y = –2x + 3 and 8x = –y + 3
respectively, what is the coefficient of correlation between x and y? {}
a) 0.5
b) -1/√2
c) – 0.5
d) None of these
30. If 4y – 5x = 15 is the regression line of y on x and the coefficient of correlation between x and
y is 0.75, what is the value of the regression coefficient of x on y? {}
a) 0.45
b) 0.9375
c) 0.6
d) None of these
31. If the regression coefficient of y on x, the coefficient of correlation between x and y and
√3
variance of y are -3/4, 2 and 4 respectively, what is the variance of x? {}
a) 2/√3/2
b) 16/3
c) 4/3
d) 4
32. [Nov 06] Take 200 and 150 resp. as the assumed mean for X and Y series of 11 values, then
dx = X – 200, dy = Y – 150, ∑ 𝑑𝑥 = 13, ∑ 𝑑𝑥 2 = 2667, ∑ 𝑑𝑦 = 42, ∑ 𝑑𝑦 2 = 6964, ∑ 𝑑𝑥 𝑑𝑦 =
3943. The value of r is: {}
a) 0.77
b) 0.98
c) 0.92
d) 0.82
33. [May 07] If the sum of squares of the rank difference in mathematics and physics marks of
10 students is 22, then the coefficient of rank correlation is: {}
a) 0.267
b) 0.867
c) 0.92
d) None
34. [Dec 10] If the sum of the product of deviations of x and y series from their means is zero,
then the coefficient of correlation will be {}
a) 1
b) – 1
c) 0
d) None of these
a) 0.03
b) 0.3
c) 0.2
d) 0.05
2
37. [May 19] Find the probable error if 𝑟 = and n = 36 {}
√10
a) 0.6745
b) 0.06745
c) 0.5287
d) None
38. [Jan 21] For the set of observations {(1,2), (2,5), (3,7), (4,8), (5,10)}, the value of Karl
Pearson’s coefficient of correlation is approx. given by {}
a) 0.755
b) 0.655
c) 0.525
d) 0.985
39. [Jan 21] The coefficient of correlation between x and y is 0.5, the covariance is 16 and the
standard deviation of x is 4. Then the standard deviation of y is {}
a) 4
b) 8
c) 16
d) 64
40. [July 21] If 𝑏𝑦𝑥 = −1.6 𝑎𝑛𝑑 𝑏𝑥𝑦 = −0.4, then 𝑟𝑥𝑦 will be {}
a) 0.4
b) – 0.8
c) 0.64
d) 0.8
y: 4 6 7 8 10
Two coefficient of correlation was found to be 0.93. What is the correlation between u and v
as given below?
u: –3 –2 0 –1 2
v: –4 –2 –1 0 2
a) -0.93
b) 0.93
c) 0.57
d) -0.57
43. Referring to the data presented in Q. No. 42, what would be the correlation between u and v?
{}
u: 10 15 25 20 35
a) -0.6
b) 0.6
c) -0.93
d) 0.93
44. Given the regression equations as 3x + y = 13 and 2x + 5y = 20, which one is the regression
equation of y on x? {}
a) 1st equation
b) 2nd equation
c) Both(a) and(b) (d)
d) None of these
45. Given the following equations: 2x – 3y = 10 and 3x + 4y = 15, which one is the regression
equation of x on y? {}
a) 1st equation
b) 2nd equation
c) Both equations
d) None of these
46. If u = 2x + 5 and v = –3y – 6 and regression coefficient of y on x is 2.4, what is the regression
coefficient of v on u? {}
a) 3.6
b) -3.6
c) 2.4
d) -2.4
48. If y = 3x + 4 is the regression line of y on x and the arithmetic mean of x is –1, what is the
arithmetic mean of y? {}
a) 1
b) -1
c) 7
d) None of these
49. If coefficient of correlation between x and y is 0.46. Find coefficient of correlation between
𝑦
x and . {}
2
a) 0.46
b) 0.92
c) – 0.46
d) – 0.92
𝑥−5 𝑦−7
50. If the correlation coefficient between x and y is r, then between 𝑈 = 10
and 𝑉 = 2
is
{}
a) r
b) – r
c) (r - 5)/2
d) (r - 7)/10
51. [Jan 21] The intersection points of the two regression lines: y on x and x on y is {}
a) (0,0)
b) (𝑥, 𝑦)
c) (𝑏𝑦𝑥 , 𝑏𝑥𝑦 )
d) (1,1)
52. [Dec 21] For any two variables x and y regression equations are given as 2x + 5y – 9 = 0 and
3x – y – 5 = 0. What is the A.M of x and y? {}
a) 2, 1
b) 1, 2
c) 4, 2
d) 2, 4
54. If the covariance between two variables is 20 and the variance of one of the variables is 16,
what would be the variance of the other variable? {}
a) More than 25
b) More than 10
c) Less than 10
d) More than 1.25
55. If y = a + bx, then what is the coefficient of correlation between x and y? {}
a) 1
b) -1
c) 1 or –1 according as b > 0 or b < 0
d) None of these
56. While computing rank correlation coefficient between profit and investment for the last 6
years of a company the difference in rank for a year was taken 3 instead of 4. What is the
rectified rank correlation coefficient if it is known that the original value of rank correlation
coefficient was 0.4? {}
a) 0.3
b) 0.2
c) 0.25
d) 0.28
57. Following are the two normal equations obtained for deriving the regression line of y and x:
5a + 10b = 40
10a + 25b = 95
a) 2x+3y=5
b) 2y+3x=5
c) Y=2+3x
d) Y=3+5x
59. [May 19] If the regression line of y on x is given by Y = x + 2 and Karl Pearson’s coefficient of
𝜎𝑦 2
correlation is 0.5 then 𝜎 2 =______. {}
𝑥
a) 3
b) 2
c) 4
d) None
60. [Jan 19] Given that the variance of x is equal to the square of standard deviation of y and
the regression line of y on x is y = 40 + 0.5(x - 30). Then the regression line of x on y is………
{}
a) Y = 40 + 4(x - 30)
b) Y = 40 + (x - 30)
c) Y = 40 + 2(x - 30)
d) Y = 30 + 2(x - 40)
61. [July 21] If the slope of the regression line is calculated to be 5.5 and the intercept 15 then
the value of y when x is 6 is: {}
a) 88
b) 48
c) 18
d) 78
63. [Dec 21] The intersecting point of two regression lines falls at x-axis. If the mean of x values
is 16, the standard deviation of x and y are resp. 3 and 4, then mean of y is {}
a) 16/3
b) 4
c) 0
d) 1
64. [MTP − S1 Dec 21] If the coefficient of correlation between two variables is 0.7 then the
percentage of variation unaccounted for is {}
a) 70%
b) 30%
c) 51%
d) 49%