Professional Documents
Culture Documents
Corelation and Reg.-12-27
Corelation and Reg.-12-27
Perfect Correlation: If two variables vary in such a way that their ratio is always constant,
then the correlation is said to be prefect.
10.17 SCATTER OR DOT-DIAGRAM
When we plot the corresponding values of two variables, taking one on x-axis and the
other along y-axis, it shows a collection of dots.
This collection of dots is called a dot diagram or a scatter diagram
Example 15. Ten students got the following percentage of marks in Economics and
Statistics.
Roll No. 1 2 3 4 5 6 7 8 9 10
Marks in Economics 78 36 98 25 75 82 90 62 65 39
Marks in Statistics 84 51 91 60 68 62 86 58 53 47
Solution. Let the sum assured denote by x and the age group by y.
748 Statistics
x 30,000 y 45
x , y
10,000 10
x 10,000 20,000 30,000 40,000 50,000
x –2 –1 0 1 2 f f y f y 2 fxy
y y
(Rows)
50–60 16 4 0 0 0
55 1 8 4 2 – 14 14 14 –20
–
f 17 27 32 20 4 N = 100 f y f y2 fxy
colu- = –61
mn = 131 = –7
–34 –27 0 20 8 fx
fx
= –33
68 0 20 16 f x2
f x2 27
= 131
fxy
f x y 4 16 0 –21 –6
= –7
N fxy f x f y
r
N f x2 f x
2 N f x2 f y2
100 7 33 61 700 2013
100 131 332 100 131 612
13100 1089 13100 3721
2713 2713 2713
0.2556
12011
9379 109.59 96.85 10613 7915
Hence, the age and sum assured are negatively correlated, i.e., as age goes up the sum
assured comes down. Ans.
10.19 SHORT-CUT METHOD
X Y X Y
r N N N
2 2
X X
2 Y Y 2
N N N N
where r is the coefficient of correlation.
X = deviation from assumed mean of x = x – a
Y = deviation from assumed mean of y = y – b
N = Total number of items.
Statistics 749
Example 17. Calculate the coefficient of correlation for the following table :
x–age 0–4 4–8 8–12 12–16
marks
0–5 7 — — —
5–10 6 8 — —
10–15 — 5 3 —
15–20 — 7 2 —
20-25 — — — 9
Solution. Replace the class-interval for x and y by their mid-points and then let
x 10 y 12.5
X and Y
4 5
2 6 10 14 f fY f Y 2 f XY
x X –2 –1 0 1 (row)
y Y f fXY f fXY f fXY f fXY
0–5 2.5 –2 7 28 7 – 14 – 28 28
5–10 7.5 –1 6 12 8 8 14 –14 – 14 20
10–15 12.5 0 5 0 3 0 8 0 0 0
1 7 2 0
15–20 17.5 9 9 9 –7
–7
20–25 22.5 2 9 18 9 18 36 18
f 13 20 5 9 47 fY –1 fY = 87 fXY=59
9 fX
fX –26 – 20 0
–37
9 fX2
f X 2 52 20 0
81
fXY 40 1 0 18 fXY
59
N
2 2
fX fX
2 fY fY 2
N N N
59 37 1
47 47 47 1.255 0.017
1.723 0.620
1.851 0.0005
81 372 87 1 2
47 47 47 47
1 n n2 1
XY d2
2 6
1 1
n n 1 d2
2
12 2
XY
Putting these values in r
X2
Y2
1 1
n n2 1 d2
12 2
n n2 1
12
6 d2
1 Ans.
n n2 1
10.21 SPEARMAN’S RANK CORRELATION COEFFICIENT
6 d2
r 1
n n2 1
where r denotes rank coefficient of correlation and d refers to the difference of ranks
between paired items in two series.
Statistics 751
Example 18. Compute Spearman’s rank correlation coefficient r for the following data:
Person A B C D E F G H I J
Rank in statistics 9 10 6 5 7 2 4 8 1 3
Rank in income 1 2 3 4 5 6 7 8 9 10
Solution.
Person Rank in statistics Rank in income d R1 R2 d2
A 9 1 8 64
B 10 2 8 64
C 6 3 3 9
D 5 4 1 1
E 7 5 2 4
F 2 6 –4 16
G 4 7 –3 9
H 8 8 0 0
I 1 9 –8 64
J 3 10 –7 49
d2 280
6 d2
r 1
n n2 1
6 280
r 1 1 1.697 0.697 Ans.
10 100 1
Example 19. Establish the formula
2x y 2x 2y 2r x y
where r is the correlation coefficient between x and y.
x x2
Solution. We know that x
2
n
[x y x y]2
x y
2
n
x y mean of x y series. mean of x mean of y x y
[ x y x y ]2 [ x x y y ]2
2xy
n n
[ x x y y 2 x x y y ]
2 2
n
x x2 y y2 2 x x y y
n n n
2 x x y y
2x 2y ...(1)
n
x x y y x x y y
We know that r or r x y
n x y n
752 Statistics
xy a x b x2 ...(3)
Dividing (2) by n we get
y x y x
ab y , x
n n n n
y abx
where x and y are the means of x series and y series.
This shows that x, y lie on the line of regression (1), shifting the origin to x, y, the
equation (3) becomes
x x y y a x x b x x2
But x x 0 i.e. x x y y b x x2
x x y
y XY
or b ...(4)
x x2 X2
XY XY XY
We know r
X
2
Y 2
X 2
Y 2 n x y
n
n n
or X Y nr x y
Putting the value of X Y in (4) we get
nr xy r x y r x y r y
b
X 2 X 2 x2 x
n
754 Statistics
y
i.e. slope of the line of regression = b = r
x
The line of regression passes through x, y .
Hence the equation to the line of regression is
y
yy r x x
x
Similarly the regression line of x on y is
x x r x y y .
y
y x
Note. byx r and bxy r are known as the coefficients of regression.
x y
y x
byx . bxy r r r
2
x y
Example 21. If be the acute angle between the two regression lines in the case of
two variables x and y, show that
1 r2 x y
tan
r 2x 2y
where r, x, y have their usual meanings. Explain the significance where r = 0 and
r 1. (A.M.I.E., Winter 2001)
Solution. Lines of regression are
y y
yy r x x ...(1) m1 r
x x
x 1 y
and xx r y y ...(2) m2
y r x
m2 m1
tan
1 m1 m2
1 y y 1 y
r r
r x x r x
y 1 y 2y
1r 1 2
x r x x
y 2
x
1 r2 x 1 r2 x y
. 2 ...(3) Proved
r x y
2 r 2x 2y
(a) If r = 0, then there is no relationship between the two variables and they are independent.
On putting the value of r = 0 in (3) we get tan = , . So the lines (1) and (2)
2
are perpendicular. (A.M.I.E., Summer 1998)
(b) If r = 1 or –1
On putting these values of r in (3) we get, tan 0 or 0
Statistics 755
r 0.49
find x and y, we solve the equations (1) and (2) simultaneously. Their point of intersection
To
is x, y.
x 6, y 1 Ans.
Example 26. Show that the geometric mean of the coefficients of regression is the
coefficient of correlation.
y x
Solution. The coefficients of regressions are r and r
x y
Find the correlation coefficient between height and weight and state the equation of re-
gression of height on weight.
x 15000 y 6800
Solution. x 150, y 68
n 100 n 100
2 2
x x
2 2272500 15000
x
n n
100 100
x
22725 22500 15
225
2
y y
2 463025 6800 2
y
n n 100 100
4630.25 4624
6.25 2.5
xy 1022250
x y 150 68
n 100
r
x y 15 2.5
10222.5 10200 22.5 1.5
0.6
15 2.5 15 2.5 2.5
Regression equation of y on x we have
y 2.5
yy r x x, y 68 0.6 x 150
x 15
1
y 68 x 150 or 10y 680 x 150
10
10y x 530 Ans.
10.25 ERROR OF PREDICTION
The deviation of the predicted value from the observed value is known as the standard
error of prediction. It is given by
Eyx
y yr2
n
where y is the actual value and yr the predicted value.
Example 31. Prove that
(i) Eyx y 1 r2
(ii) Exy x
1 r2
Solution. The equation of the line of regression of y on x is
y
y y r x x
x
y
yr y r x x
x
2 12
1
So, Eyx
y yr2
n
y
y y r x x
x
n
12
1 r2 2y 2r y
y y 2 2 x x2 x x y y
n x x
760 Statistics
12
y y2 2 2y x x2 y x x y y
r 2 2r
n x n x n
1 2
2y y
2y r2 2 2x 2r r x y
x x
12 12
2y r2 2y 2r2 2y 2y r2 2y
y
1 r2 Proved.
(ii) Similarly (ii) may be proved.
Example 32. Find the standard error of estimate of y on x for the data given below:
x 1 3 4 6 8 9 11 14
y 1 2 4 4 5 7 8 9
E yx
y y
r
2
308
22
7
0.564 Ans.
n 121 8
Exercise 10.2
1. Find the coefficient of correlation between x and y from the table of their values :
x 1 3 4 6 8 9 11 14
y 1 2 4 4 5 7 8 9
Ans. 0.977.
2. Find the coefficient of correlation of the following data taking new origin of x at 70 and for y at
67.
Statistics 761
x 67 68 64 68 72 70 69 70
y 65 66 67 67 68 69 71 73
(AMIE winter 2002 ) Ans. 0.472
3. x and y are two random variables with the same standard deviation and correlation coefficient r. Show
1r
that the coefficient of correlation between x and x + y is
2 .
4. Find the regression line of y on x for the data :
x 1 4 2 3 5
y 3 1 2 5 4
Ans. y = 2.7 + 0.1x
5. Find the correlation coefficient and the equations of regression lines from the following data :
x 1 2 3 4 5
y 2 5 3 8 7
Ans. r = 0.81, x = 0.5y + 0.5, y = 1.3x + 1.1
6. Find the regression line of y on x if
x 40 70 50 60 80 50 90 40 60 60
y 2.5 6.0 4.5 5.0 4.5 2.0 5.5 3.0 4.5 3.0
Ans. y = 0.55 + 0.0583 x
7. The following marks have been obtained by a class of students in statistics.
Paper I 80 45 55 56 58 60 65 68 70 75 85
Paper II 81 56 50 48 60 62 64 65 70 74 90
Compute the coefficient of correlation for the above data. Find the lines of regression.
Ans. r = .918, y – 65.45 = 0.981 (x – 65.18)
x – 65.18 = 0.859 (y – 65.45)
8. Find the equations to the lines of regression and the coefficient of correlation for the following data:
x 2 4 5 6 8 11
y 18 12 10 8 7 5
Ans. y – 10 = – 1.34 (x – 6), x – 6 = – 0.632 (y – 10), r = – 0.92
9. b
Obtain normal equations for fitting a curve of the form y ax
x
for n points xr, yr, r = 1, 2, ... n. y 1
Ans. xy nb a x2,
na b 2
x x
10. The following results were obtained from lineups in Applied Mechanics and Engineering Mathematics
in an examination :
Applied Mechanics Engg. Maths.
(x) (y)
Mean 47.5 10.5
Standard deviation 16.8 10.8
r 0.95
Find both the regression equations. Also estimate the value of y for x = 30.
Ans. y 0.611x 10.5, x 1.478 y 1.143, y 28.83
11. The following results were obtained from records of age (x) and systolic blood pressure (y) of a group
of 10 men :