Professional Documents
Culture Documents
Quantitative Analysis and Business Development (UNIT-1)
Quantitative Analysis and Business Development (UNIT-1)
1) Direct method.
2) In-direct method (or) Deviation method.
3) Step Deviation method.
1) Direct method :
2
2) Median:
Median is defined as “middle most “or “Central value “of the set of the observations, when
Observations are arranged in ascending or descending order of their magnitude.
It divides the given arranged series into two equal parts. Median is also known as ‘Positional
Average “.Whereas mean is known as ‘Calculated average “.
When a series consists of even number of terms then median is known as arithmetic mean
Of the central items. It is denoted by𝑀𝑑 .
Formulas:
Raw Data ----------- Arrange the given set of data in ascending or descending
Order.
Case – i) If n is odd then median is the value given by
𝑡ℎ
𝑀𝑑 = (𝑛 + 1)⁄2 term Where n = No. of observations
Case –ii) If n is even number then median is given by
(𝑛⁄2)+(𝑛+1⁄2)
𝑀𝑑 = the term
2
Discrete Data ------
STEP -1: Find the cumulative frequencies of the given data.
𝑛
STEP -2: Find N = ∑𝑖=1 𝑓𝑖
STEP -3: Find the cumulative frequency just greater than 𝑁 ⁄2 and the corresponding
value of X is known as median value.
Continuous Data---
STEP -1: Find the cumulative frequencies of the given data.
𝑛
STEP -2: Find N = ∑𝑖=1 𝑓𝑖
STEP -3: Then value of median is given by
𝑁⁄2−𝑚
𝑀𝑑 = L + {
𝑓
} 𝑋𝐶
3
If n= 2 the the geometric mean is the square root of the product of the observations.
EXA MPLE: The geometric mean of 4 and 16
2 2
G.M. = √(4). (16) = √64 = 8
If the observations are greater than 2 then the computation of n th root is not suitable, in that case we
can take logarithm.
4
1⁄
Log (G.M.) = log {(X1), ( X2), ( X3). . (. Xn )} 𝑛 = 1⁄𝑛 𝑙𝑜𝑔 {(X1), ( X2), ( X3). . (. Xn )}
FORMULAS:
Raw Data ------------- G.M. = Anti log {(1⁄𝑛)(∑𝑛
𝑖=1 log 𝑋𝑖 )}
Discrete Data ------ G.M. = Anti log {(1⁄𝑁)(∑𝑛𝑖=1 𝑓𝑖 log 𝑋𝑖 )}
Continuous Data--- G.M. = Anti log {(1⁄𝑁)(∑𝑛𝑖=1 𝑓𝑖 log 𝑚𝑖 )}
5) Harmonic Mean:
If X1, X2, X3 ... Xn are given set of n observations then the harmonic mean is given by
1
H.M. =
1⁄𝑛(∑𝑛
𝑖=11 1/𝑋𝑖 )
FORMULAS:
1
Raw Data ------------- H.M. =
1⁄𝑛(∑𝑛
𝑖=1 1/𝑋𝑖 )
1
Discrete Data ------ H.M. = fi
1⁄𝑁(∑n
i=1 ⁄Xi )
1
Continuous Data--- H.M. = fi
1⁄N(∑n
i=1 ⁄mi )
Measures of dispersion
Definition:
The meaning of dispersion is ‘scateredness’. The measure of scatter of the given data
about the average is said to be a measure of dispersion.
Characteristics of Good Measure of Dispersion
Types of Measures
1) Range.
2) Quartile Deviation.(Q.D.)
5
3) Mean Deviation.(M.D.)
4) Standard Deviation.(S.D.)
In the above the first two measures are known as ‘positional averages’ and the remaining measures
are known as ‘calculated averages’.
Formulas:
1) Range :
Range is the difference between the values of the extreme values. It is denoted by R.
Raw Data ----- ---- Range = R= (Largest value- Smallest value) = L-S
Discrete Data ----- Range = R= (Largest value- Smallest value) = L-S
Continuous Data - Range = R= (Largest value- Smallest value) = L-S
Coefficient of Range
𝐿−𝑆
Coefficient of range =
𝐿+𝑆
2) Quartile deviation :
Quartile deviation is denoted by Q.D. If Q1 is the first quartile and Q3 is the third
Quartile. Then quartile deviation is as follows
𝑄3−𝑄1
Q.D. =
2
𝑄3−𝑄1
Raw Data ----- ---- Q.D. =
2
𝑄3−𝑄1
Discrete Data ----- Q.D. =
2
𝑄3−𝑄1
Continuous Data - Q.D. =
2
Coefficient of Q.D
𝑄3−𝑄1
Coefficient of Q.D =
𝑄3+𝑄1
3) Mean Deviation :
If X1 , X2 , X3, ...... Xn are n observations and di= Xi – a then the mean deviation is
denoted by M.D. And is given by
∑𝑛 𝑖=0 |𝑑𝑖|
M.D. = ̅ 𝑋̅ = mean
where di = Xi- 𝑋
𝑛
∑𝑛
𝑖=0 |𝑑𝑖|
Raw Data ----- ---- M.D. = ̅
where di = Xi- 𝑋 𝑋̅ = mean
𝑛
∑𝑛
𝑖=0 𝑓𝑖|𝑑𝑖|
Discrete Data ----- M.D. = ̅
where di = Xi- 𝑋 𝑋̅ = mean
𝑁
6
∑𝑛
𝑖=0 𝑓𝑖|𝑑𝑖|
Continuous Data - M.D. = where di = mi- 𝑋̅ 𝑋̅ = mean
𝑁
Coefficient of Mean Deviation:
Mean Deviation
Coefficient of Mean Deviation =
𝑀𝑒𝑎𝑛 𝑜𝑟 𝑀𝑒𝑑𝑖𝑎𝑛 𝑜𝑟 𝑀𝑜𝑑𝑒
4) Standard Deviation :
̅ then the standard deviation
If X1 , X2 , X3, ...... Xn are n observations and di= Xi - 𝑋
Is denoted by S.D. and is given by
2 2
𝜎 = S.D. = √{(∑ni=1 di ⁄n) − (∑ni=1 di ⁄n) }
2 2
Raw Data ----- ---- 𝜎 = S.D. = √{(∑ni=1 di ⁄n) − (∑ni=1 di ⁄n) } where 𝑑𝑖 = 𝑋𝑖 - A
2 2
Discrete Data ----- 𝜎 = S.D. = √{(∑ni=1 𝑓𝑖 di ⁄𝑁 ) − (∑ni=1 𝑓𝑖 di ⁄N) } where 𝑑𝑖 = 𝑋𝑖 - A
2 2
Continuous Data - 𝜎 = S.D. = √{(∑ni=1 𝑓𝑖 di ⁄N) − (∑ni=1 𝑓𝑖 di ⁄N) } where 𝑑𝑖 = 𝑚𝑖 -
A
Coefficient of Variation:
C.V. = (𝜎⁄ 𝑋̅ ) 𝑋 100
Where 𝜎 = S.D.
𝑋̅ = Mean
PROBLEMS ON MEASURES OF CENTRAL TENDENCY:
1) PROBLEMS ON ARITHMETIC MEAN:
a) Direct Method:
Raw Data:
1) Find the average for the following data
45, 22, 76, 82, 53, 79, 54, 69, 73, 67
𝛴𝑋 620
Solution: 𝑋̅ = = = 62
𝑛 10
Discrete Data:
1) Find the Arithmetic mean for the following data
X 10 20 30 40 50 60
F 5 15 25 20 10 5
Solution:
X F Xf
7
10 5 50
20 15 300
30 25 750
40 20 800
50 10 500
60 5 300
𝞢f = 80 𝞢X f =2700
𝛴𝑓𝑋 2700
𝑋̅ = = =33.75
𝛴𝑓 80
Solution:
Family Income(x) 𝑑𝑖 = 𝑋𝑖 – A
A 90 -10
B 75 -25
C 60 -40
D 100 (A) 0
E 125 25
F 50 -50
G 80 -20
H 120 20
I 500 400
J 400 300
8
𝞢𝑑𝑖 = 600
Σdi
𝑋̅ = A + = 100 + 600
10
= 100 + 60 = 160
n
Discrete Data:
Problem -1 Calculate the average for the following data
X 10 20 30 40 50 60
F 5 15 25 20 10 5
Solution:
X 𝑓𝑖 𝑑𝑖 = 𝑋𝑖 – A 𝑓𝑖 𝑑𝑖
10 5 -30 -150
20 15 -20 -300
30 25 -10 -250
40 20 0 0
(A)
50 10 10 100
60 5 20 100
Σ𝑓𝑖 di −500
𝑋̅ = A + = 40 + ⌈ ⌉ = 40 -6.25
Σ𝑓𝑖 80
= 33.75
Continuous Data :
1) Find the Arithmetic mean for the following data
C.I 0-10 10-20 20-30 30-40 40-50 50-60 60-70 70-80 80-90
f 1 4 10 22 30 35 10 7 1
Solution:
C.I fi 𝑚𝑖 𝑓𝑖 𝑚𝑖 𝑑𝑖 = 𝑓𝑖 𝑑𝑖 di 𝑓𝑖 𝑑̅𝑖
𝑑̅𝑖 =
𝑚𝑖 -A c
0-10 1 5 5 -50 -50 -5 -5
50-60 35 55 A 1925 0 0 0 0
80-90 1 85 85 30 30 3 3
1) PROBLEMS ON MEDIAN:
Raw Data :
Problem -1 Find the median for the following data also calculates 𝑄1 & 𝑄3 values.
X 120 170 100 110 180 220 160
Solution:
100
110 → 𝑄1
120
160 → 𝑄2
170
10
Problem -2 obtain the value of the median from the following data. Also calculates 𝑄1 & 𝑄3 values.
X 391 384 591 407 672 522 777 753 2,488 1,490
Solution:
S.NO. Ascending Order
X
Arrange the given data in ascending order
1 384 n=10
(𝑛⁄2)𝑡ℎ 𝑡𝑒𝑟𝑚 +(𝑛+1⁄2)𝑡ℎ 𝑡𝑒𝑟𝑚
2 391 𝑀𝑑 =
2
(10⁄2)𝑡ℎ 𝑡𝑒𝑟𝑚 +(10+1⁄2)𝑡ℎ 𝑡𝑒𝑟𝑚
3 407 = 2
5𝑡ℎ 𝑡𝑒𝑟𝑚 +6𝑡ℎ 𝑡𝑒𝑟𝑚
4 522 =
2
591 +672
= 2
= 631.5 ⟹𝑄2 = 631.5
5 591
(𝑛⁄4) 𝑡𝑒𝑟𝑚 +(𝑛+1⁄4)𝑡ℎ 𝑡𝑒𝑟𝑚
𝑡ℎ
𝑄1 =
6 672 2
(10⁄4)𝑡ℎ 𝑡𝑒𝑟𝑚 +(10+1⁄4)𝑡ℎ 𝑡𝑒𝑟𝑚
=
2
7 753
2.5𝑡ℎ 𝑡𝑒𝑟𝑚 +(11/4)𝑡ℎ 𝑡𝑒𝑟𝑚
= 2
8 777 2.5𝑡ℎ 𝑡𝑒𝑟𝑚 +(2.75)𝑡ℎ 𝑡𝑒𝑟𝑚
= 2
9 1,490 2𝑡ℎ 𝑡𝑒𝑟𝑚 +(3)𝑡ℎ 𝑡𝑒𝑟𝑚 391 +407
= 2
= 2
= 399
10 2,488 ⟹ 𝑄1 = 399
(3𝑛⁄4)𝑡ℎ 𝑡𝑒𝑟𝑚 +(3(𝑛+1)⁄4)𝑡ℎ 𝑡𝑒𝑟𝑚
𝑄3 =
2
(3(10)⁄4)𝑡ℎ 𝑡𝑒𝑟𝑚 +(3(10+1)⁄4)𝑡ℎ 𝑡𝑒𝑟𝑚
= 2
11
F 5 15 25 20 10 5
Solution:
X f C.f ⟹N = 80
N 80
⟹ = = 20 ⟹ 𝑄1 = 20
10 5 5 4 4
N 80
⟹ = =40 ⟹ 𝑀𝑑 𝑜𝑟 𝑄2 = 30
2 2
20→ 𝑄1 15 20 3N 3(80)
⟹ = = 60 ⟹ 𝑄3 = 40
4 4
30→ 𝑄2 25 45
40→ 𝑄3 20 65
50 10 75
60 5 80
Continuous Data
Problem -1 Find the median for the following data also calculates 𝑄1 & 𝑄3 values.
C.I 0-10 10-20 20-30 30-40 40-50 50-60
F 4 6 10 15 8 7
Solution:
C.I f C.f
0-10 4 4
10-20 6 10 → 𝑚1
12
20-30 10 → 𝑓1 20 → 𝑚2 N = 50
N 50
30-40 15 → 𝑓2 35 → 𝑚3 ⟹ = = 12.5
4 4
N 50
40-50 8 → 𝑓3 43
⟹ = = 25
2 2
3N 3(50)
50-60 7 50 ⟹ = = 37.5
4 4
(N⁄4)− m1
𝑄1 = 𝐿1 + ⌈ ⌉Xc
f1
12.5−10
= 20 + ⌈ ⌉ X 10
10
= 20 + 2.5
= 22.5
(N⁄2)− m2 25−20
𝑄2 = 𝐿2 + ⌈ ⌉Xc = 30 + ⌈ ⌉ X 10 = 30 +3.33 = 33.33
f2 15
3(N⁄4)− m3 37.5−35
𝑄3 = 𝐿3 + ⌈ ⌉Xc = 40 + ⌈ ⌉ X 10 = 40 +3.125 = 43.125
f3 8
2) PROBLEMS ON MODE:
Raw Data:
Problem -1 Find the mode for the following data
0,6,1,7,2,3,7,6,6,2,6,6,5,6,0
Solution:
X F F ∴ MODE = 6
0 II 2
1 I 1
2 II 2
3 I 1
5 I 1
6 → 𝑀𝑂 IIII I 6
7 II 2
Discrete Data :
Problem -1 Find the mode for the following data
Height 57 59 61 62 63 64 65 66 67 69
13
(in inches)
f 3 5 7 10 20 22 24 5 2 2
Solution:
Height F
(in inches)
57 3
59 5
61 7
62 10
63 20
64 22
65 Mo 24
66 5
67 2
69 2
Mode=65
Continuous Data:
Problem -1 Find the mode for the following data
F 4 12 40 41 27 13 9 4
Solution:
C.I F
0-400 4
400-800 12
14
800-1200 40 → 𝑓0
L →1200-1600 41 → 𝑓1
1600-2000 27 → 𝑓2
2000=2400 13
2400-2800 9
2800-3200 4
𝑓1−𝑓𝑜 41−40
𝑀𝑂 = L + XC = 1200 + ⌈2(41)−40−27⌉ X400 = 1200 + 22.6 = 1226.6
2𝑓1−𝑓𝑜−𝑓2
Problems on Geometric Mean:
Raw Data:
Problem -1 Find the Geometric mean for the following data
X 2000 200 20 12 8
Solution:
X log 𝑋𝑖 𝛴 log 𝑋𝑖
G.M. = Anti log ⌈ 𝑛
⌉
8.8852
2000 3.3010 = Anti log ⌈ ⌉
5
= Anti log [1.7770]
200 2.3010
= 59.8411
20 1.3010
12 1.0792
8 0.9030
𝞢 log 𝑋𝑖
= 8.8852
Discrete Data:
Problem -1 Find the geometric mean for the following data
X 10 20 30 40 50 60
f 15 18 22 16 12 7
Solution:
X f log 𝑋𝑖 f (log 𝑋𝑖 )
15
10 15 1 15
20 18 1.3010 23.418
30 22 1.4771 32.4962
40 16 1.6020 25.6336
50 12 1.6989 20.3868
60 7 1.7781 12.4467
Σfi log Xi
G.M. = Antilog [ ] = Antilog [129.3797 ] = Antilog [1.4375] = 27.384
𝑁 90
Continuous Data:
Problem -1 Find the Geometric mean for the following data.
C.I 15-20 20-25 25-30 30-35 35-40 40-45
f 4 20 38 24 10 4
Solution:
C.I F 𝑚𝑖 log𝑚𝑖 f (log 𝑚𝑖 )
𝛴𝑓 (log 𝑚𝑖 )
G.M. =Antilog [ ] = Anti log [145.2324] = Anti log [1.4523] = 28.34
𝑁 100
16
Solution:
X 1⁄ 𝑛 6
𝑋𝑖 H.M. = 1
= = 3.96
𝛴 ( ) 1.5163
𝑋 𝑖
200 0.005
300 0.003
20 0.05
12 0.0833
8 0.125
0.8 1.25
𝞢1⁄𝑋 =
𝑖
1.5163
Discrete Data:
Problem -1 Calculate harmonic mean for the following data
X 24 26 30 42 17 11
f 2 9 7 14 24 5
Solution:
X 𝑓𝑖 𝑓𝑖
⁄𝑋
𝑖
24 2 0.083
26 9 0.346
30 7 0.233
42 14 0.333
17
17 24 1.411 N 61
H.M. = f = = 21.33
Σ Xi 2.86
11 5 0.454 i
𝞢𝑓𝑖 = 61 𝑓
𝞢 𝑖⁄𝑋 = 2.86
N 𝑖
Continuous Data:
Problem-1 Calculate the harmonic mean for the following data
C.I 100-110 110-120 120-130 130-140 140-150
F 12 18 25 22 18
Solution:
C.I 𝑓𝑖 𝑚𝑖 𝑓𝑖⁄ N 95
𝑚𝑖 H.M. = f =
Σ (mi ) 0.7577
100-110 12 105 0.1142 i
= 125.379
110-120 18 115 0.1565
𝞢𝑓𝑖 = 95 𝑓
𝞢 𝑖⁄𝑚𝑖 = 0.7577
F 6 14 10 7 5 3
Continuous Data
Problem-1: Find the range for the following data
C.I 0-10 10-20 20-30 30-40 40-50 50-60 60-70
F 5 8 12 20 15 7 3
Marks 25 35 45 17 35 20 55
Solution:
𝑛+1 7+1 8
S.NO. Marks (𝑋𝑖 ) Ascending order 𝑄1 = = = = 2𝑛𝑑 term = 20
4 4 4
3(𝑛+1) 3(7+1) 24
1 25 17 𝑄3 = 4 = 4 = 4 = 6𝑛𝑑 term = 45
𝑄 −𝑄 45−20
Q.D. = 3 2 1 = 2 = 12.5
2 35 20 → 𝑄1
𝑄 −𝑄 45−20 25
Coefficient of Q.D. = 𝑄3 + 𝑄1 = 45+20 = 65
3 1
3 45 25
= 0.3846
4 17 35
5 35 35
6 20 45 → 𝑄3
7 55 55
Discrete Data:
Problem-1 Find the quartile deviation for the following data
X 30 20 40 50 10 60
F 15 7 8 7 4 2
Solution:
X F Ascending f Cumulative
order(X) frequency (c .f.)
30 15 10 4 4
19
𝑁 43
20 7 20→ 𝑄1 7 11→ Q.D. class 𝑄1 = 4 = 4 = 10.73≅ 11
⟹𝑄1 = 20
40 8 30 15 26
3𝑁 3(43)
𝑄3 = 4
= 4
= 32.25≅ 32
50 7 40 → 𝑄3 8 34 → Q.D. class ⟹𝑄3 = 40
10 4 50 7 41
60 2 60 2 43
𝑄3 − 𝑄1 40−20
Q.D. = 2
= 2
= 10
𝑄 −𝑄 40−20 20
Coefficient of Q.D. = 𝑄3 + 𝑄1 = 40+20 = 60 = 0.3334
3 1
Continuous Data:
Problem-1 Find the quartile deviation for the following data
C.I 0-10 10-20 20-30 30-40 40-50 50-60 60-70
f 4 8 10 16 11 7 3
Solution:
C.I F Cumulative frequency (c.f.)
0-10 4 4
10-20 8 12 → 𝑚1
𝐿1 →20-30 10 → 𝑓1 22
30-40 16 38 → 𝑚3
𝐿3 →40-50 11 → 𝑓3 49
50-60 7 56
60-70 3 59
N
( )−m1 14.75−12
4
𝑄1 = 𝐿1 + [ f1
] XC = 20 + [ 10
] X10 = 20 +[2.75] = 22.75
3N
( )−m3 44.25−38
4
𝑄3 = 𝐿3 + [ ] XC = 40 + [ 11
] X10 = 40 +[5.68] = 45.68
f3
𝑄3 − 𝑄1 45.68−22.75
Q.D. = 2
= 2
= 11.465
𝑄 −𝑄 45.68−22.75 22.93
Coefficient of Q.D. = 𝑄3 + 𝑄1 = 45.68+22.75 = 68.43 =0 .3351
3 1
3) Problems on Mean Deviation:
20
Raw Data:
Problem-1 Find the mean deviation for the following data
X 7 4 10 9 15 12 7 9 7
Solution:
X Ascending |𝑑𝑖 | = |𝑋𝑖 − 𝑋̅| ̅ Σ𝑋𝑖 80
X= = = 8.9
Order (𝑋𝑖 ) 𝑛 9
7 4 4.9 Σ |di | 21.1
M.D. = = = 2.344
n 9
𝑀.𝐷 2.34
4 7 1.9 Coefficient of M.D. = 𝑀𝑒𝑎𝑛 = 8.9 = 0.26
10 7 1.9
9 7 1.9
15 9 0.1
12 9 0.1
7 10 1.1
9 12 3.1
7 15 6.1
𝞢 𝑋𝑖 = 80 𝞢 |𝑑𝑖 | = 21.1
Discrete Data:
Problem-1 Find the mean deviation for the following data
X 10 15 20 30 40 50
f 8 12 15 10 3 2
Solution:
10 8 80 11.6 92.8
20 15 300 1.6 24
30 10 300 8.4 84
21
f 5 8 7 12 28 20 10 10
Solution:
C.I f 𝑚𝑖 𝑓𝑖 𝑚𝑖 |𝑑𝑖 | =|𝑚 − 𝑋̅| 𝑓𝑖 |𝑑𝑖 |
0-10 5 5 25 40 200
40-50 28 45 1260 0 0
Σ 𝑓 𝑖 𝑚𝑖 4500
̅
X= = = 45
𝑁 100
Σfi |di | 1400
M.D. = = = 14
N 100
𝑀.𝐷. 14
Coefficient of M.D. = 𝑀𝑒𝑎𝑛 = 45 = 0.3111
4) Problems on Standard Deviation:
Raw Data:
Problem-1 Find the Standard deviation for the following data
22
X 8 10 12 14 16 18 20 22 24 26
Solution:
X 𝑑𝑖 = 𝑋𝑖 – A 𝑑𝑖2
8 -8 64
10 -6 36
12 -4 16
14 -2 4
16 → A 0 0
18 2 4
20 4 16
22 6 36
24 8 64
26 10 100
𝛴𝑑𝑖 = 10 𝞢𝑑𝑖2 = 340
Σ𝑑𝑖 10
̅
X= = 10 = 1
𝑛
Σ𝑑𝑖2 340
S.D.(𝜎)= √[ ]−⌈̅
X 2 ⌉ = √ 10 − (12 ) = √34 − 1 = √33 = 5.74
𝑛
σ 5.74
C.V. = ̅ X 100 = 1 X 100 = 574
X
Discrete Data:
Prolem-1 Find the Standard deviation for the following data
X 5 15 25 35 45 55 65
f 3 10 20 30 15 12 10
Solution:
X F 𝑑𝑖 = 𝑋𝑖 - A 𝑓𝑖 𝑑𝑖 𝑓𝑖 𝑑𝑖2
35→ A 30 0 0 0
45 15 10 150 1500
55 12 20 240 4800
65 10 30 300 9000
Σ𝑓𝑖 𝑑𝑖 200
̅
X= = 100 = 2
𝑁
Σfi 𝑑𝑖2 24000
S.D.(𝜎)= √[ ̅ 2 ⌉ = √(
]−⌈X ) − (22 ) = √240 − 4 = √236 = 15.36
𝑁 100
σ 15.36
C.V.= ̅ X 100 = 2 X 100= 768
X
Continuous Data:
Problem-1 Find the Standard deviation for the following data
C.I 0-10 10-20 20-30 30-40 40-50 50-60 60-70 70-80
f 5 8 7 12 28 20 10 10
Solution:
C.I f 𝑚𝑖 𝑑𝑖 = 𝑚𝑖 - A 𝑓𝑖 𝑑𝑖 𝑓𝑖 𝑑𝑖2
40-50 28 45 →A 0 0 0
Σ𝑓𝑖 𝑑𝑖 0
̅
X =A + X c = 45 + 100 X 10 = 45 + 0 = 45
𝑁
Σfi 𝑑𝑖2 34,200
S.D.(𝜎)= √[ ̅ 2 ⌉ = √(
]−⌈X ) − (452 ) = √342 − 2025 = √−1683
𝑁 100
= 41.0243
σ 41.0243
C.V. = ̅ X 100 = X 100 = 91.55
X 45
UNIT -I
CORRELATION
Uni-variate Distribution
Bi-variate Distribution
Multi–variate Distribution
1. Uni–variate Distribution: The distribution involving only one variable is called “uni-
Variate distribution “.
Example: The heights of certain group of persons.
2. Bi – variate Distribution: The distribution involving only 2 variables is called “Bi-
Variate distribution “.
Example: The heights and weights of certain group of persons.
3. Multi-variate Distribution: The distribution involving 2 or more than variables is called
“Multi – variate distribution “.
Correlation:
Definition 1 If the change in one variable effects a change in the other variable, then
Variables are said to be “correlated variables”.
Definition 2 Correlation is an analysis of the ‘co-variation’ between 2 or more variables.
Types of Correlation:
Positive Correlation (or) Direct Correlation
Negative Correlation (or) Inverse Correlation
Perfect Correlation
1) Positive Correlation:
Definition 1 If the variables deviate in same direction then the variables are to be
“Positive correlation”.
Definition 2 In another words, if the increase in the value of one variable is accompanied
by increase in the value of other value or a decrease in the value of one variable is
accompanied by the decrease in the other variable, then the variables are said to be
“Directly correlated variables”.
Examples:1) Price & Supply of goods. 2) Income & Expenditures of a group of persons.
2) Negative Correlation:
Definition 1 If the variables deviate in opposite direction then the variables are to be
“Negative correlation”.
Definition 2 In another words, if the increase in the value of one variable is accompanied
by decrease in the value of other value or a decrease in the value of one variable is
accompanied by the increase in the other variable, then the variables are said to be “Directly
correlated variable”.
Examples:1) Volume & pressure of a perfect gas. 2) Price & Demand of goods.
3) Perfect Correlation:
Definition: If the deviation in one variable is followed by a corresponding and
proportional deviation in the other variable, then the variables are said to be “perfectly
correlated variables”.
Linear Correlation:
Page1
Definition: If the ‘ratio’ of the change is ‘uniform’, then there will be “linear correlation”
between the variables. If we plot these on the graph then we get a ‘straight line’.
Example: We can see that ‘ratio of the change between the variables is same.
A 2 7 12 17
B 3 9 15 21
Non-Linear Correlation:
Definition: The amount of change of one variable does not bear a constant ratio of the
amount of change in the other variables, and then the correlation is called “Non- linear
correlation”. Non-linear correlation is also called ‘Curvy linear correlation’.
Uses (or) Applications of Correlation:
1) Correlation is a measure of extent of relation between 2 variables.
2) By using the correlation coefficient we can predict the future.
3) Correlation coefficient will contribute the economic behaviour.
4) By using the correlation coefficient we can find the value of variable if the value of another
variable has given.
Perfect Linear Correlation:
Definition: If the all points lie exactly on the “straight line”, then the correlation is said
to be “perfect linear correlation”.
Perfect Positive Correlation:
Definition: If the correlation is linear and the line runs from lower left hand corner to the
upper right hand corner. Then the correlation is called “perfect positive correlation “. It is
denoted by r = +1 or r = -1.
Perfect Negative Correlation:
Definition: If the correlation is linear and the line runs from upper left hand corner to
lower right hand corner. Then the correlation is called “perfect negative correlation.
No Correlation:
If the plotted points lie scattered all over graph paper, then there is no correlation
between 2 variables. And the variables are said to be “Statistically independent”.
If r=0thevariablesX&Yaresaidto be “Independent”.
Perfect +ve correlation Perfect – ve correlation
No correlation No correlation
. . . . .
. . .
. . .
. . . . . . .
. . . . . . . .
Page2
. . . . . . . .
. . . . . . .
Methods of Studying Correlation:
There are 2 different methods for finding out the relationship between the
Variables.
1) Graphical Method 2)Mathematical Method
1) Graphical Method:
a) Scatter Diagram b) Scatter gram
2) Mathematical Method:
a) Karl Pearson’s Correlation Coefficient.
b) Spearman’s Rank Correlation.
c) Coefficient of Concurrent Deviation.
d) Methods of Least Squares.
Mathematical Method:
a) Karl Pearson’s Correlation Coefficient:
As a measure of ‘intensity ’or ‘degree ’of linear relationship between 2 variables,
Karl Pearson’s, a British Bio-metrician, developed a formula called “correlation coefficient”.
Correlation coefficient 2 variables X & Y, usually denoted by r (x, y) or 𝑟𝑋𝑌 and is given by
cov( x, y)
r ( x, y) rXY 1 , Where
V ( x) . V ( y )
1
Cov( X , Y ) E{( X E ( X ))(Y E (Y ))} E{( X X ).(Y Y ) E ( XY ) X Y ( XY ) X Y
n
2 2 1 2
V ( X ) E{( X E ( X )) 2 } E{ X 2 X } E ( X 2 ) E ( X ) X2 X
n
2 2 1 2
V (Y ) E{(Y E (Y )) 2 } E{Y 2 Y } E (Y 2 ) E (Y ) Y 2 Y
n
1 / n xy ( X )(Y )
r ( x, y )
1 / n X 2 ( X ) 2 . 1 / n Y 2 (Y ) 2
Properties of Correlation Coefficient:
1) Limits for correlation coefficient lies between -1 & +1.
i.e.-1 ≤r(x,y)≤+1.
2) Correlation coefficient is independent of change of origin & scale.
3) Two independent variables are un-correlated. Its converse need not be true.
Regression:
Definition: “Regression Analysis” is a mathematical measure of average relationship
between 2 or more variables in terms of the original units of the data.
In regression Analysis there 2 types of variables, dependent variable & independent variable.
The variable whose value is ‘influenced’ or is to be ‘predicted’ is called ‘Dependent variable’
The variable which ‘influences’ or is used for ‘prediction’ is called “independent variable”.
Lines of Regression:
The line of regression is the line which gives the best estimate to the of one variable
for any specific value of the other variable. Thus the line of regression is the line of ‘best fit’,
Which can be obtained by using “principle of least square “technique.
Linear Regression:
If the points in the scatter diagram are a straight line, then it is called “linear
Regression”.
Page3
Non-Linear Regression:
If the points in the scatter diagram is a curve, then it is called “non-linear
Regression” or “curvy-linear regression”.
Curve of Regression:
If the variables in a bi-variate distribution are related, we find that the points in the
Scatter diagram will cluster round some curve is called “curve of regression”.
Let us suppose that in the bi-variate distribution (x,y) i=1,2, ....... ,n where
X= independent variable Y=dependent variable. Let the line of the regression Y on X be
Y=a+bX→1
According to the principle of least squares, the normal equations for estimating
a & b are
in1 yi n.a b in1 xi 2 , in1 ( xi )( yi ) a in1 ( xi ) b in1 xi2 3
Regression Equations:
1) Regression Equation Yon X 2) Regression Equation X on Y
Regression Equation Yon X:
Since b is the ‘slope’ of the line of regression of Y on X. And since the line of
Regression passes through the point (𝑥̅ ,𝑦̅ ), and its equation is
Y y byx ( X x) Y y r[ y / x ]( X x)
2) If one of the regression coefficients is greater than the unity, then other must be less
than unity. i.e. 𝑏𝑥̅𝑦̅≤1 ⟹𝑏𝑦̅𝑥̅≥1
3) Arithmetic Mean (A.M.) of the regression coefficients is equals to the correlation
coefficient.1⁄2 [𝑏𝑥̅𝑦̅+𝑏𝑦̅𝑥̅] ≥ r
4) Regression coefficient is independent of change of origin but not scale.
Page4
5) The angle between 2 regression lines are
1 r 2 x2 . y2
tan 1
. 2 2
r x y
PROBLEMSONCORRELATIONCOEFFICIENT:
Problem -1: Calculate the correlation coefficient for the following heights (ininches) of father(X)
and their sons (Y)
X 65 66 67 67 68 69 70 72
Y 67 68 65 68 72 72 69 71
Solution:
X Y 𝑋2 𝑌2 XY
65 67 4225 4489 4355
66 68 4356 4624 4488
67 65 4489 4225 4355
67 68 4489 4624 4556
68 72 4624 5184 4896
69 72 4761 5184 4968
70 69 4900 4761 4830
72 71 5184 5041 5112
𝞢X= 544 𝞢Y= 552 𝞢𝑋2=37028 𝞢𝑌2=38132 𝞢XY= 37560
From the above table we have
𝞢X= 544, 𝞢Y= 552, 𝞢𝑋2=37028, 𝞢𝑌2=38132, 𝞢XY= 37560
X 544 Y 352
X 68, Y 69
n 8 n 8
The correlation coefficient is given by
37560
(68)(69)
cov( X , Y ) 1 / n( XY ) ( X )(Y ) 8
r ( X ,Y )
V ( X ) . V (Y ) 1 / n X 2 ( X ) 2 . 1 / n Y 2 (Y ) 2
37028
68 2.
38132
69 2
8 8
4695 4692 3 3 3
= 0.6030
(4628.5 4624) . (4766.5 4761) (4.5).(5.5) (24.75) 4.9749
Problem-2:
Calculate the correlation coefficient for the following heights (in inches) of father(X) and their
sons (Y)
X 65 66 67 67 68 69 70 72
Y 67 68 65 68 72 72 69 71
Solution:
X Y U=X-68 Y=Y-69 𝑈2 𝑉2 UV
65 67 -3 -2 9 4 6
66 68 -2 -1 4 1 2
67 65 -4 -4 1 16 4
67 68 -1 -1 1 1 1
68 72 0 3 0 9 0
69 72 1 3 1 9 3
70 69 2 0 4 0 0
72 71 4 2 16 4 8
𝞢X=544 𝞢Y=552 𝞢U=0 𝞢V=0 𝞢𝑈2=36 𝞢𝑉2=44 𝞢UV=24
Page5
The correlation coefficient is
COV (U ,V )
r (U ,V ) (1)
U . V
U 0 V 0
U 0 U 0 V 0 V 0
n 8 n 8
1 24
Cov(U ,V ) UV (U ,V ) (0)(0) 3 0 3 Cov(U ,V ) 3
n 8
1 2 36 2
U2 U 2 U 0 4.5 0 4.5 U2 4.5
n 8
1 2 44 2
V2 V 2 V 0 5.5 0 5.5 V2 5.5
n 8
3 3 3
r (U ,V ) 0.6030 r (U ,V ) 0.6030
4.5. 5.5 24.75 4.9749
X 1004 Y 1061
X 83.67 X 83.67, Y 88.42 Y 88.42
n 12 n 12
U 4 V 5
U 0.34 U 0.34, V 0.42 V 0.42
n 12 n 12
COV (U ,V )
r (U ,V ) (1)
U . V
Page6
1 287
Cov(U ,V ) UV (U .V ) (0.34)(0.42) 23.92 0.14 23.78
n 12
1 2 488
U2 U 2 U (0.34) 2 40.67 0.110 40.56
n 12
1 2 365
V2 V 2 V (0.42) 2 30.42 0.18 30.24
n 12
23.78 23.78 23.78 23.78
r (U ,V ) 0.6788
40.56 30.24 (40.56)(30.24) 1226.53 35.03
The regression equation Y on X is
5.50
Y y r[ y / x ]( X x) (Y 88.42) 0.68 ( X 83.67) (Y 88.42) 0.68(0.86)( X 83.67)
6.37
(Y 88.42) (0.59)( X 83.67)
PROBLEM2:
By using the following data, find out the ‘2’ lines of regression and from then compute the Karl
Pearson’s coefficient of Correlation.
ΣX =250 ΣY=300 ΣXY=7900 ΣX2=6500 ΣY2=10,000 n= 10
Solution:
X 250 Y 300
X 25, Y 30
n 10 n 10
We know that Karl Pearson’s Correlation coefficient is given by
700
(25)(30)
cov( X , Y ) 1 / n( XY ) ( X )(Y ) 10
r ( X ,Y )
V ( X ) . V (Y ) 1 / n X 2 ( X ) 2 . 1 / n Y 2 (Y ) 2 6500
25 2.
10000
30 2
10 10
790 750 40 40 40
0.8
(650 625) . (1000 900) (25).(100) (2500) 50
REGRESSION EQUATION Y ON X:
The Regression equation Y on X is given by
10
Y y r[ y / x ]( X x) (Y 30) 0.8 ( X 25)
5
(Y 30) 0.8(2)( X 25) (Y 30) 1.6( X 25)
REGRESSION EQUATION X ON Y:
The Regression equation X on Y is given by
5
X x r[ x / y ](Y y ) ( X 25) 0.8 (Y 30)
10
1
( X 25) 0.8 (Y 30) ( X 25) 0.4(Y 30)
2
Page7