Download as pdf or txt
Download as pdf or txt
You are on page 1of 31

1

Measures of Central Tendency


Definition:
 Average is a measure which represents the huge volume of data into a single numerical value.
 An average gives us an idea about the concentration of the values in the central part of the
distribution.
 Averages are the typical values around which the other distribution concentrates.
Types of Measures
1) Arithmetic Mean (or) Average
2) Median
3) Mode
4) Geometric Mean
5) Harmonic Mean

Characteristics of Measures of central tendency

 It should be easy to understand and easy to calculate.


 It should be based on all items.
 It should be capable for further algebraic calculations.
 It should be rigidly defined.
 It should not affected by the extreme observations.
 It should not affected by the fluctuations of the sampling.

Demerits of measures of Central Tendency

 It can’t be determined by inspection method nor can’t locate by graphically.


 Arithmetic mean can’t be used for qualitative characteristics, which cannot be measured
quantitatively. Ex. Honesty, Intelligence, beauty, etc.
 Arithmetic mean cannot be used for open ended class-intervals.
Ex. below 90 and above 100.
 Arithmetic mean is affected by extreme values.
 Arithmetic mean leads to wrong conclusions if the details of the data from which it is computed
are given.
 Arithmetic mean cannot be obtained if the single observation is missing or lost from the
remaining values.
 Arithmetic mean is not suitable measure for extremely asymmetric distribution.

Method to calculate Average

1) Direct method.
2) In-direct method (or) Deviation method.
3) Step Deviation method.

1) Direct method :
2

 Raw Data ----------- 𝑋̅ = ∑𝑛𝑖=1 𝑋𝑖 ⁄𝑛


 Discrete Data ----- 𝑋̅ = ∑𝑛𝑖=1 𝑓𝑖 𝑋𝑖 ⁄∑𝑛𝑖=1 𝑓𝑖
 Continuous Data- 𝑋̅ = ∑𝑛𝑖=1 𝑓𝑖 𝑚𝑖 ⁄∑𝑛𝑖=1 𝑓𝑖
2) Deviation Method :
 Raw Data ----------- 𝑋̅ = A + ∑𝑛𝑖=1 𝑑𝑖 ⁄𝑛 where 𝑑𝑖 = 𝑋𝑖 - A =
 Discrete Data ----- 𝑋̅ = A + ∑𝑖=1 𝑓𝑖 𝑑𝑖 ⁄∑𝑖=1 𝑓𝑖 where 𝑑𝑖 = 𝑋𝑖 - A
𝑛 𝑛

 Continuous Data- 𝑋̅ = A + ∑𝑛𝑖=1 𝑓𝑖 𝑑𝑖 ⁄∑𝑛𝑖=1 𝑓𝑖 where 𝑑𝑖 = 𝑚𝑖 - A =


3) Step-Deviation Method :
 Raw Data ----------- 𝑋̅ = A + (∑𝑛𝑖=1 𝑑̅𝑖 ⁄𝑛) 𝑋 𝐶 where 𝑑̅𝑖 = 𝑑𝑖 ⁄𝐶
 Discrete Data ----- 𝑋̅ = A + (∑𝑛𝑖=1 𝑓𝑖 𝑑̅𝑖 ⁄∑𝑛𝑖=1 𝑓𝑖 ) X C where 𝑑̅𝑖 = 𝑑𝑖 ⁄𝐶
 Continuous Data- 𝑋̅ = A + ∑𝑛𝑖=1 𝑓𝑖 𝑑̅𝑖 ⁄∑𝑛𝑖=1 𝑓𝑖 X C where 𝑑̅𝑖 = 𝑑𝑖 ⁄𝐶

2) Median:

Median is defined as “middle most “or “Central value “of the set of the observations, when
Observations are arranged in ascending or descending order of their magnitude.
It divides the given arranged series into two equal parts. Median is also known as ‘Positional
Average “.Whereas mean is known as ‘Calculated average “.
When a series consists of even number of terms then median is known as arithmetic mean
Of the central items. It is denoted by𝑀𝑑 .
Formulas:
 Raw Data ----------- Arrange the given set of data in ascending or descending
Order.
Case – i) If n is odd then median is the value given by
𝑡ℎ
𝑀𝑑 = (𝑛 + 1)⁄2 term Where n = No. of observations
Case –ii) If n is even number then median is given by
(𝑛⁄2)+(𝑛+1⁄2)
𝑀𝑑 = the term
2
 Discrete Data ------
STEP -1: Find the cumulative frequencies of the given data.
𝑛
STEP -2: Find N = ∑𝑖=1 𝑓𝑖
STEP -3: Find the cumulative frequency just greater than 𝑁 ⁄2 and the corresponding
value of X is known as median value.
 Continuous Data---
STEP -1: Find the cumulative frequencies of the given data.
𝑛
STEP -2: Find N = ∑𝑖=1 𝑓𝑖
STEP -3: Then value of median is given by
𝑁⁄2−𝑚
𝑀𝑑 = L + {
𝑓
} 𝑋𝐶
3

Where L = Lower limit of the median class


f = frequency of the median class
m = the cumulative frequency preceding the median class
C = width of the class interval
𝑛
N = ∑𝑖=1 𝑓𝑖 = sum of the frequencies.
3) MODE:
Mode is a value in a series which occurs most frequently. In a frequency distribution mode
Is the value which has the maximum frequency. In other words, mode is the value which has the
Greatest frequency density in its neighbourhood. Mode is also known as most frequent value or difficult
value or predominant value or most fluctuation value or norm value.
FORMULAS:
 Raw Data ----------- In this case the value which has maximum frequency is known as mode
value.
 Discrete Data ------ In this case mode is the value which has maximum frequency
corresponding the X
 Continuous Data---
STEP -1: Find the cumulative frequencies of the given data.
𝑛
STEP -2: Find N = ∑𝑖=1 𝑓𝑖
STEP -3: Then value of mode is given by
𝑓1−𝑓𝑜
𝑀𝑂 = L + XC
2𝑓1−𝑓𝑜−𝑓2
L=
𝑓𝑜= Preceding
𝑓1=
𝑓2=Succeeding
C = width
4) GEOMETRIC MEAN:
The geometric mean of n observations is the n th root of the product of the observations. Let X1,
X2, X3 ... Xn are given set of n observations then the geometric mean is given by
𝑛 1⁄
G.M. = √(X1), ( X2), ( X3). . (. Xn ) = {(X1), ( X2), ( X3). . (. Xn )} 𝑛

If n= 2 the the geometric mean is the square root of the product of the observations.
EXA MPLE: The geometric mean of 4 and 16

2 2
G.M. = √(4). (16) = √64 = 8

If the observations are greater than 2 then the computation of n th root is not suitable, in that case we
can take logarithm.
4

1⁄
Log (G.M.) = log {(X1), ( X2), ( X3). . (. Xn )} 𝑛 = 1⁄𝑛 𝑙𝑜𝑔 {(X1), ( X2), ( X3). . (. Xn )}

= 1⁄𝑛 {log(𝑋1 ) + log(𝑋2 ) + log(𝑋3 ) … … . +log(𝑋𝑛 )}

FORMULAS:
 Raw Data ------------- G.M. = Anti log {(1⁄𝑛)(∑𝑛
𝑖=1 log 𝑋𝑖 )}
 Discrete Data ------ G.M. = Anti log {(1⁄𝑁)(∑𝑛𝑖=1 𝑓𝑖 log 𝑋𝑖 )}
 Continuous Data--- G.M. = Anti log {(1⁄𝑁)(∑𝑛𝑖=1 𝑓𝑖 log 𝑚𝑖 )}
5) Harmonic Mean:

The harmonic mean is the reciprocal of arithmetic mean of reciprocal of observations.

If X1, X2, X3 ... Xn are given set of n observations then the harmonic mean is given by

1
H.M. =
1⁄𝑛(∑𝑛
𝑖=11 1/𝑋𝑖 )

FORMULAS:
1
 Raw Data ------------- H.M. =
1⁄𝑛(∑𝑛
𝑖=1 1/𝑋𝑖 )
1
 Discrete Data ------ H.M. = fi
1⁄𝑁(∑n
i=1 ⁄Xi )
1
 Continuous Data--- H.M. = fi
1⁄N(∑n
i=1 ⁄mi )

Measures of dispersion
Definition:
The meaning of dispersion is ‘scateredness’. The measure of scatter of the given data
about the average is said to be a measure of dispersion.
Characteristics of Good Measure of Dispersion

 It should be easy to understand.


 It should be based on all items.
 It should be readily comprehensible.
 Its procedure should be simple.
 It should be rigidly defined.
 It should be capable for further algebraic calculations.
 It should not affected by the extreme observations.
 It should not affected by the fluctuations of the sampling.

Types of Measures
1) Range.
2) Quartile Deviation.(Q.D.)
5

3) Mean Deviation.(M.D.)
4) Standard Deviation.(S.D.)
In the above the first two measures are known as ‘positional averages’ and the remaining measures
are known as ‘calculated averages’.
Formulas:

1) Range :
Range is the difference between the values of the extreme values. It is denoted by R.
 Raw Data ----- ---- Range = R= (Largest value- Smallest value) = L-S
 Discrete Data ----- Range = R= (Largest value- Smallest value) = L-S
 Continuous Data - Range = R= (Largest value- Smallest value) = L-S
Coefficient of Range
𝐿−𝑆
Coefficient of range =
𝐿+𝑆

2) Quartile deviation :
Quartile deviation is denoted by Q.D. If Q1 is the first quartile and Q3 is the third
Quartile. Then quartile deviation is as follows
𝑄3−𝑄1
Q.D. =
2
𝑄3−𝑄1
 Raw Data ----- ---- Q.D. =
2
𝑄3−𝑄1
 Discrete Data ----- Q.D. =
2
𝑄3−𝑄1
 Continuous Data - Q.D. =
2
 Coefficient of Q.D
𝑄3−𝑄1
Coefficient of Q.D =
𝑄3+𝑄1

3) Mean Deviation :
If X1 , X2 , X3, ...... Xn are n observations and di= Xi – a then the mean deviation is
denoted by M.D. And is given by
∑𝑛 𝑖=0 |𝑑𝑖|
M.D. = ̅ 𝑋̅ = mean
where di = Xi- 𝑋
𝑛
∑𝑛
𝑖=0 |𝑑𝑖|
 Raw Data ----- ---- M.D. = ̅
where di = Xi- 𝑋 𝑋̅ = mean
𝑛
∑𝑛
𝑖=0 𝑓𝑖|𝑑𝑖|
 Discrete Data ----- M.D. = ̅
where di = Xi- 𝑋 𝑋̅ = mean
𝑁
6

∑𝑛
𝑖=0 𝑓𝑖|𝑑𝑖|
 Continuous Data - M.D. = where di = mi- 𝑋̅ 𝑋̅ = mean
𝑁
Coefficient of Mean Deviation:
Mean Deviation
Coefficient of Mean Deviation =
𝑀𝑒𝑎𝑛 𝑜𝑟 𝑀𝑒𝑑𝑖𝑎𝑛 𝑜𝑟 𝑀𝑜𝑑𝑒
4) Standard Deviation :
̅ then the standard deviation
If X1 , X2 , X3, ...... Xn are n observations and di= Xi - 𝑋
Is denoted by S.D. and is given by
2 2
𝜎 = S.D. = √{(∑ni=1 di ⁄n) − (∑ni=1 di ⁄n) }

2 2
 Raw Data ----- ---- 𝜎 = S.D. = √{(∑ni=1 di ⁄n) − (∑ni=1 di ⁄n) } where 𝑑𝑖 = 𝑋𝑖 - A

2 2
 Discrete Data ----- 𝜎 = S.D. = √{(∑ni=1 𝑓𝑖 di ⁄𝑁 ) − (∑ni=1 𝑓𝑖 di ⁄N) } where 𝑑𝑖 = 𝑋𝑖 - A

2 2
 Continuous Data - 𝜎 = S.D. = √{(∑ni=1 𝑓𝑖 di ⁄N) − (∑ni=1 𝑓𝑖 di ⁄N) } where 𝑑𝑖 = 𝑚𝑖 -
A

Coefficient of Variation:
C.V. = (𝜎⁄ 𝑋̅ ) 𝑋 100
Where 𝜎 = S.D.
𝑋̅ = Mean
PROBLEMS ON MEASURES OF CENTRAL TENDENCY:
1) PROBLEMS ON ARITHMETIC MEAN:
a) Direct Method:
 Raw Data:
1) Find the average for the following data
45, 22, 76, 82, 53, 79, 54, 69, 73, 67
𝛴𝑋 620
Solution: 𝑋̅ = = = 62
𝑛 10

 Discrete Data:
1) Find the Arithmetic mean for the following data
X 10 20 30 40 50 60

F 5 15 25 20 10 5

Solution:
X F Xf
7

10 5 50

20 15 300

30 25 750

40 20 800

50 10 500

60 5 300

𝞢f = 80 𝞢X f =2700

𝛴𝑓𝑋 2700
𝑋̅ = = =33.75
𝛴𝑓 80

b) In-Direct Method or Deviation Method:


 Raw Data
Problem -1 Calculate the average for the following data
Family A B C D E F G H I J

Income 90 75 60 100 125 50 80 120 500 400

Solution:
Family Income(x) 𝑑𝑖 = 𝑋𝑖 – A

A 90 -10

B 75 -25

C 60 -40

D 100 (A) 0

E 125 25

F 50 -50

G 80 -20

H 120 20

I 500 400

J 400 300
8

𝞢𝑑𝑖 = 600

Σdi
𝑋̅ = A + = 100 + 600
10
= 100 + 60 = 160
n
 Discrete Data:
Problem -1 Calculate the average for the following data
X 10 20 30 40 50 60

F 5 15 25 20 10 5

Solution:

X 𝑓𝑖 𝑑𝑖 = 𝑋𝑖 – A 𝑓𝑖 𝑑𝑖

10 5 -30 -150

20 15 -20 -300

30 25 -10 -250

40 20 0 0
(A)

50 10 10 100

60 5 20 100

𝞢𝑓𝑖 = 80 𝞢𝑓𝑖 𝑑𝑖 =-500

Σ𝑓𝑖 di −500
𝑋̅ = A + = 40 + ⌈ ⌉ = 40 -6.25
Σ𝑓𝑖 80

= 33.75
 Continuous Data :
1) Find the Arithmetic mean for the following data
C.I 0-10 10-20 20-30 30-40 40-50 50-60 60-70 70-80 80-90

f 1 4 10 22 30 35 10 7 1

Solution:
C.I fi 𝑚𝑖 𝑓𝑖 𝑚𝑖 𝑑𝑖 = 𝑓𝑖 𝑑𝑖 di 𝑓𝑖 𝑑̅𝑖
𝑑̅𝑖 =
𝑚𝑖 -A c
0-10 1 5 5 -50 -50 -5 -5

10-20 4 15 60 -40 -160 -4 -16


9

20-30 10 25 250 -30 -300 -3 -30

30-40 22 35 770 -20 -440 -2 -44

40-50 30 45 1350 -10 -300 -1 -30

50-60 35 55 A 1925 0 0 0 0

60-70 10 65 650 10 100 1 10

70-80 7 75 525 20 140 2 14

80-90 1 85 85 30 30 3 3

Total 𝛴𝑓𝑖 =120 𝞢𝑓𝑖 𝑚𝑖 𝞢𝑓𝑖 𝑑𝑖 = 𝞢𝑓𝑖 𝑑̅𝑖 =


=5620
-980 -98
𝑛
∑ 𝑓𝑖 𝑚𝑖 5620
Direct Method: 𝑋̅ = ∑𝑖=1
𝑛 𝑓 = 120 = 46.83
𝑖=1 𝑖
𝑛
∑ 𝑓𝑖 𝑑𝑖 − 980
Deviation Method: 𝑋̅ = A + ∑𝑖=1
𝑛 𝑓 = 55 + 120 = 55 – 8.17 = 46.83
𝑖=1 𝑖
𝑛 ̅
∑ 𝑓𝑖 𝑑𝑖 − 98
Step Deviation Method: 𝑋̅ = A + ( ∑𝑖=1
𝑛 𝑓 ) 𝑋 𝑐 = 55 + ( 120 ) 𝑋 10
𝑖=1 𝑖
− 980
= 55 + = 55 – 8.17 = 46.83
120

1) PROBLEMS ON MEDIAN:
 Raw Data :
Problem -1 Find the median for the following data also calculates 𝑄1 & 𝑄3 values.
X 120 170 100 110 180 220 160

Solution:

100

110 → 𝑄1

120

160 → 𝑄2

170
10

180 → 𝑄3 Arrange the given data in ascending order


n=7
220 𝑛+1 𝑡ℎ 7+1 𝑡ℎ 8 𝑡ℎ
𝑄2 Or 𝑚𝑑 = ( ) term = ( ) term = ( ) term
2 2 2
= 4𝑡ℎ term = 160 ⟹𝑚𝑑 = 160
𝑛+1 𝑡ℎ 7+1 𝑡ℎ 8 𝑡ℎ
𝑄1 = ( 4
) term = ( 4
) term = (4) term
= 2𝑡ℎ term = 110 ⟹𝑄1 = 110
3(𝑛+1) 𝑡ℎ 3(7+1) 𝑡ℎ 24 𝑡ℎ
𝑄3 = ( 4
) term = ( 4
) term = ( 4 ) term
𝑡ℎ
=6 term = 180 ⟹𝑄3 = 180

Problem -2 obtain the value of the median from the following data. Also calculates 𝑄1 & 𝑄3 values.
X 391 384 591 407 672 522 777 753 2,488 1,490

Solution:
S.NO. Ascending Order
X
Arrange the given data in ascending order
1 384 n=10
(𝑛⁄2)𝑡ℎ 𝑡𝑒𝑟𝑚 +(𝑛+1⁄2)𝑡ℎ 𝑡𝑒𝑟𝑚
2 391 𝑀𝑑 =
2
(10⁄2)𝑡ℎ 𝑡𝑒𝑟𝑚 +(10+1⁄2)𝑡ℎ 𝑡𝑒𝑟𝑚
3 407 = 2
5𝑡ℎ 𝑡𝑒𝑟𝑚 +6𝑡ℎ 𝑡𝑒𝑟𝑚
4 522 =
2
591 +672
= 2
= 631.5 ⟹𝑄2 = 631.5
5 591
(𝑛⁄4) 𝑡𝑒𝑟𝑚 +(𝑛+1⁄4)𝑡ℎ 𝑡𝑒𝑟𝑚
𝑡ℎ
𝑄1 =
6 672 2
(10⁄4)𝑡ℎ 𝑡𝑒𝑟𝑚 +(10+1⁄4)𝑡ℎ 𝑡𝑒𝑟𝑚
=
2
7 753
2.5𝑡ℎ 𝑡𝑒𝑟𝑚 +(11/4)𝑡ℎ 𝑡𝑒𝑟𝑚
= 2
8 777 2.5𝑡ℎ 𝑡𝑒𝑟𝑚 +(2.75)𝑡ℎ 𝑡𝑒𝑟𝑚
= 2
9 1,490 2𝑡ℎ 𝑡𝑒𝑟𝑚 +(3)𝑡ℎ 𝑡𝑒𝑟𝑚 391 +407
= 2
= 2
= 399
10 2,488 ⟹ 𝑄1 = 399
(3𝑛⁄4)𝑡ℎ 𝑡𝑒𝑟𝑚 +(3(𝑛+1)⁄4)𝑡ℎ 𝑡𝑒𝑟𝑚
𝑄3 =
2
(3(10)⁄4)𝑡ℎ 𝑡𝑒𝑟𝑚 +(3(10+1)⁄4)𝑡ℎ 𝑡𝑒𝑟𝑚
= 2
11

7.5𝑡ℎ 𝑡𝑒𝑟𝑚 +(33/4)𝑡ℎ 𝑡𝑒𝑟𝑚


= 2
7.5𝑡ℎ 𝑡𝑒𝑟𝑚 +(8.25)𝑡ℎ 𝑡𝑒𝑟𝑚
= 2
.
7𝑡ℎ 𝑡𝑒𝑟𝑚 +8𝑡ℎ 𝑡𝑒𝑟𝑚 753 +777
= 2
= 2
= 765 ⟹ 𝑄3 = 765
 Discrete Data:
Problem – 1 Find the median for the following data also calculate 𝑄1 & 𝑄3 values.
X 10 20 30 40 50 60

F 5 15 25 20 10 5

Solution:

X f C.f ⟹N = 80
N 80
⟹ = = 20 ⟹ 𝑄1 = 20
10 5 5 4 4
N 80
⟹ = =40 ⟹ 𝑀𝑑 𝑜𝑟 𝑄2 = 30
2 2
20→ 𝑄1 15 20 3N 3(80)
⟹ = = 60 ⟹ 𝑄3 = 40
4 4
30→ 𝑄2 25 45

40→ 𝑄3 20 65

50 10 75

60 5 80

 Continuous Data
Problem -1 Find the median for the following data also calculates 𝑄1 & 𝑄3 values.
C.I 0-10 10-20 20-30 30-40 40-50 50-60

F 4 6 10 15 8 7

Solution:
C.I f C.f

0-10 4 4

10-20 6 10 → 𝑚1
12

20-30 10 → 𝑓1 20 → 𝑚2 N = 50
N 50
30-40 15 → 𝑓2 35 → 𝑚3 ⟹ = = 12.5
4 4
N 50
40-50 8 → 𝑓3 43
⟹ = = 25
2 2
3N 3(50)
50-60 7 50 ⟹ = = 37.5
4 4
(N⁄4)− m1
𝑄1 = 𝐿1 + ⌈ ⌉Xc
f1
12.5−10
= 20 + ⌈ ⌉ X 10
10
= 20 + 2.5
= 22.5

(N⁄2)− m2 25−20
𝑄2 = 𝐿2 + ⌈ ⌉Xc = 30 + ⌈ ⌉ X 10 = 30 +3.33 = 33.33
f2 15
3(N⁄4)− m3 37.5−35
𝑄3 = 𝐿3 + ⌈ ⌉Xc = 40 + ⌈ ⌉ X 10 = 40 +3.125 = 43.125
f3 8
2) PROBLEMS ON MODE:
 Raw Data:
Problem -1 Find the mode for the following data
0,6,1,7,2,3,7,6,6,2,6,6,5,6,0
Solution:

X F F ∴ MODE = 6

0 II 2

1 I 1

2 II 2

3 I 1

5 I 1

6 → 𝑀𝑂 IIII I 6

7 II 2

 Discrete Data :
Problem -1 Find the mode for the following data
Height 57 59 61 62 63 64 65 66 67 69
13

(in inches)
f 3 5 7 10 20 22 24 5 2 2

Solution:
Height F
(in inches)
57 3

59 5

61 7

62 10

63 20

64 22

65 Mo 24

66 5

67 2

69 2

Mode=65

 Continuous Data:
Problem -1 Find the mode for the following data

C.I 0-400 400-800 800-1200 1200-1600 1600-2000 2000-2400 2400-2800 2800-3200

F 4 12 40 41 27 13 9 4

Solution:
C.I F

0-400 4

400-800 12
14

800-1200 40 → 𝑓0

L →1200-1600 41 → 𝑓1

1600-2000 27 → 𝑓2

2000=2400 13

2400-2800 9

2800-3200 4

𝑓1−𝑓𝑜 41−40
𝑀𝑂 = L + XC = 1200 + ⌈2(41)−40−27⌉ X400 = 1200 + 22.6 = 1226.6
2𝑓1−𝑓𝑜−𝑓2
Problems on Geometric Mean:
 Raw Data:
Problem -1 Find the Geometric mean for the following data
X 2000 200 20 12 8

Solution:
X log 𝑋𝑖 𝛴 log 𝑋𝑖
G.M. = Anti log ⌈ 𝑛

8.8852
2000 3.3010 = Anti log ⌈ ⌉
5
= Anti log [1.7770]
200 2.3010
= 59.8411
20 1.3010

12 1.0792

8 0.9030

𝞢 log 𝑋𝑖
= 8.8852

 Discrete Data:
Problem -1 Find the geometric mean for the following data
X 10 20 30 40 50 60

f 15 18 22 16 12 7

Solution:

X f log 𝑋𝑖 f (log 𝑋𝑖 )
15

10 15 1 15

20 18 1.3010 23.418

30 22 1.4771 32.4962

40 16 1.6020 25.6336

50 12 1.6989 20.3868

60 7 1.7781 12.4467

Total 𝞢𝑓𝑖 = 90 𝞢 f (log 𝑋𝑖 )


N
= 129.3797

Σfi log Xi
G.M. = Antilog [ ] = Antilog [129.3797 ] = Antilog [1.4375] = 27.384
𝑁 90
 Continuous Data:
Problem -1 Find the Geometric mean for the following data.
C.I 15-20 20-25 25-30 30-35 35-40 40-45

f 4 20 38 24 10 4

Solution:
C.I F 𝑚𝑖 log𝑚𝑖 f (log 𝑚𝑖 )

15-20 4 17.5 1.2430 4.972

20-25 20 22.5 1.3521 27.042

25-30 38 27.5 1.439 54.682

30-35 24 32.5 1.5118 36.2832

35-40 10 37.5 1.5740 15.74

40-45 4 42.5 1.6283 6.5132

𝛴 𝑓𝑖 = 100 𝞢f (log 𝑚𝑖 ) = 145.2324

𝛴𝑓 (log 𝑚𝑖 )
G.M. =Antilog [ ] = Anti log [145.2324] = Anti log [1.4523] = 28.34
𝑁 100
16

5) Problems on Harmonic Mean:


 Raw Data:
Problem -1 Calculate harmonic mean for the following data
X 200 300 20 12 8 0.8

Solution:

X 1⁄ 𝑛 6
𝑋𝑖 H.M. = 1
= = 3.96
𝛴 ( ) 1.5163
𝑋 𝑖
200 0.005

300 0.003

20 0.05

12 0.0833

8 0.125

0.8 1.25

𝞢1⁄𝑋 =
𝑖
1.5163

 Discrete Data:
Problem -1 Calculate harmonic mean for the following data
X 24 26 30 42 17 11

f 2 9 7 14 24 5

Solution:
X 𝑓𝑖 𝑓𝑖
⁄𝑋
𝑖

24 2 0.083

26 9 0.346

30 7 0.233

42 14 0.333
17

17 24 1.411 N 61
H.M. = f = = 21.33
Σ Xi 2.86
11 5 0.454 i

𝞢𝑓𝑖 = 61 𝑓
𝞢 𝑖⁄𝑋 = 2.86
N 𝑖

 Continuous Data:
Problem-1 Calculate the harmonic mean for the following data
C.I 100-110 110-120 120-130 130-140 140-150

F 12 18 25 22 18

Solution:
C.I 𝑓𝑖 𝑚𝑖 𝑓𝑖⁄ N 95
𝑚𝑖 H.M. = f =
Σ (mi ) 0.7577
100-110 12 105 0.1142 i
= 125.379
110-120 18 115 0.1565

120-130 25 125 0.2

130-140 22 135 0.1629

140-150 18 145 0.1241

𝞢𝑓𝑖 = 95 𝑓
𝞢 𝑖⁄𝑚𝑖 = 0.7577

 Problems on Measures of Dispersion:


1) Problems on Range
 Discrete Data:
Problem-1 Find the range for the following data
X 12 13 14 15 16 17

F 6 14 10 7 5 3

Solution: Range = L-S = 17-12 = 5


𝐿−𝑆 17−12 5
Coefficient of Range = = = = 0.1724
𝐿+𝑠 17+12 29
18

 Continuous Data
Problem-1: Find the range for the following data
C.I 0-10 10-20 20-30 30-40 40-50 50-60 60-70

F 5 8 12 20 15 7 3

Solution: Range = L-S = 70-0 = 70


𝐿−𝑆 70−0 70
Coefficient of Range = 𝐿+𝑠 = 70+0 = 70 = 1
2) Problems on Quartile Deviation:
 Raw Data:
Problem-1 Find the quartile deviation for the following data
S.NO. 1 2 3 4 5 6 7

Marks 25 35 45 17 35 20 55

Solution:
𝑛+1 7+1 8
S.NO. Marks (𝑋𝑖 ) Ascending order 𝑄1 = = = = 2𝑛𝑑 term = 20
4 4 4
3(𝑛+1) 3(7+1) 24
1 25 17 𝑄3 = 4 = 4 = 4 = 6𝑛𝑑 term = 45
𝑄 −𝑄 45−20
Q.D. = 3 2 1 = 2 = 12.5
2 35 20 → 𝑄1
𝑄 −𝑄 45−20 25
Coefficient of Q.D. = 𝑄3 + 𝑄1 = 45+20 = 65
3 1
3 45 25
= 0.3846
4 17 35

5 35 35

6 20 45 → 𝑄3

7 55 55

 Discrete Data:
Problem-1 Find the quartile deviation for the following data
X 30 20 40 50 10 60

F 15 7 8 7 4 2

Solution:
X F Ascending f Cumulative
order(X) frequency (c .f.)

30 15 10 4 4
19

𝑁 43
20 7 20→ 𝑄1 7 11→ Q.D. class 𝑄1 = 4 = 4 = 10.73≅ 11
⟹𝑄1 = 20
40 8 30 15 26
3𝑁 3(43)
𝑄3 = 4
= 4
= 32.25≅ 32
50 7 40 → 𝑄3 8 34 → Q.D. class ⟹𝑄3 = 40
10 4 50 7 41

60 2 60 2 43
𝑄3 − 𝑄1 40−20
Q.D. = 2
= 2
= 10
𝑄 −𝑄 40−20 20
Coefficient of Q.D. = 𝑄3 + 𝑄1 = 40+20 = 60 = 0.3334
3 1

 Continuous Data:
Problem-1 Find the quartile deviation for the following data
C.I 0-10 10-20 20-30 30-40 40-50 50-60 60-70

f 4 8 10 16 11 7 3

Solution:
C.I F Cumulative frequency (c.f.)

0-10 4 4

10-20 8 12 → 𝑚1

𝐿1 →20-30 10 → 𝑓1 22

30-40 16 38 → 𝑚3

𝐿3 →40-50 11 → 𝑓3 49

50-60 7 56

60-70 3 59

N
( )−m1 14.75−12
4
𝑄1 = 𝐿1 + [ f1
] XC = 20 + [ 10
] X10 = 20 +[2.75] = 22.75
3N
( )−m3 44.25−38
4
𝑄3 = 𝐿3 + [ ] XC = 40 + [ 11
] X10 = 40 +[5.68] = 45.68
f3
𝑄3 − 𝑄1 45.68−22.75
Q.D. = 2
= 2
= 11.465
𝑄 −𝑄 45.68−22.75 22.93
Coefficient of Q.D. = 𝑄3 + 𝑄1 = 45.68+22.75 = 68.43 =0 .3351
3 1
3) Problems on Mean Deviation:
20

 Raw Data:
Problem-1 Find the mean deviation for the following data
X 7 4 10 9 15 12 7 9 7

Solution:
X Ascending |𝑑𝑖 | = |𝑋𝑖 − 𝑋̅| ̅ Σ𝑋𝑖 80
X= = = 8.9
Order (𝑋𝑖 ) 𝑛 9
7 4 4.9 Σ |di | 21.1
M.D. = = = 2.344
n 9
𝑀.𝐷 2.34
4 7 1.9 Coefficient of M.D. = 𝑀𝑒𝑎𝑛 = 8.9 = 0.26

10 7 1.9

9 7 1.9

15 9 0.1

12 9 0.1

7 10 1.1

9 12 3.1

7 15 6.1

𝞢 𝑋𝑖 = 80 𝞢 |𝑑𝑖 | = 21.1

 Discrete Data:
Problem-1 Find the mean deviation for the following data
X 10 15 20 30 40 50

f 8 12 15 10 3 2

Solution:

X f 𝑋𝑖 𝑓𝑖 |𝑑𝑖 | =|𝑋𝑖 − 𝑋̅| 𝑓𝑖 |𝑑𝑖 |

10 8 80 11.6 92.8

15 12 180 6.6 79.2

20 15 300 1.6 24

30 10 300 8.4 84
21

40 3 120 18.4 55.2 Σ f i Xi 1080


̅
X= =
N 50
50 2 100 28.4 56.8 = 21.6

N= 50 𝞢 𝑋𝑖 𝑓𝑖 = 1080 𝛴𝑓𝑖 |𝑑𝑖 | = 392

Σfi |di | 392


M.D. = = 50
= 7.84
N
𝑀.𝐷. 7.84
Coefficient of M.D. = 𝑀𝑒𝑎𝑛 = 21.6 = 0.3629
 Continuous Data:
Problem-1 Find the mean deviation for the following data
C.I 0-10 10-20 20-30 30-40 40-50 50-60 60-70 70-80

f 5 8 7 12 28 20 10 10

Solution:
C.I f 𝑚𝑖 𝑓𝑖 𝑚𝑖 |𝑑𝑖 | =|𝑚 − 𝑋̅| 𝑓𝑖 |𝑑𝑖 |

0-10 5 5 25 40 200

10-20 8 15 120 30 240

20-30 7 25 175 20 140

30-40 12 35 420 10 120

40-50 28 45 1260 0 0

50-60 20 55 1100 10 200

60-70 10 65 650 20 200

70-80 10 75 750 30 300

N =100 𝞢𝑓𝑖 𝑚𝑖 = 4500 𝞢𝑓𝑖 |𝑑𝑖 | = 1400

Σ 𝑓 𝑖 𝑚𝑖 4500
̅
X= = = 45
𝑁 100
Σfi |di | 1400
M.D. = = = 14
N 100
𝑀.𝐷. 14
Coefficient of M.D. = 𝑀𝑒𝑎𝑛 = 45 = 0.3111
4) Problems on Standard Deviation:
 Raw Data:
Problem-1 Find the Standard deviation for the following data
22

X 8 10 12 14 16 18 20 22 24 26

Solution:
X 𝑑𝑖 = 𝑋𝑖 – A 𝑑𝑖2

8 -8 64

10 -6 36

12 -4 16

14 -2 4

16 → A 0 0

18 2 4

20 4 16

22 6 36

24 8 64
26 10 100
𝛴𝑑𝑖 = 10 𝞢𝑑𝑖2 = 340

Σ𝑑𝑖 10
̅
X= = 10 = 1
𝑛
Σ𝑑𝑖2 340
S.D.(𝜎)= √[ ]−⌈̅
X 2 ⌉ = √ 10 − (12 ) = √34 − 1 = √33 = 5.74
𝑛
σ 5.74
C.V. = ̅ X 100 = 1 X 100 = 574
X

 Discrete Data:
Prolem-1 Find the Standard deviation for the following data
X 5 15 25 35 45 55 65

f 3 10 20 30 15 12 10

Solution:
X F 𝑑𝑖 = 𝑋𝑖 - A 𝑓𝑖 𝑑𝑖 𝑓𝑖 𝑑𝑖2

5 3 -30 -90 2700


23

15 10 -20 -200 4000

25 20 -10 -200 2000

35→ A 30 0 0 0

45 15 10 150 1500

55 12 20 240 4800

65 10 30 300 9000

𝞢 f = 100 𝞢𝑓𝑖 𝑑𝑖 = 200 𝞢𝑓𝑖 𝑑𝑖2 = 24,000

Σ𝑓𝑖 𝑑𝑖 200
̅
X= = 100 = 2
𝑁
Σfi 𝑑𝑖2 24000
S.D.(𝜎)= √[ ̅ 2 ⌉ = √(
]−⌈X ) − (22 ) = √240 − 4 = √236 = 15.36
𝑁 100
σ 15.36
C.V.= ̅ X 100 = 2 X 100= 768
X
 Continuous Data:
Problem-1 Find the Standard deviation for the following data
C.I 0-10 10-20 20-30 30-40 40-50 50-60 60-70 70-80

f 5 8 7 12 28 20 10 10

Solution:
C.I f 𝑚𝑖 𝑑𝑖 = 𝑚𝑖 - A 𝑓𝑖 𝑑𝑖 𝑓𝑖 𝑑𝑖2

0-10 5 5 -40 -200 8000

10-20 8 15 -30 -240 7200

20-30 7 25 -20 -140 2800

30-40 12 35 -10 -120 1200

40-50 28 45 →A 0 0 0

50-60 20 55 10 200 2000

60-70 10 65 20 200 4000

70=80 10 75 30 300 9000


24

𝞢 f = 100 𝞢𝑓𝑖 𝑑𝑖 = 0 𝞢𝑓𝑖 𝑑𝑖2 =


34,200

Σ𝑓𝑖 𝑑𝑖 0
̅
X =A + X c = 45 + 100 X 10 = 45 + 0 = 45
𝑁
Σfi 𝑑𝑖2 34,200
S.D.(𝜎)= √[ ̅ 2 ⌉ = √(
]−⌈X ) − (452 ) = √342 − 2025 = √−1683
𝑁 100
= 41.0243
σ 41.0243
C.V. = ̅ X 100 = X 100 = 91.55
X 45
UNIT -I
CORRELATION
 Uni-variate Distribution
 Bi-variate Distribution
 Multi–variate Distribution
1. Uni–variate Distribution: The distribution involving only one variable is called “uni-
Variate distribution “.
Example: The heights of certain group of persons.
2. Bi – variate Distribution: The distribution involving only 2 variables is called “Bi-
Variate distribution “.
Example: The heights and weights of certain group of persons.
3. Multi-variate Distribution: The distribution involving 2 or more than variables is called
“Multi – variate distribution “.
 Correlation:
 Definition 1 If the change in one variable effects a change in the other variable, then
Variables are said to be “correlated variables”.
 Definition 2 Correlation is an analysis of the ‘co-variation’ between 2 or more variables.
 Types of Correlation:
 Positive Correlation (or) Direct Correlation
 Negative Correlation (or) Inverse Correlation
 Perfect Correlation
1) Positive Correlation:
 Definition 1 If the variables deviate in same direction then the variables are to be
“Positive correlation”.
 Definition 2 In another words, if the increase in the value of one variable is accompanied
by increase in the value of other value or a decrease in the value of one variable is
accompanied by the decrease in the other variable, then the variables are said to be
“Directly correlated variables”.
Examples:1) Price & Supply of goods. 2) Income & Expenditures of a group of persons.
2) Negative Correlation:
 Definition 1 If the variables deviate in opposite direction then the variables are to be
“Negative correlation”.
 Definition 2 In another words, if the increase in the value of one variable is accompanied
by decrease in the value of other value or a decrease in the value of one variable is
accompanied by the increase in the other variable, then the variables are said to be “Directly
correlated variable”.
Examples:1) Volume & pressure of a perfect gas. 2) Price & Demand of goods.
3) Perfect Correlation:
 Definition: If the deviation in one variable is followed by a corresponding and
proportional deviation in the other variable, then the variables are said to be “perfectly
correlated variables”.
 Linear Correlation:

Page1
 Definition: If the ‘ratio’ of the change is ‘uniform’, then there will be “linear correlation”
between the variables. If we plot these on the graph then we get a ‘straight line’.
Example: We can see that ‘ratio of the change between the variables is same.
A 2 7 12 17
B 3 9 15 21
 Non-Linear Correlation:
 Definition: The amount of change of one variable does not bear a constant ratio of the
amount of change in the other variables, and then the correlation is called “Non- linear
correlation”. Non-linear correlation is also called ‘Curvy linear correlation’.
 Uses (or) Applications of Correlation:
1) Correlation is a measure of extent of relation between 2 variables.
2) By using the correlation coefficient we can predict the future.
3) Correlation coefficient will contribute the economic behaviour.
4) By using the correlation coefficient we can find the value of variable if the value of another
variable has given.
 Perfect Linear Correlation:
Definition: If the all points lie exactly on the “straight line”, then the correlation is said
to be “perfect linear correlation”.
 Perfect Positive Correlation:
Definition: If the correlation is linear and the line runs from lower left hand corner to the
upper right hand corner. Then the correlation is called “perfect positive correlation “. It is
denoted by r = +1 or r = -1.
 Perfect Negative Correlation:
Definition: If the correlation is linear and the line runs from upper left hand corner to
lower right hand corner. Then the correlation is called “perfect negative correlation.
 No Correlation:
If the plotted points lie scattered all over graph paper, then there is no correlation
between 2 variables. And the variables are said to be “Statistically independent”.
If r=0thevariablesX&Yaresaidto be “Independent”.
Perfect +ve correlation Perfect – ve correlation

No correlation No correlation

. . . . .
. . .
. . .
. . . . . . .
. . . . . . . .

Page2
. . . . . . . .
. . . . . . .
 Methods of Studying Correlation:
There are 2 different methods for finding out the relationship between the
Variables.
1) Graphical Method 2)Mathematical Method
1) Graphical Method:
a) Scatter Diagram b) Scatter gram
2) Mathematical Method:
a) Karl Pearson’s Correlation Coefficient.
b) Spearman’s Rank Correlation.
c) Coefficient of Concurrent Deviation.
d) Methods of Least Squares.
 Mathematical Method:
a) Karl Pearson’s Correlation Coefficient:
As a measure of ‘intensity ’or ‘degree ’of linear relationship between 2 variables,
Karl Pearson’s, a British Bio-metrician, developed a formula called “correlation coefficient”.
Correlation coefficient 2 variables X & Y, usually denoted by r (x, y) or 𝑟𝑋𝑌 and is given by
cov( x, y)
r ( x, y)  rXY  1 , Where
V ( x) . V ( y )
1
Cov( X , Y )  E{( X  E ( X ))(Y  E (Y ))}  E{( X  X ).(Y  Y )  E ( XY )  X Y  ( XY )  X Y
n
2 2 1 2
V ( X )  E{( X  E ( X )) 2 }  E{ X 2  X }  E ( X 2 )  E ( X )  X2 X
n
2 2 1 2
V (Y )  E{(Y  E (Y )) 2 }  E{Y 2  Y }  E (Y 2 )  E (Y )  Y 2  Y
n
1 / n  xy  ( X )(Y )
r ( x, y ) 
1 / n  X 2  ( X ) 2 . 1 / n  Y 2  (Y ) 2
 Properties of Correlation Coefficient:
1) Limits for correlation coefficient lies between -1 & +1.
i.e.-1 ≤r(x,y)≤+1.
2) Correlation coefficient is independent of change of origin & scale.
3) Two independent variables are un-correlated. Its converse need not be true.
 Regression:
Definition: “Regression Analysis” is a mathematical measure of average relationship
between 2 or more variables in terms of the original units of the data.
In regression Analysis there 2 types of variables, dependent variable & independent variable.
The variable whose value is ‘influenced’ or is to be ‘predicted’ is called ‘Dependent variable’
The variable which ‘influences’ or is used for ‘prediction’ is called “independent variable”.
 Lines of Regression:
The line of regression is the line which gives the best estimate to the of one variable
for any specific value of the other variable. Thus the line of regression is the line of ‘best fit’,
Which can be obtained by using “principle of least square “technique.
 Linear Regression:
If the points in the scatter diagram are a straight line, then it is called “linear
Regression”.
Page3
 Non-Linear Regression:
If the points in the scatter diagram is a curve, then it is called “non-linear
Regression” or “curvy-linear regression”.
 Curve of Regression:
If the variables in a bi-variate distribution are related, we find that the points in the
Scatter diagram will cluster round some curve is called “curve of regression”.
Let us suppose that in the bi-variate distribution (x,y) i=1,2, ....... ,n where
X= independent variable Y=dependent variable. Let the line of the regression Y on X be
Y=a+bX→1
According to the principle of least squares, the normal equations for estimating
a & b are
 in1 yi  n.a  b  in1 xi  2 ,  in1 ( xi )( yi )  a  in1 ( xi )  b  in1 xi2  3

 Regression Equations:
1) Regression Equation Yon X 2) Regression Equation X on Y
 Regression Equation Yon X:
Since b is the ‘slope’ of the line of regression of Y on X. And since the line of
Regression passes through the point (𝑥̅ ,𝑦̅ ), and its equation is

Y  y  byx ( X  x)  Y  y  r[ y /  x ]( X  x)

Where b yx  r[ y /  x ] = The regression coefficient Y on X, r = correlation coefficient


 Regression Equation X on Y:
The regression equation X on Y is given by
X  x  bxy (Y  y)  X  x  r[ x /  y ](Y  y)
Where bxy  r[ x /  y ] = The regression coefficient X on Y, r = correlation coefficient
Regression Coefficients:
The slope of the regression is called “coefficient of regression”. The coefficient of
regression Y on X indicates the change in the value of variable Y corresponding to a unit
change in the value of variable X and is given by
b yx  r[ y /  x ] =The regression coefficient Y on X  byx  r[ y /  x ]
Similarly, the coefficient of regression X on Y indicates the change in the value of
Variable X corresponding to a unit change in the value of variable Y and is given by
bxy  r[ x /  y ] =The regression coefficient Y on X  bxy  r[ x /  y ]
 Properties of Regression Coefficient:
1) The Geometric mean(G.M.) of regression coefficient is equals to the correlation

coefficient (byx ).(bxy )  r

2) If one of the regression coefficients is greater than the unity, then other must be less
than unity. i.e. 𝑏𝑥̅𝑦̅≤1 ⟹𝑏𝑦̅𝑥̅≥1
3) Arithmetic Mean (A.M.) of the regression coefficients is equals to the correlation
coefficient.1⁄2 [𝑏𝑥̅𝑦̅+𝑏𝑦̅𝑥̅] ≥ r
4) Regression coefficient is independent of change of origin but not scale.

Page4
5) The angle between 2 regression lines are
1  r 2  x2 . y2 
  tan 1
. 2 2
 r  x   y 
PROBLEMSONCORRELATIONCOEFFICIENT:
Problem -1: Calculate the correlation coefficient for the following heights (ininches) of father(X)
and their sons (Y)
X 65 66 67 67 68 69 70 72
Y 67 68 65 68 72 72 69 71
Solution:
X Y 𝑋2 𝑌2 XY
65 67 4225 4489 4355
66 68 4356 4624 4488
67 65 4489 4225 4355
67 68 4489 4624 4556
68 72 4624 5184 4896
69 72 4761 5184 4968
70 69 4900 4761 4830
72 71 5184 5041 5112
𝞢X= 544 𝞢Y= 552 𝞢𝑋2=37028 𝞢𝑌2=38132 𝞢XY= 37560
From the above table we have
𝞢X= 544, 𝞢Y= 552, 𝞢𝑋2=37028, 𝞢𝑌2=38132, 𝞢XY= 37560
 X 544  Y 352
X   68, Y    69
n 8 n 8
The correlation coefficient is given by
37560
 (68)(69)
cov( X , Y ) 1 / n( XY )  ( X )(Y ) 8
r ( X ,Y )   
V ( X ) . V (Y ) 1 / n  X 2  ( X ) 2 . 1 / n  Y 2  (Y ) 2
37028
 68 2.
38132
 69 2
8 8
4695  4692 3 3 3
=     0.6030
(4628.5  4624) . (4766.5  4761) (4.5).(5.5) (24.75) 4.9749
Problem-2:
Calculate the correlation coefficient for the following heights (in inches) of father(X) and their
sons (Y)
X 65 66 67 67 68 69 70 72
Y 67 68 65 68 72 72 69 71
Solution:
X Y U=X-68 Y=Y-69 𝑈2 𝑉2 UV
65 67 -3 -2 9 4 6
66 68 -2 -1 4 1 2
67 65 -4 -4 1 16 4
67 68 -1 -1 1 1 1
68 72 0 3 0 9 0
69 72 1 3 1 9 3
70 69 2 0 4 0 0
72 71 4 2 16 4 8
𝞢X=544 𝞢Y=552 𝞢U=0 𝞢V=0 𝞢𝑈2=36 𝞢𝑉2=44 𝞢UV=24

Page5
The correlation coefficient is
COV (U ,V )
r (U ,V )   (1)
 U . V
U 0 V 0
U    0 U  0 V    0 V  0
n 8 n 8
1 24
 Cov(U ,V )  UV  (U ,V )   (0)(0)  3  0  3  Cov(U ,V )  3
n 8
1 2 36 2
  U2  U 2  U   0  4.5  0  4.5   U2  4.5
n 8
1 2 44 2
  V2  V 2  V   0  5.5  0  5.5   V2  5.5
n 8
3 3 3
 r (U ,V )     0.6030  r (U ,V )  0.6030
4.5. 5.5 24.75 4.9749

PROBLEMS ON REGRESSION LINES


Problem -1: Price indices of cotton and wool are given below for the 12 months of a year. Obtain
The equations of lines of regression between the indices
Price index 78 77 85 88 87 82 81 77 76 83 97 93
Of cotton
(X)
Price Index 84 82 82 85 89 90 88 92 83 89 98 99
Of wool(Y)
Solution:
X Y U=X-84 V= Y-88 𝑈2 𝑉2 UV
78 84 -6 -4 36 16 24
77 82 -7 -6 49 36 42
85 82 +1 -6 1 36 -6
88 85 +4 -3 16 9 -12
87 89 +3 +1 9 1 3
82 90 -2 +2 4 4 -4
81 88 -3 0 9 0 0
77 92 -7 +4 49 16 -28
76 83 -8 -5 64 25 40
83 89 -1 +1 1 1 -1
97 98 +13 +10 169 100 130
93 99 +9 +11 81 121 99
𝞢 X= 1004 𝞢Y=1061 𝞢U=-4 𝞢V=+5 𝞢𝑈2=488 𝞢𝑉2=365 𝞢UV=287

 X 1004  Y 1061
X   83.67  X  83.67, Y    88.42  Y  88.42
n 12 n 12
U  4 V 5
U    0.34  U  0.34, V    0.42  V  0.42
n 12 n 12
COV (U ,V )
r (U ,V )   (1)
 U . V

Page6
1 287
Cov(U ,V )  UV  (U .V )   (0.34)(0.42)  23.92  0.14  23.78
n 12
1 2 488
 U2  U 2  U   (0.34) 2  40.67  0.110  40.56
n 12
1 2 365
 V2  V 2  V   (0.42) 2  30.42  0.18  30.24
n 12
23.78 23.78 23.78 23.78
r (U ,V )      0.6788
40.56 30.24 (40.56)(30.24) 1226.53 35.03
The regression equation Y on X is
 5.50 
 Y  y  r[ y /  x ]( X  x)  (Y  88.42)  0.68 ( X  83.67)  (Y  88.42)  0.68(0.86)( X  83.67)
 6.37 
 (Y  88.42)  (0.59)( X  83.67)

The regression equation X on Y is


 6.37 
 X  x  r[ x /  y ](Y  y )  ( X  83.67)  0.68 (Y  88.42)  ( X  83.67)  0.68(1.16)(Y  88.42)
 5.50 
 ( X  83.67)  (0.79)(Y  88.42)

PROBLEM2:
By using the following data, find out the ‘2’ lines of regression and from then compute the Karl
Pearson’s coefficient of Correlation.
ΣX =250 ΣY=300 ΣXY=7900 ΣX2=6500 ΣY2=10,000 n= 10
Solution:
 X 250  Y 300
X   25, Y   30
n 10 n 10
We know that Karl Pearson’s Correlation coefficient is given by

700
 (25)(30)
cov( X , Y ) 1 / n( XY )  ( X )(Y ) 10
r ( X ,Y )   
V ( X ) . V (Y ) 1 / n  X 2  ( X ) 2 . 1 / n  Y 2  (Y ) 2 6500
 25 2.
10000
 30 2
10 10
790  750 40 40 40
    0.8
(650  625) . (1000  900) (25).(100) (2500) 50

REGRESSION EQUATION Y ON X:
The Regression equation Y on X is given by
 10 
 Y  y  r[ y /  x ]( X  x)  (Y  30)  0.8 ( X  25)
5
 (Y  30)  0.8(2)( X  25)  (Y  30)  1.6( X  25)
REGRESSION EQUATION X ON Y:
The Regression equation X on Y is given by
5
 X  x  r[ x /  y ](Y  y )  ( X  25)  0.8 (Y  30)
 10 
1
 ( X  25)  0.8 (Y  30)  ( X  25)  0.4(Y  30)
2
Page7

You might also like