Download as pdf or txt
Download as pdf or txt
You are on page 1of 10

Dispersion

The averages are more or less typical values of the variable and they are designed to represent as far as

possible the whole distribution. However, it is important to know whether the individual values cluster

closely around the average or they are widely scattered. ln other words, we need to know to what extent
the average is typical of the whole set. This information is provided by a measure of dispersion.
Dispersion is an important characteristic of a frequency distribution i.e., it tells us how compactly the
individual values are distributed around the average. This characteristic of a disffibution is variously
referred to as dispersion, scatter, variation, variability or spread. A measure of dispersion serves two
purposes. First, it is one of the most important quantities used to characterize a frequency distribution.

Second, it affords a basis of comparison between two or more frequency distributions. For example,
although the two sets data have contained the same mean, they are markedly different in their variability
or dispersion. This means that although the two sets of data are quite different in nature, the measure of
location has failed to bring out this difference.

There are four important measures of dispersion:


(i) Range
(ii) QuartileDeviation
(iii) Mean Deviation
(iv) Standard Deviation

(D Range
The range of a set of values is the difference between the highest and the lowest values in the set. fhus if
xs and x1 denote the smallest and largest values respectively in a set, the range R is given by

R: xt-xs (l)
Example : ; 3, 4, -2, 10 7,0, 4.
Find the range of the series

Here xs - -Z and xr = l},then R = 10- (-Z):10f2=12-


For group data, the range is taken either as the difference between the highest and lowest mid-values or as

the difTerence between the lower boundary of the first class and the upper boundary of the last class.

(ii) Quartile Deviation (Q.D.)


If Qr and Q3 denote the first and third quartiles respectively of a frequency distribution, the quartile
deviation of the distribution is defined by
Qz- Qt
Q'D': 2
Q)

The quartile deviation is also called the semi-interquartile range.


(iii) Mean Deviation (M.D.)
The mean deviation (M.D.) of a set of quantities x, , x2 , ....., x, is defined by

M.D.: lZtr,- rl (3)

r.vhere lxi - il denotes the numerical value of xi - x , r.vith positive sign. The mean deviation ol'a
frequency distribution is defined by

M.D.: |Zfrlxi- xl , rL:2ft (4)

Occasionally the mean deviation is defined in terms of absolute deviations from the median rather than
the mean. Mean deviation is sometimes called mean absolute deviation.

Example : Calculate : (i) Quartile deviation (Q.D.), and (ii) Mean Deviation (M.D.) from mean, for the
following data:
Marks: 0-10 10-20 20-30 30-40 40-50 50-60 60-70
No. of Students : 6 5 8 15 7 6

Solution:
Table 1: Calculation for Q.D. and M.D. from Mean
Marks Mid-value No. of )_ x-35 fd lx- il flx- tl Cumulative
(x) Students 10
- lx - 33.a1 Iiequency
(fl (c. f. )
0-10 5 6 -J -18 28.4 170.4 6
.\
l0-20 l5 5 -10 18.4 92.0 l1
20-30 25 8 -1 -B 8.4 67.2 19

30-40 35 15 0 0 1.6 24.0 34


40-50 45 7 I 7 11.6 81.2 4t
50-60 55 6 2 12 21.6 129.6 47
60-70 65 J J 9 31.6 94.8 50
Total 50 -8 6s9.2

(i) Here N = 50, If : L2.75 and 14 N =37.25


The c.f, just greater than 12.75 is 19. Hence, the corresponding class 20-30 contains Q1 . Then

Qr=20+{ (12.75-IL)=22.19
I

l'hc c.f. jr"rst greater than 37 .25 is 41. Hence. the corresponding class 40-50 contains Q3 . Then

Qz : 4o + ! {zt.zs - 34) : 44.64

Hence. Q.D.=
)fOr- Q.): ){++.0+-zz.tg):11.23

Mean. (t) : I * ry: 35 * .# :33.4 Marks. Then

M.D. (frommean) : fiZf,lx- rl= # = 8.184.

(iv) Standard Deviation


Let x1, x2, ..-.., xtt denote nvaluesofavariable x.Thestandardieviation(sr),ofthesevaluesis
defined as

Z@r i)z
(s)
n
The quantity Z@t - f,)z is often referred to as the corrected sum of squares or simply sum of squares
(S.S.) of the values x1 , x2 , ....., xn .

Now, )(x; - X)2 = L(x? - ZxiI + Ir)


: zir,xi * ni2
Zr7 -
: E*7 _ Zr.ni * ni2
= Z*? _ Zniz * niz
_ Z*?_ ntz
(nt)(n x)
- Z*7-
s (Zx)z
LLi--2_ n (6)

lf X1 , X2, ....., xk occur with frequencies h, fz ft respectively, the standard deviation is


defined as

sr- f {xr r)2


(7)
n

Here ) ft(xi- x)': Zftx? - Wn und n: l,fi .

The variance ofa set ofdata is defined as the square ofthe standard deviation, sf . fne variance rather
than the standard deviation is often more useful in theoretical investigations.
/

Standard deviation is independent of origin but depends on the scale of


measurement

Let us consider a frequency distribution with equal class intervals, and then the mid-values of the classes
are denoted by x1 , x2 , ....., xp with corresponding frequencies fr , fz , ....-, h. Then we consider

lJi:ry
xi: xo* c1ti, t : 1,2, ....,k (B)

where xs is any one of the above mid-values and c is the rvidth of the class interval. Then
I: xo* ci r9)

Subtracting (9) from (B), rve have

xi- I : c(ui- n)
and hence Lfi@i- I)' : \ficz(ui- n)'

- c2Zf{ut- u)' (10)

The variance of sfr of the variable u is given by


nsfi : Lfi@i- u)'
The relation (10) can be written in terms of sf and si as

nstr - c2 nsl,
+ str: c2s?,

+ sx : lcl s, (11)

This is an important relation. This shows that if two variable x and u are connected by a relation ofthe
type (8), then the standard deviation of x is lcl times the standard deviation of u . It is noted that s'
does not depend on the constant x6 in (8) although it depends on c . Hence we often say that the standard

deviation is independent of origin but it depends on the scale of measurement.

Calculation of the Standard Deviation of a Frequency Distribution.

x (mtd - value) f x-70 fu fu'


,= 5

50 4 -4 -16 64

55 9 -J -27 B1

"\ -28 96
60 24

65 30 -l -30 30

70 62 0 0 0

4
75 28 I 28 28
80 ./- ) 2 46 92
B5 t2 J 36 108
90 6 4 24 96
95 2 5 10 50
200 64s

23
- 0.115.
200

i= xo* cu = 70*5 x0.115 =70*0.575 = 70.575


Lfuz =645, lfu=23, n:200
rhen ,?,u = *zoo- (*)'
\zoo,/= 3.225- (0.115)2
\ = Z.ztl1 .

That is, Su : \.79 and s, - c Su: 5 X 7.79 = 8.95 .

The variance is the special case of a more general quantity known as the mean square deviation
(M.S.D.)'If 'c.'isanarbitraryvalueofavariable x, themeansquaredeviation of x1, xz, ....., xn as
measured from'a' is defined as

M.s.D.: |Z(xt_ a)2 (12)


The variance is the value of 1\4S.D. taken from the arithmetic mean.

Empirical Relations among Measures of Dispersion


For symmetrical and moderately skew distributions the following relations hold approximately.

MeanDeuiotion = i, Standard Deui.ation

Semi - interquartile Range = '= , Stand.ard Deviqtton

Variance of the Combined Series


If n1 , TL2 atra the sizes, i1 , i2 the means, and o1, 02 the standard deviations of tr,vo series. then the
standard deviation o of the combined series of size nr + n2 is given by:

o'
-Z_= L
+ dD + d)),
dila@l n2@f +

where dt= Xt- f , dz= xz- x qnd - n".x1+ nzi2


- 7\ * t72
is the mean of thc combined

series.

5
Covariance
Let (*r, yr), (xr, yr),....,(xn, yr) denote n values for pair of variables x and y then the

covariance ofx and y is defined as

- - L@t- r)(Yi- !) )
"xy TL
where s*, denote the covariance of x and y-
Coefficient of Variation
etc' , all have
The measures of dispersion is measured by the range, mean deviation, standard deviation
recorded in
the same units as the original variable. For example, the variable is the amount of rainfall
inches and hence the s.d. (standard deviation) will also be in inches. It is not therefore possible to

compare the dispersions in different distributions unless the variables happen


to be measured in the same

unit. To obviate this difficultya number of measures of dispersion which are pure numbers have been
of variation is the most important of these measures. The coefficient of
suggested. The coefficient
variation (C.V.) of a distribution is defined as

C.V.: Ix x 100 ( l3)

where s and x are the standard deviation and mean of the distribution. Since s and i have the

whose units may be different'


same unit, C.V. is a pure number and it is useful in comparing distribution

Standardised Variable
x-x
Sx
deviation is one'
is called a standardized variable and is a pure number. Its mean is zero and standard
Example: Ananalysisof monthlywagespaidtotheworkersoftwofirmsA qnd B belongingtothe
same industry gives the following results:

Description Firm A Firm B


Number of workers 500 600

Average daily wage Tk. 186.00 Tk. 175.00


Variance of distribution of wages BI 100

(i) Which firm A or B,has alarge wage bill?


(ii) In which ftm, A or B,is there greater variability in individual wages?
(iii) Calculate (a) the average daily wage, and (b) the variance of the distribution of wages of
all the workers in the firm A and B taken together.

Solution :

(i) Firm A:
No. of wage-earners, (say) n1 = 500, Average daily wages, (say) f,1 : Tk.186
Total wages paid
Average daily wage :
No.of workers
Hence total daily wages paid to the workers : ntlt : 500 x 186 = 7k. 93,000.
Firm B:
No. of wage-earners, (say) n2 = 600, Average daily wages, (say)i2 =Tk.L75.
Therefore, total daily wages paid to the workers = fl2xz = 600 x L75 = Ik. 1,05,000.
Thus we see that the firm B has larger wage bill.

(ii) Variance of distribution of wages in firm A , (say) ol = Bl


Variance of distribution of wages in firm B , (say) ol : 100

C.v. ofdistributionofwagesforfirm ^4: 100 x ?-


xL ':o=l'
186 -
4.84 percent
toro;'o
C.v. of distributionofwagesforfirm e: 100 x ?
xz - L75 - 5.7L percent
Since C.V. for firm B is greater than C.V. for firm A, ftrm B has greater variability in individual wages.
(iii) (a) The average daily rvages (say) x, of all the rvorkers in the tu,o firms,4 and B taken together is

given by:
nflt* nziz 500 x1B6 + 600 x 175
, _ ft1 * rL2 _ _ 1,98,000
: Tk.1B0
500+ 600 1-,100

(b) The combined variance o2 is given by the formula:


o': ii-lrr(ol+
o I r . ) ,?\
d?)+ , a
n2@!+ ,a\
dZ)1, where dr:
a
rr- r and dz: rz- i.
Here dr = 186 - 180 : 6 and dz: L75 - 180 : -5
z 500 (81 + 36) + 600 (L00+25) 1,33,s0n
oz: ffi:
Henue
ffi:121-36

8
Skewness and Kurtosis
Skewness and Kurtosis are indicated the shape characteristics of a frequency distribution' Skewness is
the degree of asymmetry, or departure from symmetry of a distribution. If the frequency curve
of a
distribution is said to
distribution has a longer tail to right of the central maximum than to the left then the
be skew to the right or to have positive skewness. If the longer tail lies to the left of the maximum then
the distribution is said to be skew to the left or to have negative skewness'

Mean Median Mode


There are several measures of skewness. As these measures are often used to compare frequency
should also
distributions measured in different units, they should be pure numbers. These measures
is positively skew,
reduce to zero when the distribution is symmetrical. Moreover, when the distribution
same side of the
the measures should be positive. In a skew distribution, the mean tends to lie on the
mode as the longertail. In a symmetrical distribution, the mean, the median and the mode coincide'

Hence a suitable
For a symmetrical distribution, m3 andin fact any odd moment about the mean is zero.
measure is given by

br: #
TlTg
th (l 1)
lul-
' z/z
m2'

where rrl2 =i Zf, (xt - t)' and m3 : I Lf, (*r - r)''


This is the most important lneasure of skewness from a theoretical point of vierv. lt may
be notcd in

passing that the fact that the mean, median and mode are coincident or that ffiz = 0 does
not necessarily
are shown
imply that the distribution is symmetrical. Thus another two suitable measures of skewness
below:
mean - mode (ls)
Skewness=m
3(mean - median) (1 6)
Skewness:ffi

9
Kurtosis
normal distribr'rtion' A
Kurlosis is the degree of peakedness of a distribution, usually taken relative to a
a curve rvhich is flat-
distributiol or (liequelcy curve) having a relatively high peak is called lcptokurlic,
is called mesokurtic'
topped is called platy kunic. a curve rvhich is neither too peaked nor too flat-topped

The most important measure of kurtosis is b2 defined by

h
,ffi4 (1 7)
U)-
mi2
and rtt4 =
where ma and Tt2 zre the fourth and second moments about the mean of the distribution,
distribution,
) Z - x)o . This measure is a pure number and is always positive. For the normal
f, (xi
bz = 3. Thus the kurtosis is sometimes defined by (br- 3) which
is positive for a leptokurtic
distribution, negative for a platykurtic distribution, and zero for a mesokurtic distribution'

Properties of DisPersion:

1. The variance of the first n natural numbers k+ '

2. The sum of squares of deviations about the mean, x is the least.


3. For any set of values the mean deviation about the mean cannot exceed the standard deviation'

10

You might also like