Professional Documents
Culture Documents
Measures of Dispersion 2.
Measures of Dispersion 2.
We have,
F 62 -7- -7 o -- - or-
Hence using (5), tho standard deviation of x is Hence
s,
=
lb| s
From this property, it follows that the standard deviation is independent of change of origin
s=
a360-62 +364-62 =xX-=2kg (here a), but depends on change of scale (here b).
Note: If a variable assume only two values and b (a* b), each
(ii) Suppose two groups of values of a variable x are given. If and s respectively denote
a
having same
frequency,
then the mean and the standard deviation of n values contained in one group and and s2, the
and values of the other
MD about mean = S.D. - la - bl. mean standard deviation of
of the combined data is given by
na group, then the standard deviation()
mc and =t,,
Then x - = 0, for each i,
Again,
ands y-7} x0
= 0
On the contrary, suppOse The sum of squares of the deviations of the values of the two
groups from is
s= 0 or,s =0 or,
G-F = 0
F =
0, for each i, Now, 2-7
=1
-+G-P
i.e. when = F, for each i.
Thus, if the standard deviation is zero, all the values of the variable anre equal y-+2-)-7)+ n-
i=l
i) Ify =a +br is the relation between two variables x and y, then theirrespective standard n+n-F,
deviations, denoted by s and s, are related as s,= |b] s,
=
since
(qy-) = 0.
Then ya+ br
for each i.
uY+ -H, AG-T}+n,(
So- = bo- F),
86 87
STATISTICAL TOOLS AND TECHNIQUES MEASURES OF DISPERSION
r
groups of values, the i-th group having n; values with mean 150-300 232
and standard deviation s,
i =1, 2, . . r , then (5k) may be generalised as follows
300-450 128
60
G - 7, where r =
450-600
600-750 40
750-900 28
(b) If and are not individually given but their difference is known, then (5) is used 900-1100 12
instead of (5k) for calculating s 1100-1500
We prepare the following table for necessary calculations Income Class-mark (x =5 No. of persons
TABLE 5.4
150-300 225 232 -1392 8352
NECESSARY CALCULATION FOR STANDARD DEVIATION
300-450 375 128
f 60 360 2160
450-600 525
0 12 40 480 5760
600-750 675
10 10 10 18 28 504 9072
750-900 825
13 26 52 25 12 300 7500
900-1100 1000
21 63 189 6 222 8214
1100-1500 1300
23 92 368 474 41058
Total 506
20 100 500
18 108 648
$8474= 81.14 -0.8774 80.2626
8
7 9 63
6
441
128
Nows2-i- 506 506
3 x2-48
nustraion S.17 lustration 5.19
For two values, say a and b, a < b, of a variable x, the mean and standard deviation are
Show that the standard deviation of first
of first n even numbers.
n odd numbers is equal to the standard deviation
respectively 25 and 4, Find a and b.
From the given dara we have, Let denote ith odd number. Then ith even number is given by (;+ 1) =, for each i
Hence
and 5 kg respectively. On further verification it is detected that the weights of two boys have
been wrongly included as 45 55
kg, kg instead of the actual values 42 kg and 48 kg. Calculate
or, b a =8 . i)
001 008 the correct mean and correct standard deviation.
4
Let and s respectively denote the given mean and standard devation ofweight (), while
Solving (i) and (ii), a =
21, b =29. the correct mean and standard deviation be X and s
2).
the frequency of 500- 100 90 490.
(45 55) + (42 + 48) = + =
2n
90 STATISTICAL TOOLS AND TECHNIQUES MEASURES OF DISPERSION
91
25250 (2025 + 3025) + (1764 +2304) =24268. Let denote the average marks of 9 children and variance
of their marks be s. Then the
correct 2 marks of the dull boy is 25. If s is the standard deviation of marks for all the children,
490 49 kg.
-
Hence, =-
then we have,
Now,
Gii) Find the standard deviation of the wages of all workers in two firms taken together.
92 STATISTICAL TOOLS AND TECHNIQUES MEASURES OF DISPERSION 93
0 We have, Average monthly wage =Amount of monthly wage Again, the standard deviation (6) for the whole group is given by
No. of earmers
So, in firm A, amount of monthly wage = 586 x 52.5 30765 (Rs) and in firm B, amount
of monthly wage = 648 x 47.5 = 30780 (Rs).
250
The standard deviation () of wages of all workers in the two fims taken together is given
or, 150 s= 2400 or,s 5 0 16.
by
Hence, s = VI6 = 4.
+n 586 x586 648 xI121 586x648 x(52.5-47-5
100++648
( = 111.03 +6.23 = 117.26.
(586+648)
llustration 525
The mean and standard deviation of 20 observations are found to be 10 and 2 respectively.
At the time of checking it was found that one observation 8 was incorect. Calculate the mean
Hence s I17-26 = 10.83. and standard deviation if the wrong observation () is omitted (i) is replaced by 12.
In usual given data,
from the we have, n =
20, =
10, s = 2.
symbols,
1lustration 5.24 with 15 and standard deviation 3. If the So, incorrect x; = 20x 10 200.
subgroups has 100 items
mean
The first of two
with mean 15.6 and standard deviation 13,44, find the standard
whole group has 250 items i) Here x 200 8 192.
=
correct =
of items in the
n = number
in the second subgroup = 250 100 = 150 Again, s=E- or, =n(P+ ?).
= number of items
n 100) 2080.
subgroup = 15 So, incorrect = 20(4 + =
T = mean
(10.1)2 = 106.1 -
102.01 =4.09.
We have, =
Hence, correct2
So, correct s = V4.09 = 2.02 (nearly).
x 1500+ 150 250 x 15.6
taa
=
=204.
00x15+150
2400 16.
or, F215010
1500+ 150,
=
3900 corect =
20
10.2.
or,
correct s= -
from )
where x h and s respectively denote the geometric mean, the harmonic mean and the
standard deviation.
or,
Let x assumes the values X, 2 We denote x = - T , so that x, = X +T
measure.
or, T.e 2xor, , = An important property of the quartile deviation is discussed below:
logx, log| Te 2
=
=
or,
and y related as y = a+ bx, then their quartile deviations are related
iftwo variables x are
as
of
[neglecting higher powers Q0) = lb| Qa).
ie,
96 STATISTICAL TOOLS AND TECHNIQUES MEASURES OF DISPERSION 97
The quartile deviation is the suitable measure of dispersion here. Now we form the following
Again, when b < 0, Q,0) = a +
b.Q,®), and Q,)=a + b.Q,) table for cumulative frequencies.
TABLE 5.6
Q,0)-Q,0) =
-bl9,)- Q,)) or, Q=-b,Q4)-Q() CUMULATIVE, FREQUENCY TABLE
i.e.
Q0) -b.Q).
=
i) .
involve almost
The range is the simplest one to compute. Generally, the other measures
treatment. But the other measures do not possess such properties. I fwe put ; = ; - T, i = 1, 2, n , then we get,
amenable to algebraic
in general, the standard deviation is the best measure of
dispersion.
Thus, it is evident that,
when specd of computation is of prime importance (as in the
Of course, range is preferred is the suitable in of
control). Again, quartile deviation
measure case
case of statistical quality
distributions with open end.classes. Taking positive square-root, we have,
frequency
relations
5.4 Some important i.e., s,2 MD.
the least root-mean-square deviation.
(A) Standard deviation is
From (5d), the root-mean-square deviation The equality sign is valid either when all the values of the variable
Suppose a variable x taken n
values X1, X2 variable
are equal or when the
an arbitrary
value c is assumes only two distinct values with
equal frequency.
of x about
(C) The difference between the arithmetic mean and the median cannot be
the standard deviation. greater than
Let X1. X21 X are the given values of a
variable x. Let x and
2-7)+(7-of =-7 +2-)-7)+ n -?
me be,
arithmetic mean and the median of the respectively, the
Now -of given values and their standard deviation.
-
s
i=]
We show
are to
that
- -+nR-o. x melSs.
el-
me -me me
100 STATISTICAL TOOLS AND TECHNIQUES MEASURES OF DISPERSION 101
gn holds either if all the values of the variable equal if all the values
Ss. [from 54.(B) MD S s]. Theequality are or
C.v.= x 100%.
It is used for comparing dispersions
primarily of variables having different unitsof
Hence. -7}sie. fs measurement. For example, if we are to compare the
of students with the dispersion of their
dispersion of height (in cm.) of a group
weights (in kg.), we cannot use standard deviation of
either when all the values are equal or when the variable takes the two sets of data as they have different
Here the equality sign holds units, but we can use coefficient of variation.
values with equal frequency. Again, the coefficient of variation is taken as the suitable measure for
only two distinct comparison of
Again, we have dispersionof variables having same unit of measurement
when their means are wide
apart
Suppose there are two groups of people, one rich and the other poor, and the standard deviation
ns 2 - 7 ? = (a- 7) + (b- 7} + X,6- 7 of income of each group is Rs. 100.
i=1
Now, a difference of Rs. 100 in income does not have
and maximum values.
same
significance in the two cases; while for the first
group it is negligible, for the second it
the minimum is not n such cases, where means are
includes all values of x except so.
where , quite different, one should use coefficient of variation,
instead of standard deviation, for
meaningful comparison.
ns22 (a 7) + (b F)= [2(a + 2(b F The distribution having greater coefficient of
variation is regarded
-
(ii) Coefficient of Quartile Deviation: It is expressed as First we find, taking scores of batsman A,
Since for
c.v.
in other words, the height readings are more stable than the V10
variable than the height readings;
weight readings. Hence, c.v.
of B xB 100% = x 100% =48.86%
Tllustration 5.30 We see that c.v. of B < c.v. of A. So, batsman B is more consistent in scoring.
and C.V. 50%, find var(5 2x)
10 =
ifA.M. =
A.M Lorenz curve is a special type of cumulative frequency graph. It is useful in studying the
concentration of wealth or income in relation to certain segment of a population.
30x105
50 =0 x 100 or,
S.D. =
or, 100 Let FCt) denote the percent cumulative frequency for the variable upto the value x and
Var(a) = 4 x (5)? = 100. 9 ) denote the percent cumulative total for the variable upto the value x. Obviously, both
So, Var(5- 21)
=
(-2)
F) and o) vary from 0 to 10. The curvee obtained by plotting Oa) against F(), for
different fixed values of x, is known as the Lorenz curve or the curve of concentration. The
Tlustration 5.31 certain season, given curve is necessarily concave upwards. The line ø) = Fl) is called the line
The scores of two batsmen, A and B, in ten innings during a are
of equal
distribution. In case of an income distribution it would mean that 10% of persons would earn
below: 10% of the total income, 50% of persons would earn 50% of the total income, and so on.
71 39 10 60 96 14
47 63 The more the departure of the Lorenz curve from the line of equal distribution, the more
32 28
48 53 67 90 10 62 40 80
the concentration of the total value in a few hands. The area between the line of equal
is
19 31
more consistent in scoring.
b a t s m e n in
of the
Find which
104 STATISTICAL TOOLS AND TECHNIQUES MEASURES OF DISPERSION 105
100T
(vi) Gini's coefficient G =
90
80 Here R' means the difference between the highest and the lowest income relative to the
mean income. Each of these measures is nonnegative. All the devices, excepting variance,
can
70 then
measure the relative inequality in distribution. If income is divided absolutely equally,
ANDQFEQBÓISTE R= 0. Again, R = N when the entire income goes to a single person. With perfect equality,
0
M = 0 and it becomes 2(N 1)/N in the case of whole income accruing to a single person.
and it is
It may be noted that Gini's coefficient G becomes zero in case of equal distribution
of variation
+0
KOF equal to (N 1)/N in the context of extreme inequality. The coefficient possesses
0 the property of being sensitive to income transfers for all income levels and, unlike the variance,
is independent of the mean income level.
20
It should be noted that Gini's coefficient is exactly one-half of the relative
mean difference,
where mean difference is defined as the arithmetic average of the absolute values of difference[
between all pairs of income. This measure is widely used and it is really a very direct measure
10 20 30 40 50 60 70 80 90 100 of income difference, taking note of differences between every pair of incomes. An alternative
PERCENTAGE CUMULATIVE FREQUENCY representation of G is
Fig. 5.1. The curve of concentration and the area of concentration.
G=1+N NY+ (N-1)Y +..
+2YN- +Yl, for Y, SY, S..s Y
distribution and the Lorenz curve, called the area of concentration, indicates the degree of This expression shows that Gini's coefficient is a rank-order-weighted sum of different
concentration; the larger the area the more is the concentration. Twice the area is Gini's
persons' income shares. Thus the poorest of the N individuals gets the weight eqal N,
to the
coefficient of concentration. next poor person the weight (N - 1), and so on till we arrive at the richest person who gets
5.7 Measures of inequality the Unless the rank-order is changed, even if there is any alteration in income
weight unity.
levels, the weighting pattern remains unchanged.
consider some measures which will be useful in comparing different distributions
Here we
Of course, the proper choice of inequality measure is a major problem. These inequality
of income and wealth. measures mutually differ in the sense that they weigh individual incomes in the perspective
the ith person, Y the total income, Y the mean income and N
Let Y denote the income of of the overall distribution. The range concentrates on the extremes giving zero weight to the
The common measures of inequality are given by, Gini's coefficient considers the income of
the total number of persons. entire middle portion of the distribtion, where as
(6) Relative range R' = Max Y-Min Y) peitdoe-ari cach individual by the rank-order weight.
(ii) EXERCISES
()Standard deviation
of logarithms H=
N2logY-log Y o same standard deviation but the mean of A is
(iv) Two samples A and B have the
The coefficient of variation of A is greater than that of B.
greater than that of B.
14