Professional Documents
Culture Documents
CHAPTER 4 Measure of Dispersion
CHAPTER 4 Measure of Dispersion
Measures of Dispersion
LEARNING OBJECTIVES
There never was in the world two opinions alike, no more than two hairs or two
grains; the most universal quality is diversity.
—Michel de Montaigne
4.1 INTRODUCTION
Just as central tendency can be measured by a number in the form of an average, the
amount of variation (dispersion, spread, or scatter) among the values in the data set
can also be measured. The measures of central tendency describe that the major part
of values in the data set appears to concentrate (cluster) around a central value
called average with the remaining values scattered (spread or distributed) on either
sides of that value. But these measures do not reveal how these values are dispersed
(spread or scatter) on each side of the central value. The dispersion of values is
indicated by the extent to which these values tend to spread over an interval rather
than cluster closely around an average.
1. Techniques that are used to measure the extent of variation or the deviation (also called
degree of variation) of each value in the data set from a measure of central tendency
usually the mean or median. Such statistical techniques are called measures of
dispersion (or variation).
2. Techniques that are used to measure the direction (away from uniformity or symmetry) of
variation in the distribution of values in the data set. Such statistical techniques are
called measures of skewness, discussed in Chapter 5.
To measure the dispersion, understand it, and identify its causes is very important in
statistical inference (estimation of parameter, hypothesis testing, forecasting, and so
on). A small dispersion among values in the data set indicates that data are clusted
closely around the mean. The mean is therefore considered representative of the
data, i.e. mean is a reliable average. Conversely, a large dispersion among values in
the data set indicates that the mean is not reliable, i.e. it is not representative of the
data.
The symmetrical distribution of values in two or more sets of data may have same
variation but differ greatly in terms of A.M. On the other hand, two or more sets of
data may have the same A.M. values but differ in variation as shown in Fig. 4.2.
Illustration Suppose over the six-year period the net profits (in percentage) of two
firms is as follows:
Since average amount of profit is 4.8 per cent for both firms, therefore operating
results of both the firms are equally good and that a choice between them for
investment purposes must depend on other considerations. However, the difference
among the values is greater in Firm, 2, that is, profit is varying from 5.3 to 16.1 per
cent, while the net profit values of Firm 1 were varying from 3.9 to 5.4 per cent. This
shows that the values in data set 2 are spread more than those in data set 1. This
implies that Firm 1 has a consistent performance while Firm 2 has a highly
inconsistent performance. Thus for investment purposes a comparison of the
average (mean) profit values alone should not be sufficient.
Following are some of the purposes for which measures of variation are needed.
The various measures of dispersion (variation) can be classified into two categories:
4.4 DISTANCE MEASURES
As mentioned above, two distance measures discussed in this section are namely:
1. Range, and
2. Interquartile deviation
4.4.1 Range
The range is the most simple measure of dispersion and is based on the location of
the largest and the smallest values in the data. Thus the range is defined to be the
difference between the largest and lowest observed values in a data set. In other
words, it is the length of an interval which covers the highest and lowest observed
values in a data set and thus measures the dispersion or spread within the interval in
the most direct possible way.
For example, if the smallest value of an observation in the data set is 160 and largest
value is 250, then the range is 250 – 160 = 90.
For grouped frequency distributions of values in the data set, the range is the
difference between the upper class limit of the last class and the lower class limit of
first class. In this case the range obtained may be higher than as compared to
ungrouped data because of the fact that the class limits are extended slightly beyond
the extreme values in the data set.
Coefficient of Range
The relative measure of range, called the coefficient of range is obtained by applying
the following formula:
Example 4.1: The following are the sales figures of a firm for the last 12 months
Example 4.2: The following data show the waiting time (to the nearest 100th of a
minute) of telephone calls to be matured:
Advantages
Disadvantages
1. The calculation of range is based on only two values—largest and smallest in the data set
and fail to take account of any other observations.
2. It is largely influenced by two extreme values and completely independent of the other
values. For example, range of two data sets {1, 2, 3, 7, 12} and {1, 1, 1, 12, 12} is 11, but the
two data sets differ in terms of overall di persion of values
3. Its value is sensitive to changes in sampling, that is, different samples of the same size
from the same population may have widely different ranges.
4. It cannot be computed in case of open-end frequency distributions because no highest or
lowest value exists in open-ended class.
5. It does not describe the variation among values in the data between two extremes. For
example, each of the following set of data
has a range of 21 – 9 = 12, but the variation of values is quite
different in each case between the hightest and lowest values.
Applications of Range
1. Fluctuation in share prices: The range is useful in the study of small variations among
values in a data set, such as variation in share prices and other commodities that are very
sensitive to price changes from one period to another.
2. Quality control: It is widely used in industrial quality control. Quality control is exercised
by preparing suitable control charts. These charts are based on setting an upper control
limit (range) and a lower control limit (range) within which produced items shall be
accepted. The variation in the quality beyond these ranges requires necessary correction
in the production process or system.
3. Weather forecasts: The concept of range is used to determine the difference between
maximum and minimum temperature or rainfall by meteorological departments to
announce for the knowledge of the general public.
The median is not necessarily midway between Q1 and Q3, although this will be so for
a symmetrical distribution. The median and quartiles divide the data into equal
numbers of values but do not necessarily divide the data into equally wide intervals.
As shown above the quartile deviation measures the average range of 25 per cent of
the values in the data set. It represents the spread of all observed values because its
value is computed by taking an average of the middle 50 per cent of the observed
values rather than of the 25 per cent part of the values in the data set.
Since quartile deviation is an absolute measure of variation, therefore its value gets
affected by the size and number of observed values in the data set. Thus, the Q.D. of
two or more than two sets of data may differ. Due to this reason, to compare the
degree of variation in different sets of data, we compute the relative measure
corresponding to Q.D., called the coefficient of Q.D., and it is calculated as follows:
Example 4.3: Following are the responses from 55 students to the question about
how much money they spent every day.
Calculate the range and interquartile range and interpret your result.
Solution: The median of the given values in the data set is the (55+ 1)/2 = 28th
value which is 105. From this middle value of 105, there are 27 values at or below of
105 and another 27 at or above of 105.
The lower quartile of Q1 = (27 + 1)/2 = 14th value from bottom of the data
i.e. Q1 = 94 and upper quartile is the 14th value from the top, i.e. Q3 = 120. The 55
values have been partitioned as follows:
The interquartile range is, IQR = 120 – 94 = 26 while the range is R = 150 - 55 = 95.
The middle 50% of the data fall in relatively narrow range of only Rs 26. This means
responses are more densely clustered near the centre of the data and more spread
out towards the extremes. For instance, lowest 25% of the students had responses,
ranging over 55 to 94, i.e. Rs 39, while the next 25% had responses ranging over 94
to 105, i.e. only Rs 11. Similarly, the third quarter had responses from 105 to 110, i.e.
only Rs 5, while the top 25% had responses in the interval (120 to 150), i.e. Rs 30.
The median and quartiles divide the data into equal numbers of values but not
necessarily divide the data into equally wide intervals.
Solution: Since the frequency distribution has open-end class intervals on the two
extreme sides, therefore Q.D. would be an appropriate measure of variation. The
computation of Q.D. is shown in Table 4.1.
1. It is not difficult to calculate but can only be used to evaluate variation among observed
values within the middle of the data set. Its value is not affected by the extreme (highest
and lowest) values in the data set.
2. It is an appropriate measure of variation for a data set summarized in open-end class
intervals.
3. Since it is a positional measure of variation, therefore it is useful in case of erratic or
highly skewed distributions, where other measures of variation get affected by extreme
values in the data set.
Disadvantages
1. The value of Q.D. is based on the middle 50 per cent observed values in the data set,
therefore it cannot be considered as a good measure of variation as it is not based on all
the observations.
2. The value of Q.D. is very much affected by sampling fluctuations.
3. The Q.D. has no relationship to any particular value or an average in the data set for
measuring the variation. Its value is not affected by the distribution of the individual
values within the interval of the middle 50 per cent observed values.
Conceptual Questions 4A
1. Explain the term variation. What does a measure of variation serve? In the light of these,
comment on some of the well-known measures of variation.
[Delhi Univ., MBA, 2001]
2. Explain and illustrate how the measures of variation afford a supplement to the
information about frequency distribution furnished by averages.
[Delhi Univ., MBA, 1999]
0–9 18
10–19 19
20–29 6
30–39 2
40–49 5
Calculate the quartile deviation and its coefficient from the above mentioned
data.
[Kurukshetra Univ., MBA, 1998]
4.5 You are given the data pertaining to kilowatt hours of electricity consumed by
100 persons in a city.
0–10 6
10–20 25
20–30 36
30–40 20
40–50 13
Calculate the range within which the middle 50 per cent of the consumers fall.
4.6 The following sample shows the weekly number of road accidents in a city
during a two-year period:
Below 69.44 19
69.44–104.15 25
104.16–208.32 42
208.33–312.49 12
312.50–416.65 5
Here x11.75 is the interpolated value for the 75% of the distance between 11th and
12th ordered sales amount. Similarly, x4.25 is the interpolated value for the 25% of
the distance between 4th and 5th order sales amount.
4.3 Coefficient of range = 1
4.4 Quartile deviation = 27.76; Coeff. of Q.D. = 0.020; Q1 = 1393.48; Q3 = 1449
4.5 Q3 – Q1 = 34 – 17.6 = 16.4
4.6 Q3 – Q1 = 30.06; Coefficient of Q.D. = 0.561
4.7 Q3 – Q1 = 540.26; Q.D. = 270.13
4.8 Q.D. = 10 years
The range and quartile deviation indicate overall variation in a data set, but do not
indicate spread or scatteredness around the centrilier (i.e. mean, median or mode).
However, to understand the nature of distribution of values in the data set, we need
to measure the ‘spread’ of values around the mean to indicate how representative the
mean is.
In this section, we shall discuss two more measures of dispersion to measure the
mean (or average) amount by which all values in a data set (population or sample)
vary from their mean. These measures deal with the average deviation from some
measure of central tendency—usually mean or median. These measures are:
Since two measures of variation, range and quartile deviation, discussed earlier do
not show how values in a data set are scattered about a central value or disperse
themselves throughout the range, therefore it is quite reasonable to measure the
variation as a degree (amount) to which values within a data set deviate from either
mean or median.
The mean of deviations of individual values in the data set from their actual mean is
always zero so such a measure (zero) would be useless as an indicator of variation.
This problem can be solved in two ways:
Since the absolute difference between a value x of an observation from AM. is always
a positive number, whether it is less than or more than the AM., therefore we take
the absolute value of each such deviation from the AM. (or median). Taking the
average of these deviations from the AM., we get a measure of variation called
the mean absolute deviation (MAD). In general, the mean absolute deviation is
given by
where | | indicates the absolute value. That is, the signs of deviations from the mean
are disregarded.
Formulae (4–6) and (4–7), in different contexts, indicates that the MAD provides a
useful method of comparing the relative tendency of values in the distribution to
scatter around a central value or to disperse themselves throughout the range.
While calculating the mean absolute deviation, the median is also considered for
computing because the sum of the absolute values of the deviations from the median
is smaller than that from any other value. However, in general, arithmetic mean is
used for this purpose.
If a frequency distribution is symmetrical, then A.M. and median values concide and
the same MAD value is obtained. In such a case MAD provides a range in which
57.5 per cent of the observations are included. Even if the frequency distribution is
moderately skewed, the interval MAD includes the same percentage of
observations. This shows that more than half of the observations are scattered
within one unit of the MAD around the arithmetic mean.
The MAD is useful in situations where occasional large and erratic deviations are
likely to occur. The standard deviation, which uses the squares of these large
deviations, tends to over-emphasize them.
Coefficient of MAD
The relative measure of mean absolute deviation(MAD) called the coefficient of
MAD is obtained by dividing the MAD by a measure of central tendency (arithmetic
mean or median) used for calculating the MAD. Thus
Example 4.5: The number of patients seen in the emergency ward of a hospital for
a sample of 5 days in the last month were: 153, 147, 151, 156 and 153. Determine the
mean deviation and interpret.
Solution: The mean number of patients is, = (153 + 147 + 151 + 156 + 153)/5 =
152. Below are the details of the calculations of MAD using formula (4–6).
The mean absolute deviation is 3 patients per day. The number of patients deviate
on the average by 3 patients from the mean of 152 patients per day.
Example 4.6: Calculate the mean absolute deviation and its coefficient from
median for the following data
Example 4.7: Find the mean absolute deviation from mean for the following
frequency distribution of sales (Rs in thousand) in a co-operative store.
Solution: The mean absolute deviation can be calculated by using the formula (4–
6) for mean. The calculations for MAD are shown in Table 4.3. Let the assumed
mean be, A = 175.
Thus the average sales is Rs 179.91 thousand per day and the mean absolute
deviation of sales is Rs 47.01 thousand per day.
The ages of 30 school children are noted as; 11, 8, 10, 5, 7, 12, 7, 17, 5, 13, 9, 8, 10, 15,
7, 12, 6, 7, 8, 11, 14, 18, 6, 13, 9, 10, 6, 15, 3, 5 years respectively. Calculate mean and
standard deviation of monthly scholarship. Find out the total monthly scholarship
amount being paid to the students.
Solution: The number of students in the age group from 5–7 to 17–19 are
calculated as shown in table 4.4:
Table 4.4
The calculations for mean and standard deviation are shown in Table 4.5.
Calculations for monthly scholarship paid to 30 students are shown in Table 4.6.
Advantages
1. The calculation of MAD is based on all observations in the distribution and shows the
dispersion of values around the measure of central tendency.
2. The value of MAD is easy to compute and therefore makes it popular among those users
who are not even familiar with statistical methods.
3. While calculating MAD, equal weightage is given to each observed value and thus it
indicates how far each observation lies from either the mean or median.
4. Average deviation from mean is always zero in any data set. The MAD avoids this
problem by using absolute values to eliminate the negative signs.
Disadvantages
1. The algebraic signs are ignored while calculating MAD. If the signs are not ignored, then
the sum of the deviations taken from arithmetic mean will be zero and close to zero when
deviations are taken from median.
2. The value of MAD is considered to be best when deviations are taken from median.
However, median does not provide a satisfactory result in case of a high degree of
variability in a data set.
Moreover, the sum of the deviations from mean (ignoring signs) is
greater than the sum of the deviations from median (ignoring
signs). In such a situation, computations of MAD by taking
deviations from mean is also not desirable.
3. The MAD is generally unwieldy in mathematical discussions.
Inspite of all these demerits, the knowledge of MAD would help the reader to
understand another important measure of dispersion called the standard deviation.
Another way to disregard the signs of negative deviations from mean is to square
them. Instead of computing the absolute value of each deviation from mean, we
square the deviations from mean. Then the sum of all such squared deviations is
divided by the number of observations in the data set. This value is a measure
called population variance and is denoted by σ2 (a lower-case Greek letter sigma).
It is usually referred to as ‘sigma squared’. Symbolically, it is written as:
The population variance is basically used to measure variation among the values of
observations in a population. Thus for a population of N observations (elements)
and with μ denoting the population mean, the formula for population variance is
shown in Eqn. (4-9). However, in almost all applications of statistics, the data being
analyzed is a sample data. As a result, population variance is rarely determined.
Instead, we compute a sample variance to estimate population variance, σ 2.
Standard Deviation
1. Ungrouped Data
2. Grouped Data
Remarks: 1. For any data set, MAD is always less than the σ because MAD is less
sensitive to the extreme observations. Thus when a data contains few very large
observations, the MAD provides a more realistic measure of variation than σ.
However σ is often used in statistical applications because it is amenable to
mathematical development.
2. When sample size (n) becomes very large, (n - 1) becomes indistinguishable and
becomes irrelevant.
Advantages and Disadvantages of Standard Deviation The advantages and
disadvantages of the standard deviation are summarized below:
Advantages
1. The value of standard deviation is based on every observation in a set of data. It is the
only measure of variation capable of algebraic treatment and less affected by fluctuations
of sampling as compared to other measures of variation.
2. It is possible to calculate the combined standard deviation of two or more sets of data.
3. Standard deviation has a definite relationship with the area under the symmetric curve of
a frequency distribution. Due to this reason, standard deviation is called
a standard measure of variation.
4. Standard deviation is useful in further statistical investigations. For example, standard
deviation plays a vital role in comparing skewness, correlation, and so on, and also widely
used in sampling theory.
Disadvantages
In this question, if we take deviation from an assumed A.M. = 255 instead of actual
A.M. = 260. The calculations then for standard deviation will be as shown in Table
4.8.
Remark: When actual A.M. is not a whole number, assumed A.M. method should
be used to reduce the computation time.
Solution: Let assumed mean, A be 35 and the value of h be 10. Calculations for
standard deviation are shown in Table 4.9.
Solution: To suggest to Mr. Gupta a proposal for high average net present value,
first calculate the expected (average) net present value for both the proposals.
Since the expected NPV in both the cases is same, he would like to choose the less
risky proposal. For this we have to calculate the standard deviation in both the cases.
The sA > sB indicates uniform net profit for proposal A. Thus proposal A may be
chosen.
This formula for combined standard deviation of two sets of data
can be extended to compute the standard deviation of more than
two sets of data on the same lines.
2. Standard deviation of natural numbers: The standard deviation of the
first n natural numbers is given by
For example, the standard deviation of the first 100 (i.e., from 1
to 100) natural numbers will be
3. Standard deviation is independent of change of origin but not of scale.
Example 4.12: For a group of 50 male workers, the mean and standard deviation
of their monthly wages are Rs 6300 and Rs 900 respectively. For a group of 40
female workers, these are Rs 5400 and Rs 600 respectively. Find the standard
deviation of monthly wages for the combined group of workers.
Solution: Given that
Example 4.13: A study of the age of 100 persons grouped into intervals 20–22,
22–24, 24–26,…revealed the mean age and standard deviation to be 32.02 and 13.18
respectively. While checking, it was discovered that the observation 57 was misread
as 27. Calculate the correct mean age and standard deviation.
Solution: From the data given in the problem, we have = 32.02, σ = 13.18 and N =
100. We know that
Example 4.14: The mean of 5 observations is 15 and the variance is 9. If two more
observations having values - 3 and 10 are combined with these 5 observations, what
will be the new mean and variance of 7 observations.
If two more observations having values –3 and 10 are added to the existing 5
observations, then after adding these 6th and 7th observations, we get
Hence the new mean and variance of 7 observations is 11.71 and 45.59 respectively.
For any set of data (population or sample) and any constant z greater than 1
(but need not be an integer), the proportion of the values that lie within z
standard deviations on either side of the mean is at least {1 – (1/z2)}. That is
The relationships involving the mean, standard deviation and the set of observations
are called the empirical rule, or normal rule.
1. The proportion of all x-values in any set of data to fall within the range μ ± 2σ is at
least percent.
That is, at least eight of nine values or 88.9 per cent values must
lie within ±3 standard deviations from the mean.
3. The proportion of all x-values in any set of data must lie within the range μ ± 4σ is at
This theorem has its own limitation as it emphasizes on the word, ‘at least’. For
The theorem is applicable to any data set regardless of the shape of the frequency
distribution of values. For example, assume that the marks obtained by 100 students
in business statistics had a mean of 70 per cent and standard deviation of 10 per
cent. Then number of students who obtained marks between 50 and 85 will be
determined as follows:
1. For 50 per cent marks, z = (50 – 70)/10 = – 2 indicates that 50 is 2 standard deviations
below the mean,
2. For 85 per cent marks, z = (85 – 70)/10 = 1.5 indicates that 85 is 1.5 standard deviations
above the mean.
Now applying the Chebyshev's theorem with z = 2.0, we have
This indicates that at least 75 per cent of the students must have obtained marks
between 50 and 85.
Empirical Rule
For symmetrical, bell-shaped frequency distribution (also called normal curve), the
range within which a given percentage of values of the distribution are likely to fall
within a specified number of standard deviations of the mean is determined as
follows:
μ ± σ covers approximately 68.27 per cent of values in the data set
μ ± 2σ covers approximately 95.45 per cent of values in the data set
μ ± 3σ covers approximately 99.73 per cent of values in the data set
Table 4.10 Relationship Among Measures of Variation
2. Quartile deviation =
Standard deviation =
Daily caloric value of food available per adult during current period:
A 2500 400
B 2000 200
The estimated requirement of an adult is taken as 2800 calories daily and the
absolute minimum is 1350. Comment on the reported figures and determine which
area in your opinion, need more urgent attention.
Solution: Taking into consideration the entire population of the two areas, we have
This shows that there are adults who are taking even less amount of calories, that is,
1300 calories as compared to the absolute minimum requirement of 1350 calories.
These figures are statisfying the requirement of daily calorific need. Hence, area A
needs more urgent attention.
Calculate the mean and standard deviation and determine the percentage of class
that lie between (i) μ ± σ, (ii) μ ± 2σ, and (iii) μ ± 3σ. What percentage of cases lie
outside these limits?
Solution: The calculations for mean and standard deviation are shown in Table
4.11.
The percentage of cases that lie between a given limit are as follows:
Compute the standard deviation and use the criterion ± 3σ, where σ is the
standard deviation and is the arithmetic mean to determine the largest and
smallest size of the collar he should make in order to meet the needs of practically all
the customers bearing in mind that collar are worn on average half inch longer than
neck size.
Since all the customers are to wear collar half inch longer than their neck size, 0.5 is
to be added to the neck size range given above. The new range then becomes:
(11.666 + 0.5) and (15.944 + 0.5) or 12.165 and 16.444, i.e. 12.2 and 16.4 inches.
Breaking Strength Number of Pieces
44–46 3
46–48 24
48–50 27
50–52 21
52–54 5
Calculate the average breaking strength of the alloy and the standard deviation.
Calculate the percentage of observations lying between ± 2σ.
Solution: The calculations for mean and standard deviation are shown in in Table
4.13.
To calculate the percentage of observations lying between ± 2σ, we assume that the
number of observations (pieces) are equally spread within lower and upper
boundary of each class interval (breaking strength). Since 45 is the mid-point of the
class interval 44–46 with the frequency 3, therefore there are 1.5 frequencies at 45.
Similarly, at 53 the frequency would be 2.5. Hence the total number of observations
(frequencies) between 45 and 53 are = 1.5 + 24 + 27 + 21 + 2.5 = 76. So the
percentage of observations lying within ± 2σ would be (76/80) × 100 = 95 per cent.
The set of data for which the coefficient of variation is low is said to be more uniform
(consistent) or more homogeneous (stable).
Example 4.19: The weekly sales of two products A and B were recorded as given
below:
Since the coefficient variation for product A is more than that of product B, therefore
the sales fluctuation in case of product A is higher.
Organization X Organization Y
(b) For calculating the combined variation, we will first calculate the combined mean
as follows:
Conceptual Questions 4B
8. What purpose does a measure of variation serve? In the light of these, comment
on some of the well-known measures of variation.
9. What do you understand by ‘coefficient of variation’? Discuss its importance in
business problems.
[UP Tech. Univ., MBA, 2000]
10. When is the variance equal to the standard deviation? Under what circumstances
can variance be less than the standard deviation? Explain.
11.
1. Explain and illustrate how the measures of variation afford a supplement to the
information about frequency distribution furnished by averages.
[Delhi Univ., MBA, 2001]
2. Describe various methods of measuring variation. Which of these do you consider as the
best and why?
15. Describe the various methods of measuring variation along with their respective
merits and demerits.
[Delhi Univ., MBA, 1998]
16. It has been said that the lesser the variability that exists, the more an average is
representative of a set of data. Comment.
17.
18. What advantages are associated with variance and standard deviation relative to
range as the measure of variability?
19. Suppose you read a published statement that the average amount of food
consumption in this country is adequate; the overall conclusion based upon the
statement is that everyone is properly fed. Criticize the conclusion in terms of the
concept of variability as it relates to the use of averages.
[Delhi Univ., MBA, 2000]
20. The Vice-President, Sales has been studying records regarding the performance
of his sales representatives. He has noticed that in the last 2 years, the average level
of sales per representative has remained the same, while the distribution of the sales
levels has widened. The sales levels from this period have significantly larger
variations from the mean than in any of the previous 2 year periods for which he has
records. What conclusions might be drawn from these observations?
[Delhi Univ., MBA, 1999]
Self-Practice Problems 4B
4.9 Find the average deviation from mean for the following distribution:
4.10 Find the average deviation from mean for the following distribution:
4.11 Find the average deviation from median for the following distribution:
Calculate the variance and standard deviation for the distribution.
4.13 A manufacturer of T-shirts approaches you with the following information
Calculate the standard deviation and advice the manufacturer as to the largest
and the smallest shoulder size T-shirts he should make in order to meet the
needs of his customers.
4.14 A charitable organization decided to give old-age pension to people over sixty
years of age. The scales of pension were fixed as follows:
60–65 200
65–70 250
70–75 300
75–80 350
80–85 400
The ages of 25 persons who secured the pension are as given below:
Calculate the monthly average pension payable per person and the standard
deviation.
4.15 Two automatic filling machines A and B are used to fill tea in 500 g cartons. A
random sample of 100 cartons on each machine showed the following:
Tea Contents (in g) Machine A Machine B
485–490 12 10
490–495 18 15
495–500 20 24
500–505 22 20
505–510 24 18
510–515 4 13
Comment on the performance of the two machines on the basis of average filling
and dispersion.
4.16 An analysis of production rejects resulted in the following observations
Obtain the average wages and the variability in individual wages of all the
workers in the two organizations taken together.
4.20 An analysis of the results of a budget survey of 150 families showed an average
monthly expenditure of Rs 120 on food items with a standard deviation of Rs 15.
After the analysis was completed it was noted that the figure recorded for one
household was wrongly taken as Rs 15 instead of Rs 105. Determine the correct value
of the average expenditure and its standard deviation.
4.21 The standard deviation of a distribution of 100 values was Rs 2. If the sum of
the squares of the actual values was Rs 3,600, what was the mean of this
distribution?
4.22 An air-charter company has been requested to quote a realistic turn-round
time for a contract to handle certain imports and exports of a fragile nature.
The contract manager has provided the management accountant with the
following analysis of turn-round times for similar goods over a given twelve-
monthly period.
Less than 2 25
2 and < 4 36
4 and < 6 66
6 and < 8 47
8 and < 10 26
10 and < 12 18
12 and < 14 2
1. Calculate mean and standard deviation.
2. Advice the contract manager about the turn-round time to be quoted using
If the variance of daily average temperature in a city throughout the year is 25°C,
what is the variance in F° for that year and vice-versa.
4.24 The hourly output of a new machine is four times that of the old machine. If
the variance of the hourly output of the old machine in a period of n hours is 16,
what is the variance of the hourly output of the new machine in the same period
of n hours.
4.25. The number of cheques cashed each day at the five branches of a bank during
the past month has the following frequency distribution:
0–199 10
200–399 13
400–599 17
600–799 42
800–999 18
The General manager, operations for the bank, knows that a standard deviation
in cheque cashing of more than 200 checks per day creates staffing problem at
the branches because of the uneven workload. Should the manager worry about
staffing next month?
4.26. Mr. Gupta, owner of a Bakery, said that the average weekly production level of
his company was 11,398 loaves, and the variance was 49,729. If data used to
compute the results were collected for 32 weeks, during how many weeks was the
production level below 11,175? and Above 11,844?
Coefficient of Variance
4.27 Two salesmen selling the same product show the following results over a long
period of time:
Salesman X Salesman Y
4.29 The number of employees, average daily wages per employee, and the variance
of daily wages per employee for two factories are given below:
1. In which factory is there greater variation in the distribution of daily wages per employee?
2. Suppose in factory B, the wages of an employee were wrongly noted as Rs 120 instead of
Rs 100. What would be the correct variance for factory B?
4.30 The share prices of a company in Mumbai and Kolkata markets during the last
ten months are recorded below:
4.31 A person owns two petrol filling stations A and B. At station A, a representative
sample of 200 consumers who purchase petrol was taken. The results were as
follows:
0 and < 2 15
2 and < 4 40
4 and < 6 65
6 and < 8 40
8 and < 10 30
10 and over 10
4.25
Since standard deviation σ value is more than 200, the manager should worry.
Formulae Used
1. Range, R
Value of highest observation – Value of lowest observation = H – L
Coefficient of range
2. Interquartile range = Q3 - Q1
3. Mean average deviation For ungrouped data
i.
ii.
iii.
2.
3. Variance
where d = (m - A)/h; h is the class interval and m is the mid-value
of class intervals.
4. Standard deviation
True or False
1. Range is a measure of variation which gives us information about scatter of values around
a measure of central tendency.
(T/F)
4. Absolute measures of variation are used for comparing variability among observations in
a data set.
(T/F)
7. The standard deviation is measured in the same unit as the observations in the data set.
(T/F)
10. The inter-quartile range measures the average range of the lower fourth of a distribution.
(T/F)
11. For a symmetrical distribution, mean absolute deviation equals 4/5 of standard deviation.
(T/F)
12. Variance indicates the average distance of any observation in the data set from the mean.
(T/F)
13. Sample standard deviation provides an accurate estimate of the population standard
deviation.
(T/F)
15. Standard deviation can be calculated by taking deviation from any measure of central
tendency.
(T/F)
MULTIPLE CHOICE
16. The standard deviation of a set of 50 observations is 8. If each observation is
multiplied by 2, then the new value of standard deviation will be:
1. 4
2. 8
3. 16
4. none of the above
17. If mean and coefficient of variation of a set of data is 10 and 5, respectively, then
the standard deviation is:
1. 10
2. 50
3. 5
4. none of the above
21. In a normal frequency distribution, the number oif observations included in ±
MAD are:
1. 50 per cent
2. 57.51 per cent
3. 68.51 per cent
4. none of the above
22. If quartile deviation is 8, then value of the standard deviation will be:
1. 12
2. 16
3. 24
4. none of the above
23. If mean absolute deviation is 8, then value of the standard deviation will be:
1. 15
2. 12
3. 10
4. none of the above
24. If the first and third quartiles are 22.16 and 56.36, respectively, then the quartile
deviation is:
1. 17.1
2. 34.2
3. 51.3
4. none of the above.
25. The standard deviation of a set of 50 observations is 6.5. If value of each
observation is increased by 5, then the standard deviation is:
1. 2.5
2. 1.5
3. 3.5
4. none of the above
1.
2.
3.
4.
27. The number of observations in a set of data covered by the interval, ± Q.D. are:
1. 50 per cent
2. 57.73 per cent
3. 59.23 per cent
4. none of the above
29. The relationship between mean absolute deviation and the quartile deviation is:
1.
2.
3.
4.
1. range
2. quartile deviation
3. mean absolute deviation
4. standard deviation
31. Which of the following is not a valid reason for measuring the dispersion of
distribution?
33. Assume that a population has μ = 100 and σ = 10. If a particular observation has
a standard score of 1, it can be concluded that
34. How does the computation of a sample variance differ from the computation of
a population variance?
1. bell-shaped distributions.
2. positively skewed distributions.
3. negatively skewed distributions.
4. all distributions.
0– 4 74
5– 9 192
10–14 280
15–19 105
20–24 23
25#x2013;29 6
4–7 4
8–11 5
12–15 7
16–19 2
20–23 1
24–27 1
45–49 10
50–54 40
55–59 150
60–64 175
65–69 75
70–74 15
75–79 10
What is the mean, variance, and standard deviation of speed for the automobiles
travelling on the highway?
4.35 A work-standards expert observes the amount of time (in minutes) required to
prepare a sample of 10 business letters in the office with observations in ascending
order: 5, 5, 5, 7, 9, 14, 15, 15, 16, 18.
1. Determine the range and middle 70 per cent range for the sample.
2. If the sample mean of the data is 10.9, then calculate the mean absolute deviation and
variance.
4.37 A purchasing agent obtained samples of 60 watt bulbs from two companies. He
had the samples tested in his own laboratory for length of life with the following
results:
1. Which company's bulbs do you think are better in terms of average life?
2. If prices of both the companies are same, which company's bulbs would you buy and
why?
1. Calculate the mean number of days patients stay in the hospital along with standard
deviation of the same.
2. How many patients are expected to stay between 0 and 17 days.
4.39 A nursing home is well-known in effective use of pain killing drugs for
seriously ill patients. In order to know approximately how many nursing staff to
employ, the nursing home has began to keep track of the number of patients that
come every week for checkup. Each week the CMO records the number of seriously
ill patients and the number of routine patients. The data for the last 5 weeks is as
follows:
1. Find the limits within which the middle 75 per cent of seriously ill patients per week
should fall.
2. Find the limits within which the middle 68 per cent of routine patients per week should
fall.
What can Gupta say to his customers about the diameters of 95 per cent of the
gears they are receiving?
[Delhi Univ., MBA, 1998]
4.44 Public transportation and the automobiles are two options an employee can
use to get to work each day. Samples of time (in minutes) recorded for each option
are shown below: Public transportation :
1. Compute the sample mean time to get to work for each option.
2. Compute the sample standard deviation for each option.
3. On the basis of your results from parts (a) and (b), which method of transportation
should be preferred? Explain.
4.45 The mean and standard deviation of a set of 100 observations were worked out
as 40 and 5 respectively by a computer which, by mistake, took the value 50 in place
of 40 for one observation. Find the correct mean and variance.
[Lucknow Univ., MBA, 1989]
4.46 The number of employees, wages per employee and the variance of the wages
per employee for two factories is given below:
1. In which factory is there greater variation in the distribution of wages per employee?
2. Suppose in factory B, the wages of an employee were wrongly noted as Rs 3050 instead of
Rs 3650, what would be the correct variance for factory B?
4.47 In two factories A and B engaged in the same industry, the average weekly
wages and standard deviations are as follows:
4.48 The mean of 5 observations is 4.4 and the variance is 8.24. If three of the five
observations are 1, 2 and 6, find the other two.
4.49 The mean and standard deviation of normal distribution are 60 and 5
respectively. Find the inter-quartile range and the mean deviation of the
distribution:
[Delhi Univ., BCom (H),1997]
4.50 Mean and standard deviation of the following continuous series are 31 and 5.9
respectively. The distribution after taking step deviations is as follows:
4.51 The value of the arithmetic mean and standard deviation of the following
frequency distribution of a continuous variable derived from the use of working
origin and scale are Rs. 107 and 13.1 respectively. Determine the actual classes.
4.52 The mean and standard deviation of a set of 100 observations were found to be
40 and 5 respectively. But by mistake a value 50 was taken in place of 40 for one
observation. Re-calculate the correct mean and standard deviation.
[Lucknow Univ., MBA, 1999]
4.53 The mean and the standard deviation of a sample of 10 sizes were found to be
9.5 and 2.5 respectively. Later on, an additional observation became available.
Thiswas 15.0 and was included in the original sample. Find the mean and the
standard deviation of 11 observations.
4.54 The Shareholders Research Centre of India has recently conducted a research-
study on price behaviour of three leading industrial shares, A, B, and C for the period
1979 to 1985, the results of which are published as follows in its Quarterly Journal:
4.56 An analysis of the weekly wages paid to workers in two firms A and B
belonging to the same industry, gives the following results:
4.37 For company A:
4.38
This indicates that at least 75% patients, i.e. 0.75 (200) = 150 patients should
stay between 0 and 17 days.
4.39
4.41
If distribution is bell-shaped, then 95% of the gears will have diameters in the
interval: ± 2s = 4.002 ± 2 (0.016) = (3.970, 4.034) inches.
4.44 (a) Public : 32; Auto : 32 (b) Public : 4.64; Auto : 1.83 (c) Auto has less
variability.
4.45
4.46
4.47
4.48
4.49
Since deviations are taken from A = 110 and class interval is, h = 10, therefore
the class corresponding to d = 0 will be 105–115. Other classes will be:
4.54 (a) CV(A) = 30, CV(B) = 20 and CV(C) = 25; Share B is more stable.
(b) Dispose share A because of high variability in its price.
4.55 Given N1 + N2 + N3 = 200, N1 = 50, N3 = 90, therefore N2 = 60
Case Studies
In special training sessions with physicians who were to use the system, the director
of the hospital observed that one of the key variables affecting the physicians was the
‘waiting time’ they experienced between inputting data or information requests at a
video matrix terminal and the response by the main-frame computer. One of the
doctor who is cardiologist was particularly vocal in his complaints about the system:
‘Look, I can't wait all day for a machine. I need information that is accurate and in a
form I can use. You can't expect me to also spend time learning how to use your
machine—I have enough to do.’
To the physicians, sitting at a terminal and waiting for the computer to respond was
simply ‘intolerable.’ The director of the hospital was sympathetic to the physicians’
attitude and had negotiated a contract with the computer hardware vendor
specifying that the average waiting time not to exceed 10 seconds.
After the system has been operating nearly 15 months, the director conducted a full-
scale evaluation. In general, all aspects of the system looked either good or excellent
with the exception that only about 60 percent of the physicians were actually using
it, and over the past several months there had been a number of complaints about
excessive waiting times.
The director was considering the possibility of holding a new series of training
sessions for the physicians, but he decided to first review the data collected on actual
waiting times experienced by the physicians. These sets of data were available: those
collected during the original training session in January
1. Calculate the mean waiting time for each of the three sets of data. Do the mean waiting
times appear to be in conformance with the established standard?
2. Calculate the median waiting times for each of the three sets of data. What general
conclusions can you draw?
3. Determine the range and standard deviation for each of the three sets of data and
consider the implications of the results