Professional Documents
Culture Documents
Lecture 4. Dispersion
Lecture 4. Dispersion
Measures of Dispersion
The measures of central tendency, such as the mean, median, and mode, do not reveal the
whole picture of the distribution of a data set. Two data sets with the same mean may
have completely different spreads. The variation among the values of observations for
one data set may be much larger or smaller than for the other data set. Consider the
following two data sets on the ages (in years) of all workers working for each of two
small companies.
The mean age of workers in both these companies is the same, 40 years. If we do not
know the ages of individual workers at these two companies and are told only that the
mean age of the workers at both companies is the same, we may deduce that the workers
at these two companies have a similar age distribution. As we can observe, however, the
variation in the workers’ ages for each of these two companies is very different.
Company 1 36 39
35 38 40 45 47
Company 2
18 27 33 52 70
Thus, the mean, median, or mode by itself is usually not a sufficient measure to reveal the
shape of the distribution of a data set. We also need a measure that can provide some
information about the variation among data values. The measures that help us learn about
the spread of a data set are called the measures of dispersion. The measures of central
tendency and dispersion taken together give a better picture of a data set than the
measures of central tendency alone.
(i) Range
(ii) Variance
(iii) Standard deviation and
(iv) Coefficient of variation
1
Range
The range is the simplest measure of dispersion to calculate. It is obtained by taking the
difference between the largest and the smallest values in a data set.
EXAMPLE
Table 1 gives the total areas in square miles of the four western South-Central states of
the United States.
Table 1
Solution: The maximum total area for a state in this data set is 267,277 square miles, and
the smallest area is 49,651 square miles. Therefore,
Thus, the total areas of these four states are spread over a range of 217,626 square miles.
Disadvantages:
• The range, like the mean, has the disadvantage of being influenced by outliers.
Consequently, the range is not a good measure of dispersion to use for a data set
that contains outliers.
• Another disadvantage of using the range as a measure of dispersion is that its
calculation is based on two values only: the largest and the smallest. All other
values in a data set are ignored when calculating the range. Thus, the range is not
a very satisfactory measure of dispersion.
The standard deviation is the most-used measure of dispersion. The value of the
standard deviation tells how closely the values of a data set are clustered around the
mean. In general, a lower value of the standard deviation for a data set indicates that the
2
values of that data set are spread over a relatively smaller range around the mean. In
contrast, a larger value of the standard deviation for a data set indicates that the values of
that data set are spread over a relatively larger range around the mean.
The standard deviation is obtained by taking the positive square root of the variance.
The variance calculated for population data is denoted by 𝜎 2 and the variance calculated
for sample data is denoted by 𝑠 2 . Consequently, the standard deviation calculated for
population data is denoted by 𝜎 and the standard deviation calculated for sample data is
denoted by s.
Following are what we will call the basic formulas that are used to calculate the variance
∑𝑁 𝑖=1(𝑥𝑖 − 𝜇)
2
𝜎2 =
𝑁
and
∑𝑛𝑖=1(𝑥𝑖 − 𝑥̅ )2
𝑠2 =
𝑛−1
Standard Deviation
With respect to standard deviation, the population standard deviation, , is the (positive)
square root of the population variance and is defined as
∑𝑁
𝑖=1(𝑥𝑖 − 𝜇)
2
𝜎= √𝜎 2 =√
𝑁
Example: A professor teaches two large sections of basic statistics and randomly selects
a sample of test scores from both sections. Find the range and standard deviation for each
sample:
Section 1 50 60 70 80 90
Section 2 72 68 70 74 66
3
∑ 𝑥𝑖 350
Solution: The Mean of test score of section 1 =𝑥
̅̅̅1 = = = 70 and
𝑛 5
∑ 𝑥𝑖 350
The Mean of test score of section 2= 𝑥2 =
̅̅̅ = = 70
𝑛 5
• Although the average grade for both sections is 70, we notice that the grades in
section 2 are closer to the mean, 70, than are grades in section 1.
• And just as we would expect, the range of section 1, 40, is larger than the range of
section 2, which is 8.
Similarly, we would expect the standard deviation for section 1 to be greater than the
standard deviation for section 2.
∑𝑛𝑖=1(𝑥𝑖 − 𝑥̅ )2 1000
𝑠1 = √𝑠12 = √ =√ = √250 = 15.8
𝑛−1 4
∑𝑛𝑖=1(𝑥𝑖 − 𝑥̅ )2 40
𝑠2 = √𝑠22 = √ = √ = √10 = 3.16
𝑛−1 4
Two Observations
1. The values of the variance and the standard deviation are never negative.
2. The measurement units of variance are always the square of the measurement
units of the original data.
4
Calculation of Variance and Standard Deviation using Grouped Data:
∑𝑁
2𝑖=1 𝑓𝑖 (𝑥𝑖 − 𝜇)
2
𝑉𝑎𝑟𝑖𝑎𝑛𝑐𝑒 = 𝜎 =
𝑁
∑𝑁
𝑖=1 𝑓𝑖 (𝑥𝑖 − 𝜇)
2
𝑆𝑡𝑎𝑛𝑑𝑎𝑟𝑑 𝐷𝑒𝑣𝑖𝑎𝑡𝑖𝑜𝑛 = 𝜎 = √𝜎 2 =√
𝑁
𝑁 = ∑ 𝑓𝑖
∑𝑛𝑖=1 𝑓𝑖 (𝑥𝑖 − 𝑥̅ )2
Sample standard deviation = 𝑠 = √𝑠 2 = √
𝑛−1
𝑛 = ∑ 𝑓𝑖
Example: Calculate variance and standard deviation from the following table.
Direct Method
s 2
=
f (x
i i − x)2
10510 .00
=
= 269.48
n −1 39
s = s 2 = 269 .48 = 16.42
Coefficient of Variation
5
The coefficient of variation express the standard deviation as a percentage of the mean.
If the standard deviations in sales for large and small stores selling similar goods are
compared, the standard deviation for large stores will almost always be greater. A simple
explanation is that a large store could be modeled as a number of small stores. Comparing
variation using the standard deviation would be misleading. The coefficient of variation
overcomes this problem by adjusting for the scale of units in the population.
Example: The owners are considering purchasing shares of stock A or shares of stock B,
both listed on the New York Stock Exchange. From the closing date prices of both stocks
over the last several months the means and standard deviations were found to be
considerably different, with x A = $4.00, xB = $80.00, s A = $2.00 and sB = $8.00.
Should stock A be purchased, since the standard deviation of stock B is larger?
Solution: We might think that stock B is more volatile than stock A. The mean closing
prices for the two stocks are x A = $4.00, xB = $80.00. Next, the coefficients of
variation are computed to measure and compare the risk of these competing investment
opportunities:
$2.00 $8.00
CVA = 100% = 50% and CVA = 100% = 10%
$4.00 $80.00
Notice that the market value of stock A fluctuates more from period to period than does
that of stock B.