Download as pdf or txt
Download as pdf or txt
You are on page 1of 6

Lecture

Measures of Dispersion

The measures of central tendency, such as the mean, median, and mode, do not reveal the
whole picture of the distribution of a data set. Two data sets with the same mean may
have completely different spreads. The variation among the values of observations for
one data set may be much larger or smaller than for the other data set. Consider the
following two data sets on the ages (in years) of all workers working for each of two
small companies.

Company 1: 47, 38, 35, 40, 36, 45, 39


Mean =280/7 = 40

Company 2: 70, 33, 18, 52, 27


Mean = 200/5=40

The mean age of workers in both these companies is the same, 40 years. If we do not
know the ages of individual workers at these two companies and are told only that the
mean age of the workers at both companies is the same, we may deduce that the workers
at these two companies have a similar age distribution. As we can observe, however, the
variation in the workers’ ages for each of these two companies is very different.

Company 1 36 39

35 38 40 45 47

Company 2
18 27 33 52 70

Thus, the mean, median, or mode by itself is usually not a sufficient measure to reveal the
shape of the distribution of a data set. We also need a measure that can provide some
information about the variation among data values. The measures that help us learn about
the spread of a data set are called the measures of dispersion. The measures of central
tendency and dispersion taken together give a better picture of a data set than the
measures of central tendency alone.

Different Measures of Dispersion:

(i) Range
(ii) Variance
(iii) Standard deviation and
(iv) Coefficient of variation

1
Range
The range is the simplest measure of dispersion to calculate. It is obtained by taking the
difference between the largest and the smallest values in a data set.

Finding the Range for Ungrouped Data

Range = Largest value - Smallest value

EXAMPLE
Table 1 gives the total areas in square miles of the four western South-Central states of
the United States.

Table 1

State Total Area


(square miles)
Arkansas 53,182
Louisiana 49,651
Oklahoma 69,903
Texas 267,277

Find the range for this data set.

Solution: The maximum total area for a state in this data set is 267,277 square miles, and
the smallest area is 49,651 square miles. Therefore,

Range = Largest value - Smallest value


= 267,277 - 49,651 = 217,626 square miles

Thus, the total areas of these four states are spread over a range of 217,626 square miles.

Disadvantages:

• The range, like the mean, has the disadvantage of being influenced by outliers.
Consequently, the range is not a good measure of dispersion to use for a data set
that contains outliers.
• Another disadvantage of using the range as a measure of dispersion is that its
calculation is based on two values only: the largest and the smallest. All other
values in a data set are ignored when calculating the range. Thus, the range is not
a very satisfactory measure of dispersion.

Variance and Standard Deviation

The standard deviation is the most-used measure of dispersion. The value of the
standard deviation tells how closely the values of a data set are clustered around the
mean. In general, a lower value of the standard deviation for a data set indicates that the

2
values of that data set are spread over a relatively smaller range around the mean. In
contrast, a larger value of the standard deviation for a data set indicates that the values of
that data set are spread over a relatively larger range around the mean.

The standard deviation is obtained by taking the positive square root of the variance.
The variance calculated for population data is denoted by 𝜎 2 and the variance calculated
for sample data is denoted by 𝑠 2 . Consequently, the standard deviation calculated for
population data is denoted by 𝜎 and the standard deviation calculated for sample data is
denoted by s.

Calculation of Variance and Standard Deviation for Individual Observation

Following are what we will call the basic formulas that are used to calculate the variance
∑𝑁 𝑖=1(𝑥𝑖 − 𝜇)
2
𝜎2 =
𝑁
and

∑𝑛𝑖=1(𝑥𝑖 − 𝑥̅ )2
𝑠2 =
𝑛−1

where 𝝈𝟐 is the population variance and 𝒔𝟐 is the sample variance.

Standard Deviation
With respect to standard deviation, the population standard deviation,  , is the (positive)
square root of the population variance and is defined as
∑𝑁
𝑖=1(𝑥𝑖 − 𝜇)
2
𝜎= √𝜎 2 =√
𝑁

The sample standard deviation, s, is


∑𝑛𝑖=1(𝑥𝑖 − 𝑥̅ )2
𝑠 = √𝑠 2 = √
𝑛−1

Example: A professor teaches two large sections of basic statistics and randomly selects
a sample of test scores from both sections. Find the range and standard deviation for each
sample:

Section 1 50 60 70 80 90
Section 2 72 68 70 74 66

3
∑ 𝑥𝑖 350
Solution: The Mean of test score of section 1 =𝑥
̅̅̅1 = = = 70 and
𝑛 5
∑ 𝑥𝑖 350
The Mean of test score of section 2= 𝑥2 =
̅̅̅ = = 70
𝑛 5

Range of test scores of section 1 = 90 – 50 = 40


Range of test scores of section 2 = 74 – 66 = 8

• Although the average grade for both sections is 70, we notice that the grades in
section 2 are closer to the mean, 70, than are grades in section 1.
• And just as we would expect, the range of section 1, 40, is larger than the range of
section 2, which is 8.

Similarly, we would expect the standard deviation for section 1 to be greater than the
standard deviation for section 2.

Section 1 (𝑥𝑖 − 𝑥̅ )2 Section 2 (𝑥𝑖 − 𝑥̅ )2


xi xi
50 400 72 4
60 100 68 4
70 0 70 0
80 100 74 16
90 400 66 16
Total ∑𝑛𝑖=1(𝑥𝑖 − 𝑥̅ )2 =1000 ∑𝑛𝑖=1(𝑥𝑖 − 𝑥̅ )2 =40

∑𝑛𝑖=1(𝑥𝑖 − 𝑥̅ )2 1000
𝑠1 = √𝑠12 = √ =√ = √250 = 15.8
𝑛−1 4

∑𝑛𝑖=1(𝑥𝑖 − 𝑥̅ )2 40
𝑠2 = √𝑠22 = √ = √ = √10 = 3.16
𝑛−1 4

Two Observations
1. The values of the variance and the standard deviation are never negative.
2. The measurement units of variance are always the square of the measurement
units of the original data.

4
Calculation of Variance and Standard Deviation using Grouped Data:

∑𝑁
2𝑖=1 𝑓𝑖 (𝑥𝑖 − 𝜇)
2
𝑉𝑎𝑟𝑖𝑎𝑛𝑐𝑒 = 𝜎 =
𝑁
∑𝑁
𝑖=1 𝑓𝑖 (𝑥𝑖 − 𝜇)
2
𝑆𝑡𝑎𝑛𝑑𝑎𝑟𝑑 𝐷𝑒𝑣𝑖𝑎𝑡𝑖𝑜𝑛 = 𝜎 = √𝜎 2 =√
𝑁
𝑁 = ∑ 𝑓𝑖

Sample Variance and Standard Deviation for Frequency Distribution

The sample standard deviation, s, is


∑𝑛𝑖=1 𝑓𝑖 (𝑥𝑖 − 𝑥̅ )2
2
𝑆𝑚𝑎𝑝𝑙𝑒 𝑣𝑎𝑟𝑖𝑎𝑛𝑐𝑒 = 𝑠 =
𝑛−1

∑𝑛𝑖=1 𝑓𝑖 (𝑥𝑖 − 𝑥̅ )2
Sample standard deviation = 𝑠 = √𝑠 2 = √
𝑛−1

𝑛 = ∑ 𝑓𝑖

Example: Calculate variance and standard deviation from the following table.

Table 3: Frequency distribution of data of days to maturity 40 short-term investments.

Direct Method

Class Frequency Midpoint ( xi − x ) 2 f i ( xi − x ) 2


interval fi (xi)
30—39 3 34.5 1122.25 3366.75
40—49 1 44.5 552.25 552.25
50—59 8 54.5 182.25 1458.00
60—69 10 64.5 12.25 122.50
70—79 7 74.5 42.25 295.75
80—89 7 84.5 272.25 1905.75
90—99 4 94.5 702.25 2809.00
Total 40 10510.00

s 2
=
 f (x
i i − x)2
10510 .00
=
= 269.48
n −1 39
s = s 2 = 269 .48 = 16.42

Coefficient of Variation

5
The coefficient of variation express the standard deviation as a percentage of the mean.

The population coefficient of variation is



CV =  100% if   0

The sample coefficient of variation is


s
CV =  100% if x  0
x

If the standard deviations in sales for large and small stores selling similar goods are
compared, the standard deviation for large stores will almost always be greater. A simple
explanation is that a large store could be modeled as a number of small stores. Comparing
variation using the standard deviation would be misleading. The coefficient of variation
overcomes this problem by adjusting for the scale of units in the population.

Example: The owners are considering purchasing shares of stock A or shares of stock B,
both listed on the New York Stock Exchange. From the closing date prices of both stocks
over the last several months the means and standard deviations were found to be
considerably different, with x A = $4.00, xB = $80.00, s A = $2.00 and sB = $8.00.
Should stock A be purchased, since the standard deviation of stock B is larger?

Solution: We might think that stock B is more volatile than stock A. The mean closing
prices for the two stocks are x A = $4.00, xB = $80.00. Next, the coefficients of
variation are computed to measure and compare the risk of these competing investment
opportunities:

$2.00 $8.00
CVA =  100% = 50% and CVA =  100% = 10%
$4.00 $80.00

Notice that the market value of stock A fluctuates more from period to period than does
that of stock B.

You might also like