Download as pdf or txt
Download as pdf or txt
You are on page 1of 38

Measures of

Dispersion
Measures of Dispersion
• The degree to which numerical data tend to spread about
an average value is called the dispersion, or variation of the
data.

• The measures of dispersion indicate the extent to which


observations in the data differ from the average value.
Measures of Dispersion

MCTs are not enough to describe the data

3
Measures of Dispersion
Measures of Absolute Dispersion
• carries the unit of measure of the observations
• can be used to compare data sets with the same
means and the same units of measurement

Measures of Relative Dispersion


• unitless so it can be used to compare the dispersion of
two or more data sets with different means or different
units of measurement.

BASIC STATISTICAL ANALYSIS 4


Measures of Dispersion
Measures of Absolute Dispersion
• Range
• Standard deviation

Measures of Relative Dispersion


• Coefficient of Variation
• Standard Score

BASIC STATISTICAL ANALYSIS 5


Range
The range is the difference between the maximum and
minimum values of a data set.

Range = maximum – minimum

Properties of the Range


• It does not take into account middle observations.
• It is affected by outliers
• It tends to be smaller for smaller samples than for
larger samples.

6
Mean Deviation
Mean Deviation
Class Interval Class Midpoint Frequency I(x-x̄)I f I(x-x̄)I
53-59 56 3 19.78 59.34
60-66 63 14 12.78 178.92
67-73 70 15 5.78 86.7
74-80 77 26 1.22 31.72
81-87 84 10 8.22 82.2
88-94 91 8 15.22 121.76
95-101 98 4 22.22 88.88
80 75.78 6062.4
Sum 6711.92

Mean = 75.78
MAD = 83.90
Semi-Interquartile Range
• The semi-interquartile range or quartile deviation, of a set
of data is denoted by Q and is defined by

where Q1 and Q3 are the first and third quartiles for the
data.
• The interquartile range Q3–Q1 is sometimes used, but the
semi-interquartile range is more common as a measure of
dispersion.
Semi-Interquartile Range
Example:
Find the semi-interquartile range for the height distribution:
Lower quartile, Q1 = 65.64
Upper quartile, Q3 = 69.61
Semi-interquartile range, Q = (69.61 – 65.64)/2 = 1.98

Note that 50% of the cases lie between Q1 and Q3, thus 50
students have heights between 65.64 and 69.91. We can
consider ½ (Q1 + Q3) = 67.63 to be a measure of central
tendency. It follows that 50% of the heights lie in the range
67.63 ± 1.98 in.
The 10 – 90 Percentile Range
/ Interdecile Range
The 10 – 90 percentile range of a set of data is defined by

10 – 90 percentile range = P90 – P10

where P10 and P90 are the 10th and 90th percentiles for the
data. The semi- 10 – 90 percentile range, ½ (P90 – P10) can
also be used but is not commonly employed.
The 10 – 90 Percentile Range
/ Interdecile Range
Example:
Find the 10 – 90 percentile range of the heights of the
students
10th percentile, P10 = 63.33 in.
90th percentile, P90 = 71.27 in.
10 – 90 percentile range = 71.27 – 63.33 = 7.94.
Since ½ (P90 + P10) = 67. 30in and ½ (P90 – P10) = 3.97 in, we can
conclude that 80% of the students in the range 67.30 ± 3.97 in.
Variance
▪ The variance of a set of data is the square of the standard
deviation and is given by “s2”. To distinguish the standard
deviation of a population to the standard deviation of a
sample drawn, the symbol “s” is used for the latter and “σ”
for the former.
▪ It describes how far the observations are from the mean. It
comes in square of the unit of measure of the
observations.
Variance

Population Variance Sample Variance


is a parameter is a statistic

where xi = ith observation of the variable X


N = number of observations in the population
n = number of observations in the sample

14
Variance for Grouped Data
Sheppard’s Correction for
Variance
The computation of the standard deviation is somewhat in error
as a result of grouping the data into classes (grouping error). To
adjust for grouping error:

Corrected Variance = variance from grouped data - ( c2 / 12 )


where c is the class interval size.

The correction is subtracted.

Statisticians differ as to when or whether Sheppard correction


should be applied. It should certainly not be applied before one
examines the situation thoroughly, for it often tends to
overcorrect, thus replacing an old error with a new one.
Standard Deviation
• A standard deviation (or σ) is a measure of how dispersed
the data is in relation to the mean. Low standard deviation
means data are clustered around the mean, and high
standard deviation indicates data are more spread out. It is
the square root of the variance.
• It is the positive square root of the variance. Its unit is the
same as the unit of measurement of the observations.
Standard Deviation
Example:
Consider the pre-test scores of eight sampled
participants in descriptive statistics training course: 10,
12, 14, 15, 17, 18, 18, 24

x = 10 + 12 + 14 + 15 + 17 + 18 + 18 + 24 = 128 = 16
8 8

18
Standard Deviation
Example:
Consider the pre-test scores of eight sampled
participants in descriptive statistics training course:
10, 12, 14, 15, 17, 18, 18, 24

10 − 16 2 + 12 − 16 2 + + 24 − 16 2
S=
( ) ( ) ( ) = 4.3095
8-1

On the average, the pre-test scores of the sampled participants


deviates from 16 by 4.3095

19
Standard Deviation
Example:
Site A: Heights (inches) of five trees

180” 180” 180” 180” 180”


Find the mean, range, and standard deviation.

BASIC STATISTICAL ANALYSIS 20


Standard Deviation
Example:
Site B: Heights (inches) of five trees

130” 188” 170” 194” 120”


Find the mean, range, and standard deviation.

BASIC STATISTICAL ANALYSIS 21


Standard Deviation
Mean = 15.5
Data A 11 12 13 14 15 16 17 18 19 20 21
s = 3.338

Data B 11 12 13 14 15 16 17 18 19 20 21
Mean = 15.5
s = 0.9258

Data C
Mean = 15.5
11 12 13 14 15 16 17 18 19 20 21 s = 4.57

BASIC STATISTICAL ANALYSIS 22


Standard Deviation
Remarks:

If there is a large amount of variation in the data set,


then on the average, the data values will be far from
the mean.

Hence, the standard deviation will be large;


otherwise the standard deviation will be small.

BASIC STATISTICAL ANALYSIS


Standard Deviation
▪ The standard deviation can never be a negative number.
▪ The smallest possible value for the standard deviation is 0,
and that happens only in situations where every single
number in the data set is exactly the same (no deviation).
▪ The standard deviation is affected by outliers (extremely low
or extremely high numbers in the data set). That’s because
the standard deviation is based on the distance from the
mean. The mean is also affected by outliers.
▪ The standard deviation has the same units of measure as the
original data.
Asynchronous Activity!
Class Limits Class Midpoint Frequency
53-59 56 3
60-66 63 14
67-73 70 15
74-80 77 26
81-87 84 10
88-94 91 8
95-101 98 4
Total 80

Mean = 75.78
With the frequency distribution above, find the following:
a. Semi-interquartile Range
b. Interdecile Range
c. Variance (with and without Sheppard’s Correction)
d. Standard Deviation
Empirical Relations Between
Measures of Dispersion
For moderately skewed distributions, we have the empirical
formulas

Mean deviation = 4/3 (standard deviation )


Semi-interquartile Range = 2/3 (standard deviation)
Absolute and Relative
Dispersion
• The actual variation, or dispersion, as determined from the
standard deviation or other measure of dispersion is called
the absolute dispersion.
• However, a variation (or dispersion) of 10 inches in
measuring a distance of 1000 feet is quite different in
effect from the same variation of 10 inches in a distance of
20 feet. A measure of this effect is supplied by the relative
dispersion, which is defined by
Coefficient of Variation
• If the absolute dispersion is the standard deviation (s) and mean
X, then the relative dispersion is called the coefficient of
variation, or coefficient of dispersion; it is denoted by V,
generally expressed as a percentage and is given by

• Since the coefficient of variation is independent of the units


used, it is useful in comparing distributions when the units may
be different. A disadvantage of the coefficient of variation is
that it fails to be useful when mean X is close to zero.
Coefficient of Variation
CV =σ × 100% CV = s × 100%
μ
Population CV
is a parameter

σ= population standard deviation


μ = population mean

29
Coefficient of Variation
Example:
Suppose you have two options in buying a stock. Stock 1 is
currently priced at P2000 per share and stock 2 is priced
P550 per share. In buying stocks, risk is reduced by choosing
a stock with stable price. However, once could take a chance
on a stock that shows greater variation in price, hoping the
prices go up rather than down. A sample of prices of Stock 1
and Stock 2 were collected at the close of trading for the
past months.

30
Coefficient of Variation
Example:

Stock Mean Price Std. Deviation


1 P1975 P578
2 P 565 P85

CV = 578 ×100 = 29.3% CV = 85 ×100 = 15.0%


stock1
1975 stock2
565

Stock 1 price is more variable than stock 2 price.

BASIC STATISTICAL ANALYSIS 31


Standard Score
Standard Score
• The standard score (z-score) helps determine the
relative position of an observed value in the
collection where the observed value came from.

• A positive z-score measures the number of standard


deviations an observation is above the mean while a
negative z score gives the number of standard
deviations an observation is below the mean.

33
Standard Score

Z = x− μ Z= x −x
σ s
Population z-score Sample z-score
σ = population standard s = sample standard
deviation deviation
μ = population mean x = sample mean

34
Standard Score
Example:
The mean score of participants in Exercise 1 of the training
course is 70% with standard deviation of 10%; while in
Exercise 2, the mean score is 80% with a standard
deviation of 10%.

If you got a score of 75% in Exercise 1 and a score of 85%


in Exercise 2,in which exercise did you perform better if we
consider the score of the other participants in the two
training courses?

BASIC STATISTICAL ANALYSIS 35


Standard Score
Example:

exer1 exer2

Considering the scores of the other participants in the two


exercises, your score in Exercise 1 is just as good as your
score in Exercise 2. Based on the z-scores, your scores in
both training courses are 0.5 standard deviations above
their respective mean scores.

36
Standard Score
Remark on the Standard Score
• It can be used in identifying possible outliers in the
data set. By rule of thumb, if the absolute value of
the standard score is
at least 3 then that observation is marked
as a possible outlier.

BASIC STATISTICAL ANALYSIS 37


Measure of Dispersion
You want to determine the change in the physical
characteristics of janitor fish in Laguna de Bay. You
measured the length of seven fish samples that were
collected in 2016 and 2017. The data obtained are as
follows:
Year Length (mm)
2016 82 82 83 83 83 84 91
2017 129 83 82 83 50 46 115

Compare the variability of the janitor fishes in terms


of length between years.

38

You might also like