Professional Documents
Culture Documents
Lec006 - Measures of Dispersion
Lec006 - Measures of Dispersion
Measures of Dispersion
(A):Variability
(B): No Variability
Summary Definitions
▪The measure of dispersion shows how the data is spread or
scattered around the mean.
◼ Measures of variation
give information on the
spread or variability or
dispersion of the data Same center,
values. different variation
Measures of Dispersion
The Range
▪ Simplest measure of dispersion
▪ Difference between the largest and the smallest values:
Example:
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14
Range = 13 - 1 = 12
Measures of Dispersion:
Why The Range Can Be Misleading?
7 8 9 10 11 12 7 8 9 10 11 12
Range = 12 - 7 = 5 Range = 12 - 7 = 5
▪ Sensitive to outliers
1,1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,3,3,3,3,4,5
Range = 5 - 1 = 4
1,1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,3,3,3,3,4,120
Range = 120 - 1 = 119
Notes about Range
Advantages Disadvantages
• Best for symmetric data
• Doesn’t use all of the data,
with no outliers. only the extremes.
• Very much affected if the
• Easy to compute and extremes are outliers.
understand.
• Only shows maximum
• Good option for ordinal spread, does not show
data. shape.
Quartile Measures
• Quartiles split the ranked data into 4 segments with an equal
number of values per segment:
25% 25% 25% 25%
Q1 Q2 Q3
◼ The first quartile, Q1, is the value for which 25% of the observations are
smaller and 75% are larger.
◼ Q2 is the same as the median (50% of the observations are smaller and 50%
are larger).
◼ Only 25% of the observations are greater than the third quartile Q3.
Quartile Measures: Locating Quartiles
Find a quartile by determining the value in the appropriate position
in the ranked data, where:
Ranked Data: 1 2 3 4 5 6 7 8 9
(n = 9)
so Q1 = 12.5
Ranked Data: 1 2 3 4 5 6 7 8 9
(n = 9)
Q1 is in the (9+1)/4 = 2.5 position of the ranked data,
so Q1 = (12+13)/2 = 12.5
• If the result is a fractional half (e.g. 2.5, 7.5, 8.5, etc.) then average
the two corresponding data values.
• The IQR is Q3 – Q1 and measures the spread in the middle 50% of the
data.
• Measures like Q1, Q3, and IQR that are not influenced by outliers are
called resistant measures.
Calculating The Interquartile Range(IQR)
Example
Ranked Data: 2 4 6 8 10 12 14 20 30 60
What is the IQR for the following data?
n=10.
Location Q1 = (n+1)/4 = (10+1)/4 = 2.75, so, use the locations 3: Q1= 6
Also, the location of Q3 = 3(n+1)/4 = 3(10+1)/4 =8.25, so, use the locations 8:
Q3= 20
Example:
Q1 Q2 Q3 Q1 Q2 Q3 Q1 Q 2 Q 3
Boxplot Example
Xsmallest Q1 Q2 Q3 Xlargest
0 1 2 2 3 3 4 5 5 9 27
00 22 33 5
5 27
27
0 5 10 15 20 25 30
Sample Data
The Variance
• Average (approximately) of squared deviations of values from the
mean.
• Sample variance:
(X i − X) 2
S =2 i=1
Where:
n -1 X = arithmetic mean
n = sample size
Xi = ith value of the variable X
For A Population: The Variance σ2
(X i − μ)
2
σ = 2 i =1
N
Where:
μ = population mean
N = population size
Xi = ith value of the variable X
The Standard Deviation S
• Most commonly used measure of variation.
• Shows variation about the mean.
• Is the square root of the variance.
• Has the same units as the original data.
• Sample standard deviation: n
(X i −X ) 2
S= i =1
n -1
The Standard Deviation S
• Most commonly used measure of variation.
• Shows variation about the mean.
• Is the square root of the population variance.
• Has the same units as the original data.
• Population standard deviation: N
(X − μ)
i
2
σ= i=1
N
Approximating the Standard Deviation from a Frequency Distribution
• Assume that all values within each class interval are located at the
midpoint of the class.
s=
( x − x ) f 2
Where:
n = number of values or sample size
n -1 X = estimated mean.
x = midpoint of the jth class
f = number of values in the jth class
= class width
Summary of Measures
Data A
Mean = 15.5
11 12 13 14 15 16 17 18 19 20 21 S = 3.338
Disadvantages:
• The variance is measured in the original units squared.
• Extreme values or outliers effect the variance considerably.
• Hard to calculate manually.
Standard Deviation
•Advantages:
•Same units of measurement as the values.
•Useful in theoretical work and statistical methods
and inference (conclusion).
•Disadvantages:
•Hard to calculate manually.
The Coefficient of Variation
•Measures relative variation. S
CV = *100
•Always in percentage (%). X
CV = * 100
•This can be used to compare two distributions directly
to see which has more dispersion because it does not
depend on units of the distribution.
Comparing Coefficients of Variation
• Stock A:
• Average price last year = $50
• Standard deviation = $5
S $5
CVA = 100% = 100% = 10%
X $50
Both stocks have the same
standard deviation, but stock
• Stock B: B is less variable relative to
• Average price last year = $100 its price
• Standard deviation = $5
S $5
CVB = 100% =
100% = 5%
X $100
Sample statistics
versus
population parameters
Population Sample
Measure
Parameter Statistic
Mean
X
Variance
2 S2
Standard
S
Deviation
Notes: (Some properties of x , S, and S2: