Professional Documents
Culture Documents
Descriptive Statistics - Note1
Descriptive Statistics - Note1
Descriptive
Measures
Definitions
Mode Variance
Standard Deviation
Coefficient of Variation
Measures of Central Tendency
Calculating the Mean, Median and
Mode
Measures of Central Tendency
Purpose:
To determine the
“centre” of the
data values.
The Mean
x i
x i 1
n
Sample size = number of observations
Example 1
The number of work days lost due to illness in a
business per week is given below
(for a 10 week period)
x
i 1
i
Sample mean,
n
x1 x2 x3 ... xn
n
36 28 33 ... 32
10
318
10
31.8
Exercise 1
45.25 years
Properties of the Sample Mean
Uniqueness ‐‐ For a given set of data there is one
and only one mean.
Affected (distorted) by extreme values (outliers)
0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10
Mean = 3 Mean = 4
1 2 3 4 5 15 1 2 3 4 10 20
3 4
5 5 5 5
Properties of the Sample Mean
May better be replaced by the median when
the distribution of the data is ‘skewed’).
An important property of the mean is that it
includes every value in your data set as part of
the calculation.
The Median
Observation 12 23 24 27 34 38 40 40 42
Rank 1 2 3 4 5 6 7 8 9
9 1 th
The median position is 5 rank (observation)
2
Therefore the median = 34
Exercise 1
Sambiri Silicon manufactures computer monitors.
The following data are numbers of computer
monitors produced at the company for a sample of
10 days. Find the median.
24 31 27 25 35 33 26 40 25 28
Properties of the Median
In an ordered array, the median is the “middle”
number (50% above, 50% below)
Uniqueness -- There is only one median for each
set of data.
Not affected by extreme values
0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10
Median = 3 Median = 3
The Mode
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 0 1 2 3 4 5 6
No Mode
Mode = 9
Properties of the Mode
There can be one mode
There can be several modes
We are now stuck as to which mode best
describes the central tendency of the data.
This is particularly problematic when we have
continuous data because we are more likely not to
have any one value that is more frequent than the
other.
Properties of the Mode
For example, consider measuring 30 peoples'
weight (to the nearest 0.1 kg). How likely is it
that we will find two or more people with
exactly the same weight (e.g., 67.4 kg)? The
answer, is probably very unlikely ‐ many
people might be close, but with such a small
sample (30 people) and a large range of
possible weights, you are unlikely to find two
people with exactly the same weight; that is,
to the nearest 0.1 kg. This is why the mode is
very rarely used with continuous data.
Question
When re‐ordering, the most common hat or
jeans size is what you would like to know, not
the average hat or jeans size.
The Shape: Skewness
Ch
ap
Basic Business Statistics, 11e © 2009 3-
Prentice-Hall, Inc.. 31
Measures of Central Tendency:
Summary
Central Tendency
X i
XG ( X1 X2 Xn )1/ n
X i1
n Middle value Most Rate of
in the ordered frequently change of
array observed a variable
value over time
Measures of Dispersion
Measures of Dispersion
Dataset 1
Dataset 2
Measures of Dispersion
Population 1 Population 2
Narrow range Wide range
Smaller Larger
variation variation
Smaller Larger
deviation deviation Population 1
Observations Observations
clustered spread out Population 2
Same centre,
different variation
Measures of Dispersion
The measures of central tendency, the mean, median
and mode, do not reveal the whole picture of the
distribution of the dataset.
Variation
Measures of variation give
information on the spread or
variability or dispersion of
the data values.
Same centre,
different variation
Measures of Dispersion:
The Range
Example:
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14
Range = 13 – 1 = 12
Measures of Dispersion:
Why The Range Can Be Misleading
Range 12 - 7 5 Range 12 - 7 5
Measures of Dispersion:
Why The Range Can Be Misleading
Ignores the way in which data are distributed
7 8 9 10 11 12 7 8 9 10 11 12
Range 12 - 7 5 Range 12 - 7 5
Measures of Dispersion:
Why The Range Can Be Misleading
Range 5-1 4
Sensitive to outliers
1,1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,3,3,3,3,4,5
Range 5-1 4
1,1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,3,3,3,3,4,120
(x i x) 2
xi 2
nx 2
s
2 i1
i 1
n1 n1
Where
X = arithmetic mean
n = sample size
Xi = ith observation of the
variable X
The Sample Standard Deviation
Most commonly used measure of variation
Tells us how much observations in our sample
differ from the mean value within our sample.
Has the same units as the original data making
it easier to interpret.
s s 2
Example
For this sample data Xi:
2, 3, 5, 1, 4, 3, 2, 4 find.
1. Sample variance
2. Sample standard deviation
The variation or dispersion in a set of values refers to
how spread out the values are from each other.
Smaller variation
Larger variation
The Coefficient of Variation
The variance and the standard deviation are useful
as measures of variation of the values of a single
variable for a single population (or sample).
Measures relative variation to the mean
Expressed as a percentage (%)
s
CV = ×100%
x
The Coefficient of Variation
The coefficient of variation compares the
variability of two different datasets even if they
have different units of measurement.
Example 1
Spot, the dog, weighs 65 pounds. Spot’s weight
fluctuates 5 pounds depending on Spot’s
exercise level.
Sea Biscuit, the horse, weighs 1200 pounds.
Sea Biscuit’s weight fluctuates 125 pounds
depending on the number of rides Sea
Biscuit goes on.
Ch
ap
Basic Business Statistics, 11e © 2009 3-
Prentice-Hall, Inc.. 52
Coefficient of Variation
Some financial investors use the
coefficient of variation as a measure of
risk.
What does the Coefficient of
Variation tell us about the risk of a
stock that the standard deviation
does not?
Relative to the amount invested in a
stock, the coefficient of variation reveals
the risk of a stock in terms of the size of
the standard deviation relative to the
size of the mean (in percentage).
Example 2
Relative to the amount of money invested in the
stock, which stock, A or B, is riskier?
Stock A Stock B
Average
$50 $100
price
Standard
$5 $5
deviation
Comparing Coefficients of Variation
s 5
CVA 100% 100% 10%
x 50
s 5
CVB 100% 100% 5%
x 100
Comparing the C.V. it is clear that variation is much
higher stock A than in stock B.
Example 3
The yearly salaries of all employees who work
for a company have a mean of $62,350 and a
standard deviation of $6820.
The years of experience for the same
employees have a mean of 15 years and a
standard deviation of 2 years.
Is the relative variation in the salaries larger or
smaller than that in the years of experience for
these employees?
Interpretation
A low (%) value shows low variability
implying tight clustering of observations
about the mean.
A middle to high (%) value shows high
variability implying that observations are
widely spread.
Measures of Position for
ungrouped data
(Quartile Measures)
Quartile Measures
Q1 position 0.25 n 1
Q 2 position 0.5 n 1
Q3 position 0.75 n 1
Step 3: Determine the quartile values.
The Interquartile Range (IQR)
Remember that the range can be distorted by
outliers.
The IQR excludes these outliers and focuses on the
spread of the middle 50% of the data values.
The IQR is also called the 50% mid‐spread range.
IQR Q3 Q1
The Interquartile Range (IQR)
Weakness
The IQR, like the range, also provides no
information on the clustering of observations
within the dataset as it uses only two
observations in its computation.
Example 1
Find
1. Q1 and Q3
2. IQR
Locating First quartile, Q1
11 12 13 16 16 17 18 21 22
(n = 9)
Q1 is in the 0.25(9+1)=2.5 th position of the ranked
data
so use the value half way between the 2nd and 3rd values
12 13 13 12
Q1 12.5 or Q 1 12 12.5
2 2
Locating Third Quartile, Q3
11 12 13 16 16 17 18 21 22
(n = 9)
Q3 is in the 0.75(9+1)=7.5 th position of the ranked
data
so use the value half way between the 7th and 8th values.
18 21 21 18
Q3 19.5 or Q 3 18 19.5
2 2
The Interquartile Range (IQR)
IQR Q3 Q1
19.5 12.5
7.0
Example 2
Find
1. Q1 and Q3
2. IQR
Locating First quartile, Q1
7 8 9 10 11 12 13 13 14 17 17 45
9 9.5 10 9
Q1 9.25 or Q 1 9 9.25
2 4
Locating Third Quartile, Q3
7 8 9 10 11 12 13 13 14 17 17 45
15.5 17 17 14
Q3 16.25 or Q 3 17 16.25
2 4
The Interquartile Range (IQR)
IQR Q3 Q1
16.25 9.25
7.0
End of Chapter
Grouped data
Mean
Variance
CV
Ch
ap
Basic Business Statistics, 11e © 2009 3-
Prentice-Hall, Inc.. 75