Download as pdf or txt
Download as pdf or txt
You are on page 1of 43

AST10113 Foundation Statistics

Lecture 2
Measures of Central Tendency and
Kirk Chan & Charmaine Lau

Foundation Statistics – Measures of Central Tendency and Variation

Summary Measures

Describing Data Numerically

Central Tendency Quartiles Variation

Arithmetic Mean Asymmetry Range

Median Shape Interquartile Range

Mode Variance
Geometric Mean Standard Deviation

Coefficient of Variation

Foundation Statistics – Measures of Central Tendency and Variation 2

Measures of Central Tendency
Central Tendency

Arithmetic Mean Median Mode Geometric Mean

X G = (X 1  X 2    X n )1 / n
 Xi
X = i =1
n Midpoint of Most
ranked frequently
values observed

Foundation Statistics – Measures of Central Tendency and Variation 3

Arithmetic Mean
 The arithmetic mean (or usually called mean) is the most
common measure of central tendency
 Sample mean:  Population mean:

 Xi  Xi
X = i =1 = i=1

n N
X1 + X 2 ++ Xn X1 + X 2 ++ XN
n N
where n is the sample size where N is the population size
Xi is the ith observation Xi is the ith observation

Foundation Statistics – Measures of Central Tendency and Variation 4

Arithmetic Mean (cont’d)
 The most common measure of central tendency
 Mean = sum of values divided by the number of values
 Sensitive to extreme values (outliers)
0 1 2 3 4 5 6 7 8 9 10

Mean = 3
1 + 2 + 3 + 4 + 5 15
= =3
5 5

0 1 2 3 4 5 6 7 8 9… 20

Mean = 6
1 + 2 + 3 + 4 + 20 30
= =6
5 5

Foundation Statistics – Measures of Central Tendency and Variation 5

 In an ordered array, the median is the “middle” number
(50% above, 50% below)

0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9…20

Median = 3 Median = 3

 Robust to extreme values

Foundation Statistics – Measures of Central Tendency and Variation 6

Finding the Median
 The location of the median:
n +1
Median position = position in the ordered data
– If the number of values is odd, the median is the middle number
– If the number of values is even, the median is the average of the
two middle numbers

n +1
 Note that 2
is not the value of the median, only the
position of the median in the ranked data

Foundation Statistics – Measures of Central Tendency and Variation 7

 A measure of central tendency
 Value that occurs most often
 Not affected by extreme values
 Used for either numerical or categorical (nominal) data
 -> There may be no mode
 -> There may be several modes

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 0 1 2 3 4 5 6

Mode = 9

Foundation Statistics – Measures of Central Tendency and Variation 8

Geometric Mean
 Formula for Geometric Mean:
GM = n ( X 1)( X 2)( X 3)...( Xn )
– X1…n is a set of positive numbers (i.e. rate of change in
percentages, ratios, etc.)
– n is the total number of values

 Widely used in business and economics to find the

average of
– Percentages
– Ratios
– Growth rates

Foundation Statistics – Measures of Central Tendency and Variation 9

Example for Geometric Mean
 Example:
Suppose that Anne receives a 5% increase in salary this
year and a 15% increase next year. Calculate the average
annual percentage increase.

 Answer: GM = 2 (1.05)(1.15) = 1.098863

(i.e. average annual % increase =1.098863 -1 = 0.098863 =9.8863%)

Geometric mean rate of return (GMRR)

 Assume the original salary for Anne is USD3000,

– Salary at year 2 = 3000*1.05*1.15 = 3622.50
– Using GM: 3000 * 1.098863*1.098863 = 3622.50
Foundation Statistics – Measures of Central Tendency and Variation 10
Exercise: Real estate project
 Suppose a real estate project yields 6%, 12% and 10%
increase in the first, second and third year respectively.
Find the geometric mean rate of return (GMRR).

 Answer:

Foundation Statistics – Measures of Central Tendency and Variation 11

 Quartiles split the ranked data into 4 segments with an
equal number of values per segment
25% 25% 25% 25%

Q1 Q2 Q3

◼ The first quartile, Q1, is the value for which 25% of the
observations are smaller and 75% are larger
◼ The second quartile, Q2, is the same as the median (50%
are smaller, 50% are larger)
◼ Only 25% of the observations are greater than the third
quartile, Q3

Foundation Statistics – Measures of Central Tendency and Variation 12

Quartile Formulas
 Find a quartile by determining the value in the appropriate
position in the ranked data, where
 Q1, First quartile position: Q1 = (n+1)/4
 Q2, Second quartile position: Q2 = (n+1)/2
(the median position)
 Q3, Third quartile position: Q3 = 3(n+1)/4

where n is the number of observed values

Foundation Statistics – Measures of Central Tendency and Variation 13

 Example: Find the first quartile
Sample Data in Ordered Array: 11 12 13 16 16 17 18 21 22

 As n=9,
Q1 is in the (9+1)/4 = 2.5 position of the ranked data, so
we use the value halfway between the 2nd and the 3rd
values, which yields Q1 = 12.5
 Q1 and Q3 are measures of non-central location while Q2,
i.e. median, is a measure of central tendency

Position Value

Foundation Statistics – Measures of Central Tendency and Variation 14

Quartiles (cont’d)
 Example:
Sample Data in Ordered Array: 11 12 13 16 16 17 18 21 22

 Q1 is in the (9+1)/4 = 2.5 position of the ranked data,

so Q1 = 12.5
 Q2 is in the (9+1)/2 = 5th position of the ranked data,
so Q2 = median = 16
 Q3 is in the 3(9+1)/4 = 7.5 position of the ranked data,
so Q3 = 19.5

Foundation Statistics – Measures of Central Tendency and Variation 15

Exercise for Quartiles
 Data: 11 12 13 16 16 17 18 21 22 25 n=10

 Find Quartiles:
 Q1 is in the (10+1)/4 = 2.75 or rounded as the 3rd
ranked data, so Q1 = 13
 Q2 is in the (10+1)/2 = 5.5th ranked data, so
Q2 = median = 16.5
 Q3 is in the 3(10+1)/4 = 8.25 or rounded as the 8th
ranked data, so Q3 = 21

Foundation Statistics – Measures of Central Tendency and Variation 16

Measures of Variation (Dispersion)


Range Interquartile *Variance *Standard *Coefficient

Range Deviation of Variation

Small variation
◼ Measures of variation give
information on the spread or
Large variation
variability of the data values

Same center,
different variation
Foundation Statistics – Measures of Central Tendency and Variation 17
 Simplest measure of variation
 Difference between the largest and the smallest values in
a set of data:
Range = Xlargest – Xsmallest

 Example:

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14

Range: 14 – 1 = 13

Foundation Statistics – Measures of Central Tendency and Variation 18

Disadvantages of Range
 Ignores the way in which data are distributed

7 8 9 10 11 12 7 8 9 10 11 12
Range = 12 - 7 = 5 Range = 12 - 7 = 5

 Sensitive to outliers

Range = 5 - 1 = 4


Range = 120 - 1 = 119 Misleading statistics

Foundation Statistics – Measures of Central Tendency and Variation 19

Interquartile Range
 Can eliminate some outlier problems by using the
interquartile range

 Eliminate some high- and low-valued observations and

calculate the range from the remaining values

 Interquartile range = 3rd quartile – 1st quartile

= Q3 – Q 1

Foundation Statistics – Measures of Central Tendency and Variation 20

Interquartile Range
 Example

Median X
X Q1 Q3
(Q2) maximum
25% 25% 25% 25%

12 30 45 57 70

Interquartile range
= 57 – 30 = 27

Foundation Statistics – Measures of Central Tendency and Variation 21

Boxplot: Exploratory Data Analysis
 Boxplot: A Graphical display of data using 5-number
Minimum -- Q1 -- Median -- Q3 -- Maximum
25% 25%

Minimum 1st Median 3rd Maximum

Minimum Quartile
1st Median Quartile
3rd Maximum
Quartile Quartile

Foundation Statistics – Measures of Central Tendency and Variation 22

Boxplot (cont’d)
 Boxplot can be drawn either horizontally or vertically
 Outliers can be detected and shown

Foundation Statistics – Measures of Central Tendency and Variation 23

Shape of Distribution
 Describes how data are distributed
 Measures of shape
– Symmetric or skewed

Left-Skewed Symmetric Right-Skewed

Mean < Median Mean = Median Median < Mean

 Skewness: whether the data is concentrated on

one side.

Foundation Statistics – Measures of Central Tendency and Variation 24

Distribution Shape and Boxplot
Left-Skewed Symmetric Right-Skewed

Q1 Q 2 Q3 Q1 Q2 Q 3 Q1 Q2 Q3

– Negative skew: The left tail is longer; the mass of the distribution is
concentrated on the right of the figure. It has relatively few low
values. The distribution is said to be left-skewed.
– Positive skew: The right tail is longer; the mass of the distribution is
concentrated on the left of the figure. It has relatively few high
values. The distribution is said to be right-skewed.

Foundation Statistics – Measures of Central Tendency and Variation 25

Example for Boxplot
 Below is a boxplot for the following data:

Min Q1 Q2 Q3 Max
0 2 2 2 3 3 4 5 5 10 27

00 22 33 55 27
 The data are right-skewed, as the plot depicts

Foundation Statistics – Measures of Central Tendency and Variation 26

In-class Exercise: Boxplot
 Data set:
2, 3, 4, 4, 6, 7, 7, 9, 10, 11, 13, and 20
Draw a boxplot and describe the shape for the above data


Foundation Statistics – Measures of Central Tendency and Variation 27

 Measure the data dispersion


 Variance measures the dispersion of a set of data points

around their mean

Foundation Statistics – Measures of Central Tendency and Variation 28

 Average of squared differences of values from the mean
 Sample variance:  Population variance:

 (X )
n N

 (X i − )
2 2
i −X
s =
2 i =1
2 = i =1
n −1 N
where n is the sample size where N is the population size
Xi is the ith observation Xi is the ith observation
X is the sample mean μ is the population mean

Foundation Statistics – Measures of Central Tendency and Variation 29

 Population variance:
• Dispersion is non-negative
• Non-negative values don’t
 (X i − )
cancel out
2 = i=1
• Amplifies the effect of large
higher result


lower result

Foundation Statistics – Measures of Central Tendency and Variation 30

Example - Variance
 Population of 5 observations:
 1, 2, 3, 4, 5
 Task: Calculate the population variance

 (X i − )
N=5 2 = i =1
Mean 𝑋ሜ = = 3.00

1−3 2 + 2−3 2 + 3−3 2 + 4−3 2 + 5−3 2

𝜎2 =

= 2.00
Foundation Statistics – Measures of Central Tendency and Variation 31
Example - Variance (cont’d)
(X i )

 What if they are sample (1,2,3,4,5), n = 5
s2 = i
 Sample variance S2 = 2.50 n −1

 Why is the sample variance different than the population

 Because: the sample has uncertainty.

 Imaginary population: 1, 1, 1, 2, 3, 4, 5, 5, 5, 5
 𝜎 2 = 2.96

 Our sample variance has rightfully corrected upwards in order to

reflect the higher potential variability.

Foundation Statistics – Measures of Central Tendency and Variation 32

Standard Deviation (SD)
 Average of squared differences of values from the mean
 Most commonly used measure of variation
 Shows variation about the mean
 It’s the square root of the variance
 Has the same units as the original data
 Sample standard  Population standard
deviation: deviation:
 (X i − X) (X i − )

2 2

s= i =1
 = i=1
n −1 N
where n is the sample size where N is the population size
Xi is the ith observation Xi is the ith observation
X is the sample mean μ is the population mean

Foundation Statistics – Measures of Central Tendency and Variation 33

Example for sample S.D.
Data (Xi) : 10 12 14 15 17 18 18 24

n=8 Sample mean = X = 16

S =
(10 − X ) + (12 − X ) + (14 − X )
2 2 2
+  + 24 − X )

n −1

(10 − 16 )2 + (12 − 16 )2 + (14 − 16 )2 +  + (24 − 16 )2
8 −1

130 A measure of the “average”

= = 4.3095 scatter around the mean

Foundation Statistics – Measures of Central Tendency and Variation 34

Example for Variance vs S.D.
 Pizza prices at 10 different places in New York and HK:
$ 1.00
HK$ 75.00
$ 2.00
HK$ 80.00
$ 3.00
HK$ 90.00
$ 3.00 USD HKD
HK$ 90.00
$ 5.00 $ 5.50 Mean HK$ 102.00
HK$ 95.00
$ 6.00 $2 10.72 Sample variance HK$2 523.33 HK$ 100.00
$ 7.00 $ 3.27 Sample standard deviation HK$ 22.88 HK$ 100.00
$ 8.00
HK$ 110.00
$ 9.00
HK$ 130.00
$ 11.00
HK$ 150.00

Image Credit: CC BY-NC-ND

Foundation Statistics – Measures of Central Tendency and Variation 35

Coefficient of Variation
 Relative measure of dispersion: comparing two or more
data sets
 Expressed as a % rather than the units
 Useful for comparing data sets which are expressed in
different units of measurement
 Also useful for data sets with same unit of measurement,
but vary greatly by their means and/or SD

S 
CV =    100%
X 

Foundation Statistics – Measures of Central Tendency and Variation 36

Example: Compare prices of pizza
 Continue from previous example:
New York Hongkong
Mean US$ 5.5 HK$ 102
Standard Deviation US$ 3.27 HK$ 22.88
S 
CV =    100%
X 

CV for New York: 3.27/5.5*100% = 60%

CV for Hongkong: 22.88/102*100% = 22%
Interpret this result:
Despite HK has the larger standard deviation, it gives lower
coefficient of variation
-> the prices of pizza in HK are relatively less volatile.
Foundation Statistics – Measures of Central Tendency and Variation 37
Observations and Implications
 Observations of data dispersion:
– The more spread out, or dispersed, the data are, the larger will be
the range, the inter-quartile range, the variance, and the standard
– The more concentrated, or homogeneous, the data are, the
smaller will be the range, the inter-quartile range, the variance,
and the standard deviation
– If the observations are all the same, the range, the inter-quartile
range, the variance, and the standard deviation will all be zero
– None of the measures of variation can ever be negative

 Implications:
– Helps to know how a set of data clusters around its mean
– In any data set, the observed values lie within a certain standard
deviations above or below the mean. (Chebyshev's Rule)

Foundation Statistics – Measures of Central Tendency and Variation 38

The Empirical Rule
 If the data distribution is approximately bell-shaped, then
the interval:
   1 contains about 68% of the values in the
population or the sample

Consider lifetime of certain
brand of battery
µ = 100hr
68% σ = 2hr

 Therefore, about 68% of

battery lies between 98 to 102
  1 hours

Foundation Statistics – Measures of Central Tendency and Variation 39

The Empirical Rule (cont’d)
   2 contains about 95% of the values in the
population or the sample
   3 contains about 99.7% of the values in the
population or the sample

95% 99.7%

  2   3

Foundation Statistics – Measures of Central Tendency and Variation 40

 Did you know that the average IQ Score is 100?
Consider IQ Score
µ = 100
σ = 15
  1

Therefore, about 68%

of people IQ Score
  2 between 85 – 115.

About 95% of people

IQ Score between 70 –

Foundation Statistics – Measures of Central Tendency and Variation 41

Chebyshev’s Rule
 Regardless of how the data are distributed, at least
(1 - 1/k2) x 100% of the values will fall within k standard
deviations of the mean (for k > 1)
 Example:
At least within
(1 - 1/12) x 100% = 0% ……..... k=1 (μ ± 1σ)
(1 - 1/22) x 100% = 75% …........ k=2 (μ ± 2σ)
(1 - 1/32) x 100% = 89% …........ k=3 (μ ± 3σ)

Foundation Statistics – Measures of Central Tendency and Variation 42

Example for Chebshev’s Rule
 Example: Consider lifetime of certain brand of battery
with µ = 100hr and σ = 2hr
 Using Chebyshev’s theorem, between what values would
you expect at least 80% of batteries lie?
 Answer:
1 between μ ± kσ
1 − 2 = 0.8
𝑘 between 100 ± 2.2361(2)

𝑘 = 2.2361 i.e. between 95.5278 hr and 104.4722 hr

Foundation Statistics – Measures of Central Tendency and Variation 43

You might also like