Download as pdf or txt
Download as pdf or txt
You are on page 1of 30

Where are we?

Measure of central tendency


Competency to be gained
from this lecture
Calculate a measure of central tendency
that is adapted to the sample studied
Key issues

• Measures of central tendency


 Mode
 Median
 Mean
 Geometric mean
• Appropriate applications
Summary statistics

• A single value that summarizes the observed value


of a variable
 Part of the data reduction process
• Two types:
 Measures of location/central tendency/average
 Measures of dispersion/variability/spread
• Describe the shape of the distribution of a set of
observations
• Necessary for precise and efficient comparisons of
different sets of data
 The location (average) and shape (variability) of different
distributions may be different
Different variability, same location

Population A

No. of
People

Population B

Different Variability Factor X


Same Location
Different location, same variability

No. of
People

Population A Population B

Same Variability Factor Y


Different Locations
Quick definitions of
measures of central tendency
• Mode
 The most frequently occuring observation
• Median
 The mid-point of a set of ordered observations
• Arithmetic mean
 The product of the division of the arithmetic sum
of observations by the number of observations
The mode

• Definition
 The mode of a distribution is the value that is
observed most frequently in a given set of data
• How to obtain it?
 Arrange the data in sequence from low to high
 Count the number of times each value occurs
 The most frequently occurring value is the mode

Mode
The mode
Mode

20
18
16
14
12
10
8
6
4
2
0
Mode
Examples of mode (1/2):
Annual salary (in 100,000 rupees)
• 4, 3, 3, 2, 3, 8, 4, 3, 7, 2
• Arranging the values in order:
 2, 2, 3, 3, 3, 3, 4, 4, 7, 8 7, 8
 The mode is three times “3”

Mode
Examples of mode (2/2):
Incubation period for hepatitis affected
persons (in days)
• 29, 31, 24, 29, 30, 25
• Arranging the values in order:
 24, 25, 29, 29, 30, 31
 Mode is 29

Mode
The mode is the only location statistics
to be used when some characteristic
itself cannot be measured
Colour preference of people for their cars

Colour preference Number of people

Green 354

Blue Mode 852

Gray 310

Red 474
Mode
Specific features of the mode

• There may be no mode


 When each value is unique
• There may be more than one mode
 When more than 1 peak occurs
 Bimodal distribution
• The mode can be misinterpreted
 Is a distribution skewed or bimodal ?
• The mode is not amenable to statistical tests
• The mode is not based upon all observations
Mode
The median

• The median describes literally the middle


value of the data
• It is defined as the value above or below
which half (50%) the observations fall

Median
Computing the median

• Arrange the observations in order from


smallest to largest (ascending order) or vice-
versa
• Count the number of observations “n”
 If “n” is an odd number
• Median = value of the (n+1) / 2th observation
 If “n” is an even number
• Median = the average of the n / 2th and (n /2)+1th
observations

Median
Computing the Median, Example
Example of median calculation

• What is the median of the following values:


 10, 20, 12, 3, 18, 16, 14, 25, 2
 Arrange the numbers in increasing order
• 2 , 3, 10, 12, 14, 16, 18, 20, 25
• Median = 14
• Suppose there is one more observation (8)
 2 , 3, 8, 10, 12, 14, 16, 18, 20, 25
 Median = Mean of 12 & 14 = 13

Median
Advantages and disadvantages
of the median
• Advantages
 The median is unaffected by extreme values
• Disadvantages
 The median does not contain information on the
other values of the distribution
• Only selected by its rank
• You can change 50% of the values without affecting the
median
 The median is less amenable to statistical tests

Median
Median
The median is not sensitive to
extreme values
14
12
10
8
6
4
2
0

14
Class of the variable Same median
12
10
8
6
4
2
0
Median
Class of the variable
Mean (Arithmetic mean / Average)

• Most commonly used measure of location


• Definition
 Calculated by adding all observed values and
dividing by the total number of observations
• Notations
 Each observation is denoted as x1, x2, … xn
 The total number of observations: n
 Summation process = Sigma : Σ
 The mean: X
X = Σ xi /n
Mean
Computation of the mean

• Duration of stay in days in a hospital


 8,25,7,5,8,3,10,12,9
• 9 observations (n=9)
• Sum of all observations = 87
• Mean duration of stay = 87 / 9 = 9.67
• Incubation period in days of a disease
 8,45,7,5,8,3,10,12,9
• 9 observations (n=9)
• Sum of all observations =107
• Mean incubation period = 107 / 9 = 11.89
Mean
Advantages and disadvantages
of the mean
• Advantages
 Has a lot of good theoretical properties
 Used as the basis of many statistical tests
 Good summary statistic for a symmetrical
distribution
• Disadvantages
 Less useful for an asymmetric distribution
• Can be distorted by outliers, therefore giving a less
“typical” value

Mean
Mean of several groups combined

Group Size Mean Sum


(i) ( n i) ( x i) (ni xi )

1 10 41 410

2 15 36 540

3 25 42 1050
Total 50 -- 2000

Mean of all groups = 2000 / 50 = 40 Crude average = 39.7


The geometric mean

• Background
 Some distribution appear symmetric after log
transformation
(e.g., Neutrophil counts)
 A log transformation may help describing the
central tendency
• Definition
 The geometric mean is the antilog of the mean of
the log values

Geometric mean
Calculating a geometric mean

• Observe the set of observations


 5,10,20,25,40
• Take the logarithm of these values
 0.70, 1.00, 1.30, 1.40 & 1.60.
• Calculate the mean of the log values
 0.70 + 1.00 + 1.30 + 1.40 + 1.60 = 6.00
 6.00/ 5 = 1.20
• Take the antilog of the mean of the log values
 Antilog (1.20) = 15.85

Geometric mean
Geometric mean of several groups
combined
Number of
Group patients Geometric
(i) (ni) mean (GM) log GM ni * log GM

A 20 8.5 0.93 18.60

B 18 10.2 1.01 18.18

C 12 9.4 0.97 11.64

Total 50 -- -- 48.42

Geometric mean
Overall GM = antilog of ( 48.42 / 50) = antilog ( 0.9684 ) = 9.3
Median = 10 Mode = 13.5
14

12

10

0
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19

Choosing
Mean = 10.8
What measure of location to use?

• Consider the duration (days) of absence from


work of 21 labourers owing to sickness
 1, 1, 2, 2, 3, 3, 4, 4, 4, 4, 5, 6, 6, 6, 7, 8, 9, 10,
10, 59, 80
• Mean = 11 days
 Not typical of the series as 19 of the 21 labourers
were absent for less than 11 days
 Distorted by extreme values
• Median = 5 days
 Better measure
Choosing
Choice of measure of central tendency
for symmetric distributions
• Any one of the central/location measures
can be used
• The mean has definite advantages if
subsequent computations are needed

Choosing
Choice of measure of central tendency
for asymmetric distributions
• For skewed distributions, the mean is not
suitable
 Positive skewed: Mean gives a higher value
 Negatively skewed: Mean gives a lower value
• If some observations deviate much more
than others in the series, then median is the
appropriate measure
• If the log-transformed distribution is
symmetric, the geometric mean may be used

Choosing
Key messages

• The mode is the most common value


• The median is adapted when there are
extreme values
• The mean is adapted for symmetric
distribution
• The geometric mean may be useful when log
transformed data are symmetric
• The type of the distribution determines the
measure of central tendency to use

You might also like