Professional Documents
Culture Documents
Measures of Central Tendencies
Measures of Central Tendencies
Measures of Central Tendencies
Program:PGPMex
Course Name: Data Analysis for Decision Making
Unit Name: Measures of Central Tendencies
Powered by Great Learning. Proprietary content. ©Great Learning. All Rights Reserved.
Unauthorized use or distribution prohibited.
Contents
Overview .................................................................................................................................................. 3
Objectives ................................................................................................................................................ 3
Introduction .............................................................................................................................................. 3
Measures of Central Tendency ................................................................................................................. 4
Mean ......................................................................................................................................................... 4
Geometric Mean and Harmonic Mean ..................................................................................................... 5
Median ...................................................................................................................................................... 6
Mode ......................................................................................................................................................... 8
Measures of central tendency for grouped data ...................................................................................... 9
Choosing the ideal measure of central tendency ................................................................................... 10
Conclusion.............................................................................................................................................. 11
Glossary.................................................................................................................................................. 11
Powered by Great Learning. Proprietary content. ©Great Learning. All Rights Reserved.
Unauthorized use or distribution prohibited.
Overview
Statistics and data-driven decision making are a crucial pillar of modern managerial practice.
The first step in the statistical analysis is the preparation of data summaries – both descriptive
and visual. In this chapter, we introduce the reader to the sources and types of data. We also
introduce several key visualization techniques that can be used to explore the patterns
between the variables captured in data.
Objectives
In this Unit you will learn –
Learning Outcomes
At the end of this Unit, you would -
Be able to compute measures of central tendency - mean, median and mode, geometric
mean and harmonic mean
Be able to understand when each of these metrics should be used
Unit Pre-requisites
This unit requires a prior knowledge of Data Visualization.
Before studying this Unit, the student should have completed Data Visualization unit
Table of Topics
Introduction
In data visualization unit, we looked at methods to visually summarize the contours of a data
set. While these methods provide a rich description of the variables in our data, often we need
a numerical summary of the variables of interest. An important place where such numerical
Powered by Great Learning. Proprietary content. ©Great Learning. All Rights Reserved.
Unauthorized use or distribution prohibited.
summaries get used is in hypothesis testing, a core part of data-driven decision making. In this
unit, we attempt the unimaginable and take a deep dive into a few key methods of
summarizing large amount of data using a single number.
Measures of central tendency serve two main purposes. First, they provide a great context by
outlining where most of the data lies. Second, they serve as a single-point numerical summary
of the data relieving the analyst from the burden of presenting the entirety of the data. Popular
measures of central tendency include the mean (Arithmetic, Gometric and Harmonic), median
and the mode. Let us now look at each of these in greater detail.
Mean
The mean (also referred to as arithmetic mean) of a list of numbers is computed by adding up a
set of numbers and dividing by the number of these numbers. For example, consider the list of
numbers – 10, 11, 11, 11, 10, 11, 9, 10, 9. The arithmetic mean or average of this list is =
10+11+11+11+10+11+9+10+9 = 10.22. There are several situations where the mean can be an
9
important device of investigation. For example, consider Table 1 where a sample of customer
satisfaction scores for a new mobile phone are presented for 2019 and 2020.
Table 1. Customer satisfaction scores (out of 100) for a mobile phone in 2019 and 2020 for a
sample of 15 customers
2019 2020
96 100
95 97
92 100
93 99
92 99
96 100
Powered by Great Learning. Proprietary content. ©Great Learning. All Rights Reserved.
Unauthorized use or distribution prohibited.
99 97
94 100
100 98
94 102
98 99
94 100
92 98
97 99
95 99
One good way to understand if the customer satisfaction increased or decreased from 2019
to 2020 would be to compute the average score in 2019 and 2020. From Table 1, we note that
the average customer satisfaction score for 2019 was 95 and that for 2020 was 99. This
indicates that on an average customer are more satisfied with the phone in 2020 compared to
2019.
While the arithmetic mean is the most used average in analysis, there are two other methods of
computing the average of a list of numbers – geometric mean and harmonic mean
To compute a harmonic mean of a list of numbers, we compute the reciprocal of the arithmetic
mean of the reciprocals of the numbers in the list. Digest that for a while! For instance, to
compute the harmonic mean of the list of numbers - 10, 11, 11, 11, 10, 11, 9, 10, 9, we would
perform a computation like so:
Powered by Great Learning. Proprietary content. ©Great Learning. All Rights Reserved.
Unauthorized use or distribution prohibited.
−1
10−1+11−1+11−1+11−1+10−1+11−1+9−1+10−1+9−1
( ) = 10.16.
9
Why on earth would we then do these comparatively costlier computations over the arithmetic
mean? It turns out that the geometric and harmonic means are better suited to describe the
average of certain types of data. For example, geometric mean is a natural way to describe
growth rates of a stock (or any financial instrument). This is because, the average growth over a
period of time is the product of the growth rates at each time period in the interim. One
example where the harmonic mean could be useful is in judging the performance of equity
portfolios using financial ratios (such as the PE ratio - price/earnings or EPS ratio -
earnings/total shares outstanding). Here, while the arithmetic mean over-estimates the
average financial ratio of a portfolio of securities, the harmonic mean is a natural estimate in
this situation.
Median
The median of a list of values is the middle number in the sorted list of these values. Going back
to our earlier list of numbers – 10, 11, 11, 11, 10, 11, 9, 10, 9 – to compute the median, we first
need to sort these numbers in ascending order. The sorted list is presented below:
9 9 10 10 10 11 11 11 11
The median of this list is then the middle number is 10 (highlighted in bold), that is, the number
that has equal number of entries on either side. Note that by definition, half or more numbers
in the list are smaller than the median and half or more are bigger. This is a neat property of the
median that allows it to be less influenced by extraordinarily large or small numbers (called
outliers). This important property of the median also makes it an ideal choice over the mean
when outliers are present in the data. For example, consider the following sample of number of
subscribers of a set of YouTube channels as on October 2020. This data is presented in Table 2.
Powered by Great Learning. Proprietary content. ©Great Learning. All Rights Reserved.
Unauthorized use or distribution prohibited.
Table 2. Number of subscribers for a sample of YouTube channels in October 2020
Number of
subscribers
in October
YouTube channel 2020
TechKaboom 578000
zollotech 778000
AlexiBexi 1430000
Canaltech 2550000
TECH BOZZ 63
Technix! 543
TheMissingByte 73
Yash Dua 0
TechCareCenter 688
Honest Straightforward
Reviews 1270
UNIQLY TUT 0
Powered by Great Learning. Proprietary content. ©Great Learning. All Rights Reserved.
Unauthorized use or distribution prohibited.
The mean number of subscribers in this list is 5,33,987.778 while the median is 979. Note that
since we are adding up all the numbers in the list the mean gets influenced by the relatively
high values in the list (e.g., AlexiBexi, Jared Polin) skewing the estimate higher. Since the
median is concerned only with the ranking of the numbers and not the values, it provides a
more reasonable estimate of the average.
Mode
The mode is the most frequently occurring value in a list of values. It is most commonly used for
categorical data to estimate the most common class in the data. For example, consider the
following extract of the subscription status of a set of 15 customers of an OTT player.
Customer Subscription
id status
1 Basic
2 Basic
3 Standard
4 Basic
5 Standard
6 Basic
7 Premium
8 Premium
9 Basic
10 Standard
11 Premium
12 Standard
13 Premium
Powered by Great Learning. Proprietary content. ©Great Learning. All Rights Reserved.
Unauthorized use or distribution prohibited.
14 Premium
15 Standard
The mode of the subscription status of the sample is “Basic” indicating that the OTT player has
most of its subscribers in the Basic plan.
Consider that we have 𝑁 observations divided into a certain number of classes, where 𝑓𝑖 is the
frequency of class 𝑖. Given this grouped data, the mean can be computed like so:
𝛴𝑓𝑖𝑚𝑖 𝛴𝑓𝑖𝑚𝑖
𝑥= =
𝛴𝑓𝑖 𝑁
Here, 𝑚𝑖 is the mid-point of each class 𝑖 (i.e., (lower limit + upper limit)/2).
For example, consider the following data presented in Table 4 on the income distribution of
households in India
Table 4. Distribution of household income in India in 2002 (source: Hammond, World Resources
Institute, http://pdf.wri.org/hammond_india_profile_xls.pdf)
0 - 1000 9593
Powered by Great Learning. Proprietary content. ©Great Learning. All Rights Reserved.
Unauthorized use or distribution prohibited.
The mean household income for those who earn less than $6000 can be computed using the
formula listed above as:
The median for grouped data can be computed using the formula:
𝑁
− 𝑐𝑓
𝑀𝑒𝑑𝑖𝑎𝑛 = 𝑙 + 2 𝑐
𝑓
Where 𝑙 is the lower limit of the median class, 𝑐 is the width of the median class, 𝑓 is the
frequency of the median class and 𝑐𝑓 is the cumulative frequency of the previous class. Here,
𝑁
the median class is the class that contains the 2 th data point.
𝑁 170759 th
85380 − 9593
𝑀𝑒𝑑𝑖𝑎𝑛 = 1000 + × 1000 = 1790.01
95932
Powered by Great Learning. Proprietary content. ©Great Learning. All Rights Reserved.
Unauthorized use or distribution prohibited.
Conclusion
In this chapter, we have introduced several methods to summarize the central location of a list
of numbers. We also highlight the typical situations where these methods get used and their
merits and shortcomings. A properly constructed measure of central tendency can give the
analyst a quick sense of the values of the variable of interest and should be a crucial part of any
descriptive analysis.
Glossary
Arithmetic mean of a list of numbers is the sum of the numbers divided by the number of
𝑎1+𝑎2+⋯+𝑎𝑛
numbers in the list. For a list of 𝑛 numbers, 𝑎1 , 𝑎2 , … , 𝑎𝑛 , the arithmetic mean =
𝑛
Geometric mean of a list of 𝑛 numbers is the product of numbers in the list taken to the 𝑛th
1
root. For a list of 𝑛 numbers, 𝑎1, 𝑎2, … , 𝑎𝑛, the geometric mean = (𝑎1𝑎2 … 𝑎𝑛)𝑛
Harmonic mean of a list of numbers is the reciprocal of the arithmetic mean of the reciprocals
of the numbers in the list. For a list of 𝑛 numbers, 𝑎1, 𝑎2, … , 𝑎𝑛, the harmonic mean =
−1
𝑎−1+𝑎−1+⋯+𝑎−1
1 2 𝑛
( )
𝑛
Median of a list of numbers is the value that is in the middle of the list when sorted
Formulas used
For a list of 𝑛 numbers - 𝑎1, 𝑎2, … , 𝑎𝑛:
1
𝑎1+𝑎2+⋯+𝑎𝑛
The arithmetic mean = , the geometric mean = (𝑎1𝑎2 … 𝑎𝑛)𝑛 and the harmonic
𝑛
𝑎−1+𝑎−1+⋯+𝑎−1 −1
mean = ( 1 2 𝑛 )
𝑛
Powered by Great Learning. Proprietary content. ©Great Learning. All Rights Reserved.
Unauthorized use or distribution prohibited.