Measures of Central Tendencies

Self-Learning Material
Program:PGPMex
Course Name: Data Analysis for Decision Making
Unit Name: Measures of Central Tendencies
Powered by Great Learning. Proprietary content. ©Great Learning. All Rights Reserved.
Unauthorized use or distribution prohibited.
Contents
Overview .................................................................................................................................................. 3
Objectives ................................................................................................................................................ 3
Learning Outcomes .................................................................................................................................. 3
Unit Pre-requisites ................................................................................................................................... 3
Pre-Unit Preparatory Material ................................................................................................................. 3
Table of Topics ......................................................................................................................................... 3
Introduction .............................................................................................................................................. 3
Measures of Central Tendency ................................................................................................................. 4
Mean ......................................................................................................................................................... 4
Geometric Mean and Harmonic Mean ..................................................................................................... 5
Median ...................................................................................................................................................... 6
Mode ......................................................................................................................................................... 8
Measures of central tendency for grouped data ...................................................................................... 9
Choosing the ideal measure of central tendency ................................................................................... 10
Conclusion.............................................................................................................................................. 11
Glossary.................................................................................................................................................. 11
Formulas used ........................................................................................................................................ 11
Overview
Statistics and data-driven decision making are a crucial pillar of modern managerial practice.
The first step in the statistical analysis is the preparation of data summaries – both descriptive
and visual. In this chapter, we introduce the reader to the sources and types of data. We also
introduce several key visualization techniques that can be used to explore the patterns
between the variables captured in data.
Objectives
In this Unit you will learn –
 The computation and interpretation of measures of central tendency

 How to summarize different types of data using appropriate metrics
Learning Outcomes
At the end of this Unit, you would -
 Be able to compute measures of central tendency - mean, median and mode, geometric
mean and harmonic mean
 Be able to understand when each of these metrics should be used
Unit Pre-requisites
 This unit requires a prior knowledge of Data Visualization.
 Before studying this Unit, the student should have completed Data Visualization unit
Pre-Unit Preparatory Material

None
Table of Topics
Introduction
In data visualization unit, we looked at methods to visually summarize the contours of a data
set. While these methods provide a rich description of the variables in our data, often we need
a numerical summary of the variables of interest. An important place where such numerical
summaries get used is in hypothesis testing, a core part of data-driven decision making. In this
unit, we attempt the unimaginable and take a deep dive into a few key methods of
summarizing large amount of data using a single number.
Measures of Central Tendency

An important task in summarizing the nature of data is to understand where the center of their
distribution lies. Since the center of the distribution indicates where most of th data lies,
methods to estimate this center are usually referred to as Measures of Central Tendency.
Measures of central tendency serve two main purposes. First, they provide a great context by
outlining where most of the data lies. Second, they serve as a single-point numerical summary
of the data relieving the analyst from the burden of presenting the entirety of the data. Popular
measures of central tendency include the mean (Arithmetic, Gometric and Harmonic), median
and the mode. Let us now look at each of these in greater detail.
Mean
The mean (also referred to as arithmetic mean) of a list of numbers is computed by adding up a
set of numbers and dividing by the number of these numbers. For example, consider the list of
numbers – 10, 11, 11, 11, 10, 11, 9, 10, 9. The arithmetic mean or average of this list is =
10+11+11+11+10+11+9+10+9 = 10.22. There are several situations where the mean can be an
9
important device of investigation. For example, consider Table 1 where a sample of customer
satisfaction scores for a new mobile phone are presented for 2019 and 2020.
Table 1. Customer satisfaction scores (out of 100) for a mobile phone in 2019 and 2020 for a
sample of 15 customers
2019 2020
96 100
95 97
92 100
93 99
92 99
96 100
99 97
94 100
100 98
94 102
98 99
94 100
92 98
97 99
95 99
One good way to understand if the customer satisfaction increased or decreased from 2019
to 2020 would be to compute the average score in 2019 and 2020. From Table 1, we note that
the average customer satisfaction score for 2019 was 95 and that for 2020 was 99. This
indicates that on an average customer are more satisfied with the phone in 2020 compared to
2019.
While the arithmetic mean is the most used average in analysis, there are two other methods of
computing the average of a list of numbers – geometric mean and harmonic mean
Geometric Mean and Harmonic Mean

To compute a geometric mean of a list of 𝑛 numbers, we multiply all the numbers in the list and
raise them to the 𝑛th root. For example, we can compute the geometric mean of the customer
satisfaction scores in 2019 and 2020 for the data presented in Table 1 by taking the 15 th root of
the product of the numbers in the respective columns. Applying this method, the geometric
mean of the scores in 2019 is 95.19 and that for 2020 is 99.08. Looking at these scores, we
would still come to the same inference as we did by computing the arithmetic mean, that is,
customers seem more satisfied in 2020.
To compute a harmonic mean of a list of numbers, we compute the reciprocal of the arithmetic
mean of the reciprocals of the numbers in the list. Digest that for a while! For instance, to
compute the harmonic mean of the list of numbers - 10, 11, 11, 11, 10, 11, 9, 10, 9, we would
perform a computation like so:
−1
10−1+11−1+11−1+11−1+10−1+11−1+9−1+10−1+9−1
( ) = 10.16.
9
Why on earth would we then do these comparatively costlier computations over the arithmetic
mean? It turns out that the geometric and harmonic means are better suited to describe the
average of certain types of data. For example, geometric mean is a natural way to describe
growth rates of a stock (or any financial instrument). This is because, the average growth over a
period of time is the product of the growth rates at each time period in the interim. One
example where the harmonic mean could be useful is in judging the performance of equity
portfolios using financial ratios (such as the PE ratio - price/earnings or EPS ratio -
earnings/total shares outstanding). Here, while the arithmetic mean over-estimates the
average financial ratio of a portfolio of securities, the harmonic mean is a natural estimate in
this situation.
Median
The median of a list of values is the middle number in the sorted list of these values. Going back
to our earlier list of numbers – 10, 11, 11, 11, 10, 11, 9, 10, 9 – to compute the median, we first
need to sort these numbers in ascending order. The sorted list is presented below:
9 9 10 10 10 11 11 11 11
The median of this list is then the middle number is 10 (highlighted in bold), that is, the number
that has equal number of entries on either side. Note that by definition, half or more numbers
in the list are smaller than the median and half or more are bigger. This is a neat property of the
median that allows it to be less influenced by extraordinarily large or small numbers (called
outliers). This important property of the median also makes it an ideal choice over the mean
when outliers are present in the data. For example, consider the following sample of number of
subscribers of a set of YouTube channels as on October 2020. This data is presented in Table 2.
Table 2. Number of subscribers for a sample of YouTube channels in October 2020
Number of
subscribers
in October
YouTube channel 2020
TechKaboom 578000
Brandon Butch 591000
zollotech 778000
AlexiBexi 1430000
Canaltech 2550000
Isa Marcial 2420000
Jared Polin 1260000
TECH BOZZ 63
Bong Tech Talks 0
Technix! 543
TheMissingByte 73
Manish Technician 1460
Yash Dua 0
snehrish comedy S.C 243
TechCareCenter 688
Honest Straightforward
Reviews 1270
Jass Tech 440
UNIQLY TUT 0
The mean number of subscribers in this list is 5,33,987.778 while the median is 979. Note that
since we are adding up all the numbers in the list the mean gets influenced by the relatively
high values in the list (e.g., AlexiBexi, Jared Polin) skewing the estimate higher. Since the
median is concerned only with the ranking of the numbers and not the values, it provides a
more reasonable estimate of the average.
Mode
The mode is the most frequently occurring value in a list of values. It is most commonly used for
categorical data to estimate the most common class in the data. For example, consider the
following extract of the subscription status of a set of 15 customers of an OTT player.
Table 3. Subscription status of a sample of 15 customers of an OTT player
Customer Subscription
id status
1 Basic
2 Basic
3 Standard
4 Basic
5 Standard
6 Basic
7 Premium
8 Premium
9 Basic
10 Standard
11 Premium
12 Standard
13 Premium
14 Premium
15 Standard
The mode of the subscription status of the sample is “Basic” indicating that the OTT player has
most of its subscribers in the Basic plan.
Measures of central tendency for grouped data

In case data is available only in a grouped format where the data classes and frequency of
observations belonging to each class are known, we can then compute the measures of central
tendency for such data using the following formulae.
Consider that we have 𝑁 observations divided into a certain number of classes, where 𝑓𝑖 is the
frequency of class 𝑖. Given this grouped data, the mean can be computed like so:
𝛴𝑓𝑖𝑚𝑖 𝛴𝑓𝑖𝑚𝑖
𝑥= =
𝛴𝑓𝑖 𝑁
Here, 𝑚𝑖 is the mid-point of each class 𝑖 (i.e., (lower limit + upper limit)/2).
For example, consider the following data presented in Table 4 on the income distribution of
households in India
Table 4. Distribution of household income in India in 2002 (source: Hammond, World Resources
Institute, http://pdf.wri.org/hammond_india_profile_xls.pdf)
Household income range (USD) Number of households ('000s)
0 - 1000 9593
1000 - 2000 86339
2000 - 3000 44129
3000 - 4000 12471
4000 - 5000 9593
5000 - 6000 8634
The mean household income for those who earn less than $6000 can be computed using the
formula listed above as:
9593 × 500 + 86339 × 1500 + ⋯ + 8634 × 5500 378931573

= = = 2219.10
9593 + 86339 + ⋯ + 8634 170759
The median for grouped data can be computed using the formula:
𝑁
− 𝑐𝑓
𝑀𝑒𝑑𝑖𝑎𝑛 = 𝑙 + 2 𝑐
𝑓
Where 𝑙 is the lower limit of the median class, 𝑐 is the width of the median class, 𝑓 is the
frequency of the median class and 𝑐𝑓 is the cumulative frequency of the previous class. Here,
𝑁
the median class is the class that contains the 2 th data point.
𝑁 170759 th
In the example presented in Table 4, the 2 th data point is = 2

= 85380 point that
belongs to the class 1000 – 2000. So, the median can be computed as:
85380 − 9593
𝑀𝑒𝑑𝑖𝑎𝑛 = 1000 + × 1000 = 1790.01
95932
Choosing the ideal measure of central tendency

Now that we understand the different methods of estimating the average (or central tendency)
of a list of numbers the question that remains is that of choosing among these different
methods. The answer, of course, lies in the nature of the data. It is usual practice to compute a
series of these statistics and then decide on the one best suited for the data at hand. For
example, the mode is ideal to summarize categorical data. For numeric data, we have two
options – mean or median. If we suspect that there are several outliers in the data, then mode
should be the go-to measure of central tendency. If we are blessed with evenly distributed
data, then by all means, choose the mean.
Conclusion
In this chapter, we have introduced several methods to summarize the central location of a list
of numbers. We also highlight the typical situations where these methods get used and their
merits and shortcomings. A properly constructed measure of central tendency can give the
analyst a quick sense of the values of the variable of interest and should be a crucial part of any
descriptive analysis.
Glossary
Arithmetic mean of a list of numbers is the sum of the numbers divided by the number of
𝑎1+𝑎2+⋯+𝑎𝑛
numbers in the list. For a list of 𝑛 numbers, 𝑎1 , 𝑎2 , … , 𝑎𝑛 , the arithmetic mean =
𝑛
Geometric mean of a list of 𝑛 numbers is the product of numbers in the list taken to the 𝑛th
1
root. For a list of 𝑛 numbers, 𝑎1, 𝑎2, … , 𝑎𝑛, the geometric mean = (𝑎1𝑎2 … 𝑎𝑛)𝑛
Harmonic mean of a list of numbers is the reciprocal of the arithmetic mean of the reciprocals
of the numbers in the list. For a list of 𝑛 numbers, 𝑎1, 𝑎2, … , 𝑎𝑛, the harmonic mean =
−1
𝑎−1+𝑎−1+⋯+𝑎−1
1 2 𝑛
( )
𝑛
Median of a list of numbers is the value that is in the middle of the list when sorted
Mode is the most frequently occurring value in a list of values
Formulas used
For a list of 𝑛 numbers - 𝑎1, 𝑎2, … , 𝑎𝑛:
1
𝑎1+𝑎2+⋯+𝑎𝑛
The arithmetic mean = , the geometric mean = (𝑎1𝑎2 … 𝑎𝑛)𝑛 and the harmonic
𝑛
𝑎−1+𝑎−1+⋯+𝑎−1 −1
mean = ( 1 2 𝑛 )
𝑛

Measures of Central Tendencies

Uploaded by

Copyright:

Available Formats

You might also like

Measures of Central Tendencies

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Measures of Central Tendencies

Uploaded by

Copyright:

Available Formats

Self-Learning Material

Learning Outcomes .................................................................................................................................. 3

Unit Pre-requisites ................................................................................................................................... 3

Pre-Unit Preparatory Material ................................................................................................................. 3

Table of Topics ......................................................................................................................................... 3

Formulas used ........................................................................................................................................ 11

 The computation and interpretation of measures of central tendency

Pre-Unit Preparatory Material

Measures of Central Tendency

Geometric Mean and Harmonic Mean

Brandon Butch 591000

Isa Marcial 2420000

Jared Polin 1260000

Bong Tech Talks 0

Manish Technician 1460

snehrish comedy S.C 243

Jass Tech 440

Table 3. Subscription status of a sample of 15 customers of an OTT player

Measures of central tendency for grouped data

Household income range (USD) Number of households ('000s)

1000 - 2000 86339

2000 - 3000 44129

3000 - 4000 12471

4000 - 5000 9593

5000 - 6000 8634

9593 × 500 + 86339 × 1500 + ⋯ + 8634 × 5500 378931573

In the example presented in Table 4, the 2 th data point is = 2

Choosing the ideal measure of central tendency

Mode is the most frequently occurring value in a list of values

You might also like