Measures of Central Tendencies

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 11

Self-Learning Material

Program:PGPMex
Course Name: Data Analysis for Decision Making
Unit Name: Measures of Central Tendencies

Powered by Great Learning. Proprietary content. ©Great Learning. All Rights Reserved.
Unauthorized use or distribution prohibited.
Contents
Overview .................................................................................................................................................. 3

Objectives ................................................................................................................................................ 3

Learning Outcomes .................................................................................................................................. 3

Unit Pre-requisites ................................................................................................................................... 3

Pre-Unit Preparatory Material ................................................................................................................. 3

Table of Topics ......................................................................................................................................... 3

Introduction .............................................................................................................................................. 3
Measures of Central Tendency ................................................................................................................. 4
Mean ......................................................................................................................................................... 4
Geometric Mean and Harmonic Mean ..................................................................................................... 5
Median ...................................................................................................................................................... 6
Mode ......................................................................................................................................................... 8
Measures of central tendency for grouped data ...................................................................................... 9
Choosing the ideal measure of central tendency ................................................................................... 10
Conclusion.............................................................................................................................................. 11

Glossary.................................................................................................................................................. 11

Formulas used ........................................................................................................................................ 11

Powered by Great Learning. Proprietary content. ©Great Learning. All Rights Reserved.
Unauthorized use or distribution prohibited.
Overview
Statistics and data-driven decision making are a crucial pillar of modern managerial practice.
The first step in the statistical analysis is the preparation of data summaries – both descriptive
and visual. In this chapter, we introduce the reader to the sources and types of data. We also
introduce several key visualization techniques that can be used to explore the patterns
between the variables captured in data.

Objectives
In this Unit you will learn –

 The computation and interpretation of measures of central tendency


 How to summarize different types of data using appropriate metrics

Learning Outcomes
At the end of this Unit, you would -

 Be able to compute measures of central tendency - mean, median and mode, geometric
mean and harmonic mean
 Be able to understand when each of these metrics should be used

Unit Pre-requisites
 This unit requires a prior knowledge of Data Visualization.
 Before studying this Unit, the student should have completed Data Visualization unit

Pre-Unit Preparatory Material


None

Table of Topics

Introduction
In data visualization unit, we looked at methods to visually summarize the contours of a data
set. While these methods provide a rich description of the variables in our data, often we need
a numerical summary of the variables of interest. An important place where such numerical

Powered by Great Learning. Proprietary content. ©Great Learning. All Rights Reserved.
Unauthorized use or distribution prohibited.
summaries get used is in hypothesis testing, a core part of data-driven decision making. In this
unit, we attempt the unimaginable and take a deep dive into a few key methods of
summarizing large amount of data using a single number.

Measures of Central Tendency


An important task in summarizing the nature of data is to understand where the center of their
distribution lies. Since the center of the distribution indicates where most of th data lies,
methods to estimate this center are usually referred to as Measures of Central Tendency.

Measures of central tendency serve two main purposes. First, they provide a great context by
outlining where most of the data lies. Second, they serve as a single-point numerical summary
of the data relieving the analyst from the burden of presenting the entirety of the data. Popular
measures of central tendency include the mean (Arithmetic, Gometric and Harmonic), median
and the mode. Let us now look at each of these in greater detail.

Mean
The mean (also referred to as arithmetic mean) of a list of numbers is computed by adding up a
set of numbers and dividing by the number of these numbers. For example, consider the list of
numbers – 10, 11, 11, 11, 10, 11, 9, 10, 9. The arithmetic mean or average of this list is =
10+11+11+11+10+11+9+10+9 = 10.22. There are several situations where the mean can be an
9
important device of investigation. For example, consider Table 1 where a sample of customer
satisfaction scores for a new mobile phone are presented for 2019 and 2020.

Table 1. Customer satisfaction scores (out of 100) for a mobile phone in 2019 and 2020 for a
sample of 15 customers

2019 2020

96 100

95 97

92 100

93 99

92 99

96 100

Powered by Great Learning. Proprietary content. ©Great Learning. All Rights Reserved.
Unauthorized use or distribution prohibited.
99 97

94 100

100 98

94 102

98 99

94 100

92 98

97 99

95 99

One good way to understand if the customer satisfaction increased or decreased from 2019
to 2020 would be to compute the average score in 2019 and 2020. From Table 1, we note that
the average customer satisfaction score for 2019 was 95 and that for 2020 was 99. This
indicates that on an average customer are more satisfied with the phone in 2020 compared to
2019.

While the arithmetic mean is the most used average in analysis, there are two other methods of
computing the average of a list of numbers – geometric mean and harmonic mean

Geometric Mean and Harmonic Mean


To compute a geometric mean of a list of 𝑛 numbers, we multiply all the numbers in the list and
raise them to the 𝑛th root. For example, we can compute the geometric mean of the customer
satisfaction scores in 2019 and 2020 for the data presented in Table 1 by taking the 15 th root of
the product of the numbers in the respective columns. Applying this method, the geometric
mean of the scores in 2019 is 95.19 and that for 2020 is 99.08. Looking at these scores, we
would still come to the same inference as we did by computing the arithmetic mean, that is,
customers seem more satisfied in 2020.

To compute a harmonic mean of a list of numbers, we compute the reciprocal of the arithmetic
mean of the reciprocals of the numbers in the list. Digest that for a while! For instance, to
compute the harmonic mean of the list of numbers - 10, 11, 11, 11, 10, 11, 9, 10, 9, we would
perform a computation like so:

Powered by Great Learning. Proprietary content. ©Great Learning. All Rights Reserved.
Unauthorized use or distribution prohibited.
−1
10−1+11−1+11−1+11−1+10−1+11−1+9−1+10−1+9−1
( ) = 10.16.
9

Why on earth would we then do these comparatively costlier computations over the arithmetic
mean? It turns out that the geometric and harmonic means are better suited to describe the
average of certain types of data. For example, geometric mean is a natural way to describe
growth rates of a stock (or any financial instrument). This is because, the average growth over a
period of time is the product of the growth rates at each time period in the interim. One
example where the harmonic mean could be useful is in judging the performance of equity
portfolios using financial ratios (such as the PE ratio - price/earnings or EPS ratio -
earnings/total shares outstanding). Here, while the arithmetic mean over-estimates the
average financial ratio of a portfolio of securities, the harmonic mean is a natural estimate in
this situation.

Median
The median of a list of values is the middle number in the sorted list of these values. Going back
to our earlier list of numbers – 10, 11, 11, 11, 10, 11, 9, 10, 9 – to compute the median, we first
need to sort these numbers in ascending order. The sorted list is presented below:

9 9 10 10 10 11 11 11 11

The median of this list is then the middle number is 10 (highlighted in bold), that is, the number
that has equal number of entries on either side. Note that by definition, half or more numbers
in the list are smaller than the median and half or more are bigger. This is a neat property of the
median that allows it to be less influenced by extraordinarily large or small numbers (called
outliers). This important property of the median also makes it an ideal choice over the mean
when outliers are present in the data. For example, consider the following sample of number of
subscribers of a set of YouTube channels as on October 2020. This data is presented in Table 2.

Powered by Great Learning. Proprietary content. ©Great Learning. All Rights Reserved.
Unauthorized use or distribution prohibited.
Table 2. Number of subscribers for a sample of YouTube channels in October 2020

Number of
subscribers
in October
YouTube channel 2020

TechKaboom 578000

Brandon Butch 591000

zollotech 778000

AlexiBexi 1430000

Canaltech 2550000

Isa Marcial 2420000

Jared Polin 1260000

TECH BOZZ 63

Bong Tech Talks 0

Technix! 543

TheMissingByte 73

Manish Technician 1460

Yash Dua 0

snehrish comedy S.C 243

TechCareCenter 688

Honest Straightforward
Reviews 1270

Jass Tech 440

UNIQLY TUT 0

Powered by Great Learning. Proprietary content. ©Great Learning. All Rights Reserved.
Unauthorized use or distribution prohibited.
The mean number of subscribers in this list is 5,33,987.778 while the median is 979. Note that
since we are adding up all the numbers in the list the mean gets influenced by the relatively
high values in the list (e.g., AlexiBexi, Jared Polin) skewing the estimate higher. Since the
median is concerned only with the ranking of the numbers and not the values, it provides a
more reasonable estimate of the average.

Mode
The mode is the most frequently occurring value in a list of values. It is most commonly used for
categorical data to estimate the most common class in the data. For example, consider the
following extract of the subscription status of a set of 15 customers of an OTT player.

Table 3. Subscription status of a sample of 15 customers of an OTT player

Customer Subscription
id status

1 Basic

2 Basic

3 Standard

4 Basic

5 Standard

6 Basic

7 Premium

8 Premium

9 Basic

10 Standard

11 Premium

12 Standard

13 Premium

Powered by Great Learning. Proprietary content. ©Great Learning. All Rights Reserved.
Unauthorized use or distribution prohibited.
14 Premium

15 Standard

The mode of the subscription status of the sample is “Basic” indicating that the OTT player has
most of its subscribers in the Basic plan.

Measures of central tendency for grouped data


In case data is available only in a grouped format where the data classes and frequency of
observations belonging to each class are known, we can then compute the measures of central
tendency for such data using the following formulae.

Consider that we have 𝑁 observations divided into a certain number of classes, where 𝑓𝑖 is the
frequency of class 𝑖. Given this grouped data, the mean can be computed like so:

𝛴𝑓𝑖𝑚𝑖 𝛴𝑓𝑖𝑚𝑖
𝑥= =
𝛴𝑓𝑖 𝑁

Here, 𝑚𝑖 is the mid-point of each class 𝑖 (i.e., (lower limit + upper limit)/2).

For example, consider the following data presented in Table 4 on the income distribution of
households in India

Table 4. Distribution of household income in India in 2002 (source: Hammond, World Resources
Institute, http://pdf.wri.org/hammond_india_profile_xls.pdf)

Household income range (USD) Number of households ('000s)

0 - 1000 9593

1000 - 2000 86339

2000 - 3000 44129

3000 - 4000 12471

4000 - 5000 9593

5000 - 6000 8634

Powered by Great Learning. Proprietary content. ©Great Learning. All Rights Reserved.
Unauthorized use or distribution prohibited.
The mean household income for those who earn less than $6000 can be computed using the
formula listed above as:

9593 × 500 + 86339 × 1500 + ⋯ + 8634 × 5500 378931573


= = = 2219.10
9593 + 86339 + ⋯ + 8634 170759

The median for grouped data can be computed using the formula:
𝑁
− 𝑐𝑓
𝑀𝑒𝑑𝑖𝑎𝑛 = 𝑙 + 2 𝑐
𝑓

Where 𝑙 is the lower limit of the median class, 𝑐 is the width of the median class, 𝑓 is the
frequency of the median class and 𝑐𝑓 is the cumulative frequency of the previous class. Here,
𝑁
the median class is the class that contains the 2 th data point.
𝑁 170759 th

In the example presented in Table 4, the 2 th data point is = 2


= 85380 point that
belongs to the class 1000 – 2000. So, the median can be computed as:

85380 − 9593
𝑀𝑒𝑑𝑖𝑎𝑛 = 1000 + × 1000 = 1790.01
95932

Choosing the ideal measure of central tendency


Now that we understand the different methods of estimating the average (or central tendency)
of a list of numbers the question that remains is that of choosing among these different
methods. The answer, of course, lies in the nature of the data. It is usual practice to compute a
series of these statistics and then decide on the one best suited for the data at hand. For
example, the mode is ideal to summarize categorical data. For numeric data, we have two
options – mean or median. If we suspect that there are several outliers in the data, then mode
should be the go-to measure of central tendency. If we are blessed with evenly distributed
data, then by all means, choose the mean.

Powered by Great Learning. Proprietary content. ©Great Learning. All Rights Reserved.
Unauthorized use or distribution prohibited.
Conclusion
In this chapter, we have introduced several methods to summarize the central location of a list
of numbers. We also highlight the typical situations where these methods get used and their
merits and shortcomings. A properly constructed measure of central tendency can give the
analyst a quick sense of the values of the variable of interest and should be a crucial part of any
descriptive analysis.

Glossary
Arithmetic mean of a list of numbers is the sum of the numbers divided by the number of
𝑎1+𝑎2+⋯+𝑎𝑛
numbers in the list. For a list of 𝑛 numbers, 𝑎1 , 𝑎2 , … , 𝑎𝑛 , the arithmetic mean =
𝑛

Geometric mean of a list of 𝑛 numbers is the product of numbers in the list taken to the 𝑛th
1
root. For a list of 𝑛 numbers, 𝑎1, 𝑎2, … , 𝑎𝑛, the geometric mean = (𝑎1𝑎2 … 𝑎𝑛)𝑛

Harmonic mean of a list of numbers is the reciprocal of the arithmetic mean of the reciprocals
of the numbers in the list. For a list of 𝑛 numbers, 𝑎1, 𝑎2, … , 𝑎𝑛, the harmonic mean =
−1
𝑎−1+𝑎−1+⋯+𝑎−1
1 2 𝑛
( )
𝑛

Median of a list of numbers is the value that is in the middle of the list when sorted

Mode is the most frequently occurring value in a list of values

Formulas used
For a list of 𝑛 numbers - 𝑎1, 𝑎2, … , 𝑎𝑛:
1
𝑎1+𝑎2+⋯+𝑎𝑛
The arithmetic mean = , the geometric mean = (𝑎1𝑎2 … 𝑎𝑛)𝑛 and the harmonic
𝑛
𝑎−1+𝑎−1+⋯+𝑎−1 −1
mean = ( 1 2 𝑛 )
𝑛

Powered by Great Learning. Proprietary content. ©Great Learning. All Rights Reserved.
Unauthorized use or distribution prohibited.

You might also like