Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 14

MEASURE OF CENTRAL TENDENCY

A measure of central tendency is a single value that attempts to describe a set of data by identifying the central position
within that set of data. As such, measures of central tendency are sometimes called measures of central location. They
are also classed as summary statistics. The mean (often called the average) is most likely the measure of central
tendency that you are most familiar with, but there are others, such as the median and the mode.

The mean, median and mode are all valid measures of central tendency, but under different conditions, some measures
of central tendency become more appropriate to use than others. In the following sections, we will look at the mean,
mode and median, and learn how to calculate them and under what conditions they are most appropriate to be used.

A measure of central tendency (also referred to as measures of centre or central location) is a summary measure that attempts to describe a whole set of
data with a single value that represents the middle or centre of its distribution.

There are three main measures of central tendency: the mode, the median and the mean. Each of these measures describes a different indication of the typical or
central value in the distribution.

What is the mode?


The mode is the most commonly occurring value in a distribution.

Consider this dataset showing the retirement age of 11 people, in whole years:

54, 54, 54, 55, 56, 57, 57, 58, 58, 60, 60

This table shows a simple frequency distribution of the retirement age data.

Age Frequency
54 3
55 1
56 1
57 2
58 2
60 2

The most commonly occurring value is 54, therefore the mode of this distribution is 54 years.

Advantage of the mode:

The mode has an advantage over the median and the mean as it can be found for both numerical and categorical (non-numerical) data.

Limitations of the mode:

The are some limitations to using the mode. In some distributions, the mode may not reflect the centre of the distribution very well. When the distribution of
retirement age is ordered from lowest to highest value, it is easy to see that the centre of the distribution is 57 years, but the mode is lower, at 54 years.

54, 54, 54, 55, 56, 57, 57, 58, 58, 60, 60

It is also possible for there to be more than one mode for the same distribution of data, (bi-modal, or multi-modal). The presence of more than one mode can limit
the ability of the mode in describing the centre or typical value of the distribution because a single value to describe the centre cannot be identified.

In some cases, particularly where the data are continuous, the distribution may have no mode at all (i.e. if all values are different).

In cases such as these, it may be better to consider using the median or mean, or group the data in to appropriate intervals, and find the modal class.

What is the median?


The median is the middle value in distribution when the values are arranged in ascending or descending order.

The median divides the distribution in half (there are 50% of observations on either side of the median value). In a distribution with an odd number of
observations, the median value is the middle value.
Looking at the retirement age distribution (which has 11 observations), the median is the middle value, which is 57 years:

54, 54, 54, 55, 56, 57, 57, 58, 58, 60, 60

When the distribution has an even number of observations, the median value is the mean of the two middle values. In the following distribution, the two middle
values are 56 and 57, therefore the median equals 56.5 years:

52, 54, 54, 54, 55, 56, 57, 57, 58, 58, 60, 60

Advantage of the median:

The median is less affected by outliers and skewed data than the mean, and is usually the preferred measure of central tendency when the distribution is not
symmetrical.

Limitation of the median:

The median cannot be identified for categorical nominal data, as it cannot be logically ordered.

What is the mean?


The mean is the sum of the value of each observation in a dataset divided by the number of observations. This is also known as the arithmetic average.

Looking at the retirement age distribution again:

54, 54, 54, 55, 56, 57, 57, 58, 58, 60, 60

The mean is calculated by adding together all the values (54+54+54+55+56+57+57+58+58+60+60 = 623) and dividing by the number of observations (11)
which equals 56.6 years.

Advantage of the mean:

The mean can be used for both continuous and discrete numeric data.
Limitations of the mean:

The mean cannot be calculated for categorical data, as the values cannot be summed.

As the mean includes every value in the distribution the mean is influenced by outliers and skewed distributions.

What else do I need to know about the mean?

The population mean is indicated by the Greek symbol µ (pronounced ‘mu’). When the mean is calculated on a distribution from a sample it is indicated by the
symbol x̅ (pronounced X-bar).

How does the shape of a distribution influence the Measures of Central Tendency?

Symmetrical distributions:

When a distribution is symmetrical, the mode, median and mean are all in the middle of the distribution. The following graph shows a larger retirement age
dataset with a distribution which is symmetrical. The mode, median and mean all equal 58 years.
Dispersion in statistics is a way of describing how to spread out a set of data is. Dispersion is the state of data
getting dispersed, stretched, or spread out in different categories. It involves finding the size of distribution values
that are expected from the set of data for the specific variable. The meaning of dispersion in statistics is “numeric
data that is likely to vary at any instance of average value assumption”.

Dispersion of data in Statistics helps one to easily understand the dataset by classifying them into their own specific
dispersion criteria like variance, standard deviation and ranging.

Dispersion is a set of measures that helps one to determine the quality of data in an objectively quantifiable
manner. Most often data science courses start with the basics of statistics and dispersion is one such concept that you
cannot afford to skip.   

Measures of Dispersion

The measures of dispersion contain almost the same unit as the quantity being measured. There are many Measures
of Dispersion found that help us to get more insights into the data: 

1. Range 
2. Variance 
3. Standard Deviation 
4. Skewness 
5. IQR  
Image Source

Types of Measures of Dispersion

The Measure of Dispersion in Statistics is divided into two main categories and offer ways of measuring the diverse
nature of data. It is mainly used in biological statistics . We can easily classify them by checking whether they
contain units or not. 

So as per the above, we can divide the data into two categories which are: 

 Absolute Measures of Dispersion 


 Relative Measures of Dispersion
Absolute Measures of Dispersion

Absolute Measures of Dispersion is one with units; it has the same unit as the initial dataset. Absolute Measure of
Dispersion is expressed in terms of the average of the dispersion quantities like Standard or Mean deviation. The
Absolute Measure of Dispersion can be expressed in units such as Rupees, Centimetre, Marks, kilograms, and other
quantities that are measured depending on the situation. 
Types of Absolute Measure of Dispersion in Statistics:   

 Range: Range is the measure of the difference between the largest and smallest value of the data variability. The range is
the simplest form of Measures of Dispersion. 
Example: 1,2,3,4,5,6,7 
 Range = Highest value – Lowest value 
   = ( 7 – 1 ) = 6 
2. Mean (μ): Mean is calculated as the average of the numbers. To calculate the Mean, add all the outcomes and then divide
it with the total number of terms. 

Example: 1,2,3,4,5,6,7,8 

 Mean = (sum of all the terms / total number of terms) 


               = (1 + 2 + 3 + 4 + 5 + 6 + 7 + 8) / 8 

               = 36 / 8 

               = 4.5 

1. Variance (σ2): In simple terms, the variance can be calculated by obtaining the sum of the squared distance of each term
in the distribution from the Mean, and then dividing this by the total number of the terms in the distribution.  

It basically shows how far a number, for example, a student’s mark in an exam, is from the Mean of the entire class. 

Formula: 
(σ2) = ∑ ( X − μ)2 / N 

2. Standard Deviation: Standard Deviation  can be represented as the square root of Variance. To find the standard deviation
of any data, you need to find the variance first. Standard Deviation is considered the best measure of dispersion.

Formula: 

Standard Deviation = √σ 

5. Quartile: Quartiles divide the list of numbers or data into quarters. 


6. Quartile Deviation: Quartile Deviation is the measure of the difference between the upper and lower quartile. This
measure of deviation is also known as the interquartile range. 

Formula: 

Interquartile Range: Q3 – Q1. 

3. Mean deviation: Mean Deviation is also known as an average deviation; it can be computed using the Mean or Median of
the data. Mean deviation is represented as the arithmetic deviation of a different item that follows the central tendency.  

Formula: 

As mentioned, the Mean Deviation can be calculated using Mean and Median. 
 Mean Deviation using Mean: ∑ | X – M | / N 
 Mean Deviation using Median: ∑ | X – X1 | / N 
Relative Measures of Dispersion

Relative Measure of Dispersion in Statistics are the values without units. A relative measure of dispersion is used to
compare the distribution of two or more datasets.  

The definition of the Relative Measure of Dispersion is the same as the Absolute Measure of Dispersion; the only
difference is the measuring quantity.  

Types of Relative Measure of Dispersion: Relative Measure of Dispersion is the calculation of the co-efficient of
Dispersion, where 2 series are compared, which differ widely in their average.  

The main use of the co-efficient of Dispersion is when 2 series with different measurement units are compared.   

1. Co-efficient of Range: it is calculated as the ratio of the difference between the largest and smallest terms of the
distribution, to the sum of the largest and smallest terms of the distribution.  

Formula: 

 L – S / L + S  
 where L = largest value 
 S= smallest value 
2. Co-efficient of Variation: The coefficient of variation is used to compare the 2 data with respect to homogeneity
or consistency.  
Formula: 

 C.V = (σ / X) 100 
 X = standard deviation  
 σ = mean 
3. Co-efficient of Standard Deviation: The co-efficient of Standard Deviation is the ratio of standard
deviation with the mean of the distribution of terms.  

Formula:

  σ = ( √( X – X1)) / (N - 1) 
 Deviation = ( X – X1)  
 σ = standard deviation  
 N= total number  
4. Co-efficient of Quartile Deviation: The co-efficient of Quartile Deviation is the ratio of the difference between
the upper quartile and the lower quartile to the sum of the upper quartile and lower quartile.  

Formula: 

 ( Q3 – Q3) / ( Q3 + Q1) 
 Q3 = Upper Quartile  
 Q1 = Lower Quartile 
5. Co-efficient of Mean Deviation: The co-efficient of Mean Deviation can be computed using the mean or median
of the data. 

Mean Deviation using Mean: ∑ | X – M | / N 


Mean Deviation using Mean: ∑ | X – X1 | / N 

These formulas come in handy a lot while calculating different aspects of data and when you use  python with data
science, achieving this gets easier as the programming language offers various statistical packages for these.  

Why dispersion is important in a statistic

The knowledge of dispersion is vital in the understanding of statistics. It helps to understand concepts like the
diversification of the data, how the data is spread, how it is maintained, and maintaining the data over the central
value or central tendency. 

Moreover, dispersion in statistics provides us with a way to get better insights into data distribution. 

For example,  

3 distinct samples can have the same Mean, Median, or Range but completely different levels of variability.  

How to Calculate Dispersion

Dispersion can be easily calculated using various dispersion measures, which are already mentioned in the type s of
Measures of Dispersion described above. Before measuring the data, it is important to understand the diversion of
the terms and variations. 

One can use the following method to calculate the dispersion: 


 Mean 
 Standard deviation 
 Variance 
 Quartile deviation 
For example, let us consider two datasets: 

 Data A:97,98,99,100,101,102,103  
 Data B: 70,80,90,100,110,120,130 
On calculating the mean and median of the two datasets, both have the same value, which is 100. However, the rest
of the dispersion measures are totally different as measured by the above methods.  

The range of B is 10 times higher, for instance. 

How to represent Dispersion in Statistics 

Dispersion in Statistics can be represented in the form of graphs and pie-charts. Some of the different ways used
include: 

 Dot Plots 
 Box Plots 
 Stems 
 Leaf Plots 
Example: What is the variance of the values 3,8,6,10,12,9,11,10,12,7?  

Variation of the values can be calculated using the following formula: 

 (σ2) = ∑ ( X − μ)2 / N 
 (σ2) = 7.36 
What is an example of dispersion? 
One of the examples of dispersion outside the world of statistics is the rainbow- where white light is split into 7
different colours separated via wavelengths.  

Some statistical ways of measuring it are- 

 Standard deviation 
 Range 
 Mean absolute difference 
 Median absolute deviation 
 Interquartile change 
 Average deviation 
Conclusion: 

Dispersion in statistics refers to the measure of the variability of data or terms. Such variability may give random
measurement errors where some of the instrumental measurements are found to be imprecise. 

It is a statistical way of describing how the terms are spread out in different data sets. The more sets  of values, the
more scattered data is found, and it is always directly proportional. This range of values can vary from 5 - 10
values to 1000 - 10,000 values. This spread of data is described by the range of descriptive range of
statistics. Measures of Dispersion in statistics can be represented using a Dot Plot, Box Plot, and other different
ways. Learn dispersion and other concepts in statistics as the introductory course of knowledgehut  python with data
science program. 

You might also like