Download as pdf or txt
Download as pdf or txt
You are on page 1of 15

Measure of Dispersion

How are The Measures of Dispersion Different from the Measures of Central Tendency?
For any given data set, the mean, median, and mode represent the measures of central
tendency. Measures of central tendency inform us about the horizontal shape of the distribution of
the data on a histogram. For example, the relationship between the mean and the median of a data
set provides us information about the skew of the distribution on a histogram. While the measures
of central tendency provide us information about the horizontal distribution of the data on a
histogram, the measures of dispersion provide us information about the vertical distribution of the
data on a histogram. The measures of dispersion are more concerned with how much discrepancy
(distance or difference) is present between the average value of the data set and every single value.
The Measures of dispersion also provide us information about clustering of data points.
Dispersion Definition
In statistics, dispersion refers to how the data is spread out, how widely or narrowly is it
scattered on a plot, or how much variability is present in the data points when compared to the
mean or average value of all data points. Dispersion is also referred to as the variation in the data.
Statistical analysis of data can provide us with much important information. This information can
be divided into two distinct groups. One group is called measures of central tendency and the
other is called measures of dispersion.
What is Measure of Dispersion in Statistics?
Measures of dispersion help to describe the variability in data. Dispersion is a statistical
term that can be used to describe the extent to which data is scattered. Thus, measures of dispersion
are certain types of measures that are used to quantify the dispersion of data.
Measures of dispersion are non-negative real numbers that help to gauge the spread of data
about a central value. These measures help to determine how stretched or squeezed the given data
is. There are five most commonly used measures of dispersion. These are range, variance, standard
deviation, mean deviation, and quartile deviation.
The most important use of measures of dispersion is that they help to get an understanding
of the distribution of data. As the data becomes more diverse, the value of the measure of
dispersion increases.
Types of Measures of Dispersion
The measures of dispersion can be classified into two broad categories. These are absolute
measures of dispersion and relative measures of dispersion. Range, variance, standard deviation
and mean deviation fall under the category of absolute measures of deviation. These measures
have the same unit as the data that is being scrutinized. Coefficients of dispersion are relative
measures of deviation. Such dispersion measures are always dimensionless. The upcoming
sections will further elaborate on these measures.
Absolute Measures of Dispersion
If the dispersion of data within an experiment has to be determined then absolute measures
of dispersion should be used. These measures usually express variations in a data set with respect
to the average of the deviations of the observations. The most commonly used absolute measures
of deviation are listed below:
1. Range: Given a data set, the range can be defined as the difference between the maximum
value and the minimum value.
2. Variance: The average squared deviation from the mean of the given data set is known as
the variance. This measure of dispersion checks the spread of the data about the mean.
3. Standard Deviation: The square root of the variance gives the standard deviation. Thus,
the standard deviation also measures the variation of the data about the mean.
4. Mean Deviation: The mean deviation gives the average of the data's absolute deviation
about the central points. These central points could be the mean, median, or mode.
5. Quartile Deviation: Quartile deviation can be defined as half of the difference between
the third quartile and the first quartile in a given data set.
Relative Measures of Dispersion
If the data of separate data sets have different units and need to be compared then relative
measures of dispersion are used. The measures are expressed in the form of ratios and percentages
thus, making them unit less. Some of the relative measures of dispersion are given below:
1. Coefficient of Range: It is the ratio of the difference between the highest and lowest value
in a data set to the sum of the highest and lowest value.
2. Coefficient of Variation: It is the ratio of the standard deviation to the mean of the data
set. It is expressed in the form of a percentage.
3. Coefficient of Mean Deviation: This can be defined as the ratio of the mean deviation to
the value of the central point from which it is calculated.
4. Coefficient of Quartile Deviation: It is the ratio of the difference between the third
quartile and the first quartile to the sum of the third and first quartiles.

Measures of Dispersion Formula


Measures of dispersion are used when we want to find the scattering of data about a central point
such as the mean. The general formulas used to calculate the various measures of dispersion are
given below:
Absolute Measures of Dispersion
Relative Measures of Dispersion

Examples
Measures of Dispersion for a Grouped data
Descriptive Statistics - Measures of Shape
Not everything that can be counted counts, and not everything that counts can be counted.
The moments of a function are quantitative measures related to the shape of the function's
graph. If the function is a probability distribution, then the zeroth moment is the total probability
(i.e. one), the first moment is the expected value or Mean, the second central moment is the
Variance, the third standardized moment is the Skewness, and the fourth standardized moment is
the Kurtosis.
The shape of a distribution can be easily observed through Histograms and Density plots.
Two important Measures of Shape are:
Skewness
Skewness is a measure of the asymmetry of a distribution. A distribution is asymmetrical
when its left and right side are not mirror images.
A distribution can have right (or positive), left (or negative), or zero skewness. A right-
skewed distribution is longer on the right side of its peak, and a left-skewed distribution is longer
on the left side of its peak:
What is zero skew?
When a distribution has zero skew, it is symmetrical. It’s left and right sides are mirror images.
Normal distributions have zero skew, but they’re not the only distributions with zero skew.
Any symmetrical distribution, such as a uniform distribution or some bimodal (two-peak)
distributions, will also have zero skew.
The easiest way to check if a variable has a skewed distribution is to plot it in a histogram.
For example, the weights of six-week-old chicks are shown in the histogram below.
The distribution is approximately symmetrical, with the observations distributed similarly
on the left and right sides of its peak. Therefore, the distribution has approximately zero skew.
In a distribution with zero skew, the mean and median are equal.
Zero skew: mean = median
For example, the mean chick weight is 261.3 g, and the median is 258 g. The mean and
median are almost equal. They aren’t perfectly equal because the sample distribution has a very
small skew.
Although a theoretical distribution (e.g., the z distribution) can have zero skew, real data
almost always have at least a bit of skew. However, if a distribution is close to being symmetrical,
it usually is considered to have zero skew for practical purposes, such as verifying model
assumptions.
What is right skew (positive skew)?
A right-skewed distribution is longer on the right side of its peak than on its left. Right
skew is also referred to as positive skew.
You can think of skewness in terms of tails. A tail is a long, tapering end of a distribution.
It indicates that there are observations at one of the extreme ends of the distribution, but that they’re
relatively infrequent. A right-skewed distribution has a long tail on its right side.
The number of sunspots observed per year, shown in the histogram below, is an example
of a right-skewed distribution. The sunspots, which are dark, cooler areas on the surface of the sun,
were observed by astronomers between 1749 and 1983.
The distribution is right-skewed because it’s longer on the right side of its peak. There is a
long tail on the right, meaning that every few decades there is a year when the number of sunspots
observed is a lot higher than average.
The mean of a right-skewed distribution is almost always greater than its median. That’s
because extreme values (the values in the tail) affect the mean more than the median.
Right skew: mean > median
For example, the mean number of sunspots observed per year was 48.6, which is greater
than the median of 39
What is left skew (negative skew)?
A left-skewed distribution is longer on the left side of its peak than on its right. In other
words, a left-skewed distribution has a long tail on its left side. Left skew is also referred to as
negative skew.
Test scores often follow a left-skewed distribution, with most students performing
relatively well and a few students performing far below average. The histogram below shows
scores for the zoology portion of a standardized test taken by Indian students at the end of high
school.
The distribution is left-skewed because it’s longer on the left side of its peak. The long tail
on its left represents the small proportion of students who received very low scores.

The mean of a left-skewed distribution is almost always less than its median.
Left skew: mean < median
For example, the mean zoology test score was 53.7, which is less than the median of 55.
How to calculate skewness:
There are several formulas to measure skewness. One of the simplest is Pearson’s median
skewness. It takes advantage of the fact that the mean and median are unequal in a skewed
distribution.

Kurtosis
Kurtosis is a measure of the tailedness of a distribution. Tailedness is how often outliers
occur. Excess kurtosis is the tailedness of a distribution relative to a normal distribution.
 Distributions with medium kurtosis (medium tails) are mesokurtic.
 Distributions with low kurtosis (thin tails) are platykurtic.
 Distributions with high kurtosis (fat tails) are leptokurtic.
Tails are the tapering ends on either side of a distribution. They represent the probability or
frequency of values that are extremely high or low compared to the mean. In other words, tails
represent how often outliers occur.
Types of kurtosis
Distributions can be categorized into three groups based on their kurtosis:
What is a mesokurtic distribution?
A mesokurtic distribution is medium-tailed, so outliers are neither highly frequent, nor highly
infrequent.
Kurtosis is measured in comparison to normal distributions.
Normal distributions have a kurtosis of 3, so any distribution with a kurtosis of approximately 3 is
mesokurtic.
Often, kurtosis is described in terms of excess kurtosis, which is kurtosis − 3. Since normal
distributions have a kurtosis of 3, excess kurtosis makes comparing a distribution’s kurtosis to a
normal distribution even easier:
Normal distributions have an excess kurtosis of 0, so any distribution with an excess kurtosis of
approximately 0 is mesokurtic.
Mesokurtic distribution example
On average, a female baby elephant weighs an impressive 210 lbs at birth. Suppose that a zoologist
is interested in the distribution of elephant birth weights, so she contacts zoos and sanctuaries
around the world and asks them to share their data. She collects birth weight data for 400 female
baby elephants:

From the graph, we can see that the frequency distribution (shown by the gray bars)
approximately follows a normal distribution (shown by the green curve). Normal distributions are
mesokurtic.
The zoologist calculates the kurtosis of the sample. She finds that the kurtosis is 3.09 and
the excess kurtosis is 0.09, and she concludes that the distribution is mesokurtic.
Note: Although a population’s probability distribution can have a kurtosis of exactly 3, real data
is almost always at least slightly platykurtic or leptokurtic. If a sample has a kurtosis of
approximately 3, you can assume it’s drawn from a mesokurtic population.
Mesokurtic distributions have outliers that are neither highly frequent, nor highly infrequent, and
this is true of the elephant birth weights. Occasionally, a female baby elephant will be born
weighing less than 180 or more than 240 lbs.
What is a platykurtic distribution?
A platykurtic distribution is thin-tailed, meaning that outliers are infrequent.
Platykurtic distributions have less kurtosis than a normal distribution. In other words,
platykurtic distributions have:
A kurtosis of less than 3
An excess kurtosis of less than 0
Platykurtosis is sometimes called negative kurtosis, since the excess kurtosis is negative.
Note: The “platy” in “platykurtosis” comes from the Greek word platús, which means flat.
Although many platykurtic distributions have a flattened peak, some platykurtic distributions have
a pointy peak. Statisticians now understand that kurtosis is a measure of tailedness, not
“peakedness.”
A trick to remember the meaning of “platykurtic” is to think of a platypus with a thin tail.
Platykurtic distribution example
A sociologist is studying the social media use of students at a small high school. There are
400 students at the school, ranging in age from 14 to 18 years old:

The frequency distribution (shown by the gray bars) doesn’t follow a normal distribution
(shown by the dotted green curve). Instead, it approximately follows a uniform distribution (shown
by the purple curve). Uniform distributions are platykurtic.
The sociologist calculates that the kurtosis of the sample is 1.78 and its excess kurtosis is
−1.22. He concludes that the distribution is platykurtic.
Platykurtic distributions have a low frequency of outliers. Uniform distributions, like the
distribution of students’ ages, are the extreme cases of platykurtic distributions because outliers
are so rare that they’re completely absent. There are no students younger than 14 or older than 18
years.
Note: In the graph above, notice that on the far left and right sides of the distribution—the tails—
the space below the uniform distribution curve (purple) is thinner than the space below the normal
distribution curve (green). This is what is meant by “thin tails.”
What is a leptokurtic distribution?
A leptokurtic distribution is fat-tailed, meaning that there are a lot of outliers.
Leptokurtic distributions are more kurtotic than a normal distribution. They have:
A kurtosis of more than 3
An excess kurtosis of more than 0
Leptokurtosis is sometimes called positive kurtosis, since the excess kurtosis is positive.
Note: The “lepto” in “leptokurtosis” comes from the Greek word leptós, which means narrow.
Like platykurtosis, this is a misnomer because it defines kurtosis in terms of “peakedness” instead
of tailedness.
A trick to remember the meaning of “leptokurtic” is to think of a leaping kangaroo with a fat tail.
Leptokurtic distribution example
Imagine that four astronomers are all trying to measure the distance between the Earth and Nu2
Draconis A, a blue star that’s part of the Draco constellation. Each of the four astronomers
measures the distance 100 times, and they put their data together in the same dataset:
The frequency distribution (shown by the gray bars) doesn’t follow a normal distribution
(shown by the dotted green curve). Instead, it approximately follows a Laplace distribution (shown
by the blue curve). Laplace distributions are leptokurtic.
The astronomers calculate that the kurtosis of the sample is 6.54 and its excess kurtosis is
3.54. They conclude that the distribution is leptokurtic.
Leptokurtic distributions have frequent outliers. The distribution of the astronomers’
measurements has more outliers than you would expect if the distribution were normal, with
several extreme observations that are less than 50 or more than 150 light-years.
Note: If you look closely at the graph above, you’ll notice that on the far left and right sides of the
distribution—the tails—the space below the Laplace distribution curve (blue) is slightly thicker
than the space below the normal distribution curve (green). This is what is meant by “fat tails.”
How to calculate kurtosis:
Mathematically speaking, kurtosis is the standardized fourth moment of a distribution.
Moments are a set of measurements that tell you about the shape of a distribution.
Moments are standardized by dividing them by the standard deviation raised to the
appropriate power.
Kurtosis of a population
The following formula describes the kurtosis of a population:

Kurtosis of a sample
The kurtosis of a sample is an estimate of the kurtosis of the population.
It might seem natural to calculate a sample’s kurtosis as the fourth moment of the sample
divided by its standard deviation to the fourth power. However, this leads to a biased estimate.

The formula for the unbiased estimate of excess kurtosis includes a lengthy correction based on
the sample size:
It’s time-consuming to calculate kurtosis by hand. For this reason, most people use
computer software to calculate it. For example, the KURT () function in Excel calculates kurtosis
using the above formula.

References:

Gulati, B. S. (n.d.). Measures of Dispersion. Retrieved October 5, 2022, from


https://www.slideshare.net/BirinderSinghGulati/measures-of-dispersion-111028342
Article title: Measures of Dispersion - Definition, Formulas, Examples
Website title: Cuemath
URL: https://www.cuemath.com/data/measures-of-dispersion/
No title. (n.d.). Study.com. Retrieved October 5, 2022, from
https://study.com/academy/lesson/measures-of-dispersion-definition-equations-examples.html

Measures of dispersion of grouped data. (n.d.). Pandai. Retrieved October 5, 2022, from
https://question.pandai.org/note/read/kssm-mt-11-07/kssm-f5-mm-07/bab-7-sukatan-dan-
serakan-data-terkumpul
Turney, S. (2022, May 10). Skewness | Definition, Examples & Formula. Scribbr.
https://www.scribbr.com/statistics/skewness/
Turney, S. (2022, June 27). What Is Kurtosis? | Definition, Examples & Formula. Scribbr.
https://www.scribbr.com/statistics/kurtosis/

You might also like