Download as pdf or txt
Download as pdf or txt
You are on page 1of 12

Segment: Descriptive Statistics

Topic 1: Measures of Central Tendency

Topic 2: Measures of Variation

Topic 3: Measures of the Shape of


Distributions
DESCRIPTIVE STATISTICS

Hello, I am Dr. Ganesh C. B, welcome to this video on descriptive statistics.

In this video, we will cover three topics. The first topic is measures of central tendency. In
this topic, we will explore the concepts of mean, median and mode as measures of central
tendency. In the second topic, we will explore the measures of variation. This will involve the

©COPYRIGHT 2023 (Ver. 1.0), ALL RIGHTS RESERVED. MANIPAL ACADEMY OF HIGHER EDUCATION 2/12
DESCRIPTIVE STATISTICS

range, interquartile range, variance, standard deviation, and coefficient of variation. In the
third topic, we will explore the measures of the shape of distributions. This will involve
skewness and kurtosis. Let us look at these topics in-depth to gain a better understanding
of descriptive statistics.

Let us first look at the learning objectives of this session.

At the end of this video, you will be able to:

• Interpret the descriptive statistics of a data set


• Assess the pros and cons of various measures of central location and variation

©COPYRIGHT 2023 (Ver. 1.0), ALL RIGHTS RESERVED. MANIPAL ACADEMY OF HIGHER EDUCATION 3/12
DESCRIPTIVE STATISTICS

Topic 1: Measures of Central Tendency

Let's begin our exploration of the first topic, which is measures of central tendency.

A measure of central tendency represents the centre or middle of a set of data values.
Central tendency is a statistic that shows the single value of a data set. The commonly used
measures of central tendency are Mean, Median, and Mode.

Let us start by understanding the concept of mean first, it is the most commonly used to
measure central tendency. The mean is often called the average.

The formula to calculate the mean is:


∑𝑛𝑖=1 𝑋𝑖
𝑋̅ =
𝑛

To find the mean first take a data set with numbers such as:
8, 3, 2, 1, 1

Now calculate the sum of all these numbers:


8 + 3 + 2 + 1 + 1 = 15

©COPYRIGHT 2023 (Ver. 1.0), ALL RIGHTS RESERVED. MANIPAL ACADEMY OF HIGHER EDUCATION 4/12
DESCRIPTIVE STATISTICS

Then divided the sum by the total numbers:


15
=3
5
So, the mean of the given data set is 3.

The disadvantage of using the mean as a measure of central tendency is that the mean as
outliers will influence a measure of centre tendency. A high number can increase the mean
of a set of numbers. For example, if an MBA candidate is offered a high-paying overseas job,
it increases the mean salary of the entire batch.

Median is the middle value of the data set when the data is arranged in ascending or
descending order. In other words, the median can also be defined as the value at the 50th
percentile of the data set.

To find the median first take a data set with numbers such as:
8, 3, 2, 1, 1
Now arrange these numbers in an ascending or descending order. After arranging there are
five observations. The middle value which is the third observation that turns out to be the
median of this data set is 2. It can be observed that the median will not get influenced by
the presence of outliers.

©COPYRIGHT 2023 (Ver. 1.0), ALL RIGHTS RESERVED. MANIPAL ACADEMY OF HIGHER EDUCATION 5/12
DESCRIPTIVE STATISTICS

The mode is the value that occurs most frequently and represents the highest peak of the
distribution. It is a good measure of location when the variable is inherently categorical or
has otherwise been grouped into categories.

To find the mode first take data set with numbers such as:
8, 3, 2, 1, 1

Here, you can observe that number 1 is repeated twice whereas all the other data points are
not getting repeated. So, the value that occurs most frequently in this data set with five
observations is the value 1 and for this data set mode will be equal to 1.

©COPYRIGHT 2023 (Ver. 1.0), ALL RIGHTS RESERVED. MANIPAL ACADEMY OF HIGHER EDUCATION 6/12
DESCRIPTIVE STATISTICS

Topic 2: Measures of Variation

The second topic is measures of variation where we will be discussing different levels of
variation.

Range: It measures the spread of the data. It has the difference between the largest and
smallest values in the data set. It has the demerit of being influenced by outliers.

Formula for calculating the range is:


𝑅𝑎𝑛𝑔𝑒 = 𝑋𝑙𝑎𝑟𝑔𝑒𝑠𝑡 − 𝑋𝑠𝑚𝑎𝑙𝑙𝑒𝑠𝑡

©COPYRIGHT 2023 (Ver. 1.0), ALL RIGHTS RESERVED. MANIPAL ACADEMY OF HIGHER EDUCATION 7/12
DESCRIPTIVE STATISTICS

Interquartile Range: To solve the outlier issue interquartile range is implemented to take
variation measures. Interquartile range is the difference between the 75th and 25th percentile
in the data set. For a set of data points arranged in order of magnitude, the 𝑝th percentile is
the value that has 𝑝% of the data points below it and (100 − 𝑝) % above it.

©COPYRIGHT 2023 (Ver. 1.0), ALL RIGHTS RESERVED. MANIPAL ACADEMY OF HIGHER EDUCATION 8/12
DESCRIPTIVE STATISTICS

Variance and Standard Deviation: To capture the variation in a data set, the most commonly
used measure is Variance. It is defined as the mean of the square deviation from the mean.
The values will be always positive and standard deviation is defined as the square root of
variance. In the given graph, it has the distribution of two data sets. First data set shows the
green curve which has smaller variances whereas another shows the yellow curve which
means higher variance. Yellow curve represents wider distribution spread than green curve.

So, the conclusion is that both measures of variation have different impacts, such as:
• Standard deviation data set has the same unit of measurement.
• Variance data will not be in the same unit of measurement.

As a result, the standard deviation is more convenient in a measure of variation than


variance.

Coefficient of Variation: It is defined as the ratio of the standard deviation to the mean
expressed as a percentage and is a unitless measure of relative variability.

The following advantages of the coefficient of variation are:

• Coefficient of variation is a unit less and it is expressed as a percentage.

©COPYRIGHT 2023 (Ver. 1.0), ALL RIGHTS RESERVED. MANIPAL ACADEMY OF HIGHER EDUCATION 9/12
DESCRIPTIVE STATISTICS

• It becomes a good measure to compare the relative variances or relative measures of


variation of two different data sets.

Topic 3: Measures of the Shape of Distributions

The third topic is measures of the shape of distributions where we will be discussing the
distribution of skewness and kurtosis.

Skewness: The deviations from the mean tend to be larger in one direction than in the other.
It can be interpreted as the tendency for one tail of the distribution to be heavier than the
other.

Let us look into the different skewness distribution:

• Positive skew: Heavy-tailed distribution is positively skewed.


• Symmetric distributions: It will not display any skewness except real-life data sets.
• Negative skew: Light-tailed distribution is negatively skewed.

©COPYRIGHT 2023 (Ver. 1.0), ALL RIGHTS RESERVED. MANIPAL ACADEMY OF HIGHER EDUCATION 10/12
DESCRIPTIVE STATISTICS

Kurtosis: It is the final measure of the shape of distributions, and it is a measure of the
relative peakiness or flatness of the curve defined by the frequency distribution.

Let us look into the different Kurtosis:

• Positive Kurtosis: Highly peaked curve is referred to have positive kurtosis.


• Normal Kurtosis: The excess kurtosis should be zero for a perfectly normal
distribution.
• Negative kurtosis: The thicker curve is referred to have negative kurtosis.

©COPYRIGHT 2023 (Ver. 1.0), ALL RIGHTS RESERVED. MANIPAL ACADEMY OF HIGHER EDUCATION 11/12
DESCRIPTIVE STATISTICS

Summary

In this topic, we discussed:

• Mean, median, and mode are the various measures of central location.
• Range, interquartile range, variance, standard deviation, and coefficient of variation are
the various measures of variation.
• Skewness and kurtosis are two measures of the shape of the data distribution.

©COPYRIGHT 2023 (Ver. 1.0), ALL RIGHTS RESERVED. MANIPAL ACADEMY OF HIGHER EDUCATION 12/12

You might also like