Professional Documents
Culture Documents
RM-Topic 1-Descriptive Statistics
RM-Topic 1-Descriptive Statistics
In this video, we will cover three topics. The first topic is measures of central tendency. In
this topic, we will explore the concepts of mean, median and mode as measures of central
tendency. In the second topic, we will explore the measures of variation. This will involve the
©COPYRIGHT 2023 (Ver. 1.0), ALL RIGHTS RESERVED. MANIPAL ACADEMY OF HIGHER EDUCATION 2/12
DESCRIPTIVE STATISTICS
range, interquartile range, variance, standard deviation, and coefficient of variation. In the
third topic, we will explore the measures of the shape of distributions. This will involve
skewness and kurtosis. Let us look at these topics in-depth to gain a better understanding
of descriptive statistics.
©COPYRIGHT 2023 (Ver. 1.0), ALL RIGHTS RESERVED. MANIPAL ACADEMY OF HIGHER EDUCATION 3/12
DESCRIPTIVE STATISTICS
Let's begin our exploration of the first topic, which is measures of central tendency.
A measure of central tendency represents the centre or middle of a set of data values.
Central tendency is a statistic that shows the single value of a data set. The commonly used
measures of central tendency are Mean, Median, and Mode.
Let us start by understanding the concept of mean first, it is the most commonly used to
measure central tendency. The mean is often called the average.
To find the mean first take a data set with numbers such as:
8, 3, 2, 1, 1
©COPYRIGHT 2023 (Ver. 1.0), ALL RIGHTS RESERVED. MANIPAL ACADEMY OF HIGHER EDUCATION 4/12
DESCRIPTIVE STATISTICS
The disadvantage of using the mean as a measure of central tendency is that the mean as
outliers will influence a measure of centre tendency. A high number can increase the mean
of a set of numbers. For example, if an MBA candidate is offered a high-paying overseas job,
it increases the mean salary of the entire batch.
Median is the middle value of the data set when the data is arranged in ascending or
descending order. In other words, the median can also be defined as the value at the 50th
percentile of the data set.
To find the median first take a data set with numbers such as:
8, 3, 2, 1, 1
Now arrange these numbers in an ascending or descending order. After arranging there are
five observations. The middle value which is the third observation that turns out to be the
median of this data set is 2. It can be observed that the median will not get influenced by
the presence of outliers.
©COPYRIGHT 2023 (Ver. 1.0), ALL RIGHTS RESERVED. MANIPAL ACADEMY OF HIGHER EDUCATION 5/12
DESCRIPTIVE STATISTICS
The mode is the value that occurs most frequently and represents the highest peak of the
distribution. It is a good measure of location when the variable is inherently categorical or
has otherwise been grouped into categories.
To find the mode first take data set with numbers such as:
8, 3, 2, 1, 1
Here, you can observe that number 1 is repeated twice whereas all the other data points are
not getting repeated. So, the value that occurs most frequently in this data set with five
observations is the value 1 and for this data set mode will be equal to 1.
©COPYRIGHT 2023 (Ver. 1.0), ALL RIGHTS RESERVED. MANIPAL ACADEMY OF HIGHER EDUCATION 6/12
DESCRIPTIVE STATISTICS
The second topic is measures of variation where we will be discussing different levels of
variation.
Range: It measures the spread of the data. It has the difference between the largest and
smallest values in the data set. It has the demerit of being influenced by outliers.
©COPYRIGHT 2023 (Ver. 1.0), ALL RIGHTS RESERVED. MANIPAL ACADEMY OF HIGHER EDUCATION 7/12
DESCRIPTIVE STATISTICS
Interquartile Range: To solve the outlier issue interquartile range is implemented to take
variation measures. Interquartile range is the difference between the 75th and 25th percentile
in the data set. For a set of data points arranged in order of magnitude, the 𝑝th percentile is
the value that has 𝑝% of the data points below it and (100 − 𝑝) % above it.
©COPYRIGHT 2023 (Ver. 1.0), ALL RIGHTS RESERVED. MANIPAL ACADEMY OF HIGHER EDUCATION 8/12
DESCRIPTIVE STATISTICS
Variance and Standard Deviation: To capture the variation in a data set, the most commonly
used measure is Variance. It is defined as the mean of the square deviation from the mean.
The values will be always positive and standard deviation is defined as the square root of
variance. In the given graph, it has the distribution of two data sets. First data set shows the
green curve which has smaller variances whereas another shows the yellow curve which
means higher variance. Yellow curve represents wider distribution spread than green curve.
So, the conclusion is that both measures of variation have different impacts, such as:
• Standard deviation data set has the same unit of measurement.
• Variance data will not be in the same unit of measurement.
Coefficient of Variation: It is defined as the ratio of the standard deviation to the mean
expressed as a percentage and is a unitless measure of relative variability.
©COPYRIGHT 2023 (Ver. 1.0), ALL RIGHTS RESERVED. MANIPAL ACADEMY OF HIGHER EDUCATION 9/12
DESCRIPTIVE STATISTICS
The third topic is measures of the shape of distributions where we will be discussing the
distribution of skewness and kurtosis.
Skewness: The deviations from the mean tend to be larger in one direction than in the other.
It can be interpreted as the tendency for one tail of the distribution to be heavier than the
other.
©COPYRIGHT 2023 (Ver. 1.0), ALL RIGHTS RESERVED. MANIPAL ACADEMY OF HIGHER EDUCATION 10/12
DESCRIPTIVE STATISTICS
Kurtosis: It is the final measure of the shape of distributions, and it is a measure of the
relative peakiness or flatness of the curve defined by the frequency distribution.
©COPYRIGHT 2023 (Ver. 1.0), ALL RIGHTS RESERVED. MANIPAL ACADEMY OF HIGHER EDUCATION 11/12
DESCRIPTIVE STATISTICS
Summary
• Mean, median, and mode are the various measures of central location.
• Range, interquartile range, variance, standard deviation, and coefficient of variation are
the various measures of variation.
• Skewness and kurtosis are two measures of the shape of the data distribution.
©COPYRIGHT 2023 (Ver. 1.0), ALL RIGHTS RESERVED. MANIPAL ACADEMY OF HIGHER EDUCATION 12/12