Professional Documents
Culture Documents
S TATISTICS
S TATISTICS
EXPERIMENT 1
1 Definition of Mean:
The mean, often referred to as the average, is a measure of central tendency used in
statistics to describe the average value of a set of numbers. It is calculated by adding up all
the values in a dataset and then dividing the sum by the total number of values in the
dataset. The formula for calculating the mean for a dataset of "n" values is:
∑ ∑ ∑
̅ ( )
∑ ∑ ∑
Teacher’s Signature-
By – Sumit Kumar
1.2 Merits and Demerits of Mean
Sensitive to All Data Points: It considers all values in the dataset, providing a
comprehensive summary.
Useful in Further Analysis: Mean is often used as a basis for more advanced statistical
analyses.
Sensitivity to Outliers: Outliers (extreme values) can significantly affect the mean,
making it less representative of the typical values in the dataset.
Not Suitable for Skewed Data: In skewed distributions, the mean may not accurately
represent the central value.
Data Distribution Matters: The mean may not be appropriate for datasets with non-
normally distributed data.
Let's say you have a dataset of ages grouped into intervals. Here's an example of how to
calculate the mean for this grouped data:
0-10 6
11-20 12
21-30 8
31-40 5
41-50 4
Teacher’s Signature-
By – Sumit Kumar
Steps for Calculating mean
Step 1: Calculate the midpoint of each interval. The midpoint is the average of the lower and
upper bounds of the interval.
Step 2: Multiply the frequency of each group by its midpoint and calculate the sum.
(6 * 5) + (12 * 15.5) + (8 * 25.5) + (5 * 35.5) + (4 * 45.5) = 30 + 186 + 204 + 177.5 + 182 = 779
6 + 12 + 8 + 5 + 4 = 35
∑
̅
∑
Teacher’s Signature-
By – Sumit Kumar
EXPERIMENT 2
2. Definition of Median:
The median is another measure of central tendency in statistics. It is the middle value of a
dataset when the values are arranged in ascending or descending order. If there is an even
number of values, the median is the average of the two middle values. The median is often
used to represent the central value that separates the higher half from the lower half of a
dataset.
2.1 Merits and Demerits of Median
( )
( ) [ ]
Robust to Outliers: The median is not influenced by extreme values or outliers, making
it a robust measure of central tendency.
Appropriate for Skewed Data: It is suitable for datasets with skewed or non-normally
distributed data, as it focuses on the middle value.
Easy to Understand: Like the mean, the median is relatively easy to understand and
calculate.
Complex for Grouped Data: Calculating the median for grouped data can be more
complex than for ungrouped data.
Less Sensitive to All Data Points: The median does not consider all values in the
dataset, so it may not provide a complete picture of the data distribution.
May Not Be Unique: In some cases, there might be multiple values that could be
considered as the median, especially for discrete data.
Teacher’s Signature-
By – Sumit Kumar
0-10 6
11-20 12
21-30 8
31-40 5
41-50 4
Step 1: Calculate the cumulative frequency. This is the sum of frequencies up to a particular
group.
0-10 6 6
11-20 12 6 + 12 = 18
21-30 8 18 + 8 = 26
31-40 5 26 + 5 = 31
41-50 4 31 + 4 = 35
Teacher’s Signature-
By – Sumit Kumar
Step 2: Find the group in which the median lies. In this case, the median will lie in the group
where the cumulative frequency is greater than or equal to half of the total frequency (i.e.,
35 / 2 = 17.5).
So, the median lies in the group 21-30 with a cumulative frequency of 26.
CF: Cumulative frequency of the group before the median group (18)
Teacher’s Signature-
By – Sumit Kumar
EXPERIMENT 3
3.1 Definition of Mode:
The mode is a measure of central tendency that represents the value(s) in a dataset that
occur with the highest frequency. In other words, the mode is the value that appears most
frequently in a dataset. Unlike the mean and median, which aim to find a central value, the
mode identifies the most common value(s).
Useful for Categorical Data: The mode is particularly useful for categorical or discrete
data where you can easily count the frequency of each category.
May Not Represent Central Tendency : In cases where multiple values have the same
highest frequency, there may be multiple modes, making it less representative of
central tendency.
Not Sensitive to All Data Points : Like the median, the mode doesn't consider all values
in the dataset, which can be a limitation.
Teacher’s Signature-
By – Sumit Kumar
Example of Mode Calculation for Grouped Data:
Let's consider a different set of grouped data, where you have data on the number of pets
owned by households:
0 12
1 20
2 18
3 10
4 5
Step 1: Identify the group(s) with the highest frequency. In this case, the group with the
highest frequency is "1 pet" (with a frequency of 20).
Step 2: To calculate the mode more precisely, you can use interpolation. The mode for
grouped data is often the midpoint of the group with the highest frequency.
So, the mode for this grouped data is 1, meaning that "1 pet" is the mode,
and it's the most common number of pets in these households.
Teacher’s Signature-
By – Sumit Kumar
EXPERIMENT 4
4.1 Definition of Harmonic Mean:
The harmonic mean is a measure of central tendency that is calculated by taking the
reciprocal of the arithmetic mean of the reciprocals of a set of values. In simple terms, it is
the reciprocal of the average of the reciprocals of the data points. The formula for
calculating the harmonic mean for a dataset of "n" values is:
∑
∑
Balances Out Extreme Values: It gives less weight to large values, which can be
advantageous in certain situations, as it prevents extreme values from dominating the
result.
Not suitable for All Data Types: It is not suitable for all types of data, especially when
there are zero values, as the reciprocal of zero is undefined.
Not as Intuitive as the Arithmetic Mean: The harmonic mean is not as intuitive as the
arithmetic mean, and its interpretation can be less straightforward.
Teacher’s Signature-
By – Sumit Kumar
Step 1: Calculate the reciprocals of the values:
Step 3: Calculate the harmonic mean by taking the reciprocal of the arithmetic mean of the
reciprocals:
Teacher’s Signature-
By – Sumit Kumar
EXPERIMENT 5
5.1 Definition of Geometric Mean:
The geometric mean is a measure of central tendency that is calculated by taking the "nth"
root of the product of "n" values. In other words, it is the "n"th root of the multiplication of
the values. The formula for calculating the geometric mean for a dataset of "n" values is:
Balances Out Extremes: It reduces the impact of extreme values, making it less
sensitive to outliers and skewed data.
Statistical Properties: The geometric mean has important statistical properties and is
commonly used in various fields like finance, biology, and economics.
Not suitable for Negative Values: The geometric mean cannot be calculated for
datasets that contain negative values or values that include zero because the product
of negative numbers can lead to complex or imaginary results.
Less intuitive: It may not be as intuitive as the arithmetic mean, making its
interpretation less straightforward for some individuals.
Not suitable for Additive Data: The geometric mean is not appropriate for data where
values represent additive changes, like income or distance, as it is designed for
multiplicative relationships.
Teacher’s Signature-
By – Sumit Kumar
Let's say you have a dataset of three values representing investment returns over three
years: 5%, 8%, and 12%.
Step 2: Calculate the geometric mean by taking the cube root of the product:
So, the geometric mean of the investment returns (5%, 8%, 12%) is approximately 1.078 or
7.8%. This represents the average annual return over the three-year period, taking into
account the multiplicative nature of returns.
Teacher’s Signature-