Professional Documents
Culture Documents
BS Unit 2
BS Unit 2
Syllabus:
Measures of Central tendency – A.M, Median, Quartiles & Mode (without Grouping), G.M, H.M.
Summarisation of data is a necessary function of any statistical analysis. As a first step in this direction, the
huge mass of unwieldy data are summarized in the form of tables and frequency distributions. In order to
bring the characteristics of the data into sharp focus, these tables and frequency distributions need to be
summarized further. A measure of central tendency or an average is very essential and an important
summary measure in any statistical analysis. An average is a single value which can be taken as a
representative of the whole distribution.
“An average is an attempt to find one single figure to describe the whole of figures”
“A measure of central tendency is a typical value around which other figures congregate”.
Functions of an Average
1. To present huge mass of data in a summarized form. A measure of average is used to summarize
a large body of numerical figures into a single figure which makes it easier to understand and
remember.
2. To facilitate comparison.Different sets of data can be compared by comparing their averages.
3. To help in decision making.Most of the decisions to be taken in research, planning etc. are based
on the average values of certain variables.
According to Prof. Yule anideal measure of average must possess the following characteristics:
1. It should be rigidly defined, preferably by an algebraic formula, so that different persons obtain the
same value for a given set of data.
2. It should be easy to compute and understand.
3. It should be based on all observations. Thus, in the computation of an ideal average the entire set
of data at our disposal should be used and there should not be any loss of information resulting
from not using the available data. Obviously if the whole data is not used in computing the average,
it will be unrepresentative of the distribution.
4. It should be suitable for further mathematical treatment. In other words, the average should possess
some important and interesting mathematical properties so that its use in further statistical theory
is enhanced.
5. It should be affected as little as possible by fluctuations in sampling.By this we mean that if we
take independent random samples of the same size from a given population and compute the
average for each of these samples then, for an ideal average, the values so obtained from different
samples should not vary much from one another. The difference in the values of the average for
different samples is attributed to the so-called fluctuations of sampling. This property is also
explained by saying that an ideal average should possess sampling stability.
6. It should not be affected much by extreme observations. By extreme observations we mean very
small or very large observations. Thus a few very small or very large observations should not
unduly affect the value of a good average.
Various measures of average can be classified into the following three categories:
a) Mathematical Averages
i) Arithmetic Mean
ii) Geometric Mean
iii) Harmonic mean
iv) Quadratic Mean
b) Positional Averages
i) Median
ii) Mode
c) Commercial Averages
i) Moving Average
ii) Progressive Average
iii) Composite Average
Out of these, A.M, Median, Mode, G.M & H.M are commonly used in practice.
I. Arithmetic Mean
Arithmetic Mean of a given set of observations is their sum divided by the number of observations.
Solution:
Here X is calculated using an Assumed Mean; taking deviations from it, the following formula is used.
Here each frequency is multiplied by the variable, taking the total and dividing total by total number of
frequencies, we get X.
Symbolically,
X = ∑fx/N
Where f = frequency,
Solution:
(ii) Short Cut Method:
Here Assumed Mean is taken and taking deviations of variable from it. We obtain X by using the following
formula.
dx = (X-A);
(Note :-This formula is often used when the variables are large in size or infractions and direct formula is
not easy to use.)
Solution:
Important:
But this formula cannot be applied to every data. For example if in the given example, values of X are 4, 7,
12, 17, 19 ; the common factor cannot be procured in this case. Hence problem in such a case will be solved
by Direct or Short Cut method.
Continuous series means where frequencies are given along with the value of the variable in the form of
class intervals. For example.
Here:
(iii) In 20-30, 30-40…. etc. 20 is the lower and 30 the upper limit of 20-30 class interval.
(iv) Adding both the limits and taking their average, we get midpoint of the class interval. The mid-value
of 20-30 is ; 20+30/2 = 25.
It is often denoted by m or X.
When we take mid points of class Intervals, it can be denoted by X, m or M X can be found by three methods.
𝑿−𝑨
Important: i is the magnitude of the class intervals. 𝒅′ 𝒙 = 𝒊
Other Special Cases of Continuous Series:
Series such as in the last example i.e. 10—20, 20—30, 30—40……….. is known as Exclusive Series.
For other types of continuous series as discussed below, all the series are first converted into exclusive
series and then preceded for the solution as above.
Important: It is regarded essential to convert all other types of series into exclusive type ; Otherwise we will
proceed to a wrong result.
Properties of A.M
1. The sum of the deviations, of all the values of x, from their arithmetic mean, is zero.
Justification :
Since is a constant,
2. The product of the arithmetic mean and the number of items gives the total of all items.
Justification :
or
3. If and are the arithmetic mean of two samples of sizes n1 and n2 respectively then, the
arithmetic mean of the distribution combining the two can be calculated as
4. Wrong Observations: for correcting incorrect value of mean, first we find the corrected ∑X or ∑fX
( in case of discrete or continuous frequency distribution). For this we have to subtract the wrong
items from the incorrect ∑X or ∑fX and add the correct observations to it. Finally on dividing the
corrected ∑X or ∑fX by N we obtain the correct mean.
(A) Merits:
1. It can be easily calculated; and can be easily understood. It is the reason that it is the most used measure
of central tendency.
3. As the mathematical formula is rigid one, therefore the result remains the same.
4. Fluctuations are minimum for this measure of central tendency when repeated samples are taken from
one and the same population.
5. It can further be subjected to algebraic treatment unlike other measures i.e. mode and median.
6. A.M. has also a plus point being a calculated quantity and is not based on position of terms in a series.
2. A single item can bring big change in the result. For example if there are three terms 4, 7, 10 ; X is 7 in
this case. If we add a new term 95, the new X is 4+7+10+95/4 = 116/4 = 29. This is a big change as
compared to the size of first three terms’ AM.
3. Its value will be effective only if the frequency is normally distributed. Otherwise in case skewness is
more, the results become ineffective.
4. In case of open end class intervals we have to assume the limits of such intervals and a little variation in
X can take place. Such is not the case with median and mode, and there is no use of the open end intervals
in its calculations.
5. Qualitative forms such as Cleverness, Riches etc. cannot give X as data can’t be expressed numerically.
In case of simple arithmetic mean, we give equal importance to all the observations, but in practice we
might come across situations where the relative importance of all the items of distribution is not same. In
such cases proper weightage is to be given to various items – the weights attached to each item being
proportional to the importance of the item in the distribution.
We have to provide different weights according to their importance and the mean calculated so is known
as Weighted Arithmetic Mean.
MEDIAN
Median is a value which divides the series into two equal parts. It is position which is exactly in the centre,
equal number of terms lie on either side of it, when terms are arranged in ascending or descending order.
Definition:
“The median is that value of the variable which divides the group into two equal parts, one part comprising
all values greater and the other all values less than median”.
“Median of a series is the value of the item actual or estimated when a series is arranged in order of
magnitude which divides the distribution into two parts.” —Horace Secrist
Determination of Median
A) Individual Series:
To find the value of Median, in this case, the terms are arranged in ascending or descending order first; and
then the middle term taken is called Median.
i) the median is found by taking the ((N+1)/2)th element if there are an odd number of elements.
ii) If there are an even number of elements, then the median is an average of the (N/2)th and (N/2
+ 1)th element.
Md = 19
Example 2. From the following figures of ages of some students, calculate the median age:
After arranging the terms, take cumulative frequencies, then we take (N+1/2) and calculate median.
Steps to Calculate:
(1) Arrange the data in ascending or descending order.
(3) Find the value of the middle item by using the formula
(4) Find that total in the cumulative frequency column which is equal (N + 1/2)th or nearer to that value.
(5) Locate the value of the variable corresponding to that cumulative frequency This is the value of Median.
Example: Locate the median of the following frequency distribution:
Variable (X) : 10 11 12 13 14 15 16
Frequency (f) : 8 15 25 20 12 10 5
Solution
X : 10 11 12 13 14 15 16
f : 8 15 25 20 12 10 5
c.f. : 8 23 48 68 80 90 95
95+1
Here, N = 95, which is odd. Thus, median is the size of ( )th term = 48th observation.
2
Md = 12
Alternative Method
𝑁 95
= = 47.5th term.
2 2
In this case, less than cumulative frequencies is taken and then the value from the class-interval in which
(N/2)th term lies is taken using the interpolation formula.
ℎ 𝑁
Median = 𝑙 + ( - C)
𝑓 2
Where, l is the lower limit of the median class
f is the frequency of the median class
h is the magnitude or width of the median class,
N = ∑f, is the total frequency,
C is the cumulative frequency of the class preceding the median class.
Remarks:
1. The distribution of the variable under consideration is continuous with exclusive type classes
without any gaps.
2. There is an orderly and even distribution of observations within each class.
Merits
• Even if the value of extreme item is much different from other values, it is not much affected by
these values e.g. median in case of 4, 7, 12, 18, 19 is 12 and if we add two values equal to 450
10000, new median is 18.
• It can also be used for the quantities those can’t give A.M; as is in case of intelligence etc. It is
possible to arrange in any order and to locate the middle value. For such cases it is the best measure.
• Median is also used for other statistical devices such as Mean Deviation and skewness.
Demerits or Limitations
• Even if the value of extreme items is too large, it does not affect too much, but due to this reason,
sometimes median does not remain the representative of the series.
• Median cannot be used for further algebraic treatment. Unlike mean we can neither find total of
terms as in case of A.M. nor median of some groups when combined.
• In a continuous series it has to be interpolated. We can find its true-value only if the frequencies
are uniformly spread over the whole class interval in which median lies.
• If the number of series is even, we can only make its estimate; as the A.M. of two middle terms is
taken as Median.
QUARTILES
The values which divide the given data into four equal parts are known as quartiles. Obviously there will
be three such points Q1, Q2, and Q3 such that Q1≤Q2≤Q3, termed as the quartiles. Q1, known as the lower
or first quartile is the value which has 25% of the items of the distribution below it and consequently 75%
of the items are greater than it. Incidentally Q2, the second quartile, coincides with the median and has an
equal number of observations above it and below it. Q3, known as the upper or third quartile, has 75% of
the observations below it and consequently 25% of the observations above it.
Determination of Quartiles
The working principle for computing the quartiles is basically the same as that of computing the median.
Similarly to compute Q3, see the less than c.f., just greater than 3N/4. The corresponding value of X gives
Q3. In case of continuous frequency distribution, the corresponding class contains Q 3 and the value of Q3 is
given by the formula:
ℎ 3𝑁
𝑄3 = 𝑙 + ( − 𝐶)
𝑓 4
Where, l is the lower limit, f is the frequency and h is the magnitude of the class containing Q3.
C is the cumulative frequency of the class preceding the class containing Q3
The various partition values viz., quartiles, deciles and percentiles can be easily located graphically with
the help of a curve called the cumulative frequency curve or ogive. The procedure involves the following
steps.
Steps: 1. Represent the given distribution in the form of a less than cumulative frequency distribution.
2. Take the values of the variable (in the case of frequency distribution) and the class intervals (in the case
of continuous frequency distribution) along the X axis and the cumulative frequency along the vertical axis.
3. Plot the c.f against the corresponding value of the variable (in the case of frequency distribution) and
against the upper limit of the corresponding class (in the case of continuous frequency distribution).
4. The smooth curve obtained by joining the points so obtained by means of free-hand drawing is called
‘less than’ ogive or less than c.f. curve
In this case we form the more than cumulative frequency distribution and plot it against the corresponding
value of the variable or against the lower limit of the corresponding class (in case of continuous frequency
distribution). The curve obtained on joining the points so obtained by smooth free-hand drawing is called
more than cumulative frequency curve or more than ogive.
Remark. If we draw a perpendicular from the point of intersection of the two ogives on the x-axis, the foot
of the perpendicular gives the value of the median.
410-419 409.5-419.5 14 14
420-429 419.5-429.5 20 34
430-439 429.5-439.5 42 76
The median value of a series may be determined through the graphic presentation of data in the form of
Ogives. This can be done in 2 ways.
1. Presenting the data graphically in the form of 'less than' ogive or 'more than' ogive.
2. Presenting the data graphically and simultaneously in the form of 'less than' and 'more than' ogives. The
two ogives are drawn together.
1. Convert the series into a 'less than ' cumulative frequency distribution as shown above.
2. Let N be the total number of students whose data is given. N will also be the cumulative frequency of
the last interval. Find the (N/2)th item(student) and mark it on the y-axis. In this case the (N/2)th item
(student) is 200/2 = 100th student.
3. Draw a perpendicular from 100 to the right to cut the Ogive curve at point A.
4. From point A where the Ogive curve is cut, draw a perpendicular on the x-axis. The point at which it
touches the x-axis will be the median value of the series as shown in the graph.
1. Convert the series into a 'more than ' cumulative frequency distribution as shown above.
2. Let N be the total number of students who's data is given. N will also be the cumulative frequency of the
last interval. Find the (N/2)th item(student) and mark it on the y-axis. In this case the (N/2)th item (student)
is 200/2 = 100th student.
3. Draw a perpendicular from 100 to the right to cut the Ogive curve at point A.
4. From point A where the Ogive curve is cut, draw a perpendicular on the x-axis. The point at which it
touches the x-axis will be the median value of the series as shown in the graph.
Another way of graphical determination of median is through simultaneous graphic presentation of both
the less than and more than Ogives.
1. Mark the point A where the Ogive curves cut each other.
2. Draw a perpendicular from A on the x-axis. The corresponding value on the x-axis would be the median
value.
.
MODE
Mode is the value which occurs most frequently in a set of observations and around which the other items
of the set cluster densely. In other words, mode is the value of a series which is predominant in it. In the
words of Croxton and Cowden, “the mode of a distribution is the value at the point around which the items
tend to be most heavily concentrated. It may be regarded as the most typical of a series of values.”
The concept of mode, as a measure of central tendency, is preferable to mean and median when it is desired
to know the most typical value, e.g., the most common size of shoes, the most common size of ready- made
garment, the most common size of pocket expenditure of a college student etc.
Determination of mode
a) When data are either in the form of individual observations or in the form of ungrouped frequency
distribution.
Given individual observations, these are first transformed into an ungrouped frequency distribution. The
mode of an ungrouped frequency distribution can be determined in two ways:
i. By inspection
ii. By method of grouping
i. By inspection
when a frequency distribution is fairly regular, then mode if often determined by inspection. It is that value
of the variate for which the frequency is the maximum. By a fairly regular frequency distribution we mean
that as the values of the variable increase the corresponding frequencies of these values first increase in a
gradual manner and reach a peak at a certain value and finally start declining gradually in, approximately,
the same manner as in case of increase.
Example
3, 4, 5, 10, 15, 3, 6, 7, 9, 12, 10, 16, 18, 20, 10, 9, 8, 19, 11, 14, 10, 13, 17, 9, 11
Solution
X : 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
f : 2 1 1 1 1 1 3 4 2 1 1 1 1 1 1 1 1 1
therefore mode = 10
Remarks: 1. if the frequency of each possible value of the variable is the same then there is no mode.
2. If there are two values having maximum frequency, then distribution is said to be bi-modal.
In the case of continuous frequency distribution, the class corresponding to the maximum frequency is
called the modal class and the value of mode is obtained by the interpolation formula:
ℎ(𝑓1−𝑓0) ℎ(𝑓1−𝑓0)
Mode = 𝑙 + (𝑓1−𝑓0)−(𝑓2−𝑓1) = 𝑙 + 2𝑓1−𝑓0−𝑓2
Remarks: The above formula for computing mode is based on the following assumptions:
1. the frequency distribution must be continuous with exclusive type classes without any gaps. If the
data are not given in exclusive form, they must be first converted into exclusive class intervals.
2. the class intervals must be uniform throughout i.e., the width of all the class intervals must be the
same. In case of distribution with unequal class intervals, they should be made equal under the
assumption that the frequencies are uniformly distributed over all the classes, otherwise the value
of mode computed will give misleading results.
Example
The frequency distribution of marks obtained by 60 students of a class in a college is given below:
Marks: 30 – 34 35 -39 40 – 44 45 – 49 50 – 54 55 – 59 60 – 64
Frequency: 3 5 12 18 14 6 2
Solution
Marks Frequency
29.5 – 34.5 3
34.5 – 39.5 5
39.5 – 44.5 12
44.5 – 49.5 18
49.5 – 54.5 14
54.5 – 59.5 6
59.5 – 64.5 2
Mode can be located graphically from the histogram of frequency distribution by making use of rectangles
erected on the modal, pre-modal and post-modal classes. The method involves the following steps:
i. join the top right corner of the rectangle erected on the modal class with top right corner of the
rectangle erected on the preceding class by means of a straight line.
ii. Join the top left corner of the rectangle erected on the modal class with the top left corner of
the rectangle erected on the succeeding class by a straight line.
iii. From the point of intersection of the lines in steps (i) and (ii) above, draw a perpendicular to
the X-axis. The X- coordinate or the abscissa of the point where the perpendicular meets the X
axis gives the modal value.
Example
(x) (f)
10-19 9.5-19.5 10
20-29 19.5-29.5 12
30-39 29.5-39.5 18
40-49 39.5-49.5 30
50-59 49.5-59.5 16
60-69 59.5-69.5 6
70-79 69.5-79.5 8
1. Represent the given data in the form of a Histogram. The height of the rectangles in the histogram
is marked by the frequencies of the class interval as shown in the graph .Identify the highest
rectangle. This corresponds to the modal class of the series.
2. Join the top corners of the modal rectangle with the immediately next corners of the adjacent
rectangles. The two lines must be cutting each other.This might be difficult to visualise so look at
the graph given below.
3. Let the point where the joining lines cut each other be 'A'. Draw a perpendicular line from point A
onto the x-axis. The point 'P' where the perpendicular will meet the x-axis will give the mode.
The Histogram
In this case the value of point P turns out to be 44.12
1. Mode is the term that occur most in the series hence it is not an isolated value like Median nor it is value
like mean that may not be there in the series.
4. For open end intervals it is not necessary to know the length of open intervals.
6. With only just a single glance on data we can find its value. It is simplest.
7. It is the most used average in day today life, such as average marks of a class, average number of students
in a section, average size of shoes, etc.
2. Mode is based only on concentrated values; other values are not taken into account in-spite of their big
difference with the mode. In continuous series only the lengths of class intervals are considered.
3. Mode is most affected by fluctuation of sampling.
4. Mode is not so rigidly defined. Solving the problem by different methods we won’t get the same results
as in case of mean.
5. It is not capable of further algebraic treatment. It is impossible to find the combined mode of some series
as is in case of Mean
6. Also we can’t find the total of whole series from value of mode as is in case of Mean.
7. If the number of terms is too large, only then we can call it as the representative value.
8. It is also said that sometimes mode is ill-defined, ill- definite and indeterminate.
In case of a normal or a symmetrical distribution mean=median=mode. When the frequencies are not
properly distributed it is called as an asymmetrical or skewed distribution. If it is moderately asymmetrical
distribution the following empirical relationship holds good.
GEOMETRIC MEAN
The geometric mean, usually abbreviated as G.M of a set of n observations is the nth root of their product.
Thus, if X1, X2, X3, …….., Xn are the given n observations then their G.M is given by
𝑛
G.M = √𝑋1. 𝑋2. 𝑋3 … … . . 𝑋𝑛 =(𝑋1. 𝑋2. 𝑋3 … . 𝑋𝑛)1/𝑛 …….(1)
If n is 2, then G.M can be computed by taking the square root of their product.
But if n, the number of observations is greater than 2, then the computation of the nth root is very tedious.
In such a case the calculations are facilitated by making use of the logarithms. Taking logarithm on both
sides of (1), we get
1 1
Log G.M = (log X1 + logX2 + log X3+ ……+ log Xn) = ∑log X ………(2)
𝑛 𝑛
In case of frequency distribution (Xi,fi); i= 1,2,…….n, where the total number of observations is N = ∑f.
1
G.M = Antilog[𝑁 ∑ 𝑓log 𝑋]
In the case of grouped or continuous frequency distributions, the values of X are the mid-values of the
corresponding classes.
The geometric mean, like the arithmetic mean, has a number of advantages and disadvantages.
(c) It is much convenient to calculate required averages of ratios, rates, and percentages with the aid of GM.
(d) It is not affected by the exceptional and extremely large or small values of a variable.
(e) It gives the highest weight for the lowest observation and the lowest weight for the highest observation
and thereby balances the entire procedure to get the best result.
(g) It helps in the calculation for determining rates of exchange among the currencies of various countries.
a. It is very difficult to calculate when the data is given in the fashion of a grouped frequency distribution
having large frequencies in enough numbers.
c. The result finally obtained may not be equal to any of the observations given in the series.
e. In some cases it cannot play the role as the true representative of an average.
f. It usually brings out the property of the ratio of changes and not the differences of change.
Computation of GM:
Example 1:
Find the GM of the observations 12, 18, 48 and 61 of a variable having their frequencies 5, 3, 2 and 8
respectively
Solution:
Let us prepare the data in the form of a table so as to calculate
GM.
HARMONIC MEAN
The harmonic mean of a set of observations on a variable is defined as the reciprocal of the arithmetic
average of the reciprocal of the given observations (any of the observations must not be zero).
If the variable noted is X which takes n- number of values as x1, x2, x3, … xn and their reciprocals are:
For the observations having their respective frequencies, the weighted HM can be computed as:
It is a special kind of average used in some selected situations.
(a) If the given values of a variable are all equal (but ≠ 0) then their harmonic mean will be equal to their
common value.
Here, n is the total number of observations of the variable and c is the common value.
(b) If a variable y is related to another variable X in the form y = ax, then the harmonic mean of y is related
to that of x in the similar form:
(c) If n1 and n2 are two sets of values of a variable x and their respective harmonic means are H 1 and H2,
then the harmonic mean of the combined set (H) is given by:
Example 1:
Solution:
Determine the weighted HM for the observations of the variable X from the following:
Like all the devices of central tendency mentioned earlier, harmonic mean also has a number of merits and
demerits.
These are:
(b) It is calculated on the basis of all the information available on the variable.
(h) As it measures relative changes in the given observations of a variable, it becomes perfectly useful for
finding out averages of certain ratios and rates.
(a) The result usually found has no existence in the given series of observations on the variable.
(c) It is much restrictive in the sense that it cannot be calculated if any of the observations is zero.
Let us consider the simplest example on a variable X having only two observations x1 and x2 (e.g., the two
sides of a coin).
The same analysis can be extended for any number of observations of a variable and the same result can
easily be established.
But, for any two different numbers the relation turns into:
AM x HM = (GM)2
All the averages will become equal with each other when the variable assumes identical observations.