Professional Documents
Culture Documents
Statistics Made Easy Volume 1 Descriptive Statistics by Pritish Ranjan Gayali
Statistics Made Easy Volume 1 Descriptive Statistics by Pritish Ranjan Gayali
P.R. Gayali
ISBN: 978-93-5636-587-2
Year of Publication: 2022
Country of Publication: India
Published By: Pritish Ranjan Gayali
Dedicated To
www.gayali.in
CONTENTS
www.gayali.in
[6] MOMENTS, SKEWNESS & KURTOSIS...............................................125–148
[10] INTERPOLATION................................................................................233–262
PREFACE
Distinguishing Features:
www.gayali.in
Statistics Made Easy | 1
What is Statistics
Statistics, as a plural noun, is used to mean numerical data arising in any
sphere of human experience. To be precise, numerical data which arise from a host of
uncontrolled and mostly unknown causes acting together. It is in this sense that the
term used when our daily newspapers give vital statistics, crime statistics or statistics of
rainfall, statistics of temperature, statistics of accidents, etc. Used as singular, statistics
is a name for the body of scientific methods which are meant for the collection, analysis
and interpretation of numerical data.
Primary and Secondary data
The data may be of two broad types :– primary and secondary. The ordinary user
of economic and social statistics will find that the data have been already collected by
some other agency – government or private. These may exist either in a published or an
unpublished form. His job will then be simply to have access to the source and get hold
of the data. Such data will be called secondary data. Government departments collect
data on diverse topics that touch the life of the people as a matter of routine and as an
www.gayali.in
essential basis of administration. Private agencies like banks and industrial concerns
regularly compile figures on their assets and liabilities, number of employees, income of
employees, etc. The enquirer may get his material readymade from such agencies or he
may get the data in rough form and adapt them to his needs. In some cases, the enquirer
will find that the relevant data have been collected by some research organization as
part of an investigation similar to his own.
In making use of secondary data, the enquirer has to be practically careful about
the nature of the data and their coverage, the definition on which they are based and
their degree reliability. May be he will find that the available data are more extensive
than is required for the purpose of his enquiry. In such a case, he will naturally discard
the part of the data that is redundant. Sometimes he may as well find that the available
information is inadequate for the purpose of his enquiry. He will then have to decide
whether to collect his own data, either to base his enquiry solely on them or to plug
lacunae in the secondary data.
Data collected primarily for the purpose of the given enquiry are called primary
data. These are collected by the enquirer, either on his own or through some agency set
up for this purpose, directly from the field of enquiry. It goes without saying that this
type of data may be used with greater confidence, because the enquirer will himself
decide upon the coverage of the data and the definitions to be used and, as such, will
have a measure of control on the reliability of the data.
www.gayali.in
www.gayali.in
Statistics Made Easy | 2
www.gayali.in
disadvantages of the methods are – the low degree of reliability of collected data and a
large number of non-respondents.
“Schedules sent through investigators” is the most widely used method of
collection of primary data. Here, paid investigators are employed for data collection.
The investigators carry with them printed “schedules” specially designed for the
purpose, interview people concerned, and fill up the schedules on the spot, based
on answers received from the informant. The method is very popular and yields
satisfactory results. Most of the accuracy of the collected data however depends on
the ability and tactfulness of investigators, who are given special training as to how
they should elicit the correct information through friendly discussions. The method is
adopted during the decennial census of population in this country.
Classification of Data
Classification is the process of arranging data collected under different
categories.
Types of classification
Broadly, there are four types of classifications:-
1. On qualitative basis – Classification of the total population according to sex,
religion, occupation, etc.
www.gayali.in
www.gayali.in
Statistics Made Easy | 3
Tabulation
Tabulation may be defined as the logical and systematic organization of
statistical data in rows and columns, designed to simplify the presentation and facilitate
comparison.
Different parts of a table
(i) Title – This is a brief description of the contents and is shown at the top of the
table.
(ii) Stub – The extreme left part of the table where descriptions of rows are shown is
called stub.
(iii) Caption and Box-head – The upper part of the table which shows the description
of columns and sub-coulmns is called caption. The whole upper part including caption,
unit of measurement and columns numbers, if any, is called the box-head.
(iv) Body – It is that part of the table which shows the figures.
Table 1.1 – Different parts of a table
<--------------------------------------- TITLE ------------------------------------->
CAPTION
}
www.gayali.in
(1) (2) (3) (4) (5) (6) BOX-Head
S
T <--------------------- BODY -------------------->
U
B
Source :..............
Footnote :..........
(v) Footnote – This is the part below the Body where the source of data and
explanation are shown.
Problem 1 – Draw up a blank table in which could be shown the number of
persons employed in six industries on two different dates distinguishing males from
females and among the latter, singles, married and widows.
[I.CW.A. Jan'1973]
Solution :
Table 1.2 : Number of persons employed in six industries
As on 01.01.2016 As on 01.01.2017
Industry Female Female
Male Male
Single Married Widow Single Married Widow
(1) (2) (3) (4) (5) (6) (7) (8) (9)
www.gayali.in
A
B
C
D
E
F
Source : Industrial Statistics
Footnote : Data is in Lakhs
www.gayali.in
Statistics Made Easy | 4
Problem 3 – Draw up a blank table showing the Exports and Imports during the
www.gayali.in
years 1960, 1961, 1962, 1963 and 1964 relating to the ports Bombay, Calcutta, Madras
and other ports. The table should provide for the values and the balance of trade and
the totals for each year.
[C. A. ‘63]
Solution :
Table 1.4 : Value of Exports and Imports and balance of trade during 1960 to
1964 for Bombay, Calcutta, Madras and other Ports
Value in crores INR
Items
1960 1961 1962 1963 1964
(1) (2) (3) (4) (5) (6)
Exports From
Mumbai
Kolkata
Chennai
Others
Total of Exports (A)
Imports From
www.gayali.in
Mumbai
Kolkata
Chennai
Others
Total of Imports (B)
Balance of Trade (A –B)
www.gayali.in
Statistics Made Easy | 5
Problem 4 – You are given data on exports (both quantity and value) of Indian
jute to U.K., U.S.A., Russia, Japan and Canada for 5 consecutive years. Suggest a suitable
tabular representation by drawing a blank table.
Solution –
Table 1.5 : Exports of Indian Jute to Different countries during 1990 to 1994
1990 1991 1992 1993 1994
Items
Quantity Value Quantity Value Quantity Value Quantity Value Quantity Value
Exports to
U.K.
U.S.A
Russia
Japan
Canada
Total of Exports (A)
www.gayali.in
CHARTS AND DIAGRAMS
www.gayali.in
Statistics Made Easy | 6
presentation of data
In the usual type of graph papers, all rulings are shown equal apart both
horizontally and vertically. These are known as natural scale or arithmetic scale graph
papers. There is a special type of graph paper in which the distances of rulings from
the initial line are proportional to the logarithms of numbers, and hence the distances
between consecutive rulings are not equal. Such a scale is known as logarithmic scale
or ratio scale.
The natural scale is used for showing absolute amount of change.
Example:
Table 2.1 : Cheque clearance (crores of Rs)
Month
Year
Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
1958 832 765 873 792 791 663 834 754 806 799 773 887
1959 894 828 946 923 849 – – – – – – –
In Figure 1.5, an amount of 50 (Rs. Crores) increase either from 650 to 700 or
from 900 to 950 is represented by
www.gayali.in
Fig. 1.5 : Line diagram showing cheque clearance
1000
950
900
850
800
750
700
www.gayali.in
650
0
Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec Jan Feb Mar Apr May
the same distance in the vertical direction.
www.gayali.in
Statistics Made Easy | 7
www.gayali.in
since the ratio scale shows proportionate changes, there is no zero point. The base line
must show either 1 or 10 or 100 or 1000 etc.
Some graph papers have a ratio scale in the vertical direction, but natural scale
in the horizontal direction. These are known as semi-logarithmic graph papers.
Semi-logarithmic Graph
Semi-logarithmic graph or Ratio Chart is a line diagram drawn on a special
type of graph paper which shows the natural scale in the horizontal direction and
the logarithmic or ratio scale in the vertical direction. In the semi-logarithmic graph
paper, the vertical rulings are equispaced, but the horizontal rulings are not, their
distances from the base line being proportional to the logarithms of the numbers
represented. If a semi-logarithmic graph paper is not available, the ratio chart may
be drawn on a natural scale graph paper by plotting the logarithms of values of the
dependent variable y against the corresponding values of the independent variable x.
[2] Bar Diagram
Bar diagram consists of a group of equispaced rectangular bars, one for each
category (or class) of given statistical data. The bars, starting from a common base line,
must be of equal width and their length represent the values of statistical data.
There are two types of bar diagrams – Vertical Bar Diagram and Horizontal Bar
Diagram. Vertical bars are used to represent time series data or data classified by the
www.gayali.in
values of variable. Horizontal bars are used to depict data classified by attributes only.
For each of these types, we have again grouped bar diagram, sub-divided (or
component) bar diagram, paired bar diagram, etc.
[3] Pie Diagram
Pie diagram is a circle whose area is divided proportionately among the different
components by straight lines drawn from the centre to the circumference to the circle.
When statistical data are given for a number of categories, and we are interested in the
www.gayali.in
Statistics Made Easy | 8
comparison of the various categories or between a part and the whole. Such a diagram
is very helpful in effectively displaying the data.
For drawing a pie diagram, it is necessary to express the value of each category
as a percentage of the total. Since the full angle 3600 around the centre of the circle
represents the whole i.e. 100% the perentage figure of each component is multiplied by
3.6 degrees to find the angle of the corresponding sector at the centre of the circle.
[4] Pictogram
Pictogram consists of rows of picture symbols of equal size. Each symbol
represents a definite numerical value. If a fraction of this value occurs, then the
proportionate part of the picture from the Left is shown. Pictograms are used for
representing time series data, one row of pictures for each time period. It may also be
used for displaying statistical data classified by attributes.
[5] Histogram, Frequency Polygon and Ogive
These diagrams are used to depict statistical data given in the form of frequency
distributions.
Exercise–2
[1] Represent the following statistical information graphically :
www.gayali.in
Year 1924 1925 1926 1927 1928 1929 1930
Monthly Average Production 609 522 205 608 551 632 516
[C.U. B.com.(Hons.)'65]
Solution :
Figure – Line Diagram showing Monthly Average Production
650
600
550
Monthly average production
500
450
400
350
www.gayali.in
300
250
200
1924 1925 1926 1927 1928 1929 1930
Year
www.gayali.in
Statistics Made Easy | 9
[2] Plot the following data relating to population of India so as to indicate the
proportionate increase in population from one period to another :–
Year 1872 1881 1891 1901 1911 1921 1931 1941
Population (in millions) 210 250 290 295 315 320 350 390
[C.U. , B.A. (Econ) ‘62]
Solution:
We draw the semi logarithmic graphs on an ordinary (arithmetic scale) graph
paper. For this purpose, the logarithms of population data should be plotted against
the corresponding years.
Table – Calculation of logarithms
www.gayali.in
1891 290 2.4624 2.46
1901 295 2.4698 2.47
1911 315 2.4983 2.50
1921 320 2.5051 2.51
1931 350 2.5441 2.54
1941 390 2.5911 2.59
Figure – Semi-logarithmic graph (or Ratio Chart). (Drawn on ordinary graph paper)
2.6
2.5
2.4
Log y
2.3
www.gayali.in
2.2
2.1
2
1872 1881 1891 1901 1911 1921 1931 1941
Year
www.gayali.in
Statistics Made Easy | 10
[3] Represent the information contained in the following table in a component part chart.
Commodity pattern of India’s exports (percentage)
1956 – ’57 1957 – ‘58 1958 – ’59
Capital goods 0.29 0.31 0.30
Intermediate goods 45.82 46.87 44.19
Consumer goods 50.50 47.32 48.19
Unclassified 3.39 5.50 7.32
Total 100.00 100.00 100.00
[C. U. B. Com (Hons) ‘67]
Solution: Figure – Component bar chart showing Indian Exports during
1956-57, 1957-58, 1958-59
www.gayali.in
[4] The following table shows the values of a variable y corresponding to some given
equidistant values of the independent variable x :–
x 7 8 9 10 11 12
y 132 214 330 486 688 942
Draw a semi-logarithmic chart and find by graphical interpolation the value of
y, when x = 10.5
[ I.C.W.A. ‘71]
Solution :
Table – Calculation of logarithms for Ratio Chart
www.gayali.in
x y Log x Log y
7 132 0.8451 = 0.85 2.1206 = 2.12
8 214 0.9031 = 0.90 2.3304 = 2.33
9 330 0.9542 = 0.95 2.5185 = 2.52
10 486 1.0000 = 1.00 2.6866 = 2.69
11 688 1.0414 = 1.04 2.8376 = 2.84
12 942 1.0792 = 1.08 2.9741 = 2.97
www.gayali.in
Statistics Made Easy | 11
2.9
2.8
2.75
2.7
2.6
log y
2.5
2.4
2.3
2.2
2.1
2
6 7 8 9 x 10 10.5 11 12 x
log x
www.gayali.in
= 562.3
[5] The following table shows the number of bushels of wheat and corn produced in
a farm during the years 1950 to 1960.
Express the yearly number of bushels of wheat and corn as percentages of total
annual production. Graph the percentages by component bar charts.
Year 1950 1951 1952 1953 1954 1955 1956 1957 1958 1959 1960
No. of Bushels of wheat 200 185 225 250 240 195 210 225 250 230 235
No. of bushels of Corn 75 90 100 85 80 100 110 105 95 110 100
[Dip. Soc. Welfare, ’68]
Solution :
Year 1950 1951 1952 1953 1954 1955 1956 1957 1958 1959 1960
No. of Bushels of wheat 200 185 225 250 240 195 210 225 250 230 235
No. of bushels of Corn 75 90 100 85 80 100 110 105 95 110 100
Totals 275 275 325 335 320 295 320 330 345 340 335
Figure – Component bar chart showing no. of bushels of wheat and corn (1950 – 1960)
27.22% 32.73%
Frequency
Year
www.gayali.in
Statistics Made Easy | 12
[6] Of the life insurance policy dividends paid in the United States, 21% were taken
in cash, 1% were used to pay premiums, 18% were used to purchase additional paid-up
life insurance, 30% were left with life insurance companies to earn interest. Construct
a pie diagram showing these different uses of policy dividends.
[C.U. M Com. ‘62]
Solutions:
Table – Calculations for pie chart
Mode of Payment or distribution of Angle (degrees) at the center of
Percent of total Cumulative total
dividend pie chart col. (2) x 3.6
(1) (2) (3) (4)
Cash 21 75.6 75.6
Premiums 31 111.6 187.2
Purchase additional paid up LIC policy 18 64.8 252.0
Left with Life insurance companies 30 108.0 360.0
Total 100 360.0
Fig. Pie diagram showing Life Insurance Policy dividend distributed to different heads.
www.gayali.in
[7] A summary of the estimated receipts and expenditures of Government of India
for a particular year is given below :
Receipts Amount (millions of Expenditure Amount (millions of
rupees) rupees)
Direct Taxes on Income 2076.0 Interest on Public Debt 1143.3
www.gayali.in
www.gayali.in
Statistics Made Easy | 13
www.gayali.in
Total Expenditure 16162.1 100.00
Figure: Sub-divided bar chart showing receipts and expenditures of India for a
particular year.
[8] Use a suitable diagram to represent the following data relating to the Post and
Telegraph Department, Govt. of India
www.gayali.in
www.gayali.in
Statistics Made Easy | 14
Solution :
Net receipts (in lakhs Rs.)
Year
Fig: Bar diagram showing data of Post & Telegraph Department during 1955–56
to 1966–67.
www.gayali.in
[9] The actual outlay on the public sector in the First and Third Five – year plans of
India is shown below by head of development:
Head of Development First Plan outlay (` Cr) Third Plan Outlay (` Cr)
Agricultural & Community
290 1096
Development
Irrigation and Power 583 1927
Industries and Mining 97 1965
Transport and Communications 518 2113
Social Services 412 1422
Miscellaneous 60 85
Total 1960 8608
Draw suitable diagrams to show the relative importance attached to the various
heads in each plan. Hence, make a comparison between the First and Third Plan.
Solution:
Table – Calculations of sub-divided bar Chart
Head of Development % of total outlay: plan-I % of total outlay: plan-III
Agricultural & Community
www.gayali.in
14.8 12.7
Development
Irrigation and Power 29.8 22.5
Industries and Mining 4.9 22.8
Transport and Communications 26.4 24.5
Social Services 21.0 16.5
Miscellaneous 3.1 1.0
Total 100.0 100.0
www.gayali.in
This page has been intentionally hidden to encourage reader
to buy the book and support the author's hard work
Statistics Made Easy | 15
3.1
21.0
26.4
4.9
29.8
14.8
www.gayali.in
Attribute and Variable
The character of statistical information collected from a group of individuals or
Preview
objects, is of two types – quantitative and qualitative. Information about the ages of a
group of men is quantitative, because age is expressed in numbers, say 29 years, 43.5
years, etc. Religion of a group of men is qualitative, because religion cannot be stated
in numerical terms, e.g., either Hindu or Buddhist or Christian, etc. The quantitative
character is technically called variable and the qualitative character is called attribute.
A variable takes different ‘values’ and these values can be measured numerically in
suitable units. An attribute cannot be measured but can only be classified under
different heads or categories.
Discrete and continuous variables
When we pass on to the study of data regarding quantitative characters, it is
immediately found that these may be of two principal types. In the first place, the
character may take only some isolated values, like the number of letters in a word,
number of petals in a flower, number of members in a family and so forth. Alternatively,
it may be conceivably take any value within its range of variation. The height, weight
or age of a man, the diameter of a bobbin, the temperature, rainfall or humidity in a
region, etc. are variables of this type. Variables of the first type are called discontinuous
www.gayali.in
www.gayali.in
Statistics Made Easy | 16
www.gayali.in
35 – 44 9
45 - 59 6
Total 200
Useful terms associated with grouped frequency distributions:
[a] Class interval or class
[b] Class frequency and Total Frequency
[c] Class Limits – Lower class limit, upper class limit
[d] Class boundaries – Lower class boundary, upper class boundary
[e] Class mark, or Mid – Value, or Mid – point of class interval
[f] Width, or size of class interval
[g] Frequency density
We shall explain these terms with reference to table below
Class Class Class Limits Class Class Width Frequency Relative
Interval Frequency Boundaries Mark of Class Density Frequency
Lower Upper Lower Upper
(1) (2) (3) (4) (5) (6) (7) (8) (9) (10)
20 – 29 2 20 29 19.5 29.5 24.5 10 0.2 0.029
30 – 39 4 30 39 29.5 39.5 34.5 10 0.4 0.057
40 – 49 7 40 49 39.5 49.5 44.5 10 0.7 0.100
www.gayali.in
www.gayali.in
Statistics Made Easy | 17
www.gayali.in
etc.
In the construction of grouped frequency distribution, the class intervals must
therefore be defined by the pairs of numbers such that the upper end of one class does
not coincide with the lower end of the immediately following class. The two numbers
used to specify the limits of a class interval for the purpose of tallying the original
observations in the various classes, are called ‘Class Limits’. The smaller of the pair is
known as Lower class limit and the larger as Upper Class Limit with reference to the
particular class.
[d] Class boundaries
When measurements are taken on a continuous variable, all data are recorded
nearest to a certain unit. Thus, if ages are recorded to the nearest whole number of
years, any age between 19.5 years and 20.5 years is recorded as 20 years. Similarly,
29 years denotes an age between 28.5 years and 29.5 years. Hence, the class interval
20 – 29 actually includes all ages between 19.5 and 29.5 years. These most extreme
values which would ever be included in a class interval are called ‘class boundaries’.
The lower extreme point is called lower class boundary, and the upper extreme point is
called upper class boundary with reference to any particular class. See Table columns
(5)&(6).
www.gayali.in
Class boundaries may be calculated from class limits by applying the following
rule:
Lower class boundary = Lower class limit – ½ d and
Upper class boundary = Upper class limit + ½ d
Where d is the common difference between the upper class limit of any class
interval and the lower class limit of the next class interval. In Table, d = 1.
www.gayali.in
Statistics Made Easy | 18
www.gayali.in
[g] Frequency Density
Frequency density of a class is its frequency per unit of width. It shows the
concentration of frequency in a class and is given by the formula
Class Frequency
Frequency Density = Width of the Class
www.gayali.in
Statistics Made Easy | 19
[4] A table is now prepared showing the class intervals in the first column and the
corresponding class frequencies in the second column. This is required frequency
distribution.
Cumulative Frequency Distribution
Cumulative Frequency corresponding to a specified value of the variable may be
defined as the number of observations smaller than (or greater than) that value.
The number of observations ‘upto’ a given value is called ‘Less – than’ cumulative
frequency; and the number of observations ‘greater than’ a value is called the ‘More-
than’ cumulative frequency. When a grouped frequency distribution relates to a variable
of continuous type, the cumulative frequencies calculated therefrom must be shown
against the class boundary points (i.e. end points of classes). Cumulative frequency
expressed as a percentage of total frequency, is known as cumulative percentage.
A Table showing the cumulative frequencies against values of the variable
systemically arranged in in increasing (or decreasing) order is known as cumulative
frequency distribution. If cumulative percentages are shown, instead of cumulative
frequencies, the table is called Cumulative Percentage Distribution.
www.gayali.in
Relative frequency distribution
Relative Frequency denotes the class frequency expressed as a fraction of the
total frequency.
Class Frequency
Relative Frequency = Total Frequency
The sum of all the relative frequencies is equal to 1. See Table, column (10).
Diagrammatic Representation of Frequency Distribution
The diagrams commonly used to depict statistical data given in the form of a
frequency distribution are:-
[1] Histogram
[2] Frequency Polygon
[3] Ogive (or cumulative frequency polygon)
[1] Histogram
Histogram is the most common form of diagrammatic representation of a
grouped frequency distribution. It consists of a set of adjoining rectangles drawn
www.gayali.in
on a horizontal baseline, with areas proportional to the class frequencies. The width
of rectangles, one for each class, extends over the class boundaries (not class limits)
shown on the horizontal scale. When all classes have equal widths, the heights of
rectangles will be proportional to the class frequencies and it is then customary to take
the heights numerically equal to the class frequencies. If, however, the classes are of
unequal width, the rectangles will also be of unequal width, and therefore the heights
must be proportional to the frequency density. Because then,
www.gayali.in
Statistics Made Easy | 20
www.gayali.in
Ogive is the graphical representation of the cumulative frequency distribution,
and hence is also called cumulative frequency polygon. When cumulative frequencies
are plotted against the corresponding class boundaries and the successive points are
joined by straight lines, the line diagram obtained is known as Ogive or Cumulative
Frequency Polygon. The ogive is of “less-than” or “more-than” type according to
the cumulative frequencies used are of “less-than” or “more-than” type. The “less-
than”ogive starts from the lowest class boundary on the horizontal axis and gradually
rising upward ends at the highest class boundary corresponding to the cumulative
frequency N, i.e., the total frequency. It looks like an elongated letter S. The “more-
than” ogive has the appearance of an elongated S turned upside down. Unequal width
of classes in the frequency distribution, do not cause any difficulty in the construction
of an ogive.
Frequency Curve
If the widths of classes be made smaller and smaller and at the same time the
total frequency be also increased indefinitely, then the histogram and the frequency
polygon will closely approach to a smooth curve known as the frequency curve.
The frequency curve shows the probability distribution of the variable in the
population and its area bounded by the ordinates at two specified points on the
www.gayali.in
horizontal axis represents the probability that a value of the variable lies between those
two limits. Like histogram, the frequency curve is therefore an area diagram.
Generally, there are 4 types of frequency curves –
i) Symmetrical bell – shaped
ii) Asymmetrical single humped
iii) J – Shaped
iv) U – Shaped
www.gayali.in
Statistics Made Easy | 21
www.gayali.in
For most of the distributions met with in practice, the frequency curve is bell
– shaped, and in such cases three important characteristics are immediately apparent
from the frequency curve:
[1] The first characteristics is a measure of central tendency. In particular, the ‘mode’
of the distribution is given by the abscissae of the highest point of the frequency curve.
[2] The second characteristic is a measure of dispersion. In particular, the ‘range’
, i.e., the maximum possible discrepancy between any two values, is given by the
distance between the two points at which the frequency curve meets the horizontal
axis.
[3] The third characteristic is the shape of the frequency curve i.e., whether the
curve is symmetrical or not; and if not, a measure of the degree of ‘Skewness”. A
symmetrical curve indicates that mean, median and mode are equal. For asymmetrical
curves this is not true, the mean being greater or less than the mode according as the
longer tail of the curve lies to the right or to the left.
Solved Problems
[1] Below is given the distribution of heights of a group of 60 students: -
Height (in cm) 145.0–149.9 150.0–154.9 155.0–159.9 160.0–164.9 165.0–169.9 170.0–174.9 175.0–179.9 180.0–184.9
No. of Students 2 5 9 15 16 7 5 1
www.gayali.in
Explain the terms ‘class limits’ and ‘class boundaries’ with reference to this
distribution.
[I.C.W.A., ‘75]
Solution:
Class Limits are 145.0 – 149.9, 150.0 – 154.9, 155.0 – 159.9, etc.
Class boundaries are 144.95 – 149.95, 149.95 – 154.95, 154.95 – 159.95, etc.
www.gayali.in
Statistics Made Easy | 22
www.gayali.in
7 | 1
Total 30
Value 1 2 3 4 5 6 7 Total
Frequency 1 4 12 9 2 1 1 30
[3] The following are the monthly salaries of 20 employees:
(Rs.) 130, 62, 145, 118, 125; 76, 151, 142, 110, 98;
95, 116, 100, 103, 71; 85, 80, 122, 132, 95;
Form a frequency distribution with class intervals Rs. 61 – 80, 81 – 100, 101 –
120, 121 – 140 and 141 – 160.
[C.U., B. Com, ‘74]
Solution:
Table Tally Sheet
Class Limits Tally Marks Frequency
61 – 80 |||| 4
81 – 100 |||| 5
www.gayali.in
Salary (Rs.) 61 – 80 81 – 100 101 – 120 121 – 140 141 – 160 Total
Frequency 4 5 4 4 3 20
www.gayali.in
Statistics Made Easy | 23
[4] The data below give the marks secured by 70 candidates in a certain examination:
21 31 35 52 64 74 89 53 42 7
22 35 43 67 76 35 46 26 32 40
72 43 38 41 63 71 28 32 45 54
15 18 52 73 86 50 39 55 47 12
44 58 67 85 39 40 50 65 72 69
57 63 5 56 79 37 24 54 82 49
51 54 68 29 34 44 58 62 59 65
Construct a frequency distribution of the marks, taking classes of uniform
width of 10 marks and 0 as the lower limit of the lower-most class.
[I.C.W.A. ‘74]
Solution:
Maximum Value = 89
Minimum Value =7
Table: Tally Sheet
www.gayali.in
Class Marks Tally Marks Frequency
0–9 || 2
10 – 19 ||| 3
20 – 29 |||| | 6
30 – 39 |||| |||| | 11
40 – 49 |||| |||| || 12
50 – 59 |||| |||| |||| 15
60 – 69 |||| |||| 10
70 – 79 |||| || 7
80 – 89 |||| 4
Total 70
Arrange the data in a frequency distribution in 10 class intervals and obtain the
percentage frequency in each class interval.
[C. U., B. Com ‘72]
Solution:
Maximum Value = 60
Minimum Value = 31
www.gayali.in
Statistics Made Easy | 24
www.gayali.in
4
49 – 51 |||| × 100 = 8 4
50
5
52 – 54 |||| × 100 = 10 5
50
2
55 – 57 || × 100 = 4 2
50
5
58 – 60 |||| × 100 = 10 5
50
Total 100 50
Class Marks 31–33 34–36 37– 39 40–42 43–45 46–48 49–51 52–54 55–57 58–60 Total
Frequency 6 4 7 7 5 5 4 5 2 5 50
Percentage
12 8 14 14 10 10 8 10 4 10 100
frequency
[6] Form an ordinary frequency table from the following cumulative frequency
distribution of marks obtained by 22 students:
www.gayali.in
www.gayali.in
Statistics Made Easy | 25
Solution:
Table: Frequency Distribution
Class Marks Frequency
0–9 3
10 – 19 5
20 – 29 9
30 – 39 3
40 – 49 2
Total 22
[7] From the following data, calculate the “percentage” of workers getting wages :–
(a) more than Rs. 44, (b) between Rs. 22 and Rs. 58
Wages (Rs) 0 – 10 10 – 20 20 – 30 30 – 40 40 – 50 50 – 60 60 – 70 70 – 80 Total
No. of Workers 20 45 85 160 70 55 35 30 500
[C.A. ‘76]
Solution:
Table: Cumulative Frequency Distribution
Class Boundary Cumulative
www.gayali.in
Frequency (less than)
10 20
20 65
22 x
30 150
40 310
44 y
50 380
58 z
60 435
70 470
80 500
[a] Number greater than Rs. 44
= Total frequency – Cumulative frequency (less than) corresponding to Rs.44
= 500 – 338 = 162
162
Percentage of workers getting wages more than Rs. 44 = × 100 = 32.4
To find the cumulative frequency x , we have 500
22 − 20 x − 65
=
30 − 20 150 − 65
2 x − 65
=
www.gayali.in
5 10 85 17
or, x-65 = 17 or, x=82
To find the cumulative frequency (less than) y, we have
44 − 40 y − 310
=
50 − 40 380 − 310
4 y − 310 y − 310 = 28
or , = or ,
10 70 y = 3 38
www.gayali.in
Statistics Made Easy | 26
[8] Draw the histogram of the following frequency distribution of heights of 100
college students :
Height (cm) 141 – 150 151 – 160 161 – 170 171 – 180 181 -190 Total
Frequency 5 16 56 19 4 100
www.gayali.in
[W.B.H.S. ‘78]
Solution:
Table: Calculations for Drawing Histogram
Class Limits Class Boundaries Frequency
141 – 150 140.5 – 150.5 5
151 – 160 150.5 – 160.5 16
161 – 170 160.5 – 170.5 56
171 – 180 170.5 – 180.5 19
181 – 190 180.5 – 190.5 4
Figure: Histogram
60
Frequency (No. of students)
50
40
30
www.gayali.in
20
10
0
140.5-150.5 150.5-160.5 160.5-170.5 170.5-180.5 180.5-190.5
Height (cm)
www.gayali.in
Statistics Made Easy | 27
[9] Draw histogram and frequency polygon to present the following data :–
Income (Rs) 100-149 150-199 200-249 250-299 300-349 350-399 400-449 450-499 Total
No. of Individuals 21 32 52 105 62 43 18 9 342
[I.C.W.A. ‘78]
Solution:
Table: Calculations for Drawing Histogram
Class Limits Class Boundaries Frequency (f) Width of Class (w)
100 – 149 99.5 – 149.5 21 50
150 – 199 149.5 – 199.5 32 50
200 – 249 199.5 – 249.5 52 50
250 – 299 249.5 – 299.5 105 50
300 – 349 299.5 – 349.5 62 50
350 – 399 349.5 – 399.5 43 50
400 – 449 399.5 – 449.5 18 50
450 – 499 449.5 – 499.5 9 50
110
gr am
100 isto
www.gayali.in
← H
90 on
olyg
80 yP
u enc
Class frequency
70 q
60 Fre
50 ←
40
30
20
10
0
Class boundaries
[10] Draw the histogram of the distribution given below and obtain the number of
firms whose sales lie between Rs. 12,00,000 and Rs. 26,00,000.
Value of Sales (Rs. 1000) No. of Firms
www.gayali.in
0 – 500 3
500 – 1000 42
1000 – 2500 288
2500 – 3500 150
3500 – 4500 51
Also draw the cumulative frequency polygon.
[C.U.B.A. (Econ.) ‘71]
www.gayali.in
Statistics Made Easy | 28
Solution:
Table: Calculations for drawing histogram
Class Interval (Rs. 1000) Frequency (f) Width of Class (w) Frequency density (f ÷ w)
0 – 500 3 500 0.006
500 – 1000 42 500 0.084
1000 – 2500 288 1500 0.192
2500 – 3500 150 1000 0.150
3500 – 4500 51 1000 0.051
Figure – Histogram for the Distribution of Sales
Frequency Density
www.gayali.in
Sales ('000)
The proportion of firms with annual sales lie between Rs.12,00,000 and
Rs.26,00,000 is given by the proportion of area under the histogram which lies between
right of vertical line at 1200 and 2500 and between vertical line 2500 to 2600 on the
horizontal axis. Assuming that the frequency 288 in the class interval 1200 to 2500
is uniformly distributed in the whole interval, the proportional part of the area i.e.
frequency 1200 and 2500 units is 1300 x 0.192 = 249.6 and proportional part of the
area i.e., frequency 2600 and 2500 units is 100 x 0.150 = 15.0.
So, the frequency between 1200 and 2600 is 249.6 + 15.0 = 264.6 ≈ 265
No. of firms = 265
Table: Cumulative Frequency Distribution
www.gayali.in
www.gayali.in
Statistics Made Easy | 29
400
350
Cumulative
300
250
Cumulative
200
150
100
50
0
0 500 1000 1500 2000 2500 3000 3500 4000 4500
Values of Sales (Rs.'000)
[11] Draw a cumulative frequency graph and estimate the number of persons
between the ages 30 – 32 in the following table:
Age 20 – 25 25 – 30 30 – 35 35 – 40 40 – 45 45 – 50 50 – 55 55 – 60 Total
www.gayali.in
No of persons 50 70 100 180 150 120 70 59 799
[C. U., M. Com. ‘68]
Solution:
Table – Cumulative Frequency Distribution
Class Boundary Cumulative Frequency (less – than)
20 0
25 50
30 120
35 220
40 400
45 550
50 670
55 740
60 799
800
700
600
Cumulative Frequency
500
400
300
www.gayali.in
200
160
120
100
0
20 25 30 32 35 40 45 50 55 60
No of people between the age 32 and 30 is 40. No. of persons i.e. 160 - 120.
www.gayali.in
Statistics Made Easy | 30
[12] Draw an ogive from the following data and find graphically the number of
observations lying between 360 and 440 :
Value Number of Observations
More than 200 400
More than 250 370
More than 300 315
More than 350 220
More Than 400 115
More than 500 45
More than 600 15
More than 700 0
[I.C.W.A. ‘72]
Solution:
Table: Cumulative Frequency Distribution
Value Cumulative Frequency (more – than)
200 400
250 370
www.gayali.in
300 315
350 220
400 115
500 45
600 15
700 0
Figure: Cumulative Frequency Polygon (more – than)
210
98
www.gayali.in
360 440
www.gayali.in
Statistics Made Easy | 31
[13] Draw less-than Ogive based on the data given below. (N = 146)
Mid-Point 18 25 32 39 46 53 60
Frequency 10 15 32 42 26 12 9
[C.A. ‘74]
Solution:
Class Boundary Frequency Cumulative Frequency (less – than)
14.5 0 0
21.5 10 10
28.5 15 25
35.5 32 57
42.5 42 99
49.5 26 125
56.5 12 137
63.5 9 146
Figure: Ogive (less – than)
www.gayali.in
Cumulative Frequency
Value
[14] The word – length for each of 90 words in a poem by Tagore is shown below:
5 4 3 5 8 6 6 3 4
3 4 4 5 8 2 6 7 6
4 5 6 4 9 6 4 2 2
2 9 2 3 3 3 2 4 7
7 2 4 4 4 3 4 4 2
www.gayali.in
4 4 9 3 7 4 5 12 6
3 5 2 5 10 3 5 7 3
3 3 6 2 5 3 3 3 2
4 5 8 5 3 4 4 6 7
2 3 5 5 5 3 2 4 5
Construct column diagram, frequency polygon, and cumulative frequency
polygon (less than).
www.gayali.in
Statistics Made Easy | 32
Solution:
Word Length Tally Marks Frequency
2 |||| |||| ||| 13
3 |||| |||| |||| |||| |||| 19
4 |||| |||| |||| |||| |||| 20
5 |||| |||| |||| 15
6 |||| |||| 9
7 |||| | 6
8 ||| 3
9 ||| 3
10 | 1
12 | 1
Total 90
Word – length Frequency (f) Cumulative Frequency (less – than) Relative Frequency
www.gayali.in
2 13 13 0.1445
3 19 32 0.2111
4 20 52 0.2222
5 15 67 0.1667
6 9 76 0.1000
7 6 82 0.0667
8 3 85 0.0333
9 3 88 0.0333
10 1 89 0.0111
12 1 90 0.0111
Total 90 1.0000
Figure: Column diagram for the frequency distribution of word lengths
Frequency
www.gayali.in
Word-length
www.gayali.in
Statistics Made Easy | 33
20
15
Frequency
10
0
1 2 3 4 5 6 7 8 9 10 11 12 13
Word-length
Figure: Cumulative Frequency diagram (less than type) for the data on number
of different word – lengths.
[15] On the basis of the table constructed in Exercise 14, answer the following
www.gayali.in
questions:
[a] What is the proportion of words with 9 letters?
[b] What is the number of words with 3 letters or less, and what is the number
of words with 5 letters or more?
[c] What is the number of words with not less than 4 and not more than 6
www.gayali.in
letters?
Answer:
[a] 0.0333 [b] (13+19)=32, (15+9+6+3+3+1+1)=38 [c] 44=(76–32)
[16] With the data shown below, form a frequency distribution with six classes. Show
the frequencies, the relative frequencies and the cumulative frequencies (of both the
less – than and the greater – than type). Finally, represent the distribution by means of
a suitable diagram.
www.gayali.in
Statistics Made Easy | 34
www.gayali.in
Class Limits Tally Marks Frequency
501 – 650 ||| 3
651 – 800 |||| ||| 8
801 – 950 |||| |||| |||| |||| |||| |||| 29
951 – 1100 |||| |||| |||| |||| |||| || 27
1101 – 1250 |||| |||| |||| |||| |||| 25
1251 – 1400 |||| ||| 8
Total 100
The required frequency distribution is shown below:
Table: Frequency Distribution of life of bulbs
Life (in hours) Relative Frequency Class Boundaries
501 – 650 0.030 3 500.5 – 650.5
651 – 800 0.080 8 650.5 – 800.5
801 – 950 0.290 29 800.5 – 950.5
951 – 1100 0.270 27 950.5 – 1100.5
1101 – 1250 0.250 25 1100.5 – 1250.5
1251 – 1400 0.080 8 1250.5 – 1400.5
Total 1.000 100
Table: Cumulative Frequency Distribution
Cumulative Frequency
www.gayali.in
Class Boundaries
(Less than) (More than)
500.5 0 100
650.5 3 97
800.5 11 89
950.5 40 60
1100.5 67 33
1250.5 92 8
1400.5 100 0
www.gayali.in
Statistics Made Easy | 35
Figure: Histogram
40
30
Frequency
20
10
0
0 500.5 650.5 800.5 950.5 1100.5 1250.5 1400.5
Hours
www.gayali.in
100
80
Cumulative Frequency
60
40
20
0
500.5 650.5 800.5 950.5 1100.5 1250.5 1400.5
Hours
Central Tendency
www.gayali.in
Quite often there will be found in the data a tendency, notwithstanding their
variability, to cluster around a central value. In such a case, it would be legitimate
to use a single value, the central value, to represent the whole set of figures. Such a
representative or typical value of a variable is called a measure of central tendency or
an average.
There are three measures of central tendency – Mean, Median and Mode.
www.gayali.in
Statistics Made Easy | 36
Again, Mean is of three types – Arithmetic Mean (A.M.), Geometric Mean (G.M.),
and Harmonic Mean (H.M.).
Arithmetic Mean (A.M.)
Arithmetic mean of a set of observations is defined to be their sum, divided by
the number of observations.
Given n observations x1, x2, ............., xn, their A.M., denoted by the symbol x is
x1 + x 2 + − − − − + x n 1
x= = Σx
n n
If x1, x2, ––––, xn have frequencies f1, f2, ––––, fn respectively i.e. x1 occurs f1
times, x2 occurs f2 times and so on, then the sum of all the f1+f2+––––+fn observations
is
x1 + x1 + − − − − + x1 + x 2 + x 2 − − − − + x 2 + − − − − + x n + x n + − − − − + x n
f1 terms f2 terms fn terms
= f1x1+f2x2+––––+fnxn
www.gayali.in
Hence, the arithmetic mean is
f1 x1 + f2 x 2 + − − − − + fn x n Σfx
x= =
f1 + f2 + − − − − + fn N
x−c
In particular, if y = , where c and d are constants, then x = c + dy
d
[d] If a group of n1 observations has A.M. x1 , and another group of n2 observations
has A.M. x2 , then the A.M. ( x ) of the composite group of n1+n2 (=N, say) observations
is given by
Nx = n1 x1 + n2 x 2
www.gayali.in
Statistics Made Easy | 37
www.gayali.in
= (x1– x )+(x2– x )+––––+(xn– x )
= x1+x2+––––+xn– x – x –––– x (n times)
= (x1+x2+––––+xn) – n x
=nx –nx
=0
Example–2 : Show that if x be the arithmetic mean of the values xi, weighted by
fi (i=1, 2, ––––, n), then
n
Σ fi ( x i − x ) = 0
1
Solution : The arithmetic mean of the values xi weighted by fi (i=1, 2, ––––, n) is,
f1 x1 + f2 x 2 + − − − − + fn x n
by definition, x = where N=f1+f2+––––+fn. Therefore,
N
n
Σ fi ( x i − x ) = f1 ( x1 − x ) + f2 ( x 2 − x ) + − − − − + fn ( x n − x )
1
= f1 x1 − f1 x + f2 x 2 − f2 x + − − − − + fn x n − fn x
= f1 x1 + f2 x 2 + − − − − + fn x n − f1 x − f2 x − − − −fn x
= ( f1 x1 + f2 x 2 + − − − − + fn x n ) − x ( f1 + f2 + − − − − + fn )
= Nx − Nx
www.gayali.in
=0
Example–3 : If yi=xi–c; (i=1, 2,––––, n) where c is a constant, prove that x =c+ y
Solution : Since yi = xi – c, therefore xi = c + yi
Multiplying both sides by fi and then summing over all values of i = 1, 2, ––––, n
we have
n n
Σ fi x i = Σ fi ( c + y i )
1 1
www.gayali.in
Statistics Made Easy | 38
= ∑(fic + fiyi)
= ∑fic + Σfiyi
= c∑fi + Σfiyi
= CN + ∑fiyi, since ∑fi = N
1 1
Hence, x = Σfi x i = ( CN + Σfi y i )
N N
1
= C + Σfi Yi
N
=c+y
xi − c
Example–4 : If y i = (i = 1, 2, ––––, n) where c and d are constants, prove
d
that x = c + d y .
xi − c
Solution : Since y i = , we have dyi = xi – c or, xi = dyi+c :
d
Multiplying both sides by fi, we get fixi = fi (c + dyi). Now, suming over all values of
i = 1, 2, ––––, n
∑fixi = ∑fi (c + dyi) = ∑(fic + dfiyi)
www.gayali.in
= ∑fic + ∑dfiyi
= c∑fi + d∑fiyi
= cN + d ∑fiyi where N = ∑fi
1 1
Hence, x = Σfi x i = ( cN + d Σfi y i )
N N
1
= c + d Σfi y i
N
= c + dy
n
Example–5 : Prove that Σ ( x i − A ) / n is the least when A = x where x1, x2, ––––,
2
i =1
xn are the observations, A is any arbitrary constant and x the arithmetic mean.
n n
Solution : Σ ( x i − A ) / n will be the least, when Σ ( x i − A ) is so.
2 2
i =1 i =1
Now, we can write
xi – A = (xi – x ) + ( x – A)
Therefore, Σ ( x i − A ) = Σ{( x i − x ) + ( x − A )}
2 2
{ }
www.gayali.in
= Σ ( xi − x ) + 2 ( xi − x )( x − A ) + ( x − A )
2 2
= Σ ( x i − x ) + Σ2 ( x i − x ) ( x − A ) + Σ ( x − A )
2 2
= Σ ( xi − x ) + 2( x − A ) Σ ( xi − x ) + n ( x − A )
2 2
( )
= Σ ( x i − x ) + 2 X − A .0 + n ( x − A )
2 2
= Σ ( xi − x ) + n ( x − A )
2 2
www.gayali.in
Statistics Made Easy | 39
Both the terms of the right are positive; because, the first is the sum of n squares
(xi – x )2, and the second is the product of n (a positive integer) and a square. But we have
only to choose the value of A which makes ∑(xi – A)2 the minimum possible and this will
be achieved when the second term on the right has the minimum possible value viz. 0 i.e.
n( x – A)2 = 0
Or, ( x – A)2 = 0
x –A=0
∴x =A
Geometric Mean (G.M.)
Geometric mean of a group of n observations is the n-th root of their product. It
is defined only when all observations have the same sign, and none of them is zero.
Given n observations x1, x2, ––––, xn
G.M. = n x1 × x 2 × − − − − × x n
1
This may also be written as ( x1 , x 2 − − − − x n ) n
If, however, x1, x2, ––––, xn have frequencies f1, f2, ––––, fn respectively, the product
www.gayali.in
of all the N (= f1 + f2 +––––+ fn) observations is
= x f1 x 2 f2 − − − − x n fn
So that G.M. = N
x1f1 x 2 f2 − − − − x n fn
1
This may also be written in the form x1f1 x 2 f2 − − − − x fn( ) N where N = fi is the total
frequency.
We have simple geometric mean and weighted geometric mean, given by the
formulae
1
Simple G.M. (g) = ( x1 .x 2 − − − − x n ) n
1
(
Weighted G.M. (G) = x1 x 2 − − − − x nf1 f2 fn
) N
They are equal, only when all weights are equal. For practical calculations, these
formulae cannot be applied directly. Taking logarithms of both sides, we have
1
log g =
n
( log x1 + log x 2 + − − − − + log x n )
1
= Σ log x i
n
www.gayali.in
1
log G = f1 ( log x1 ) + f2 ( log x 2 ) + − − − − + fn ( log x n )
N
1
= Σfi ( log x i )
N
Properties of G.M.
[a] The product of a group of n observations is equal to the n-th power of their G.M.
x1. x2––––.xn = gn
www.gayali.in
Statistics Made Easy | 40
[b] The logarithm of G.M. of a set of observations is equal to the A.M. of their
logarithms.
1
log g = Σ log x i ;
n
1
log G = Σfi ( log x i )
N
[c] If G1, G2 –––– be the geometric means of several groups having n1, n2 ––––
observations respectively, then G.M. (G) of the composite group is given by their
weighted geometric mean.
G= N
G1n1 G2 n2 − − − −
1
i.e. log G = Σni ( log Gi )
N
Where N = n1 + n2 + ––––
Harmonic Mean (H.M.)
Harmonic mean of a set of observations is the reciprocal of the arithmetic mean
of their reciprocals. Like G.M., H.M. is defined only when no observation is zero.
www.gayali.in
n n
Simple H.M. = =
1 1 1 1
+ +−−−−+ Σ
x1 x 2 xn
xi
N N
Weighted H.M. = =
f1 f2 f f
+ +−−−−+ n Σ i
x1 x 2 xn xi
They are equal only when all weights are equals.
Relations between A.M., G.M., H.M.
[1] For any given set of observations, A.M. is greater than or equal to G.M., and G.M.
is greater than or equal to H.M.
A.M. ≥ G.M. ≥ H.M.
They are equal, only when all observations are equal.
[2] For two observations only.
A.M. G.M.
=
G.M. H.M.
This means that G.M. not only lies between A.M. and H.M., but (G.M.)2 = A.M. ×
www.gayali.in
www.gayali.in
Statistics Made Easy | 41
Solution : Let x1, x2, ––––, xn be a set of n observations (all positive). Their A.M.,
G.M., and H.M. (denoted by A, G and H respectively) are
x1 + x 2 + − − − − + x n
A=
n
1
G = ( x1 x 2 − − − − x n ) n
n
H=
1 1 1
+ +−−−−+
x1 x 2 xn
Considering only two observations x1 and x2, we see that (√x1–√x2)2 ≥ 0, because
the left side is a square quantity.
Or, x1 + x 2 − 2 x1 x 2 ≥ 0
Or, x1 + x 2 ≥ 2 x1 x 2
x1 + x 2
Or, ≥ x1 x 2 –––– (i)
2
i.e. A.M. ≥ G.M., when n=2 –––– (ii)
www.gayali.in
Similarly, considering only the observations x3 and x4, we have
x3 + x 4
≥ x 3 x 4 –––– (iii)
2
x1 + x 2 x +x
If we now consider the two quantities and 3 4 , we must have, by (ii),
2 2
x1 + x 2 x 3 + x 4
+ x1 + x 2 x 3 + x 4
2 2 ≥ .
2 2 2
x1 + x 2 + x 3 + x 4 x1 + x 2 x 3 + x 4
Or, ≥ . –––– (iv)
4 2 2
x1 + x 2 x 3 + x 4
But, . ≥ x1 x 2 . x 3 x 4 –––– (v)
2 2
because each of the two terms on the left is greater than or equal to the
corresponding term on the right, by (i) and (iii).
Substituting from (v) in (iv),
x1 + x 2 + x 3 + x 4
≥ x1 x 2 . x 3 x 4
4
www.gayali.in
x +x +x +x
Or, 1 2 3 4 ≥ 4 x1 x 2 x 3 x 4
4
i.e. A.M. ≥ G.M., when n = 4
Proceeding this way, it can be shown that A.M. > G.M., whenever n = 2 or 4 or 8
or 32 etc. i.e. of the form 2m, where m is a positive integer. But we have to prove the result
for any value of n. For this purpose, let us suppose that the given value of n' lies between
two such values 2m–1 and 2m i.e. 2m–1 < n < 2m.
www.gayali.in
Statistics Made Easy | 42
We now consider 2m (= N, say) values, consisting of the n given observations x1, x2,
–––– xn, and (N–n) further values each equal to A, i.e. (x1 + x2 + ––––+ xn)/n.
x1 , x 2 , − − − − x n A
, , −
A −
− −
A
n terms ( N − n ) terms
A.M. of these N values is
x1 + x 2 + − − − − + x n + A + A + − − − − + A nA + ( N − n ) A
= =A
N N
Also G.M. of these N values is
1 1
( x1 . x 2 − − − − x n . A . A. − − − −A ) N = ( Gn . A N −n ) N
1
(
Since A.M. ≥ G.M. for N=2m values, therefore, in the present case A ≥ Gn . A N − n ) N
www.gayali.in
1 1 1
the n values , , − − −−, .
x1 x 2 xn
1 1 1
+ +−−−−+
x1 x 2 xn 1
The A.M. of these values is = and their G.M. is
n H
1 1
1 1 1 n 1 n 1 1
. −−−− = = n =
x1 x 2 xn x1 . x 2 − − − − x n G G
1 1
Since, we have proved that A.M. ≥ G.M., in the present case ≥ i.e. G ≥ H
H G
Combining the two results A ≥ G and G ≥ H, we have A ≥ G ≥ H; i.e. in general
A.M. ≥ G.M. ≥ H.M.
We shall now prove that A.M. = G.M. = H.M., only when all the observations have
the same value, i.e. x1 = k, x2 = k, –––– xn = k.
In such a situation
k + k + − − − − + k nk
A.M. = = =k
n n
www.gayali.in
1 1
G.M. = ( k.k. − − − −k ) n = k n ( ) n =k
n n
H.M. = = =k
1 1 1 n
+ +−−−−+ k
k k k
and hence A.M. = G.M. = H.M., when all the observations are equal.
www.gayali.in
Statistics Made Easy | 43
Median
Median of a set of observations is the middle – most value when the observations
are arranged in order of magnitude. The number of observations smaller than Median is
the same as the number greater than it. Thus, Median divides the observations into two
equal parts. It is unaffected by the presence of extremely large or small observations and
can be calculated from frequency distributions with open-end classes.
An important property of Median is that for any given set of observations the sum
of absolute deviations from median is the least.
Calculation of Median
The median is calculated as follows :
[a] From simple series – The given data are arranged in order of magnitude. If
the number of observations be odd, the value of the middle-most item is the median.
However, if the number be even, the arithmetic mean of the two middle-most items is
taken as median.
[b] From simple frequency distribution – The cumulative frequency ("Less than" type)
corresponding to each distinct value of the variable is calculated. If the total frequency
be N, the value of the variable corresponding to cumulative frequency (N+1)/2 gives the
www.gayali.in
median.
[c] From grouped frequency distribution – Median from a grouped frequency
N
distribution is that value which corresponds to cumulative frequency . Median from
2
a grouped frequency distribution can be calculated by any of the following methods :–
[i] By the application of formula for median :
The cumulative frequencies are calculated. The class in which cumulative
N
frequency lies, is called the median class. Now we apply the formula :
2
N
−F
2
Median = l1 + f × C
m
www.gayali.in
Statistics Made Easy | 44
www.gayali.in
occurs the maximum number of times.
From a simple frequency distribution, mode can be determined by inspection
only. It is that value of the variable which corresponds to the largest frequency.
From a grouped frequency distribution it is very difficult to find the mode
accurately. However, if all classes are of equal width, mode is usually calculated by the
formula.
d1
Mode = l1 + ×c
d1 + d 2
where
l1 = lower boundary of the modal class
d1 = difference of the largest frequency and the frequency of class just preceeding
the modal class;
d2 = difference of the largest frequency and the frequency of class just following
the modal class;
c = common width of classes.
If fo, f–1, f1 represent the frequencies of the modal class, the class just preceding and
www.gayali.in
www.gayali.in
Statistics Made Easy | 45
www.gayali.in
[ii] Second quartile : Q2 (or middle quartile)
[iii] Third quartile (or upper quartile) : Q3
For data of continuous type, one – quarter of the observations is smaller than
Q1, two – quarters are smaller than Q2 and three – quarters are smaller than Q3. This
means that Q1, Q2, Q3 are values of the variable corresponding to "less-than" cumulative
N 2 N 3N 2N N
frequencies , , respectively. Since, = , it is evident that the second quartile
4 4 4 4 2
Q2 is the same as median.
Q1 < Q2 < Q3 ; Q2 = Median
In Bowley's formula for skewness, all the three quartiles are used.
Median = Q2
Q3 − Q1
Quartile Deviation =
2
Q3 − 2Q2 + Q1
Skewness =
Q3 − Q1
Deciles are such values which divide the total number of observations into 10
equal parts. There are 9 deciles D1, D2, ––––, D9 called the first decile, the second decile,
www.gayali.in
etc. The number of observations small-than D1, or between two successive deciles, or
larger thanD9 is the same. For data of continuous type, D1, D2, –––– D9 correspond to
N 2N 9N
cumulative frequencies , , − − −−, respectively.
10 10 10
D1 < D2 < –––– < D9; D5 = Q2 = Median
Percentiles are such values which divide the total number of observations into
www.gayali.in
Statistics Made Easy | 46
100 equal parts. There are 99 percentiles P1, P2, ––––, P99, called the first percentiles
the second percentile, and so on. The K-th percentile (Pk) is, therefore, that value of the
variable upto which lie exactly K% of the total number of observations.
In particular,
P10 = D1, P20 = D2, ––––, P90 = D9.
P25 = Q1, P50 = D5 = Q2 = Median, P75 = Q3
P1 < P2 < –––– < P99
Calculation of partition values
[a] From simple series – The given data are arranged in increasing order of
magnitude and a number showing the rank is attached to each observation. The smallest
value is given rank 1, the next higher value rank 2, etc. and the largest value is given rank
n. The ranks of partition values are as follows :
1
Rank of Median =
2
( n + 1)
1
Rank of Q1 = ( n + 1)
www.gayali.in
4
3
Rank of Q3 = ( n + 1)
4
K
Rank of Dk = ( n + 1)
10
K
Rank of Pk =
100
( n + 1)
Using simple interpolation, the value of the variable corresponding to the
appropriate rank is determined, giving the partition values.
1
Median = Value corresponding to rank
2
( n + 1)
1
Q1 = Value corresponding to rank
4
( n + 1)
3
Q3 = Value corresponding to rank ( n + 1)
4
K
Dk = Value corresponding to rank ( n + 1)
10
K
Pk = Value corresponding to rank
100
( n + 1)
[b] From simple frequency distribution – The cumulative frequency
www.gayali.in
corresponding to each distinct value of the variable is calculated. If the total frequency be N,
1
Median = Value corresponding to cumulative frequency
2
( N + 1)
1
Q2 = Value corresponding to cumulative frequency
4
( N + 1)
3
Q3 = Value corresponding to cumulative frequency ( N + 1)
4
www.gayali.in
Statistics Made Easy | 47
K
Dk = Value corresponding to cumulative frequency
10
( N + 1)
K
Pk = Value corresponding to cumulative frequency
100
( N + 1)
[c] From grouped frequency distribution –
(i) By application of interpolation – A cumulative frequency distribution is
constructed showing the class boundaries and the corresponding cumulative frequencies
("less-than" type). Using simple interpolation, we now find
1
Median = Value corresponding to cumulative frequency N
2
1
Q1 = Value corresponding to cumulative frequency N
4
3
Q3 = Value corresponding to cumulative frequency N
4
K
Dk = Value corresponding to cumulative frequency N
10
K
Pk = Value corresponding to cumulative frequency N
100
www.gayali.in
(ii) Graphical method – An ogive ("less-than" type) is drawn. From this ogive
1
Median = Abscissa corresponding to ordinate N
2
1
Q1 = Abscissa corresponding to ordinate N
4
3
Q3 = Abscissa corresponding to ordinate N
4
K
Dk = Abscissa corresponding to ordinate N
10
K
Pk = Abscissa corresponding to ordinate N
100
Exercise
[1] Find the mean and the median of :
88, 72, 33, 29, 70, 54, 86, 91, 57, 61
[C.U. B.Com.'73]
1
Solution : Mean = (88 + 72 + 33 + 29 + 70 + 54 + 86 + 91 + 57 + 61)= 64.1
10
For median data are arranged in order of magnitude 29, 83, 54, 57, (61), (70), 72,
86, 88, 91
www.gayali.in
www.gayali.in
Statistics Made Easy | 48
www.gayali.in
Hence, Median is 3.
Mode = 3
[3] Evaluate the arithmetic mean, median and mode for the following distribution of
number of telephone calls received per one minute "interval".
No. of calls 0 1 2 3 4 5 6 7 8
Frequency 5 22 31 43 51 40 35 15 3
[B.U., B.Com.'71]
Solution :
Table : Calculations for A.M., median and mode
No. of calls (x) Frequency (f) Cumulative frequency y=x–4 fy
0 5 5 –4 –20
1 22 27 –3 –66
2 31 58 –2 –62
3 43 101 –1 –43
←
4 51 152 0 0
5 40 192 1 40
www.gayali.in
6 35 227 2 70
7 15 242 3 45
8 3 245=N 4 12
Total 245 –24
24
Mean ( x ) = 4 + − = 4 − 0.098 = 3.902 = 3.90
245
www.gayali.in
Statistics Made Easy | 49
Mode = 4
N + 1 245 + 1 246
Median lies in cumulative frequency = = = 123
2 2 2
Value corresponding to cumulative frequency 123 = 4
Median = 4
[4] Calculate the simple and weighted average from the following and account for the
difference between the two :
Price per ton (Rs./P.) 45.60 50.70 42.45
Tons purchased 135 40 25
[C.A. '72]
Solution :
1
3
( 45.60 + 50.70 + 42.45)
Simple A.M. =
138.75
= = 46.25
3
45.60 × 135 + 50.70 × 40 + 42.45 × 25
www.gayali.in
Weighted A.M. =
135 + 40 + 25
6156 + 2028 + 1061.25
=
200
9245.25
= = 46.23
200
[5] The numbers 3.2, 5.8, 7.9 and 4.5 have frequencies x, (x+2), (x–3) and (x+6)
respectively. If the arithmetic mean is 4.876, find the value of x.
[C.U., M.Com., '73]
Solution :
As per given condition,
3. 2 × x + 5. 8 × ( x + 2 ) + 7.9 ( x − 3 ) + 4.5 ( x + 6 )
= 4.876
x + x +2+ x −3+ x +6
3.2 x + 5.8 x + 11.6 + 7.9 x − 23.7 + 4.5x + 27
Or, = 4.876
4x + 5
Or, 21.4x + 14.9 = 4.876 (4x + 5)
Or, 21.4x + 14.9 = 19.504x + 24.38
Or, 21.4x – 19.504x = 24.38 – 14.9
www.gayali.in
Or, 1.896x=9.48
9.48
∴ x = =5
1.896
[6] Calculate the arithmetic mean from the following data :
[i] Class interval 50–59 60–69 70–79 80–89 90–99 100–109 110–119
Frequency 14 38 44 54 45 30 25
www.gayali.in
Statistics Made Easy | 50
[ii] Height in inches 57.5 - 60.0 - 62.5 - 65.0 - 67.5 - 70.0 - 72.5
Number of men 6 26 190 281 412 127 38
[iii] Weight in lbs. 137.5–147.5 147.5–157.5 157.5–1673.5 167.5–177.5 177.5–187.5 187.5–197.5 197.5–217.5 217.5–247.5
Number of Men 2 5 4 5 7 5 3 1
[iv] x 20–30 30–50 50–100 100–200 200–350 350–550
Frequency 2 9 11 52 10 3
Solution :
[i] Table : Calculation of Arithmetic Mean
x − 84.5
Class interval Frequency (f) Mid-Value (x) y= fy
10
50–59 14 54.5 –3 –42
60–69 38 64.5 –2 –76
70–79 44 74.5 –1 –44
80–89 54 84.5 0 0
www.gayali.in
90–99 45 94.5 1 45
100–109 30 104.5 2 60
110–119 25 114.5 3 75
Total 250 – – 18
18
Arithmetic Mean ( x ) = 84.5 + × 10
250
= 84.5 + 0.72 = 85.22
[ii] Table : Calculation for Arithmetic Mean
x − 66.25
Class interval Frequency (f) Mid-Value (x) y= fy
2.50
57.5–60.0 6 58.75 –3 –18
60.0–62.5 26 61.25 –2 –52
62.5–65.0 190 63.75 –1 –190
65.0–67.5 281 66.25 0 0
67.5–70.0 412 68.75 1 415
www.gayali.in
www.gayali.in
Statistics Made Easy | 51
www.gayali.in
= 172.5 +
32
= 172.5 + 4.22
= 176.72 lbs.
[iv] Table : Calculation for Arithmetic Mean
x −150
x Frequency (f) Mid-Value (x) y= fy
5
20–30 2 25 –25 –50
30–50 9 40 –22 –198
50–100 11 75 –15 –165
100–200 52 150 0 0
200–350 10 275 25 250
350–550 3 450 60 180
Total 87 – – 17
17
Arithmetic Mean ( x ) = 150 + ×5
87
www.gayali.in
www.gayali.in
Statistics Made Easy | 52
Solution :
Table : Calculations for Arithmetic Mean
x − 64.5
Class interval Frequency (f) Mid-Value (x) y= fy
10
30–39 2 34.5 –3 –6
40–49 3 44.5 –2 –6
50–59 11 54.5 –1 –11
60–69 20 64.5 0 0
70–79 32 74.5 1 32
80–89 25 84.5 2 50
90–99 7 94.5 3 21
Total 100 – – 80
80
Arithmetic Mean ( x ) = 64.5 + × 10
100
= 64.5 + 8 = 72.5
www.gayali.in
[8] The following table gives the rise in prices of 300 commodities between two dates.
Calculate the mean rise in price :
% increase 0 - 5 - 10 - 15 - 25 - 35 - 45 - 60-80
Frequency 12 30 51 84 66 35 15 7
[Dip. in Social Welfare '71]
Solution :
Table : Calculations for Arithmetic Mean
x − 30
Class boundaries Frequency (f) Mid-Value (x) y= fy
2. 5
0–5 12 2.5 –11 –132
5–10 30 7.5 –9 –270
10–15 51 12.5 –7 –357
15–25 84 20 –4 –336
25–35 66 30 0 0
35–45 35 40 4 140
www.gayali.in
www.gayali.in
Statistics Made Easy | 53
[9] The following are the monthly salaries (in Rs.) of 30 employees in a firm :
140 139 126 114 100 88 62 77 99 103
108 129 144 148 134 63 69 148 132 118
142 116 123 104 95 80 85 106 123 133
The firm gave bonus of Rs.10, 15, 20, 25, 30, 35 for individuals in the respective
salary groups : 'exceeding Rs.60 but not exceeding Rs.75'; 'exceeding Rs.75 but not
exceeding Rs.90'; and so on upto 'exceeding Rs.135 but not exceeding Rs.150'. Find the
average bonus per worker.
[I.C.W.A. '76 - old]
Solution : As per data given
Table : Calculation for A.M.
Bonus paid (Class (x)(Rs.) boundaries) Frequency (f) fx
10 3 30
15 4 60
20 5 100
25 5 125
30 7 210
35 6 210
Total 30 735
www.gayali.in
735
Arithmetic Mean ( x ) = = Rs.24.50
30
[10] For the variable x, taking the values 0, 1, 2, ––––, k, the cumulative frequencies of
k k Fi
more-than type are F0, F1, F2, ––––, Fk. Show that x = Σ Σ , where n is the total frequency.
i =1 i =1 n
K[Fk–1 – Fk]
www.gayali.in
K Fk Fk
Total (F0 + Fk) (F1+F2+F3+––––+Fk)
F1 + F2 + F3 + − − − − +Fk
∴x =
n
k F
=Σ i
i =1 n
[11] [a] The arithmetic mean calculated from the following frequency distribution
www.gayali.in
Statistics Made Easy | 54
x − 67
Class limits Frequency (f) Mid-value (x) y= fy
3
60–62 15 61 –2 –30
63–65 54 64 –1 –54
66–68 f3 67 0 0
69–71 81 70 1 81
72–74 24 73 2 48
Total 174+f3 – – 45
45
x = 67 + 3 ×
174 + f3
www.gayali.in
135
Or, 67.45 = 67 +
174 + f3
135
Or, 0.45 =
174 + f3
Or, 0.45f3 + 78.3 =135
Or, 0.45f3 = 135 – 78.3 = 56.7
56.7
∴ f3 = = 126
0.45
[b] The expenditure of 1000 families is given below :
Expenditure (Rs.) 40–59 60–79 80–99 100–119 120–139
No. of families 50 ? 500 ? 50
The median and mean for the distribution are both Rs.87.50P. Calculate the
missing frequencies.
[I.C.W.A. '78]
Solution : let the missing frequencies are f2 and f4 respectively.
Table : Calculations for Missing Frequencies
www.gayali.in
x − 89.5
Class limits Frequency (f) Mid-Value y= fy
20
40–59 50 49.5 –2 –100
60–79 f2 69.5 –1 –f2
80–99 500 89.5 0 0
100–119 f4 109.5 1 f4
120–139 50 129.5 2 100
Total 600+f2+f4 – – f4–f2
www.gayali.in
Statistics Made Easy | 55
www.gayali.in
[12] Find out the missing frequencies of the following data, given the A.M. is 67.45
inches.
Height (inches) 60–62 63–65 66–68 69–71 72–74 Total
No. of students 5 18 f3 f4 8 100
[Dip. Management '72]
Solution :
Table : Calculation for Missing Entries
x − 67
Class limit Frequency (f) Mid-Value y= fy
3
60–62 5 61 –2 –10
63–65 18 64 –1 –18
66–68 f3 67 0 0
69–71 f4 70 1 f4
72–74 8 73 2 16
Total 31+f3+f4 – – f4 – 12
As per given condition,
f4 − 12
67.45 = 67 + ×3
www.gayali.in
100
f4 − 12
Or, 0.45 = ×3
100
Or, 45 = 3f4 – 36
∴ 3f4 = 45 + 36 = 81
81
f4 = = 27
3
www.gayali.in
Statistics Made Easy | 56
www.gayali.in
Mean marks x1 = 75 x 2 = 85 x =80
and n1+n2=N.
Table : Mean of composite group
Groups
Characteristics Composite group
I II
No. of observations n1 n2 N = n1 + n2
Mean of grade x1 =68.4 x 2 =71.2 x =70.0
www.gayali.in
Statistics Made Easy | 57
Applying formula,
Nx = n1 x1 + n2 x 2
Or, ( n1 + n2 ) x = n1 x1 + n2 x2
Or, ( n1 + n2 ) 70 = n1 × 68.4 + n2 × 71.2
Or, 70n1 + 70n2 = 68.4n1 + 71.2n2
Or, 70n1 − 68.4n1 = 71.2n2 − 70n2
n 1. 2 3
Or, 1.6n1 = 1.2n2 Or, =
1
=
n2 1. 6 4
∴ n1 : n2 = 3 : 4
[iii] The mean age of a combined group of men and women is 30 years. If the
mean age of the group of men is 32 and that of the group of women is 27, find
out the percentage of men and women in the group.
[C.A. '65]
Solution : Let the number of men and number of women be n1 and n2 and
n1+n2=N.
www.gayali.in
Table : Mean of composite group
Groups
Characteristics Men Women Composite group
I II
No. of observations n1 n2 n1 + n2 = N
Mean age (years) 32 27 30
Applying formula,
Nx = n1 x1 + n2 x 2
Or, ( n1 + n2 ) 30 = 32n1 + 27n2
Or, 30n1 + 30n2 = 32n1 + 27n2
Or, 30n2 − 27n2 = 32n1 − 30n1
Or, 3n2 = 2n1
n1 3
Or, =
n2 2
Or, n1 : n2 = 3 : 2
3 3 20
Or, n1 = × 100 = × 100 = 60% and n2 = 40%
3+2 5
www.gayali.in
∴ Men = 60%
Women = 40%
[14] Out of the total population in a certain town in South Africa, 60% belonged to
the Black Race and the rest belonged to the White Race. It was estimated that their
mean incomes were respectively 2000 and 5000 pounds. Find the average income of
the entire town.
[C.A. '68]
www.gayali.in
Statistics Made Easy | 58
www.gayali.in
worker of the whole factory.
[Dip. Management '70]
Solution : Let the mean earning per worker be x .
Table : Mean of composite group
Groups
Characteristics Composite group
I II III IV V
No. of observations 105 184 130 93 125 637
Mean of earning 13.80 15.00 15.20 18.20 14.20 x
Applying formula,
Nx = n1 x1 + n2 x2 + n3 x3 + n 4 x 4 + n5 x5
637 x = 105 × 13.80 + 184 × 15.00 + 130 × 15.20 + 93 × 18.20 + 125 × 14.20
= 1449 + 2760 + 1976 + 1692.6 + 1775
= 9652.6
9652.6
∴ x = = Rs.15.15
637
[ii] In a survey of locality the following figures regarding the income of the
people in different occupations were received. Find out the average per capita income
www.gayali.in
:
Occupation Average income (in Rs.) Number of people
Business 500 700
Labour 300 300
Craftmanship 200 200
Others 400 100
[D.S.W. '70]
www.gayali.in
Statistics Made Easy | 59
[16] The following shows some data collected for three regions of a country :
www.gayali.in
No. of inhabitants Percentage of Average annual income
Region
(Millions) Literates per person (Rs.)
A 10 52 850
B 5 68 620
C 18 39 730
Obtain the over all figures for the three regions taken together.
[C.U., B.A. (Eco.) '77]
Solution : Let the average annual income of entire group be x .
Table : Mean of composite group
Groups
Characteristics Composite group
A B C
No. of observations 10 5 18 33
Mean of income (Rs.) 850 620 730 x
Applying formula,
www.gayali.in
Nx = n1 x1 + n2 x 2 + n3 x3
33 x = 10 × 850 + 5 × 620 + 18 × 730
= 8500 + 3100 + 13140
= 24740
24740
∴x = = Rs.749.70
33
www.gayali.in
Statistics Made Easy | 60
www.gayali.in
Or, 439 = 361 (1 + i)10
Taking together on both sides
log 439 = log 361 + 10 log (1 + i)
2.6425 = 2.5575 + 10 log (1 + i)
10 log (1 + i) = 2.6425 – 2.5575
= 0.085
log (1 + i) = 0.0085
1 + i = Antilog .0085
= 1.020
i = .020 = 2%
[ii] Let A be the population in 1971
∴ A = 439 (1 + .02)10
= 439 (1.02)10
log A = log 439 + 10 log 1.02
= 2.6425 + 10 × 0.085
= 2.7275
A = Antilog 2.7275
= (5333 + 6) = 533.9 million.
www.gayali.in
[18] A man gets three successive annual increments in salary of 20%, 30% and 25%,
each percentage being reckoned on his salary at the end of the previous year. How
much better or worse off would he have been if he had been given 3 annual increments
of 25% each, reckoned in the same way?
[I.C.W.A. '74]
Solution : Let R be his starting salary.
A1 be his salary when annual increments are given 20%, 30% and 25%
www.gayali.in
Statistics Made Easy | 61
successively. A2 be his salary when annual increments are 25% each year.
20 30 25
∴ A1 = R 1 + 1+ 1+
100 100 100
= R (1.2)(1.3)(1.25)
= R × 1.95
3
25
A2 = R 1 + = R × 1.953125
100
∴ A2–A1 = R (1.953125 – 1.95)
= R × 0.003125
R
=
320
R
∴ In the second case he would have received more.
320
[19] A machine is assumed to depreciate 40% in value in the first year, 25% in the
second year and 10% per annum for the next 3 years, each percentage being calculated
on the diminishing value. What is the average percentage depreciation, reckoned on
the diminishing value, for the 5 years?
www.gayali.in
[B.U., B.A. (Eco.) '66, C.U. M.Com. '73]
Solution : Let P is the original value of the machine 'i' is the average rate of
depreciation
i1, i2, i3 are successive rate of depreciations.
As per given condition,
P/ (1 − i ) = P/ (1 − i1 ) (1 − i 2 ) (1 − i 3 )
5 3
www.gayali.in
Statistics Made Easy | 62
[20] The G.M. of 4 observations is 47, and the G.M. of 6 others is 40. Find the G.M.
of all the 10 observations.
Solution : Formula states that
If G1, G2 –––– be the G.M. of several groups having n1, n2 –––– observations
respectively, then G.M. (G) of the composite group is given by their weighted Geometric
Mean.
G = N G1n1 , G2 n2 − − − −
1
log G = Σni ( log Gi )
N
where N = n1 + n2 + ––––
Here, n1 = 4, n2 = 6, N = 10
G1 = 47 G2 = 40 G = ?
Substituting the values in the formula,
1
log G = 4 log 47 + 6 log 40
10
www.gayali.in
1
= 4 × 1.6721 + 6 × 1.6021
10
1
= 6.6884 + 9.6126
10
16.3010
= = 1.6301
10
Taking Antilog, G = [4266 + 1] = 42.67
Therefore, G.M. of all the 10 observations = 42.67
[21] The geometric mean of six numbers is 75. If the geometric mean of four of them
is 67, what is the geometric mean of the other two?
[B.U., B.A. (Eco.) '71]
Solution : As per given condition,
1
log 75 = 4 log 67 + 2 log G2
6
1
Or, 1.8751= 4 × 1.8261 + 2 log G2
6
www.gayali.in
1
= 7.3044 + 2 log G2
6
1
= 1.2174 + log G2
3
1
Or, log G2 = 1.8751 – 1.2174 = 0.6577
3
www.gayali.in
Statistics Made Easy | 63
log G2 = 1.9731
Taking Antilog, G = (9397 + 2) = 93.99
= 94
∴ Geometric Mean of other two = 94
[22] You fly to a place X in a Boeing at a speed of 500 miles per hour and came back
from X, following the same route, at a speed of 160 mp.h. what is your average
speed for the to-and-fro journey?
[C.U., M.Com. '72]
2
Solution : Average speed =
1 1
+
500 160
2
=
8 + 25
4000
2 8000
www.gayali.in
= = = 242.4 m.p.h.
33 33
4000
[23] An aeroplane flies around a square the sides of which measure 100 Kms each.
The aeroplane covers at a speed of 100 Kms. per hour the first side, and at 400
Kms. per hour the fourth side. Use the correct mean to find the average speed
round the square.
[I.C.W.A. '78]
4
Solution : Average speed =
1 1 1 1
+ + +
100 200 300 400
4 4
= = = 192 k.p.h.
12 + 6 + 4 + 3 25
1200 1200
[24] If two grades of oranges sell at 10 for Rs.1 and 20 for Rs.1 respectively, calculate
the average price per orange, statigng your assumptions explicitly.
[B.U., B.A. (Eco.) '69]
www.gayali.in
www.gayali.in
Statistics Made Easy | 64
1
1 orange for Rs.
20
X
X orange for Rs.
20
X X 3X
+
3X 1
Average price = 10 20 = 20 = ×
2X 2X 20 2 X
3
= × 100
40
= 7.5 P.
[25] The weights (in lbs.) of 8 persons are 138, 143, 141, 139, 152, 148, 160 and 267.
Find the average weight using a suitable form of average. Give reasons for your choice.
Solution :
No. Weight (arranged in order of magnitude)
1. 138 143 + 148
2. 139 Average Weight =
2
www.gayali.in
3. 141
291
4. (143) = = 145.5 lbs.
5. (148) 2
Since there are one extremely large value, A.M. will not be
6. 152
7. 160 suitable, Mode does not exist.
8. 267 Hence Median is the suitable average, which is 145.5 lbs.
[26] Find the mean and the median for the following data and comment on the shape
of the distribution :
Weight in Kg. 36–40 41–45 46–50 51–55 56–60 61–65 66–70
No. of persons 14 26 40 53 50 37 25
[I.C.W.A., '75 - old]
Solution :
Table : Calculations for Mean and Median
Cumulative
Weight
Frequency (f) Mid-Value (x) y = x − 53 fy Class Boundary frequency
(Kg.)
5 (less-than)
36–40 14 38 –3 –42 35.5–40.5 14
41–45 26 43 –2 –52 40.5–45.5 40
46–50 40 48 –1 –40 45.5–50.5 80
www.gayali.in
N
← = 122.5
2
51–55 53 53 0 0 50.5–55.5 133
56–60 50 58 1 50 55.5–60.5 183
61–65 37 63 2 74 60.5–65.5 220
66–70 25 68 3 75 65.5–70.5 245=N
Total 245 – – 65
www.gayali.in
Statistics Made Easy | 65
65
x = 53 + × 5 = 53 + 1.33 = 54.33
245
N
−F
122.5 − 80 42.5
Median = l1 + 2 × c = 50.5 + × 5 = 50.5 + × 5 = 50.5 + 4.01 = 54.5 Kg.
fm 53 53
[27] The G.M., H.M. and A.M. of three observations are 3.63, 3.27 and 4 respectively.
Find the observations.
[C.U., M.Com. '75]
Solution : Let the observations be x, y and z
As per given condition,
www.gayali.in
3 xyz = 3.63
∴ xyz = 47.83 –––– (1)
3
= 3.27
1 1 1
+ +
x y z
3
yz + xz + xy 3xyz 3 × 47.83
= 3.27 or, = 3.27 or, = 3.27
xyz xy + yz + zx xy + yz + zx
∴ xy + yz + zx = 43.88 –––– (2)
x+y +z
=4
3
or, x + y + z = 12 –––– (3)
Also, for A.M. of x, y, z
x+z
y= or, 2y = x + z –––– (4)
2
Putting the value of x + z in (3) we get
2y + y = 12
www.gayali.in
or, y = 4 and x + z = 2 × 4 = 8
Putting the value of (x + z) y = 8 × 4 = 32
xy + zy = 32 in (2)
zx = 43.88 – 32 = 11.88 = 12 (approx.)
z (8 – z) = 12 or, z2 – 8z + 12 = 0
or, (z – 6)(z – 2) = 0
z = 6 or 2
www.gayali.in
Statistics Made Easy | 66
when z = 6, y = 4
x + y + z = 12
x=2
∴ the observations are 2, 4 and 6
[28] Using a suitable formula calculate the median value from the following data :
Mid Value 115 125 135 145 155 165 175 185 195 Total
Frequency 6 25 48 72 116 60 38 22 3 390
[C.A., '66]
Solution :
Table : Calculation for Median.
Mid Value Class boundaries Frequency (f) Cumulative frequency (less-than)
115 110–120 6 6
125 120–130 25 31
135 130–140 48 79
145 140–150 72 151
N
www.gayali.in
← = 195
2
155 150–160 116 267
165 160–170 60 327
175 170–180 38 365
185 180–190 22 387
195 190–200 3 390 = N
Total – 390 –
N
−F
Median = l1 + 2 ×c
fm
195 − 151
= 150 + × 10
116
44
= 150 + × 10
116
= 150 + 3.80
= 153.80
∴Median = 153.80
[29] In a group of 1000 wage earners the monthly wages of 4% are below Rs.60 and
www.gayali.in
those of 15% are under Rs.62.50. 15% earned Rs.95 and over, and 5% got Rs.100 and
over. Find the median wage.
[B.U., B.A. (Eco.) '70]
Solution :
4% i.e. 40 wage earners earns below Rs.60
15% i.e. 150 wage earners earns below Rs.62.50
www.gayali.in
Statistics Made Easy | 67
www.gayali.in
= 62.50 + 16.25
= 78.75
∴Median = Rs.78.75
[30] The table below gives the frequency distribution of weights of 80 apples :
Weight (gms.) 110–119 120–129 130–139 140–149 150–159 160–169 170–179 180–189
Frequency 5 7 12 20 16 10 7 3
Draw the cumulative frequency diagram and hence determine the median
weight of an apple.
[I.C.W.A. '76]
Solution :
Table : Calculation of median weight
Weight (gms) Frequency (f) Class boundary Cumulative frequency (less-than)
110–119 5 109.5 0
120–129 7 119.5 5
130–139 12 129.5 12
140–149 20 139.5 24
www.gayali.in
N
← = 40
2
150–159 16 149.5 44
160–169 10 159.5 60
170–179 7 169.5 70
180–189 3 179.5 77
Total 80 189.5 80 = N
www.gayali.in
Statistics Made Easy | 68
80
70
60
Cumulative Frequency
50
40
30
20
10
www.gayali.in
7.5
14
0
9.5 9.5 9.5 39.5 49.5 59.5 69.5 9.5 9.5
10 11 12 1 1 1 1 17 18
Class boundary
∴ Median = 147.5
[31] Draw the less that Ogive and estimate the value of median on the basis of the
data given below :
Mid-Point 18 25 32 39 46 53 60
Frequency 10 15 32 42 26 12 9 N=146
[C.A. ’74]
Solution:
Table: Ogive (less-than) for data given
Mid-point Clan boundary Frequency Cumulative frequency (less than)
18 14.5 0 0
25 21.5 10 10
32 28.5 15 25
www.gayali.in
39 35.5 32 57
N
← = 73
2
46 42.5 42 99
53 49.5 26 126
60 56.5 12 137
63.5 9 146=N
Total - 146 -
www.gayali.in
Statistics Made Easy | 69
n)
ha
st
les
e(
giv
O
Median = 38.2
[32] An incomplete frequency distribution is given below :
www.gayali.in
Height (inches) 5.1-6.0 6.1-7.0 7.1-8.0 8.1-9.0 9.1-10.0 10.1-11.0 11.1-12.0
No of Plants 3 8 27 ? 17 11 9
It is known that the median height of the plant in 8.53 inches. Calculate the
missing frequency.
[I.C.W.A. '72]
Solution: let the missing frequency be f4
Table: Calculation for missing frequency
Height (class boundary) Frequency Cumulative Frequency (less-than)
5.05-6.05 3 3
6.05-7.05 8 11
7.05-8.05 27 38
N 75 + f 4
← =
2 2
8.05-9.05 f4 38+f4
9.05-10.05 17 55+f4
10.05-11.05 11 66+f4
11.05-12.05 9 75+f4 = N
Total 75+f4
www.gayali.in
Clearly, Median lies in the clan 8.05-9.05 and cumulative frequency is move
than 38 but less than 38+ f4
As per gives condition
75 + f4
− 38 75 + f4 − 76
8.53 = 8.05 + 2 × 1 Or, 0.48 =
f4 2f4
www.gayali.in
Statistics Made Easy | 70
www.gayali.in
100-110 4 4
110-120 7 11
120-130 15 26
130-140 a 26+a
N
← = 75
2
140-150 40 66+a
150-160 b 66+a+b
160-170 16 82+a+b
170-180 10 92+a+b
180-190 6 98+a+b
190-200 3 101+a+b=N
Total 101+a+b
75 − (26 + a)
Median = 140 + × 10
40
49 − a
146.25 = 140 +
4
49 − a
Or, 6.25 = Or, 25 = 49 – a
4
a = 24
a + b = 49
www.gayali.in
Or, 24 + b = 49 ∴b = 25
Therefore, missing entries are
a = 24, b = 25
[34] Calculate the value of the mode by the usual formula (after grouping if necessary) :
x 10-20 20-30 30-40 40-50 50-60 60-70 70-80 80-90 90-100 100-110
f 4 6 5 10 20 22 24 6 2 1
www.gayali.in
Statistics Made Easy | 71
[C .A. ‘74]
Solution : Table – Calculation for Mode
Class boundaries Frequency (f) Regrouped Class Frequency
(x) boundary
10-20 4 10-30 10
20-30 6 30-50 15
30-40 5 50-70 42
40-50 10 70-90 30
50-60 20 90-110 9
60-70 22
70-80 24
80-90 6
90-100 2
100-110 1
Mode lies in the class boundary 50–70
f0 − f−1 42 - 15 27
Mode = l1 + × c = 50 + × 20 = 50 + × 20
2f0 − f−1 − f1 2 × 42 - 15 - 30 84 - 45
27
= 50 + × 20 = 50 + 13.85 = 63.85
39
www.gayali.in
∴ Mode = 63.8
[35] From the following distribution of weekly earnings, calculate (i) the most usual
wage, and (ii) the percentage earning more than Rs. 31.50
Weekly earning (Rs.) 25 - 26 - 27 - 28 - 29 - 30 - 31 - 32 - 33 - 34 - 35-36 Total
No. of persons 25 70 210 275 430 550 340 130 90 55 25 2200
[ I. C. W. A. ‘73]
Solution : Total Calculation for Mode
Weekly earnings (Rs.)(x) Frequency (f) Class boundary Cumulative frequency (less than)
25-26 25 25 0
26-27 70 26 25
27-28 210 27 95
28-29 275 28 305
29-30 430 29 580
30-31 550 30 1010
31-32 340 31 1560
←31.5 ←x
32-33 130 32 1900
33-34 90 33 2030
34-35 55 34 2120
35-36 25 35 2175
www.gayali.in
36 2200
(i) Mode lies in the class 30-31
550 - 430
Mode = 30 + ×1
2 × 550 - 430 - 340
120 120
= 30 + = 30 + = 30 + 0.36 = 30.36
1100 - 770 330
www.gayali.in
Statistics Made Easy | 72
www.gayali.in
20-30 25 51 19 0 0
30-40 35 78 27 1 27
40-50 45 97 19 2 38
50-60 55 109 12 3 36
Total 109 - 54
54
Mean ( x ) = 25 + × 10 = 25 + 4.95 = 29.95 years
109
27 − 19 8 8
Mode = 30 + × 10 = 30 + × 10 = 30 + × 10 = 35 years
2 × 27 − 19 − 19 54 - 38 16
[37] From the following cumulative frequency distribution of marks obtained by 22
students, calculate (a) arithmetic mean, (b) Median and (c) Mode
Marks No. of students
Below 10 3
Below 20 8
Below 30 17
Below 40 20
Below 50 22
[I.C.W.A ‘77]
Solution: Table : Calculation for A.M.
www.gayali.in
Cumulative Class
Class limits Frequency (f) Mid-value (x) y = x − 24.5 fy
frequency 10 boundary
0-9 3 3 4.5 -2 -6 0.5-9.5
10-19 8 5 14.5 -1 -5 9.5-19.5
20-29 17 9 24.5 0 0 19.5-29.5
30-39 20 3 34.5 1 3 29.5-39.5
40-49 22 2 44.5 2 4 39.5-49.5
Total 22 - -4
www.gayali.in
Statistics Made Easy | 73
4
Mean ( x ) = 24.5 + − × 10 = 24.5 − 1.82 = 22.68 = 22.7
22
11 − 8
Median = 19.5 + × 10 = 19.5 + 3.33 = 22.83 = 22.8
9
9−5
Mode = 19.5 + × 10 = 19.5 + 4 = 23.5
18 - 5 - 3
[38] The table below given the numbers (f) of candidates obtaining marks (x) or
higher in a certain examination (all marks are given in whole number).
x 10 20 30 40 50 60 70 80 90 100
f 140 133 118 100 75 45 25 9 2 0
Calculate the mean and the median marks obtained by the candidates.
[ I.C.W.A.’75]
Solution: Table:- Calculation of A.M. and median
Marks (x) or Cumulative frequency Class Cumulative frequency
Frequency (f)
Higher (more than) interval (less than)
10 140 10-20 7 7
www.gayali.in
20 133 20-30 15 22
30 118 30-40 18 40
40 100 40-50 25 65
←
50 75 50-60 30 95
60 45 60-70 20 115
70 25 70-80 16 131
80 9 80-90 7 138
90 2 90-100 2 140=N
100 0
Total 140
Median value is at cumulative frequency to which lies at the class interval 50-60
70 − 65 5
Median = 50 + × 10 = 50 + × 10 = 50 + 1.67 = 51.67 = 51.7 (roundup)
30 30
x − 55
Mid-value (x) y= )f( fx
10
15 -4 7 -28
25 -3 15 -45
www.gayali.in
35 -2 18 -36
45 -1 25 -25
55 0 30 0
65 1 20 20
75 2 16 32
85 3 7 21
95 4 2 8
Total – 140 -53
www.gayali.in
Statistics Made Easy | 74
−53
x = 55 + × 10
140
= 55 – 3.78
= 51.22
∴ Arithmetic Mean = 51.22
[39] Calculate the values of (i) mean (ii) median and (iii) the two quartiles:
Income (Rs.1000) Under 1 1-2 2-3 3-5 5-10 10-25 25-50 50-100 100-1000
No. of persons 13 90 81 117 66 27 6 2 2
[C.U.M.Com.’74]
Solution: Table : Calculation for mean, median and quartiles
Income (Rs.1000) Cumulative frequency
Frequency (f) Mid-value (x) fx )less-than (
(class marks)
0-1 13 0.5 6.5 13
404
← Q1 = = 101
4
1-2 90 1.5 135.0 103
www.gayali.in
2-3 81 2.5 202.5 184
N 404
←Q2 = = = 202
2 2
3-5 117 4 468.0 301
3N 3 × 404
← Q3 = = = 303
4 4
5-10 66 7.5 495.0 367
10-25 27 17.5 472.5 394
25-50 6 37.5 225.0 400
50-100 2 75 150.0 402
100-1000 2 550 1100.0 404=N
Total 404 3254.5
3254.5
[i] Arithmetic Mean ( x ) = = 8.06
404
202 − 184 18
[ii] Median = 3 + ×2 = 3 + × 2 = 3.31
117 117
101 − 13 88
[iii] First Quartile (Q1) = 1 + ×1 = 1 + = 1.98
90 90
303 − 301 2 10
Third Quartile (Q3) = 5 + ×5 = 5 + ×5 = 5 + = 5.15
66 66 66
www.gayali.in
[40] In a moderately asymmetrical distribution, the mean and the median are
respectively 25.6 and 26.1 inches. What is the mode of the distribution?
[I.C.W.A.’71]
Solution:
Mean – Mode = 3 (Mean – Median)
25.6 – Mode = 3 (25.6 – 26.1) = 3 × –0.5 = –1.5
Mode = 25.6 + 1.5 = 27.1 inches
www.gayali.in
Statistics Made Easy | 75
[41] In a moderately skewed distribution, Arithmetic Mean = 24.6, and the Mode = 26.1.
Find the value of the Median and explain the reason for the method employed.
[C.A.’67]
Solution:
Mean – Mode = 3 (Mean – Median)
24.6 – 26.1 = 3 (24.6 – Median)
– 1.5 = 73.8 – 3 Median
3 Median = 73.8 + 1.5 = 75.3
75.3
Median = = 25.1
3
For unimodal distributions of moderate skewness the following approximate
relation has been found to hold :
Mean – Mode = 3 (Mean – Median)
[42] Calculate arithmetic mean, and median of the frequency distribution given
below. Hence calculate the mode using the empirical relation between the three.
Class limits 130-134 135-139 140-144 145-149 150-154 155-159 160-164
Frequency 5 15 28 24 17 10 1
www.gayali.in
[I.C.W.A. '74]
Solution: Table: Calculation for A .M., Median
Class Cumulative
Class limit Mid-value (x) Frequency (f) y = x −147 fy
5 boundary frequency
130-134 132 5 -3 -15 129.5-134.5 5
135-139 137 15 -2 -30 134.5-139.5 20
140-144 142 28 -1 -28 139.5-144.5 48
N
← = 50
2
145-149 147 24 0 0 144.5-149.5 72
150-154 152 17 1 17 149.5-154.5 89
155-159 157 10 2 20 154.5-159.5 99
160-164 162 1 3 3 159.5-164.5 100=N
Total 100 -33
33
Arithmetic Mean ( x ) =147+ − ×5=147–1.65=145.35
100
50 − 48 2
Median=144.5+ ×5=144.5+ ×5=144.5+0.42=144.92
24 24
www.gayali.in
www.gayali.in
Statistics Made Easy | 76
www.gayali.in
following table :–
Weekly wages (Rs.) 12.5-17.5 17.5-22.5 22.5-27.5 27.5-32.5 32.5-37.5 37.5-42.5 42.5-47.5 47.5-52.5 52.5-57.5
No. of workers 12 16 25 14 13 10 6 3 1
2n 3n
Calculate the three quartiles of the above distribution taking n/4, 4 and as
4
their respective ranks.
[C.A.’63]
Solution : Table : Calculations for Quartiles
Class Boundaries Frequency (f) Cumulative frequencies (less-than)
12.5-17.5 12 12 n
← = 25
4
17.5-22.5 16 28 n
← = 50
2
22.5-27.5 25 53
27.5-32.5 14 67 3n
← = 75
4
32.5-37.5 13 80
37.5-42.5 10 90
42.5-47.5 6 96
www.gayali.in
47.5-52.5 3 99
52.5-57.5 1 100=N
Total 100
n 25 − 12 13
First Quartile (Q1)
= 17.5 + =
× 5 = 17.5 + × 5
4 16 16
65
= 17.5 + = 17.5 + 4.06 = 21.56 (Rs.)
16
www.gayali.in
Statistics Made Easy | 77
n 50 − 28 22
2nd Quartile (Q2) =
= 22.50 + × 5 = 22.50 + × 5
2 25 25
110
= 22.50 + = 22.5 + 4.4 =26.90 (Rs.)
25
3n 75 − 67 8
3rd Quartile (Q3) = = 32.50 + × 5 = 32.50 + × 5
4 13 13
= 32.50 + 3.08 = 35.58 (Rs.)
[45] The following table shows the age distribution of heads of families in a certain
country during the year 1957. Find the median, the third quartile and the second
decile of the distribution. Check your results by the graphical method:
Age of head of family (Yrs.) Under 25 25–29 30–34 35–44 45–54 55–64 65–74 Above 74 Total
Number (Million) 2.3 4.1 5.3 10.6 9.7 6.8 4.4 1.8 45.0
[I.C.W.A. '73]
Class boundary Frequency Cumulative frequency (less-than)
0.5–24.5 2.3 2.3
24.5–29.5 4.1 6.4 2N
←
www.gayali.in
=9
10
29.5–34.5 5.3 11.7
34.5–44.5 10.6 22.3 N
← = 22.5
2
44.5–54.5 9.7 32.0
3N
← = 33.75
4
54.5–64.5 6.8 38.8
64.5–74.5 4.4 43.2
74.5–above 1.8 45.0=N
Total 45.0
22.5 − 22.3 0.2 2
Median = 44.5 + × 10 = 44.5 + × 10 = 44.5 + =44.5+0.21=44.7 years.
9. 7 9. 7 9. 7
3N 33.75 − 32.0 1.75
Third Quartile = 54.5 + × 10 = 54.5 + × 10
4 6 . 8 6. 8
17.5
= 54.5 + = 54.5 + 2.57 = 57.07 = 57.1 years
6. 8
2N 9 − 6. 4 2. 6
Second Decile = 29.5 + 5.3 × 5 = 29.5 + 5.3 × 5 = 29.5+2.45=32.95=32 years.
www.gayali.in
10
[46] For an income distribution of a group of men, 20 percent of men have income
below Rs. 30, 35 percent below Rs.70, 60 percent below Rs.150 and 80 percent below
Rs.250, The first and third quartiles are Rs.50 and Rs.170
Put the above information in a cumulative frequency distribution and find the median.
[C.U. M.Com. ‘66]
www.gayali.in
Statistics Made Easy | 78
Solution:
Income below Rs.30 the frequency is 20%
Income below Rs.70 the frequency is 35%
Income below Rs.150 the frequency is 60%
Income below Rs.250 the frequency is 80%
Income below Rs. 50 the frequency is 25%
Income below Rs.170 the frequency is 75%
Table : Calculations for Median
Income (Rs.) Cumulative frequency in % (less than)
30 20
50 25
70 35 N
Median→ ← = 50
2
150 60
170 75
250 80
Above 250 100=N
By interpolation,
www.gayali.in
Median − 70 50 − 35 Median − 70 15 15
=
150 − 70 60 − 35 Or, 80
=
25
Or, Median − 70 = × 80 = 48
25
Median = 70 + 48 = 118
∴Median = Rs.118
[47] For a group of 5000 workers the weekly wages vary from Rs.20 to Rs.80. The wages
of 4 percent of the workers are under Rs.25 and those of 10 percent are under Rs.30; 15
percent of the workers earn Rs.60 and over, and 5 percent of them get Rs.70 and over. The
quartile wages are Rs.40 and Rs.54, and the sixth decile is Rs.50”. Put the above information
in the form of a frequency distribution and find the mean wage there from.
[I.C.W.A. ‘71]
Solution : Number of workers = 5000
Under Rs.25 the frequency is 4% i.e. 200
Under Rs.30 the frequency is 10% i.e. 500
Under Rs.60 & over frequency is 15% i.e. 750
Under Rs.70 & over frequency is 5% i.e. 250
Under Rs.40 the frequency is 25% i.e. 1250
Under Rs.54 the frequency is 75% i.e. 3750
Under Rs.50 the frequency is 60% i.e. 3000
Class boundary Frequency (f)(less-than) Mid-value (x) x − 45 fy
y=
2. 5
www.gayali.in
www.gayali.in
Statistics Made Easy | 79
4600 11500
Arithmetic Mean ( x ) = 45 + × 2.5 = 45 + = 45 + 2.30 = 47.30
5000 5000
∴ Arithmetic Mean = Rs.47.30
[48] For a certain group of ‘Saree’ weavers of varanashi, the median and quartile of
earnings per week are Rs.44.30, Rs.43.00 and Rs.45.90 respectively. Ten percent of the
group earn under Rs.42 per week and 13% earn Rs.47 and over, and 6% Rs.48 and
over. The range of earnings per week is Rs.40 – Rs.50. Put the data into a frequency
distribution.
[C.U., B.A. (Eco.) ‘70]
Solution:
As per condition given.
Earning per week under 42.00 cumulative frequency 10%
Earning per week under 43.00 cumulative frequency 25%
Earning per week under 44.30 cumulative frequency 50%
Earning per week under 45.90 cumulative frequency 75%
Earning per week under 47.00 cumulative frequency 87%
Earning per week under 48.00 cumulative frequency 94%
Table: Frequency Distribution of Wages
www.gayali.in
Weekly Wages (Rs.) Cumulative Frequency (%)(less-than) Frequency (%)
40-42 10 10
42-43 25 15
43-44.30 50 25
44.30-45.90 75 25
45.90-47.00 87 12
47.00-48.00 94 7
48.00-50.00 100 6
[50] Given below the frequency distribution of carbon content (percent) in 150
determinations of a certain mixed powder.
Percent carbon 4.0-4.1 4.2-4.3 4.4-4.5 4.6-4.7 4.8-4.9 5.0-5.1 5.2-5.3 5.4-5.5 5.6-5.7
Frequency 1 2 7 20 25 30 10 25 30
Compute the arithmetic mean and median.
[I.C.W.A. ‘78]
Solution : Table : Calculations for A.M. and Median
Class x = 4.85 Class Cumulative
limit (%) Frequency (f) Mid-value (x) y = fy boundary frequency
0. 1
4.0 – 4.1 1 4.05 -8 -8 3.95 – 4.15 1
4.2 – 4.3 2 4.25 -6 -12 4.15 – 4.35 3
4.4 – 4.5 7 4.45 -4 -28 4.35 – 4.55 10
4.6 – 4.7 20 4.65 -2 -40 4.55 – 4.75 30
www.gayali.in
www.gayali.in
Statistics Made Easy | 80
402 40.2
Arithmetic Mean ( x ) = 4.85 + × 0.1 = 4.85 + = 4.85 + 0.268 =5.118
150 150
Arithmetic Mean = 5.118%
75 − 55
Median = 4.95 + × 0.20 = 4.95 + 0.133 = 5.083
30
Median = 5.083%
[52] Compute the arithmetic mean, median and mode of the following distribution
and explain their relationships :
Monthly income (Rs.) 0-75 75-150 150-225 225-300 300-375 375-450
Frequency 15 200 250 225 10 5
[C.U., M.Com.'76]
Solution : Calculations for A.M., Median and Mode
x - 187.5 Cumulative Fre-
Class boundary Frequency (f) Mid-value (x) y = fy
75 quency (less-than)
0-75 15 37.5 -2 -30 15
www.gayali.in
75-150 200 112.5 -1 -200 215
150-225 250 187.5 0 0 465
225-300 225 262.5 1 225 690
300-375 10 337.5 2 20 700
375-450 5 412.5 3 15 705=N
Total 705 - - 30
30
Arithmetic Mean ( x ) = 187.5 = + × 75
705
= 187.5 + 3.19 = 190.69
Arithmetic Mean = Rs.190.69
352.5 − 215
Median = 150 + × 75
250
137.5 10312.5
= 150 + × 75 = 150 + =150+41.25=191.25
250 250
∴ Median = Rs.191.25
250 − 200
Mode = 150 + × 75
2 × 250 − 200 − 225
50 50
= 150 + × 75 = 150 + × 75 = 150 + 50 = 200
500 − 425 75
www.gayali.in
∴ Mode = Rs.200
The distribution is very skewed. Hence, the relation, Mean – Mode = 3(Mean –
Median) does not hold good.
[53] [a] z1=x1+y1, z2=x2+y2,----zn=xn+yn, Then prove that z = x + y , where the
symbols have their usual meaning.
[b] Prove that the logarithm of geometric mean of observations is the
www.gayali.in
Statistics Made Easy | 81
( f1 f2
In case of weighted G. M. (G) = x1 , x 2 , - - - x n
both sides, we have
fn
) N
taking logarithms of
1
log G =
f1 log x1 + f2 log x 2 + - - - fn log x n
N
1
= Σ fi log x i
www.gayali.in
N
[54] The following are the population figures (in thousand) of 10 cities. Find the
median: 2488, 1490, 777, 733, 522, 672, 591, 407, 387 and 391.
[D.M.’78]
Solution : The figures are arranged in order of magnitude:
387, 391, 407, 522, (591,) (672), 733, 777, 1490, 2488
There are 10 observation i e. ever number of observation. Hence, median is the
arithmetic mean of the two middle most observations.
591 + 672 1263
Median = = = 631.5 = 631.5
2 2
∴ Median = 631.5 thousands.
[55] The Mean weight per student in a group of 6 students is 119 lbs. The individual
weights of 5 of them are 115 lbs.,109 lbs., 129 lbs., 117 lbs, and 114 lbs. what is the
weight of the other student of the group?
Solution: Let the mean weight of the remaining student be x.
As per condition given,
115 + 109 + 129 + 117 + 114 + x
119 =
6
584 + x
or, 119 = 6
www.gayali.in
www.gayali.in
Statistics Made Easy | 82
www.gayali.in
312
Mean monthly saving = = Rs.26
12
[58] The following data show the length of ear – head (in cm.) for 24 ears of a variety
of wheat. Compute the mean and the median.
11.5 8.8 10.1
8.2 9.3 10.0
9.7 10.1 10.3
10.3 11.3 9.8
10.7 9.8 9.3
8.6 10.4 9.8
11.3 8.4 9.0
10.7 9.6 11.2
Solution : Table: Calculations for A.M. and Median
Class limit Frequency (f) Mid-value (x) fx Class boundary Cumulative frequency
8.0-8.70 3 8.35 25.05 7.995-8.705 3
8.71-9.41 4 9.06 36.24 8.705-9.415 7 N
← = 12
2
9.42-10.12 8 9.77 78.16 9.415-10.125 15
www.gayali.in
www.gayali.in
Statistics Made Easy | 83
[59] For a certain frequency table with total frequency 150, the mean was found to be
Rs.76.47. But while copying out the table, a typist Left out two of the class frequency,
say f * and f **, as that the table is given to you in the following form :
Weekly wages in Rs. (mid-value) 65 70 75 80 85 90 95 Total
Frequency 5 48 f* 30 f** 8 6 150
Determine f * and f **
Solution: Calculation for Arithmetic Mean
Mid-value (x) Frequency (f) fx
65 5 325
70 48 3360
75 f* 75 f*
80 30 2400
85 f** 85 f**
90 8 720
95 6 570
Total 97 + f* + f** 7375 +75 f* +85 f**
www.gayali.in
97 + f* + f** = 150
f* + f** = 53 ––––(1)
* **
7375 + 75f * + 85f **
A.M. ( x ) = 7375 + 75f + 85f Or, 76.47 =
150 150
Or, 11470.5=7375+75f*+85f** Or, 75f*+85f**=4095.5 Or, 15f*+17f**=819.1---(2)
Equation (1)×15, 15f*+15f**=795.0 ----- (3)
Equation (2) – (3), we get 2f**=24.1
24.1
∴ f**= = 12
2
Putting the value of f** in equation (1), we get
f* + 12 = 53
f* = 53–12 = 41
[60] The number of telephone calls received at an exchange per interval for 245
succesive one – minute intervals are shown in the following frequency distribution :
Number of calls Frequency
0 14
1 21
2 25
www.gayali.in
3 43
4 51
5 40
6 39
7 12
Total 245
Evaluate the mean, median and mode
www.gayali.in
Statistics Made Easy | 84
58
Arithmetic Mean ( x ) = 4 + − = 4 – 0.24 = 3.76
248
Mode = 4 (maximum frequency)
www.gayali.in
N + 1 245 + 1
Median = Median = = = 123
2 2
Value of x corresponding to cumulative frequency 123 which is 4.
Median = 4.
[61] Compute the mean, median and mode for the following frequency distribution:
Frequency distribution of I.Q. for 309 six - years old children
I.Q. Frequency
160 – 196 2
150 – 159 3
140 – 149 7
130 – 139 19
120 – 129 37
110 – 119 79
100 – 109 69
90 – 99 65
80 – 89 17
www.gayali.in
70 – 79 5
60 – 69 3
50 – 59 2
40 – 49 1
Total 309
www.gayali.in
Statistics Made Easy | 85
www.gayali.in
123
Mean = 104.5 +
× 10 = 104.5 + 3.98 = 108.48
309
154.5 − 93 61.5
Median = 99.5 + × 10 = 99.5 + × 10 = 99.5 + 8.91 = 108.41
69 69
79 − 69
Mode = 109.5 + × 10
2 × 79 − 69 − 37
10 100
= 109.5 + × 10 = 109.5 + = 109.5 + 1.92 = 111.42
158 − 106 25
[62] Determine the median and mode for the following distribution of monthly
income for 580 middle–class people :
Monthly income (Rs.) Frequency
–300 53
300–350 81
350–400 114
400–450 195
450–500 63
www.gayali.in
500–550 32
550–600 20
600–650 11
650–700 8
700– 3
Total 580
www.gayali.in
Statistics Made Easy | 86
www.gayali.in
∴Median = 400 + × 50 = 400 + × 50 = 400 + 10.77 = Rs.410.77
195 195
Mode lies in the class 400 – 450 i.e. maximum frequency of 195
195 − 114 81 81 × 50
∴Mode = 400 + × 50 = 400 + × 50 = 400 +
2 × 195 − 114 − 36 290 − 177 213
4050
= 400 + = 400 + 19.01= Rs.419.01
213
[63] The age-distribution of 4488 Bengali males is given below
Age last birth day Frequency
0 156
1 121
2 111
3 106
4 103
5–9 472
10–14 434
15–19 407
20–24 383
25–29 357
30–34 335
www.gayali.in
35–39 306
40–49 522
50–59 370
60–69 213
70–79 80
80–89 11
90–99 1
Total 4488
www.gayali.in
Statistics Made Easy | 87
www.gayali.in
20 – 24 383 22 0 0
25 – 29 357 27 1 357
30 – 34 335 32 2 670
35 – 39 306 37 3 918
Total 2694 - - -746
746
x 2 = 22 + − × 5 = 22 - 1.38 = 20.62
2694
x 3 − 64.5
Class Marks Frequency (f2) Mid-value (x2) y 3 = f3 y3
10
40 – 49 522 44.5 -2 -1044
50 – 59 370 54.5 -1 -370
60 – 69 213 64.5 0 0
70 – 79 80 74.5 1 80
80 – 89 11 84.5 2 22
90 – 99 1 94.5 3 3
Total 1197 -1309
1309
x 3 = 64.5 + − × 10 = 64.5 – 10.94 = 53.56
1197
Therefore, according formula for composite Mean
www.gayali.in
n1 x1 + n2 x 2 + n3 x3
Arithmetic Mean ( x ) =
n1 + n2 + n3
597 × 1.80 + 2694 × 20.62 + 1197 × 53.56 1074.6 + 55550.28 + 64111.32
= =
597 + 2694 + 1197 4488
120736.2
= = 26.90
4488
www.gayali.in
Statistics Made Easy | 88
MEASURES OF DISPERESION
Meaning :
The word dispersion is used to denote the ‘degree of heterogeneity’ in the data. It
is an important characteristic indicating the extent to which observations vary among
themselves. The dispersion of a given set of observations will be zero, only when all of
them are equal. The wider the discrepancy from one observation to another, the larger
will be the disperstion.
A measure of dispersion is designed to state numerically the extent to which
individual observations vary on the average. There are several measures of dispersion.
Measures of Dispersion
Absolute measures Relative measures
Range Quartile Mean Standard Coefficient Coefficient Coefficient
Deviation Deviation Deviation of variation of Quartile of Mean
www.gayali.in
Deviation Deviation
Range
Range of a set of observations is the difference between the maximum and the
minimum value.
Range = Maximum value – Minimum value
we first obtain the deviations (x1 – A), (x2 – A), - - -, (xn - A). Some of these deviations
may be positive and some negative. If we write |xi - A| to denote the positive value of
(xi –A), whatever be the actual sign, the sum of these ‘absolute deviations’ is |x1 - A| +
|x2 - A| + - - - + |xn - A| = Σ |xi - A| and A.M. of the absolute deviations is
1
Mean Deviation about A = n Σ(xi–A). Mean Deviation (M.D.) is usually
calculated about arithmetic mean ( x ), and hence ‘Mean Deviation’ only refers to M.D.
about mean.
www.gayali.in
Statistics Made Easy | 89
www.gayali.in
2 2 2
n n
Root Mean Square – Deviation from mean, i.e.
1 2
Standard Deviation (σ) = Σ ( xi − x )
n
The square of standard deviation is knownas variance
Variance = (S.D.)2
1
For simple series, σ2 = Σ ( x i − x )
2
n
1
For frequency distribution, σ2 = Σfi ( x i − x )
2
n
S.D. is always considered as positive.
Important properties of S.D.
[a] S.D. is independent of the charge of origin;
i.e. if y = x - c, where c is a constant, then S.D. of x = S.D. of y
In symbol, σx = σy
This implies that the same S.D. will be obtained if each of the observations
is increased or decreased by a constant.
www.gayali.in
[b] If two variables x and z are so related that z = ax + b for each x = xi where
a and b are constant, then σz =| a | σx.
Where | a | denotes the positive value of a
In particular, if y = (x – c)/d, where c and d are constants (d positive), then σx = d. σy
This implies that S.D. does not depend on origin, but depends on scale
of measurement. If each observation is multiplied or divided by a constant, S.D.
will also be similarly affected,
www.gayali.in
Statistics Made Easy | 90
[c] If a group of n1, observations has means x 1 and S.D. σ1 and another group
of n2 observations has mean x 2 and S.D. 62, then S.D. (σ) of the composite group of n1
+ n2 (= N, say) observations an be obtained by the formula
Nσ2 = (n1 σ12 + n2σ2) + (n1 d12 + n2 d2) - - - (i)
Where d1 = x1 − x , d 2 = x2 − x
and Nx = n1 x1 + n2 x2
Relation (i) may be extended to any number of groups :
Nσ2 = Σ ni σi2 + Σ ni di2
Where d i = xi − x , N = Σni and x is the mean of composite group, given by
N x =Σ ni x i
[d] S.D is the minimum root – mean – square – deviation, i.e
1 1
Σ ( xi − x ) ≤ Σ ( xi − A )
2 2
n n
Whatever be the value of A.
Calculation of standard Deviations (σ)
www.gayali.in
If the observation are small, S.D. can be calculated by using the following
relations :
For simple series,
2
Σx 2 Σx
σ2 = −
n n
For frequency distributions,
2
Σfx 2 Σfx
σ2 = −
N N
The calculations can, however, be simplified based on the following results.
[I] If y1, y2, - - - yn represent the deviations of x1, x2, - - -, xn from an
arbitrary constant c, than S.D. of x = S.D. of y.
In symbol, if y = x – c, then σx = σy.
[II] If y1, y2, - - -, yn represent the deviations of x1, x2, - - -, xn from an
arbitrary constant c, in units of another constant d, then
S.D. of x = d (S.D. of y)
x−c
In symbols, if y = , then σx = d.σy :
d
Relative measures of dispersion
www.gayali.in
www.gayali.in
Statistics Made Easy | 91
Lorenz curve
Lorenz curve is a diagram for showing the dispersion of a group. It is, in effect,
a cumulative percentage curve, combining the percentage of items under review with
the percentage of the factor (Say, wealth distribution) among the items. If wealth
were equally distributed among the people, the curve would be the straight line ACB,
connecting the two extremes of the scales. In practice, however, curve like ADB are
obtained. The less the area between the Lorenz curve ADB and the diagonal straight
line ACB, the greater is the homogeneity in the distribution of wealth, i.e. less is the
dispersion.
FIGURE – LORENZ CURVE
100 B
Percentage of Wealth
D
A
o
o 100
Percentage of Population
www.gayali.in
On the other hand, the Larger the area, the larger is the percentage of poor
people and greater is the concentration of wealth in the hands of a few. Lorenz curve
does not yield a numerical measure. It is in this respect, inferior to the familiar
measures of dispersion e.g. Range, Standard deviation, etc. But the advantage is that it
affords a picture of the dispersion at a glance. Lorenz curve is useful in such studies as
the distribution of land, wages and income among the population of a country or the
distribution of profits over different groups in business.
Exercise :
[1] If each item is reduced by 10, what effect would this have on (i) the arithmetic
mean, (ii) the range, and (iii) the standard deviation?
[CA 1964]
Ans. (i) A.M. is reduced ley 10
(ii) & (iii) Range and S.D. unchanged.
[2] If the variables are increased or decreased, (i) by the same amount, (ii) by the
same proportion, what will be the effect on standard deviation?
Ans. (i) The values of standard deviation will be the same as before i.e. unchanged.
(ii) S.D. will be changed in the same proportion.
[3] (i) If the first quartile is 142 and the semi-interquartile range is 18, what is
www.gayali.in
www.gayali.in
Statistics Made Easy | 92
Or, Q3 – 142 = 36
Q3 = 142 + 36 = 178
SD
(ii) Coefficient of Variation (C.V.) = Mean ×100
SD
∴ 40 = × 100
30
40 × 30
Or, SD = = 12
100
[4] Find out the range of the following data
Height (inches) 60–62 63–65 66–68 69–71 72–74
No of students 8 27 42 18 5
[D.S.W. 1973]
Solution: Table: Calculation for Range
Class limits Class boundary Frequency
60–62 59.5–62.5 8
63–65 62.5–65.5 27
66–68 65.5–68.5 42
69–71 68.5–71.5 18
www.gayali.in
72–74 71.5–74.5 5
Total
Highest Value = 74.5
Lowest Value = 59.5
Range = 74.5 – 59.5 = 15
[5] Calculate the quartile deviation and its coefficient from the following :
Cl. Interval 10 - 15 15 - 20 20 - 25 25 - 30 30 - 40 40 – 50 50 – 60 60 - 70 Total
Frequency 4 12 16 22 10 8 6 4 82
3N
← Q3 = 61.5 =
4
30-40 10 64
40-50 8 72
50-60 6 78
60-70 4 82=N
Total 82
www.gayali.in
Statistics Made Easy | 93
20.5 − 16 4. 5 × 5
Q1 = 20 + × 5 = 20 + = 20 + 1.4 = 21.4
16 16
41 − 32 9
Q2 = 25 + × 5 = 25 + × 5 = 25 + 2.05 = 27.05
22 22
61.5 − 54
Q3 = 30 + × 10 = 30 + 7.5 = 37.5
10
Q3 − Q1 37.5 − 21.4 16.1
Quartile Deviation = 2
= = = 8.05
2 2
Quartile Deviation
Coefficient of Quartile Deviation = 100 ×
Median
8.05
= 100 × = 29.76 = 30 (approx.)
27.05
[6] The following table shows the distribution of the maximum loads supported by
certain cables produced by a company :–
Maximum load (short tons) 9.3–9.7 9.8–10.2 10.3–10.7 10.8–11.2 11.3–11.7 11.8–12.2 12.3–12.7 12.8–13.2
www.gayali.in
No of cables 2 5 12 17 14 6 3 1
Find the semi – inter quartile range
[D.S.W. 1968]
Solution: Calculation for semi – Quartile Range
Class limits Class boundary Frequency Cumulative frequency
9.3–9.7 9.25–9.75 2 2
9.8–10.2 9.75–10.25 5 7
N
← = 15 = Q1
4
10.3–10.7 10.25–10.75 12 19
10.8–11.2 10.75–11.25 17 36
3N
← = 45 = Q 3
4
11.3–11.7 11.25–11.75 14 50
11.8–12.2 11.75–12.25 6 56
12.3–12.7 12.25–12.75 3 59
12.8–13.2 12.75–13.25 1 60=N
Total 60
15 − 7 8
Q1 = 10.25 + × 0.50 = 10.25 + × .5 = 10.25 + 0.33 = 10.58
12 12
www.gayali.in
45 − 36 9
Q3 = 11.25 + × 0.50 = 11.25 + × 0.5 = 11.25 + 0.32 = 11.57
14 14
Q3 − Q1 11.57 − 10.58 0.99
Semi-inter quartile range = = = = 0.49 short tons
2 2 2
[7] Find the mean deviation about the arithmetic mean of the number 31, 35, 29,
63, 55, 72, 37.
[B.U.B.com, 1976]
www.gayali.in
Statistics Made Easy | 94
Solution :
1 322
Arithmetic Mean ( x ) = (29 + 31 + 35 + 37 + 55 + 63 + 72) = = 46
7 7
Table : Calculation of Mean Deviation
x |x– x | = |x–46|
29 17
31 15
35 11
37 9
55 9
63 17
72 26
Total 104
1 1
Mean Deviation about Mean = ∑ x − x = × 104 = 14.86 = 14.9 Ans.
n 7
[8] Calculate the mean deviation of the following: 13, 84, 68, 24, 96, 139, 84, 27,
about the median.
www.gayali.in
[B.U.B.com. 1977]
Solution :
Since there are even number of observations, viz. 8, the median is the average of
the two middle – most observations, when arranged in order of magnitude: 13, 24, 27,
(68,84), 84, 96, 139
∴ Median = (68 + 84)/2 = 152/2 = 76
Table : Calculation for Mean Deviation
x |x–Median| i.e. difference from median
13 63
24 52
27 49
68 8
84 8
84 8
96 20
139 63
Total 271
1 1
Mean Deviation about median = Σ x − median = × 271 = 33.9
www.gayali.in
n 8
[9] Find the mean deviation about median from the following data: 46, 79, 26, 85,
39, 65, 99, 29, 56, 72
[C.U.B.com. 1977]
Solution :
Since there are even number of observations, viz. 10, the median is the average
of two middle most 26, 29, 39, 46, (56,65), 72, 79, 85, 99
www.gayali.in
Statistics Made Easy | 95
56 + 65 121
∴ Median = = = 60.5
2 2
www.gayali.in
n 10
[10] Find mean deviation for the following frequency distribution:
Variable 3 5 7 9 11 13
Frequency 2 7 10 9 5 1
[D.M. (Suppl.), 1977]
Solution: Table: Calculations for Mean Deviation
x f fx |x– x | f|x– x |
3 2 6 4.65 9.30
5 7 35 2.65 18.55
7 10 70 0.65 6.50
9 9 81 1.35 12.15
11 5 55 3.35 16.75
13 1 13 5.35 5.35
Total 34 260 - 68.60
260 1 1
=x = 7.65 , Mean Deviation = Σ x − x = × 68.60 = 2.02
34 n 34
www.gayali.in
[11] Calculate the mean deviation from the following data, relating to heights (to the
nearest inches) of 100 children :
Height (inches) 60 61 62 63 64 65 66 67 68
No. of children 2 0 15 29 25 12 10 4 3
[I.C.W.A. 1973]
www.gayali.in
Statistics Made Easy | 96
Solution :
Table : Calculations for Mean Deviation
x f y = x – 64 fy |x– x | f|x– x |
60 2 -4 -8 3.89 7.78
61 0 -3 0 2.89 0
62 15 -2 -30 1.89 28.35
63 29 -1 -29 0.89 25.81
64 25 0 0 0.11 2.75
65 12 1 1 1.11 13.32
66 10 2 2 2.11 21.10
67 4 3 3 3.11 12.44
68 3 4 4 4.11 12.33
Table 100 - -11 - 123.88
11
x = 64 − = 64 − 0.11 = 63.89
100
123.88
Mean Deviation = = 1.24 inches
100
[12] Calculate mean deviation from median from the following :
www.gayali.in
Class interval 2–4 4–6 6–8 8 – 10
Frequency 3 4 2 1
[I.C.W.A. 1977]
Solution :
Table : Calculations for Mean Deviation
Class interval Frequency (f) Cumulative frequency Mid-value (x) |x-Median| f|x-Median|
2–4 3 3 N 3 2 6
Median→ ← =5
2
4–6 4 7 5 0 0
6–8 2 9 7 2 4
8–10 1 10=N 9 4 4
Total 10 - - 14
5−3 2
Median = 4 + ×2 = 4 + ×2 = 5
4 4
1
Mean Deviation = × 14 = 1.4
10
[13] In a certain distribution of N=25 measurements it was found that x = 56 inches
www.gayali.in
and S.D. = 2 inches. After these results were computed it was discovered that a mistake
had been made in one of the measurements which was recorded as 64 inches. Find the
mean and standard deviation, if the incorrect measurement is omitted.
[C.U.M.com, 1962]
Solution:
Here, N = 25, x = 56, S.D = 2
www.gayali.in
Statistics Made Easy | 97
Σx = 25 × 56 = 1400
(–) mistaken record = 64
New ∑x = 1336
1336
New Mean = = 55.67
24
2
Σx 2 Σx
σ2 = −
n n
2
Σx 2 1400 Σx 2
4= − = − 562
25 25 25
Σx2
∴ = 3136 + 4 = 3140
25
∑x2 = 3140 × 25 = 78500
After excluding mistaken record,
∑x2 = 78500 – 642 = 78500 – 4096 = 74404
2
74404 1336
New σ2 = − = 3100.17 – 3098.78 = 1.39
24 24
www.gayali.in
σ = 1.39 = 1.18 inches
[14] The mean and S.D. of a group of 25 observations were found to be 30 and 3 respectively.
After the calculations were made, it was found that two of the observations were incorrect, which
were recorded as 29 and 31. Find the mean and S.D. if the incorrect observations are excluded.
[C.U., B.com. (Hons.)1968]
Solution :
Σx = 25 × 30 = 750
2
2 Σx 2 Σx
σ = −
n n
2
Σx 2 750 Σx 2
− ( 30 )
2
32 = − =
25 25 25
Σx 2
9= − 900
25
Σx 2
= 909
25
∑x2 = 909 × 25 = 22725
When the incorrect items are omitted , we have for the remaining 23 items,
www.gayali.in
∑x = 750 – 29 – 31 = 690
∑x2 = 22725 – 292 – 312 = 22725 – 841 – 961 = 22725 – 1802 = 20923
690
Mean =
( x ) = 30
23
2
20923 690
S.D.2 = − = 909.70 – 900 = 9.70
23 23
S.D. = 9.70 = 3.1
www.gayali.in
Statistics Made Easy | 98
[15] The mean and the standard deviation of a group of 100 observations were found
to be 20 and 3 respectively. After the calculations were made it was found that three of
the observations were incorrect which were recorded as 21, 21 and 18. Find the mean
and s.d. if the incorrect observations are omitted.
[C.U., B.A.(Econ.) 1965]
Solution :
∑x = 100 × 20 = 2000
∴ Σx = nx , n = 100, x = 20
2 2
Σx 2 Σx 2 Σx 2 2000
σ2 = − = −
n n 100 100
Σx 2 Σx 2
− ( 20 ) or ,
2
32 = = 400 + 9 = 409
100 100
∑x2 = 40900
When incorrect items are omitted,
∑x = 2000 – 21 – 21 –18 = 1940
Now, n = 100 – 3 = 97
www.gayali.in
Σx 1940
Now, = = 20
n 97
New, ∑x2 = 40900–212–212–182=40900–441–441–324=40900–1206=39694
39694
− ( 20 ) = 409.22–400=9.22
2
σ2 =
97
∴ σ = 9.22 = 3.04
[16] The mean and the standard deviations of a sample of size 10 were found to be
9.5 and 2.5 respectively. Later on, an additional observation became available. This
was 15.0 and was included in the original sample. Find the mean and the standard
deviations of the 11 observations.
[I.C.W.A. 1975]
Solution :
∑x = 10 × 9.5 = 95
Where n = 10, x = 9.5
Σx 2 Σx 2
− ( 9. 5 ) =
2
As per given condition, 2.52 = − 90.25
10 10
Σx 2
= 6.25 + 90.25 = 96.50
10
www.gayali.in
∑x2 = 965
When additional observation were available,
then, ∑x = 95 + 15 = 110
then again, ∑x2 = 965 + 152 = 965 + 225 = 1190
110
=
x = 10
11
www.gayali.in
Statistics Made Easy | 99
2
1190 110 1190
σ2 = − = − 100 = 108.18 − 100 = 8.18
11 11 11
∴ σ = 8.18 = 2.86
[17] The mean and the standard deviation of a sample of 100 observations were
calculated as 40 and 5.1 respectively, by a student who by mistake took one observation
as 50 instead of 40. Calculate the correct S.D.
[I.C.W.A. 1976]
Solution :
Here, x = 40, n = 100, σ = 5.1
∑x=100×40=4000
2 2
Σx 2 Σx Σx 2 4000
σ2 = − = −
n n 100 100
Σx 2
5.12 = − 1600
100
Σx 2
www.gayali.in
Or, = 26.01 + 1600 = 1626.01
100
∑x2 = 162601
When 50 is replaced by 40, the correct sum of observations are
∑x = 100×40–(50)+40 = 4000–10=3990
∑x2 = 162601–502+402 = 162601–2500+1600 = 162601–900=161701
Using in the formulae for mean and variance.
3990
Mean = = 39.90
100
161701
− ( 39.90 ) = 1617.01–1592.01
2
S.D.2 =
100
∴ S.D. = 25 = 5
[18] For a distribution of 280 observations mean and standard deviations were
found to be 54 and 3 respectively. On checking it was discovered that two observations
which should correctly read as 62 and 82, had been wrongly recorded as 64 and 80
respectively. Calculate the correct values of mean and S.D.
[C.U., B.A.(Econ.) 1969]
Solution :
www.gayali.in
www.gayali.in
Statistics Made Easy | 100
Σx 2
9= − 2916
280
Σx 2
Or, = 2916 + 9 = 2925
280
Or, ∑x2 = 2925×280=819000
When 64 and 80 are replaced by 62 and 82,
∑x = 15120–64–80+62+82 = 15120–144+144=15120
∑x2 = 819000–642–802+622+822 = 819000+622–642+822–802
= 819000+(–2)(126)+2(162) = 819000–252+324=819000+72=819072
Σx 15120
Mean = = = 54
n 280
2
819072 Σx 819072 2
S.D.2 = − = 280 − 54 = 2925.26–2916 = 9.26
280 n
∴ S.D.2 = 9.26 = 3.04
[19] X is the mean of X1, X2, and X3. If x1, x2, x3are the deviations of X1, X2, X3 from
X respectively, prove that x1 + x2 + x3 = X1 + X2 + X3 - 3 X .
2 2 2 2 2 2 2
www.gayali.in
[C.U., B.A. (Econ.) 1969]
Solution :
X1 + X 2 + X 3
X=
3
Or, X1 + X2 + X3 = 3 X
L.H.S. = ( X − X1 ) + ( X − X 2 ) + ( X − X 3 )
2 2 2
= X 2 + X12 − 2 X X1 + X 2 + X 22 − 2 X X 2 + X 2 + X 32 − 2 X X 3
= X12 + X 22 + X 32 − 2 X ( X1 + X 2 + X 3 ) + 3X 2 = X12 + X 22 + X 32 − 2 X.3X + 3X 2
= X12 + X 22 + X 32 − 6 X 2 + 3X 2 = X12 + X 22 + X 32 − 3X 2
= R.H.S. proved.
[20] Let x1, x2, - - -, xn be a set of observations. Suppose we compute yi = a + b xI (i =
1, 2, - - -, n), where a and b are constants, Express the s.d. of the y's in terms of the s.d.
of the x’s and comment on the relation between the two.
[C.U.,B.A.(Econ.) 1978]
Solution :
y i = a + bx i , y = a + bx
www.gayali.in
yi − a
Or, xi =
b
( y i − y ) = ( a + bx i ) − ( a + bx ) = b ( x i − x )
Σ{b ( x i − x )}
2
Σ( yi − y ) b2 Σ ( x i − x )
2 2
2
σ y = = = = b2 σ x 2 = σ y = bσ x
n n n
It is observed from the result that on the right hand side, the new origin ‘a’ is
www.gayali.in
Statistics Made Easy | 101
absent but sale ‘b’ is present. This proves that S.D. is unaffected by any change of origin,
but depends on scale.
[21] If the mean and the standard deviation of n observations x1, x2, - - -, xn be x
and σ respectively then the mean and the stand and deviations of –x1, -x2, - - -, -xn will
be – x and –σ respectively comment.
[I.C.W.A., 1975]
Solution :
Mean of –x1, –x2, ––––, –xn
− ( x1 + x 2 + − − − − + x n ) − Σx
= = = −x
n n
Σ ( xi − x )
2
www.gayali.in
Square Deviation from mean :
− ( x1 − x ) , − ( x 2 − x ) , − − − − , − ( x 2 − x ) − − − − ( x n − x )
2 2 2 2
n
1
Σ ( x i − x ) = −σ
2
S.D. = −
n
[22] If d2 = mean square deviation about x, σ= standard deviation, and x - x = a,
then show that d2 = σ2 + a2
[D.S.W., 1971]
Solution :
Σ(x i − x)2 Σ(a + x − x)2
d2 = , xi = a + x =
n n
(x i − x) = (x i − x ) + ( x − x)
Therefore,
www.gayali.in
www.gayali.in
Statistics Made Easy | 102
d 2 = σ2 + (−a )2 = σ2 + a 2
[23] Calculate the standard deviation from the following Series: 20, 85, 120, 60, 40
[B.U., B. Com. 1971]
Solution :
Table : Calculations for standard Deviation
x − 60
x y= y2
5
20 -8 64
40 5 25
60 12 144
85 0 0
120 -4 16
Total 5 249
2 2
Σy 2 Σ y 249 5
σy= − = − = 49.8 − 1 = 48.8
n n 5 5
σy = 6.98
www.gayali.in
σx = d. (σy) = 5 × 6.98 = 34.9
[24] Find the standard deviation of weights (to the nearest pound) of 15 students
given below: 138, 156, 147, 115, 145, 132, 163, 158, 130, 123, 103, 109, 100, 105, 106.
[B.U., B.A. (Econ.) 1972]
Solution: Table: Calculation for S.D.
x y = x – 130 y2
100 -30 900
103 -27 729
105 -25 625
106 -24 576
109 -21 441
115 -15 225
123 -7 49
130 0 0
132 2 4
138 8 64
145 15 225
147 17 289
156 26 676
158 28 784
www.gayali.in
163 33 1089
Total -20 6676
2 2
2 Σy 2 Σ y 6676 20
σy = − = − − = 445.07 – 1.78 = 443.29
n n 15 15
σy= 443.29 = 21 lbs.
σx=σy=21 lbs.
www.gayali.in
Statistics Made Easy | 103
[25] Calculate the s.d. of the following observations: 240.12, 240.13, 240.15, 240.12,
240.17, 240.15, 240.17, 240.16, 240.22, 240.21.
[I.C.W.A. 1976]
Solution :
Table : Calculations for S.D.
x − 240
x y= y2
.01
240.12 12 144
240.12 12 144
240.13 13 169
240.15 15 225
240.15 15 225
240.16 16 256
240.17 17 289
240.17 17 289
240.21 21 441
240.22 22 484
Total 160 2666
2
www.gayali.in
2
Σy 2 Σ y 2666 160
σy2 = −
n n
= − = 266.6 – 256 = 10.6
10 10
σy= 10.6 = 3.256
σx = d σy= .01 × 3.256 = 0.033
[26] Find the standard deviation for the distribution given below :
x 1 2 3 4 5 6 7
Frequency 10 20 30 35 14 10 2
[Dip. Management, 1967]
Solution :
Calculation for S.D.
x f fx f x2
1 10 10 10
2 20 40 80
3 30 90 270
4 35 140 560
5 14 70 350
6 10 60 360
7 2 14 98
www.gayali.in
www.gayali.in
Statistics Made Easy | 104
of a Parliament:
Age in years 30 40 50 60 70
No. of members 64 132 153 140 51
[C.U.,B.Com. 1978]
Solution :
Table : Calculation for S.D.
x − 50
x (Age in years) f (No. of members) y= fy f y2
10
30 64 -2 -128 256
40 132 -1 -132 132
50 153 0 0 0
60 140 1 140 140
70 51 2 102 204
Total 540 - -18 732
2
732 18
σy2 = − − = 1.3560 - .00111 = 1.3449
540 540
www.gayali.in
σy= 1.3449 = 1.164
σx2 = d. σy = 10 × 1.164 = 11.64
[28] Find the s.d. from the following frequency distribution:
Wt. (lbs.) 120–124 125–129 130–134 135–139 140–144 145–149 Total
No. of boys 12 25 28 15 12 8 100
[B.U., B.Com. 1974]
x −132
Class interval Frequency (f) Mix-value (x) y = fy f y2
5
120–124 12 122 -2 -24 48
125–129 25 127 -1 -25 25
130–134 28 132 0 0 0
135–139 15 137 1 15 15
140–144 12 142 2 24 48
145–149 8 147 3 24 72
Total 100 - - 14 208
2
208 14
σy2 = − = 2.08 − 0.0196 = 2.08 − .02 = 2.062
100 100
www.gayali.in
www.gayali.in
Statistics Made Easy | 105
Weight – (OZ) 3.0–3.1 3.1–3.2 3.2–3.3 3.3–3.4 3.4–3.5 3.5–3.6 3.6–3.7 3.7–3.8 3.8–3.9 3.9–4.0
Frequency 5 10 12 20 25 18 10 8 8 5
Calculate the arithmetic mean and standard deviation.
[C.A. 1972]
Solution:
x − 3.45
Class interval Frequency (f) Mix-value (x) y = .10
fy f y2
3.0–3.1 5 3.05 -4 -20 80
3.1–3.2 10 3.15 -3 -30 90
3.2–3.3 12 3.25 -2 -24 48
3.3–3.4 20 3.35 -1 -20 20
3.4–3.5 25 3.45 0 0 0
3.5–3.6 18 3.55 1 18 18
3.6–3.7 10 3.65 2 20 40
3.7–3.8 8 3.75 3 24 72
3.8–3.9 8 3.85 4 32 128
3.9–4.0 5 3.95 5 25 125
Total 121 - - 25 621
www.gayali.in
2
Σfy 2 Σfy 25
σy2 = − x = 3.45 + × 0.10 = 3.45 + 0.02 = 3.47
n n 121
2
621 25
= − = 5.13 – 0.04 = 5.09
121 121
σy = 5.09 = 2.3
σx = 0.10 × 2.3 = 0.23
[30] Compute the standard deviation of the following data:
Weekly wages in Rs. Number of Men
30 and under 40 8
40 and under 50 12
50 and under 60 6
60 and under 70 4
70 and under 80 10
[B.U.,B.Com. 1973]
Solution :
Calculation for S.D.
x −55
Class interval Frequency (f) Mid-value (x) Y= fy f y2
10
www.gayali.in
30 – 40 8 35 -2 -16 32
40 – 50 12 45 -1 -12 12
50 – 60 6 55 0 0 0
60 – 70 4 65 1 4 4
70 – 80 10 75 2 20 40
Total 40 - - -4 88
2 2
Σfy 2 fy 88 4
σy2 = − = − − = 2.20 – 0.01 = 2.19
n n 40 40
www.gayali.in
Statistics Made Easy | 106
www.gayali.in
2
349 75
σy2= − = 3.79 − 0.66 = 3.13
92 92
σy= 3.13 = 1.768
σx=σx = d. σy = 50 ×1.768 = 88.4
[32] Compute the arithmetic mean, standard deviation and mean deviation about
the mean for the following data :
Scores 4–5 6–7 8–9 10–11 12–13 14–15 Total
f 4 10 20 15 8 3 60
[I.C.W.A., 1978]
Solution :
Table : Calculations for A.M. and S.D.
x −8.5
Class interval f Mid-value (x) Y= fy f y2
2
4–5 4 4.5 -2 -8 16
6–7 10 6.5 -1 -10 10
8–9 20 8.5 0 0 0
10–11 15 10.5 1 15 15
12–13 8 12.5 2 16 32
www.gayali.in
14–15 3 14.5 3 9 27
Total 60 - - 22 100
22
x = 8. 5 + × 2 = 8.5 + 0.73 = 9.23
60
2
2100 22
Σy = − = 1.67 − 0.13 = 1.54
60 60
www.gayali.in
Statistics Made Easy | 107
www.gayali.in
Income (Rs.) Below 200 200–399 400–599 600–799 800–999 1000–1199
No. of earners 25 72 47 22 13 7
[C.U., B.A.(Econ.), 1978]
Solution : Calculations for S.D.
σx = d.σy= 200×1.257=251.40
[34] Find the mean and the s.d. from the following frequency distribution:
Weight (lb.) 131–140 141–150 151–160 161–170 171–180 181–190 191–210 211–240
No. of person 2 5 4 9 7 5 3 1
[I.C.W.A. 1971]
www.gayali.in
Statistics Made Easy | 108
Solution :
Table : calculations for Mean and S.D.
Weight (lb.) f Mid-value (x) x −165.5 fy f y2
y=
10
131–140 2 135.5 -3 -6 18
141–150 5 145.5 -2 -10 20
151–160 4 155.5 -1 -4 4
161–170 9 165.5 0 0 0
171–180 7 175.5 1 7 7
181–190 5 185.5 2 10 20
191–210 3 200.5 3.5 10.5 36.75
211–240 1 225.5 6 6 36
Total 36 - - 13.5 141.75
13.5
x = 165.5 + × 10 = 165.5 + 3.75 = 169.25
36
2
141.75 13.5
σy2= − = 3.938 − 0.1406 = 3.797
36 36
www.gayali.in
∴σy= 3.797 = 1.949
σx = 10×1.949=19.49=19.5 Ans.
[35] Calculate the proportion of firms in which costs of production are within the
range A.M. ± S.D. in the following distribution:
Costs of production (Rs. per 5 litres) 4–6 6–8 8–10 10–12 12–14 14–16 Total
No. of dairy farms 13 111 182 105 19 7 437
[I.C.W.A. 1973]
Solution:
Table: Calculations for A.M. & S.D.
Costs of production Cumulative Mid-value (x) x −9
f frequency y= fy f y2
(class boundary) 2
4–6 13 13 5 -2 -26 52
6–8 111 124 7 -1 -111 111
8–10 182 306 9 0 0 0
10–12 105 411 11 1 105 105
12–14 19 430 13 2 38 76
14–16 7 437 = N 15 3 21 63
Total 437 - - - 27 407
27
x =9+ × 2 = 9 + 0.124 = 9.124
www.gayali.in
437
2
407 27 177859 − 729 177130
σy2= − = =
437 437 4372 4372
177130 420.87
∴σy= = = 0.963
4372 437
σx = 2 × 0.963 = 1.926
A.M. + S.D = 9.124 + 1.926 = 11.05
www.gayali.in
Statistics Made Easy | 109
www.gayali.in
observations are zero. Find the mean and s.d. of 400 observations together.
[B.U., B.A.(Econ.),1966]
Solution :
Table : Calculations for A.M. and S.D.
x f fx f x2
0 300 0 0
1 100 100 100
Total 400 100 100
Σfx 100 1
A.M. ( x ) = = =
Σf 400 4
2 2
Σfx 2 Σfx 100 100 1 1 4 −1 3
σx2= − = − = − = =
n n 400 400 4 16 16 16
3
∴σx=
4
[37] Two samples of sizes 60 and 90 have 52 and 48 as the respective arithmetic means,
and 9 and 12 as the respective standard deviation. Find the arithmetic mean and standard
deviation of the combined sample of size 150.
[I.C.W.A.,1970]
Solution :
www.gayali.in
www.gayali.in
Statistics Made Easy | 110
2. ( ) (
Nσ2 = n1σ12 + n2 σ22 + n1d12 + n2 d 22 )
7440
Using (1) 150 x = 60 × 52 + 90 × 48 = 3120 + 4320 = 7440 = = 49.6
150
( ) (
Nσ2 = n1σ12 + n2 σ22 + n1d12 + n2 d 22 )
d1 = x1 − x , d 2 = x 2 − x
∴ d1 = 52 – 49.6 = 2.4, d2 = 48 – 49.6 = –1.6
150σ2 = 60×92+90×122+60×2.42+90×(–1.6)2=60×81+90×144+60×5.76+90×2.56
= 4860+12900+345.60+230.40 = 18336
18336
σ2 = = 122.24
150
σ = 122.24 = 11.06 = 11.1
[38] The mean of two samples of sizes 50 and 100 respectively are 54.4 & 50.3 and the
standard deviations are 8 and 7. Obtain the mean and standard deviation of the sample of size
150 obtained by combining the two samples. (Give answers correct to one decimal place.)
www.gayali.in
[I.C.W.A.,1978]
Solution :
Table : Mean and S.D. of composite Group
Group
Characteristics Composite Group
I II
No. of observations 50 100 150
Mean 54.4 50.3 x
Standard Deviation 8 7 σ
150 x = 50 × 54.4 + 100 × 50.3 = 2720 + 5030 = 7750
7750
=x = 51.67 = 51.7
150
d1 = 54.4 – 51.7 = 2.7
d2 = 50.3 – 51.7 = –1.4
150σ2 = 50×82+100×72+50×2.72+100×(–1.4)2 = 50×64+100×49+50×7.29+100× 1.96
= 3200+4900+364.50+196 = 8660.50
8660.5
σ2 = = 57.74
150
σ = 57.74 = 7.6
www.gayali.in
[39] An analysis of monthly wages paid to workers in two firms A and B, belonging
to same industry, gives the following results :
Firm A Firm B
Number of wage earners 550 650
Average monthly wages Rs. 50 Rs. 45
S.D. of the distribution of wages Rs. (√90) Rs. (√120)
www.gayali.in
Statistics Made Easy | 111
www.gayali.in
d2 = 45 – 47.29 = -2.29
( ) ( )
2 2
1200 σ2 = 550 × 90 + 650 × 120 + 550 × 2.712 + 650 × ( −2.29 )
2
= 49500 + 78000 + 4039.26 + 3408.67 = 1, 34, 947.93
2 134947.93
σ = = 112.46
1200
σ = 112.46 = 10.60
[40] A company has three establishments E1, E2 and E3 in there cities. Analysis of the
monthly salaries paid to the employees in the three establiments is given below :
E1 E2 E3
Number of employees 20 25 40
Average monthly salary (Rs.) 305 300 340
S.D. of monthly salaries (Rs.) 50 40 45
Find the average and the standard deviation of the monthly salaries of all 85
employees in the company.
[I.C.W.A., 1976]
Solution :
85 x = 20 × 305 + 25 × 300 + 40 × 340 = 6100 + 7500 + 13600 = 27,200
27200
=x = 320
www.gayali.in
85
d1 = 305 – 320 = –15
d2 = 300 – 320 = –20
d3 = 340 – 320 = 20
85 σ2 = 20 × 502 + 25 × 402 + 40 × 452 + 20 × (-15)2 + 25 × (-20)2 + 40 × (20)2
= 50000 + 40000 + 81000 + 4500 + 10000 + 16000 = 201500
σ2 = 201500/85 = 2370.59
σ = 2370.59 = 48.69
www.gayali.in
Statistics Made Easy | 112
[41] Three sets of values of the variable x have means 26.3, 27.0 and 28.5 and standard
deviations 4.5, 3.9 and 4.8. If the three sets have respectively 50, 60 and 55 values. What
would be the mean and variance of x, if the three sets are taken together?
Solution :
n1 + n2 + n3 = 50 + 60 +55 = 165
165 x = 50 × 26.3 + 60 × 27.0 + 55 × 28.5 = 1315 + 1620 + 1567.5 = 4502.50
x = 4502/165 = 27.29
d1 = 26.3 – 27.29 = 0.99, d2 = 27.0 – 27.29 = -0.29
d3 = 28.5 – 27.29 = 1.21
165 σ2 = 50 × 4.52 + 60 × 3.92 + 55 × 4.82 + 50 × 0.992 + 60 × (-0.29)2 + 55 × 1.122
= 50 × 20.25 + 60 × 15.21 + 55 × 23.04 + 50 × 0.9801 + 60 × 0.0841 + 55 × 1.4641
= 1012.50 + 912.60 + 1267.20 + 49 + 5.05 + 80.53 = 3326.88
∴σ2 = 3326.88/165 = 20.16
∴Variance = 20.16
[42] The mean and the variance calculated from a group of 80 observations are 63.2
and 25.9 respectively. If 60 of these observations have mean 64.8 and s.d. 4, find the mean
and the s.d. of the remaining 20 observations.
www.gayali.in
[I.C.W.A., 1971]
Solution :
when n = 80,
∑x = 80 × 63.2 = 5056 - - - - (i)
When n = 60, ∑x1 = 60 × 64.8 = 3888 - - - - (ii)
Sum of remaining 20 observation (i) – (ii) = 1168 = ∑x2 (say)
1168
∴ x2 = = 58.4
20
Here, n1, σ1, x 1 60, 4, 64.8
n2, σ2, x 2 20, 62, 58.4
d1 = x 1 - x = 64.8 – 63.2 = 1.6
d2 = x 2 - x = 58.4 – 63.2 = -4.8
By the formula,
N σ2 = n1 σ12 + n2 σ22 + n1 d12 + n2 d22
80 × 25.93 = 60 × 42 + 20 × σ22 + 60 × 1.62 + 20 × (-4.8)2
2074.4 = 60×16+20 σ22+60×2.56+20×23.04 = 960+20 σ22+153.6+460.80
= 1574.40 + 20 σ22
Or, 20 σ22 = 2074.4 – 1574.4 = 500
σ22 = 500/20 = 25
www.gayali.in
σ2 = 25 = 5
www.gayali.in
Statistics Made Easy | 113
www.gayali.in
Calculate the coefficients of variation and commeant.
[I.C.W.A. 1975]
Standard Deviation
Solution : Coefficient of Variation = × 100
Mean
Table : Calculations for Mean and S.D.
For A For B
x y2 x y2
y = x −10 y = x −8
4 -6 36 12 4 16
8 -2 4 8 0 0
4 -6 36 3 –5 25
15 5 25 15 7 49
10 0 0 6 –2 4
11 1 1 4 –4 16
9 -1 1 10 2 4
Total -9 103 Total 2 90
9
For A : x = 10 + − = 10 − 1.29 = 8.71
7
2
103 9
σ2 = − − = 14.71 − 1.66 = 13.05
www.gayali.in
7 7
σ = 13.05 = 3.61
2
For B : x = 8 + = 8 + 0.29 = 8.29
7
2
90 2
σ2 = − = 12.86 − 0.08 = 12.78
7 7
www.gayali.in
Statistics Made Easy | 114
σ = 12.78 = 3.57
3.61
C.V. (for A) = × 100 = 41.45 = 41.5
8.71
3.57
C.V. (for B) = × 100 = 43.06 = 43.1
8.29
The percentage of dividend is higher in B than A. Hence shares of company B is
more preferable than A.
[45] From the pries of shares x and y below find out which is more stable in value:
x: 35 54 52 53 56 58 52 50 51 49
y: 108 107 105 105 106 107 104 103 104 101
[I.C.W.A., 1976]
Solution :
Table : Calculations for Mean and S.D.
For x For y
x y= y2 y z= z2
x − 52 y − 105
35 –17 289 108 3 9
www.gayali.in
54 2 4 107 2 4
52 0 0 105 0 0
53 1 1 105 0 0
56 4 16 106 1 1
58 6 36 107 2 4
52 0 0 104 –1 1
50 –2 4 103 –2 4
51 –1 1 104 –1 1
49 –3 9 101 –4 16
Total –10 360 Total 0 40
10
For x : x = 52 + − = 52 − 1 = 51
10
2
360 10
σx2= − − = 36 − 1 = 35, σ x = 35 = 5.92
10 10
0
For y : y = 105 + = 105
10
2
40 0
σy2= − =4
10 10
www.gayali.in
σy= 4 =2
5.92
C.V. (for x) = × 100 = 11.61
51
2
C.V. (for y) = × 100 = 1.9
105
Share y is more stable.
www.gayali.in
Statistics Made Easy | 115
[46] Calculate the coefficient of variation from the following data, showing Grades of
100 students in M.A. Mathematics :
Grades 30–39 40–49 50–59 60–69 70–79 80–89 90–99
Frequency 2 3 11 20 32 25 7
[C.U.,M com. 1973]
Solution:
Table : Calculations for Mean and S.D.
www.gayali.in
Total 100 - - 80 236
80
x = 64.5 + × 10 = 64.5 + 8 = 72.5
100
2
236 80
σy 2 = − = 2.36 − 0.64 = 1.72
100 100
σ y = 1.72 = 13.1
σx = 1.31 × 10 = 13.1
13.1
C.V. = × 100 = 18.07 = 18.1
72.5
[47] The mean life in days and standard deviation for two types of electric bulbs are
given below :
Mean life in days Standard Deviation in days
Type I 310 9
Type II 260 14
Compare the relative variability of life of the two types of bulbs.
[B.U., B.A. (Econ.), 1965]
www.gayali.in
Solution:
9
C.V. (type I bulb) = × 100 = 2.29
310
14
C.V. (type II bulb) = × 100 = 5.38 = 5.4
260
Mean life of Electric bulb of Type II is more variable.
www.gayali.in
Statistics Made Easy | 116
[48] You are given the distribution of wages in two factories X and Y.
Wages (Rs.) 50–100 100–150 150–200 200–250 250–300 300–350
X 2 9 29 54 11 5
No. of Workers Y 6 11 18 32 27 11
State in which factory the wages are more variable (Use Standard Deviation and Mean.)
[C.A., 1975]
Solution :
Table : Calculations for Mean and S.D. for X
Wages Frequency (f) Mid-value (x) z=
y −175 fy f y2
50
50–100 2 75 -2 -4 8
100–150 9 125 -1 -9 9
150–200 29 175 0 0 0
200–250 54 225 1 54 54
250–300 11 275 2 22 44
300–350 5 325 3 15 45
Total 110 - - 78 160
Table : Calculations for Mean and S.D. for Y
www.gayali.in
Wages Frequency (f) Mid-value (y) z=
y −175 fz f z2
50
50–100 6 75 –2 –12 24
100–150 11 125 –1 –11 11
150–200 18 175 0 0 0
200–250 32 225 1 32 32
250–300 27 275 2 54 108
300–350 11 325 3 33 99
Total 110 – – 96 274
78
For x : x = 175 + × 50 = 175 + 35.45 = 210.45
100
2
160 78
σy2= − = 1.45 − 0.50 = 0.95
110 110
σy= 0.95 = 0.97, σx = 50 × 0.97 = 48.5
48.5
C.V. = × 100 = 23
210.45
96
For y : y = 175 + × 50 = 175 + 45.71 = 220.71
110
www.gayali.in
2
274 96
σz2= − = 2.61 − 0.84 = 1.77
110 110
σz= 1.77 = 1.33
σy= 50 × 1.33 = 66.5
66.5
C.V. = × 100 = 30.13 = 30
220.71
Wages of factor y is more variable
www.gayali.in
Statistics Made Easy | 117
www.gayali.in
12–14 1 13 1 1 1
14–16 3 15 2 6 12
16–18 1 17 3 3 9
18–20 1 19 4 4 16
20–22 2 21 5 10 50
Total 61 - - -119 581
119
x = 11 + − × 2 = 11 − 3.90 = 7.10
61
2
581 119
σy2= − − = 9.52 − 3.80 = 5.72
61 61
σy= 5.72 = 2.39
σx= 2 × 2.39 = 4.78
4.78
C.V.= × 100 = 67.32
7.10
(for cotton consumed)
3
C.V. for spindle mill = × 100 = 15
20
Dispersion for cotton consumption is more.
www.gayali.in
[50] In a small town, a survey was conducted in respect of profit made by retail
shops. The following results were obtained :
Profit or loss (Rs. '000) -4 to -3 -3 to -2 -2 to -1 -1 to 0 0 to 1 1 to 2 2 to 3 3 to 4 4 to 5 5 to 6
No. of shops 4 10 22 28 38 56 40 24 18 10
Calculate : (i) the average profit made by a retail shop;
(ii) total profit made by all the shops;
(iii) the coefficient of variation of earnings.
[C.A., 1977]
www.gayali.in
Statistics Made Easy | 118
Solution :
Class interval Frequency (f) Mid-value (x) y = x – 0.5 fy f y2
–4 to –3 4 -3.5 -4 -16 64
–3 to –2 10 -2.5 -3 -30 90
–2 to –1 22 -1.5 -2 -44 88
–1 to 0 28 -0.5 -1 -28 28
0 to 1 38 0.5 0 0 0
1 to 2 56 1.5 1 56 56
2 to 3 40 2.5 2 80 160
3 to 4 24 3.5 3 72 216
4 to 5 18 4.5 4 72 288
5 to 6 10 5.5 5 50 250
Total 250 - 5 212 1240
212
x = 0. 5 + = 0.5 + 0.848 = 1.348
250
2
1240 212
σy2= − = 4.96 − 0.719 = 4.24
250 250
σy= 4.24 = 2.06
∴ (i) Average profit = 1.348
www.gayali.in
(ii) Total profit = 250 × 1.348 = 337
2.06
(iii) Coefficient of variation = × 100 = 153
1.348
[51] The following data show the length of ear-head (in cm) for 24 ears of a variety of wheat
11.5 8.8 10.1
8.2 9.3 10.0
9.7 10.1 10.3
10.3 11.3 9.8
10.7 9.8 9.3
8.6 10.4 9.8
11.3 8.4 9.0
10.7 9.6 11.2
Determine the range, the mean deviation about mean and the standard deviation
for the data
Solution : Range = Maximum value – Minimum value
= 11.5 – 8.2 = 3.3 cm.
Table : Calculations for Mean Deviation about mean and s.d.
www.gayali.in
www.gayali.in
Statistics Made Easy | 119
236.61
=x = 9.9 cm
24
2
2351.13 236.61
S.D.2 = − = 97.96 – 97.19 = .77
24 24
S.D.
= =
0.77 0.88 = 0.90
Σf | x − x | 17.11
Mean Deviation about Mean = = = 0.71
n 24
[52] For the frequency distribution of the number of telephone calls received at
an exchange per interval for 245 successive one – minute interval are shown in the
following frequency distribution :
Number of calls Frequency
0 14
1 21
2 25
3 43
4 51
5 40
www.gayali.in
6 39
7 12
Total 245
Compute the mean deviation about median and the standard deviation.
Solution :
Table : Calculations for Mean Deviation and S.D.
x f Cumulative fx f x2 |x – Median| i.e. |x–4| f |x–Median|
frequency
0 14 14 0 0 4 56
1 21 35 21 21 3 63
2 25 60 50 100 2 50
3 43 103 129 387 1 43
4 51 154 204 816 0 0
5 40 194 200 1000 1 40
6 39 233 234 1404 2 78
7 12 245=N 84 588 3 36
Total 245 - 922 4316 16 366
Median = value corresponding to the cumulative frequency (N + 1)/2 i.e. term
245 + 1
=123-th term = 4
www.gayali.in
2
366
Mean Deviation about Median
= = 1.494
245
2
4316 922 1057420 − 850084 207336
S.D.2 = = − = =
245 245 ( 245 )
2
( 245 )
2
207336 455.34
∴Standard deviation = = = 1.858
( 245 )
2
245
www.gayali.in
Statistics Made Easy | 120
[53] Evaluate the three quartiles for the frequency distribution of the following
frequency distribution, Frequency distribution of I.Q. for 309 six – year old children
I.Q. Frequency
160–169 2
150–159 3
140–149 7
130–139 19
120–129 37
110–119 79
100–109 69
90–99 65
80–89 17
70–79 5
60–69 3
50–59 2
40–49 1
Total
www.gayali.in
Next determine the mean deviation about median, the standard deviation and
the quartile deviation.
Solution : Frequency Distribution is arranged in reverse order
Class Limits Frequency (f) Class boundary Cumulative frequency
40-49 1 39.5-49.5 1
50-59 2 49.5-59.5 3
60-69 3 59.5-69.5 6
70-79 5 69.5-79.5 11
28
N
80-89 17 79.5-89.5 ← = 77.25
4
= Q1
93
N
90-99 65 89.5-99.5 ← = 154.5
2
= Q2
162
3N
100-109 69 99.5-109.5 ← = 231.75
4
www.gayali.in
= Q3
110-119 79 109.5-119.5 241
120-129 37 119.5-129.5 278
130-139 19 129.5-139.5 297
140-149 7 139.5-149.5 304
150-159 3 149.5-159.5 307
160-169 2 159.5-169.5 309=N
Total 309 -
www.gayali.in
Statistics Made Easy | 121
77.25 − 28 49.25
Q1 = 89.5 + × 10 = 89.5 + × 10 = 89.5 + 7.58 = 97.08
65 65
154.5 − 93 61.5 615
Q2 = 99.5 + × 10 = 99.5 + × 10 = 99.5 + = 99.5 + 8.91 = 108.41
69 69 69
231.75 − 162 69.75 69.75
Q3 = 109.5 + × 10 = 109.5 + × 10 = 109.5 + =109.5+8.83=118.33
79 79 79
www.gayali.in
84.5 -2 17 -34 68 23.91 406.47
94.5 -1 65 -65 65 13.91 904.15
104.5 0 69 0 0 3.91 269.79
114.5 1 79 79 79 6.09 481.11
124.5 2 37 74 148 16.09 595.33
134.5 3 19 57 171 26.09 495.71
144.5 4 7 28 112 36.09 252.63
154.5 5 3 15 75 46.09 138.27
164.5 6 2 12 72 56.09 112.18
Total - 309 123 969 - 4128.65
2
969 123 299421 − 15129 284292
S.Dy2 = − = =
309 30 ( 309 )
2
( 309 )
2
533.19
∴=
S.Dy = 1.726
309
∴ S.D.x = 1.726×10=17.26
www.gayali.in
∴ S.D.x =17.26
4128.65
=
Mean deviation about median = 13.36
309
Q3 − Q1 118.33 − 97.08 21.25
Quartile Deviation = = = = 10.63
2 2 2
www.gayali.in
Statistics Made Easy | 122
[54] Compute the standard deviation of the age – distribution of Bengali males as
given below:
Age last birthday Frequency
0 156
1 121
2 111
3 106
4 103
5-9 472
10-14 434
15-19 407
20-24 383
25-29 357
30-34 335
35-39 306
40-49 522
50-59 370
60-69 213
70-79 80
www.gayali.in
80-89 11
90-99 1
Total 4488
Solution : From the solution on page 86, Example-63
x = 26.90, n1 = 597, n2 = 2694, n3 = 1197
x 1 = 1.80, x 2 = 20.62, x 3 = 53.56
2
1263 −121
S12= − = 2.12 − 0.04 = 2.08
597 597
2
10847 −746
S2 =
2
− = 4.02 − 0.08 = 3.94
2694 2694
2
2591 −1309
S32= − = 2.16 − 1.20 = 0.96
1197 1197
n1s12 + n2 s22 + n3s32 n1 ( x1 − x ) + n2 ( x2 − x ) + n3 ( x3 − x )
2 2 2
σ2 = +
n1 + n2 + n3 n1 + n2 + n3
= = = 2.90
4488 4488
597 (1.80 − 26.90 ) + 2694 ( 20.62 − 26.90 ) + 1197 ( 53.56 − 26.99 )
2 2 2
2nd Part =
597 + 2690 + 1197
597 × ( −25.10 ) + 2694 × ( −6.28 ) + 1197 ( 26.66 )
2 2 2
=
4488
www.gayali.in
Statistics Made Easy | 123
www.gayali.in
We have 65.7 = , so that ∑ x = 65.7 × 250 = 16425
250
2
Σx 2 16425
4. 4 2 = −
250 250
Σx 2
= 4316.49 + 19.36 = 4335.85
250
∑ x2 = 1083962.50
When 91 and 80 are replaced by 71 and 83, the correct values are
∑ x = 16425 – 91 – 80 + 71 + 83 = 16408
∑ x2 = 1083962.50–(91+80)2+(71+83)2=1083962.50–29241+23716=1078437.5
Using these in the formulae for mean and variance,
16408
=
Mean = 65.63
250
1078437.5
− ( 65.63 ) = 4313.75 – 4307.30 = 6.45
2
(S.D)2 =
250
S.D.
= =
6.45 2.54
www.gayali.in
[56] The number of runs scored by cricketers A and B during a test series consisting
of 5 test matches is shown below for each of the 10 innings :
Cricketers A – 5, 26, 97, 76, 112, 89, 6, 108, 24, 16.
Cricketers B – 51, 47, 36, 60, 58, 39, 44, 42, 71, 50.
Make a comparative study of their batting performance.
www.gayali.in
Statistics Made Easy | 124
Cricketers A Cricketers B
x y2 y z2
y = x − 55 z = y − 50
5 -50 2500 51 1 1
26 -29 841 47 -3 9
97 42 1764 36 -14 196
76 21 441 60 10 100
112 57 3249 58 8 64
89 34 1156 39 -11 121
6 49 2401 44 -6 36
108 53 2809 42 -8 64
www.gayali.in
24 -31 961 71 21 441
16 -39 1521 50 0 0
Total 107 17643 Total -12 1032
107
x = 55 + = 55 + 10.7 = 65.7
10
2
17643 107
σ2 = − = 1764.3 − 114.49 = 1649.81
10 10
σ = 1649.81 = 40.62
12
y = 50 − = 50 − 1.2 = 48.8
10
2
1032 12
σ2 = − − = 103.2 − 14.4 = 88.8
10 10
σ = 88.8 = 9.42
40.62
www.gayali.in
www.gayali.in
Statistics Made Easy | 125
Moments
Given n observations x1, x2, - - --, xn and an arbitrary constant A,
1
∑ (x – A) is called the 1st moment about A,
n
1
∑ (x – A)2 is called the 2nd moment about A,
n
1
∑ (x – A)3 is called the 3rd moment about A,
n
and so on, let us denote these moments successively by m11 , m12 , m13 , etc.
Then m11 = Σ ( x − A ) / n = ( Σx − ΣA ) / n = ( Σx − nA ) / n = x − A i.e. the 1st moment about A
equals ( x − A ) .
(a) Moment about zero (i.e, when A = 0) or raw moments
1
1st moment about zero = Σx = x
n
www.gayali.in
1
2nd moment about zero = Σx 2
n
1 3
3 moment about zero = Σx
rd
n
And so on, Note that the 1st moment about zero is the mean x
m11 = x
(b) Moment about mean (or central moments)
1
1st moment about mean = Σ(x − x ) = 0
n
1
2nd moment about mean = Σ ( x − x ) = σ2
2
n
1
3 moment about mean = Σ ( x − x )
rd 3
n
1
4 moment about mean = Σ ( x − x )
th 4
n
and so on.
These are usually denoted by m1 , m2 , m3 , m4 , etc. Note that the 1st central moment
is always zero, and the 2ndcentral moment is the variance σ2. Hence, m1 = 0 m2 = σ2
www.gayali.in
From the second relation, we find that the standard deviation is the square – root
of the second central moment m2.
The 3rd central moment m3 is used to measure skewness and the 4th central
moment m4 to measure kurtosis.
In general, given n observations x1, x2, ---, xn, the r-th order moment (r = 0, 1, 2, ---)
are defined as follows:
www.gayali.in
Statistics Made Easy | 126
1
r-th moment about A : mr/ = ∑ (x − A)r
n
1
r-th raw moment : mr/ = ∑ x r
n
1
r-th central moment : mr = ∑ (x − x )r
n
For a frequency distribution,
/ 1
r-th moment about A : mr = ∑ f (x − A)r
N
1
r-th raw moment : mr/ = ∑ fx r
N
1
r-th central moment : mr = ∑ f (x − x )r
N
Where N = ∑f
There are important relations between central and non-central moment. For
example, if the non-central moment (m1/ , m2/ , m3/ etc.) about any arbitrary origin A
are known, central moment can be obtained by using the relations, viz.
www.gayali.in
2
m2 = m2/ – m1/
3
m3 = m3/–3 m2/ m1/ +2 m1/
2 4
m4 = m4/–4 m3/ m1/ +6m2/ m1/ –3 m1/
In particular, using the first two moments, m1/ and m2/, about an arbitrary origin
A, the mean and the variance may be obtained:
2
x = m1/ + A, σ2 = m2/ − m1/
Relation between central and non-central moment
[I] Formula for mr in terms of mr/ and moments of lower order:
mr = Σ ( x i − x ) / n m1r = Σ ( x i − A ) / n
r r
Let us write
xi– x =(xi – A)–( x –A)={(xi – A) – d}, (suppose) where d=( x –A)=m1/
Using the binomial expansion
( xi − x ) = ( x i − A )r −r c1 ( x i − A )r −1 d + r c2 ( x i − A )r −2 d2 − − − + ( −1)r dr
r
www.gayali.in
Statistics Made Easy | 127
Writing d = m1/ and simplifying, the central moments (mr) when expressed in
terms of the moment (mr/) about any origin are
m1 = 0
m2 = m2/ − m1/ 2
m3 = m3/ − 3m2/ m1/ + 2m1/ 3
m4 = m /4 − 4m3/ m1/ + 6m2/ m1/ 2 − 3m1/ 4
www.gayali.in
m11 = m1 + d
m12 = m2 + 2m1d + d 2
m13 = m3 + 3m2 d + 3m1d 2 + d 3
m14 = m 4 + 4m3 d + 6m2 d 3 + d 4
Since m1 = 0 and d = x –A = m1/ , we have
m1/ = m1/ (as exp ected)
m2/ = m2 + m1/ 2
m3/ = m3 + 3m2 m1/ + m1/ 2
m /4 = m 4 + 4m3 m1/ + 6m2 m1/ 2 + m1/ 4
moments are zero, m3=0; consequently β1=0, when the distribution is symmetrical.
Frequency distributions are classified as Leptokurtic, Platykurtic or mesokurtic,
according as the value of β2 is greater than, less than, or equal to 3. Beta – coefficients
are used for measuring skewness and kurtosis.
‘Gamma – coefficients’ are defined as follows:
γ 1 = β1 ; γ 2 = β 2 − 3
www.gayali.in
Statistics Made Easy | 128
γ1 must have the same sign as m3. The gamma – coefficients may be positive,
negative or zero; but are pure numbers like the beta – coefficients. These are used as
measures of ‘skewness’ and ‘kurtosis’:
m
Skewness ( γ1 ) = β1 = 33
σ
m
Kurtosis ( γ 2 ) = β2 − 3 = 44 − 3
σ
Distributions are said to be ‘positively skew’ ‘negetively skew’, or ‘symmetrical’,
according as γ is positive, negative or zero. Similarly positive, negative, or zero value of
γ2 are associated with ‘leptokurtic, ‘platykurtic’ or ‘mesokurtic’ distributions.
Moments of frequency distributions
If x1, x2, - - -, xn have frequency f1, f2, - - -, fn respectively, the r–th moment about
A is defined as
1
Σf ( x − A )
r
m1r =
N
where N = ∑ƒ. The r–th central moment is similarly defined as
www.gayali.in
1
mr= ∑ƒ (x – x )r
n
where x = ∑fx/N
In the care of grouped frequency distributions, the mid values are taken as
representatives of the respective classes and x1, x2, - - -, xn denote these mid value. If the
successive mid value have a common difference, the calculation of central moments
can be simplified.
If y = (x – c)/d, where c and d are constants, the r-th central moment of x is equal
to d times the r-th central moment of y.
r
www.gayali.in
Statistics Made Easy | 129
all observations falling within that class interval. This, however, introduces some error
in the calculated values known as ‘error due to grouping’.
So far as the first four moments are concerned:
[i] No correction is necessary for the mean m11 and the third central moment m3
[ii] Correction for the 2nd and the 4th central moments are
c2
M2 (corrected) = m2 (uncorrected) −
12
c2 7c 4
m 4 (corrected) = m 4 (uncorrected) − m2 (uncorrected) +
2 240
Where c is the common width of class interval.
Sheppard’s corrections should be applied when –
[a] The distribution relates to a continuous variable, and is of moderate
symmetry;
[b] The frequency distribution becomes smaller and smaller approaching
zero at each end of the distribution.
www.gayali.in
These corrections are not applicable to J- or U- shaped distributions or to very
skew distributions. Moreover, unless the total frequency is fairly large, the corrections
will be of little practical importance.
Skewenss
A frequency distribution is said to be ‘symmetrical,’ if the frequencies are
symmetrically distributed about mean, i. e. when values of the variable equidistant
from mean have equal frequencies.
Illustration–I : Symmetrical distribution:
[i] x : 10 15 20 25 30
ƒ: 3 7 16 7 3
[ii] x : 10 15 20 25 30 35
ƒ: 3 7 16 16 7 3
Note that in the above distributions, the means are respectively 20, 22.5. The
median and mode for each also have the same value. In fact, for any symmetrical
distribution mean, median and mode are equal.
The word “skewness” is used to denote the ‘extent of asymmetry’ in the
data, when the frequency distribution is not symmetrical, it is said to be ‘skew: The
word ‘skewnes’ literally denotes ‘asymmetry’ or 'lack of symmetry' and skew denotes
www.gayali.in
www.gayali.in
Statistics Made Easy | 130
Where m2 and m3 are second and third central moments, and σ denotes the S.D.
It should be noted that all the measures of skewness are pure numbers and have
the value zero when the distribution is symmetrical.
www.gayali.in
Figures – Position of Mean (M), Median (Me), Mode (Mo) for Different Types of
skewness
www.gayali.in
Statistics Made Easy | 131
Exercise
[1] The first two moments of α distribution about the value 4 are -1.5 and 2.7 (a) Find
the moment about zero. (b) Also calculate the mean and S.D.
Solution: ∑(x–4)/n = –1.5 - - - - (1)
www.gayali.in
∑(x– 4)2/n = 2.7 - - - - (2)
Σx − 4n
From (1) we get = −1.5
n
Σx Σx
Or, − 4 = −1.5 Or, = −1.5 + 4 = 2.5
n n
From (2) we get ∑ (x2 - 8x + 16)/n = 2.7
Σx 2 8Σx 16n
Or, − + = 2. 7
n n n
Σx 2
− 8 × 2.5 + 16 = 2.7
n
Σx 2
= 2.7 − 16 + 20 = 6.7
n
Σx
(b) ∴Mean = = 2.5 (a) Moment about 0
n
Σ( x − 0)
2 2
Σx 2 Σx
S.D. = − 2
= = S.D.
n n n
Σx 2
www.gayali.in
=6.7–2.52 = = 6. 7
n
=6.7–6.25
=0.45
∴S.D.= 0.45 = 0.671
[2] The first three moments of α distribution about the value 3 of the variable are 2,
10 and 30 respectively. Obtain the first three moments about zero, show also that the
variance of the distribution is 6.
www.gayali.in
Statistics Made Easy | 132
Solution : Here,
Σx 3n Σx
∑(x–3)/n = 2 Or , − = 2 Or , =2+3=5
n n n
Σ ( x − 3 ) / n = 10
2
Σx 2 6 Σx
(
Or, Σ x 2 − 6 x + 9 / n = 10 Or , ) n
−
n
+ 9 = 10
2
Σx Σx 2
Or , − 6 × 5 + 9 = 10 Or , = 10 + 30 − 9 = 31
n n
∑(x–3)3/n = 30
Σx 3 9Σx 2 27Σx
Or, ∑(x3 – 9x2 + 27x – 27)/n = 30 Or , − + − 27 = 30
n n n
Σx 3 Σx 3
Or, − 9 × 31 + 27 × 5 − 27 = 30 Or, = 279 − 135 + 27 + 30
n n
= 336 – 135 = 201
1
1st moment about zero = Σx = x = 5
n
1 2
2nd moment about zero = Σx = 31
n
www.gayali.in
1 3
3 moment about zero = Σx = 201
rd
n
1
Variance = Σ ( x − x ) = σ2
2
n
1
(
= Σ x 2 − 2 xx + x 2
n
)
1 2Σxx Σx 2 Σx 2 nx
/
2
− 2 ( Σx ) +
2
= Σx 2 − + =
n n n n n/
= 31 – 2 × 52 + 52
= 31 – 50 + 25 = 6 proved.
[3] The first four moments about the value 1 are 2.6, 10.2, 43.4 and 192.6 respectively.
Find the A.M. and the first four moments about 4.
Solution : As per condition given,
∑(x–1)/n = 2.6
Σx Σx
Or, − 1 = 2.6 Or, = 3.6 = x = A.M
n n
∑(x–1)2/n = 10.2
Σx 2 2 Σx Σx 2
Or, ∑(x2–2x+1)/n = 10.2 Or, − + 1 = 10.2 Or, − 2 × 3.6 + 1 = 10.2
n n n
www.gayali.in
Σx 2
Or, = 10.2 + 7.2 − 1 = 16.4
n
∑(x–1)3/n = 43.4
Σx 3 3Σx 2 3Σx
Or, ∑(x3–3x2+3x–1)/n = 43.4 Or, − + − 1 = 43.4
n n n
Σx 3 Σx 3
Or, − 3 × 16.4 + 3 × 3.6 − 1 = 43.4 Or, = 43.4 + 49.2 − 10.8 + 1
n n
= 93.6 – 10.8 = 82.8
www.gayali.in
Statistics Made Easy | 133
Σn
Σ ( x − 1) / n = Σx 4 / n − 4Σx 3 / n + 6Σx 2 / n − 4
4
+ 1 = 192.6
n
Σx 4
=192.6+4×82.8–6×16.4+4×3.6–1=192.6+331.2–98.4+14.4–1=538.2–99.4=438.8
n
Σx
Σ( x − 4) / n = − 4 = 3.6 − 4 = −0.4
n
Σx 2 8 Σx
Σ( x − 4) / n =
2
− + 16 = 16.4 – 8 × 3.6 + 16 = 32.4 – 28.8 = 3.6
n n
Σx 3 12Σx 2 48Σx
∑(x–4)3/n = ∑(x3–12x2+48x–64)/n = − + − 64
n n n
= 82.8–12×16.4+48×3.6–64 = 82.8–196.8+172.8–64 = 255.6–260.8 = –5.2
∑(x–4)4/n = ∑(x4–16∑x3+96∑x2–256∑x+256)/n
Σx 4 16Σx 3 96Σx 2 256Σx
= − + − + 256
n n n n
= 438.8–16×82.8+96×16.4–256×3.6+256 = 438.8–1324.8+1574.4–921.6+256
= 2269.20–2246.40 = 22.80
[4] The first three moment of α distribution about the value 7, calculated from α set of
www.gayali.in
9 observations are 0.2, 19.4 and -41.0. Find the measures of central tendency, dispersion
and also the third moment about the origin.
[I.C.W.A 1975]
Solution :
∑(x–7)/n = 0.2
Σx
− 7 = 0.2
n
Σx
= 7. 2 = x
n
∑(x–7)2/n = 19.4
∑(x2-14x+49)/n = 19.4
Σx 2 14Σx
− + 49 = 19.4
n n
Σx 2 Σx 2
− 14 × 7.2 + 49 = 19.4 Or, = 19.4 + 100.8 − 49 = 120.2 – 49 = 71.2
n n
Σx 2 2Σxx Σx 2 Σx 2
σ2 =
1
n
1
n
(
Σ ( x − x ) = Σ Σx 2 − 2 xx + x 2 =
2
)n
−
n
+
n
=
n
− 2x2 + x2
www.gayali.in
Σx 2
= − x 2 = 71.2 − 7.22 = 71.2–51.84 = 19.36
n
∴ σ = 19.36 = 4.4
∑(x–7)3/n = – 41 Or, ∑(x3–21x2+147x – 343)/n = – 41
Σx 3 21Σx 2 147Σx Σx 3
Or, − + − 343 = −41 Or, − 21 × 71.2 + 147 × 7.2 − 343 = −41
n n n n
www.gayali.in
Statistics Made Easy | 134
Σx 3
Or, = −41 + 1495.2 − 1058.4 + 343 = 1838.20 – 1099.40 = 738.80
n
Hence, A.M. = 7.2
S.D. = 4.4
Third moment about origin = 738.80
[5] Find the first, the second and the third central moments of the frequency
distribution of expenditure (Rs. Per month) given below:
Expenditure 3-6 6-9 9-12 12-15 15-18 18-21 21-24 Total
No. of familities 28 292 389 212 59 18 2 1000
[I.C.W.A.1978]
Solution :
Table : Calculations for moment
Mid – value x f x −13.5 fy f y2 f y3 f y4 f (y+1)4
y=
3
4.5 28 -3 -84 252 -756 2268 448
7.5 292 -2 -584 1168 -2336 4672 292
10.5 389 -1 -389 389 -389 389 0
www.gayali.in
13.5 212 0 0 0 0 0 212
16.5 59 1 59 59 59 59 944
19.5 18 2 36 72 144 288 1458
22.5 2 3 6 18 54 162 512
Total 1000 - -956 1958 -3224 7838 3866
Charlier's check :–
∑f(x+1)4 = ∑fy4+4∑fy3+6∑fy2+4∑fy+N
= 7838+4×–3224+6×1958+4×–956+1000
= 7838–12896+11748–3824+1000 = 20586–16720=3866=L.H.S.
Raw moments of y :–
−956
m1/ = Σfy / N = = −0.956
1000
1958
m2/ = Σfy 2 / N = = 1.958
1000
3224
m3/ = Σfy 3 / N = − = −3.224
1000
7838
m /4 = Σfy 4 / N = = 7.838
1000
Central moments of y :–
( )
www.gayali.in
2
= 1.958 − ( −0.956 ) = 1.958 – 0.91 = 1.048
2
m2 = m2/ − m1/
( )
3
m3 = m3/ − 3m2/ m1/ + 2 m1/ = –3.224 – 3 × 1.958 × –0.956 + 2 (– .956)3
= –3.224 + 5.616 – 1.747 = 0.645
( ) ( )
2 4
m 4 = m /4 − 4m3/ m1/ + 6m2/ m1/ − 3 m1/
= 7.838 – 4 × –3.224 × – .956 + 6 × 1.958 × 0.91 – 3 × 0.956
= 7.838 – 12.33 + 10.69 – 2.51 = 18.53 – 14.84 = 3.69
www.gayali.in
Statistics Made Easy | 135
Central moments of x :–
m1 = 0
m2(x) = d2m2(y) = 32 × 1.048 = 9 × 1.048 = 9.43
m3(x) = d3m3(y) = 33 × 0.645 = 27 × 0.645 = 17.4
Therefore,
}
1st central moment = 0
2nd central moment = 9.43 Ans.
3rd central moment = 17.4
[6] Find the first four moments and the value of β1 and β2 from the following
frequency distribution:
x 21–24 25–28 29–32 33–36 37–40 41–44
f 40 90 190 110 50 20
Also, find the measures of skewness and kurtosis
Solution :
Table : Calculations for moments
www.gayali.in
Mid – value x f x − 30.5 fy f y2 f y3 f y4
y=
4
22.5 40 -2 -80 160 -320 640
26.5 90 -1 -90 90 -90 90
30.5 190 0 0 0 0 0
34.5 110 1 110 110 110 110
38.5 50 2 100 200 400 800
42.5 20 3 60 180 540 1620
Total 500 - 100 740 640 3260
Raw moments of y :–
m11 = Σfy / N = 100 / 500 = 0.2
m12 = Σfy 2 / N = 740 / 500 = 1.48
m13 = Σfy 3 / N = 640 / 500 = 1.28
m14 = Σfy 4 / N = 3260 / 500 = 6.52
Central moments of y :–
( )
2
= 1.48 − ( 0.2 ) = 1.48 − 0.04 = 1.44
2
m2 = m12 − m11
( )
3
m3 = m13 − 3m12 m11 + 2 m11
= 1.28 – 3 × 1.48 × 0.2 + 2 (0.2)3 = 1.28 – 0.888 + 0.016 = 0.408
www.gayali.in
( ) ( )
2 4
m 4 = m14 − 4m13 m11 + 6m12 m11 − 3 m11
= 6.52 – 4 × 1.28 × 0.2 – 6 × 1.48 × (.2)2 – 3(.2)4
= 6.52 – 1.024 + 0.3552 – .0048 = 6.8752 – 1.0288 = 5.8464
Central moments of x :–
m2(x) = d2m2(y) = 42 × 1.44 = 23.04
m3(x) = d3m3(y) = 43 × 0.408 = 26.11
m4(x) = d4m4(y) = 44 × 5.8464 = 1496.68
www.gayali.in
Statistics Made Easy | 136
100
x = c + dy = 30.5 + 4 × = 30.5 + 0.80 = 31.30
500
m23 0.4082 0.1665
β1 = = = = 0.056
m32 1.44 3
2.99
4
m 5.8464
β2 = = = 2.82
m22 1.442
Skewness ( γ1 ) = β1 = 0.056 = +0.24
Kurtosis ( γ 2 ) = β2 − 3 = 2.82 − 3 = −0.18
www.gayali.in
Class Marks Mid-value x f x − 62.50 fy f y2
y=
3
55 – 58 56.50 12 -2 -24 48
58 – 61 59.50 17 -1 -17 17
61 – 64 62.50 23 0 0 0
64 – 67 65.50 18 1 18 18
67 – 70 68.50 11 2 22 44
Total 81 - -1 127
−1 3
Mean ( x ) = c + dy = 62.50 + 3 × = 62.5 − = 62.50 – 0.04 = 62.46
81 81
2 2
fy 2 fy 127 −1
S.D. ( σ ) = d − = 3 − = 3 1.57 = 3.76
n n 81 81
f − f −1 23 − 17
Mode = l1 + 0 × c = 61 + ×3
2f0 − f−1 − f1 2 × 23 − 17 − 18
6 18
= 61 + × 3 = 61 + = 62.64
46 − 17 − 18 11
Coefficient of skewness = (Mean – Mode)/S.D.
62.46 − 62.64 −0.18
www.gayali.in
= = = –0.048
3.76 3.76
[8] Calculate Pearson’s measure of skewness on the basis of Mean, Mode and
Stadard deviation :–
x 14.5 15.5 16.5 17.5 18.5 19.5 20.5 21.5
f 35 40 48 100 125 87 43 22
[C.A. 1975]
www.gayali.in
Statistics Made Easy | 137
www.gayali.in
= 18 + = 18 + = 18 + 0.04 = 18.40
250 − 187 63
2 2
fy 2 fy 1669 −217
S.D. = − = − = 3.34 − 0.19 = 3.15 = 1.77
n n 500 500
Mean − Mode 18.07 − 18.40
Skewness = = = −0.19
S.D 1.77
[9] Calculate from the undernoted table the measure of skewness based on Mean,
Median and Standard Deviation :
x 100–200 200–300 300–400 400–500 500–600 600–700 700–800 800–900
y 45 88 146 206 79 52 30 14
[C.A. 1973]
Solution :
Table : Calculations for Mean, Median and S.D.
x f Mid – value x y = x − 450 fy f y2 Cumulative frequency
100
100–200 45 150 -3 -135 405 45
200–300 88 250 -2 -176 352 133
300–400 146 350 -1 -146 146 279
www.gayali.in
N
← = 330
2
400–500 206 450 0 0 0 485
500–600 79 550 1 79 79 564
600–700 52 650 2 104 208 616
700–800 30 750 3 90 270 646
800–900 14 850 4 56 224 660
Total 660 - - -128 1684 -
www.gayali.in
Statistics Made Easy | 138
−128
Mean ( x ) = c + dy = 450 + 100 × = 450 – 19.39 = 430.61
660
N −F
2 330 − 279 51
Median= l1 + × c = 400 + × 100 = 400 + × 100 =400+24.76=424.76
fm 206 450
2
fy 2 fy 1684 −128
S.D. = d − = 100 − = 100 2.55 − 0.04
n n 660 660
= 100 2.51 = 100 × 1.58 = 158
3 ( Mean − Median ) 3 ( 430.61 − 424.76 ) 3 × 5.85
Skewness = = = = 0.11
S.D 158 158
[10] Calculate the measure skewness based on quartiles and median from the
following data :
Variable 10–20 20–30 30–40 40–50 50–60 60–70 70–80
Frequency 358 2417 976 129 62 18 10
[C.A. 1972]
www.gayali.in
Solution :
Table : Calculation for Quartiles and Median
Class boundary Frequency (f) Cumulative frequency
10–20 358 358
N
← Q1 = = 992
4
N
← Q2 = = 1985
2
20–30 2417 2775
3N
← Q3 = = 2977.50
4
30–40 976 3751
40–50 129 3880
50–60 62 3942
60–70 18 3960
70–80 10 3970=N
Total 3970 –
N
−F
www.gayali.in
www.gayali.in
Statistics Made Easy | 139
www.gayali.in
39.5–49.5 20 48 N
← Q2 = = 50
2
49.5–59.5 25 73
3N
← Q3 = = 75
4
59.5–69.5 15 88
69.5–79.5 8 96
79.5–89.5 4 100=N
Total 100
25 − 14 11
Q1 = 29.5 + × 10 = 29.5 + × 10 = 29.5 + 7.86 = 37.36
14 4
50 − 48 2
Median (Q2) = 49.5 + × 10 = 49.5 + × 10 = 49.5+0.80=50.30
25 25
75 − 73 2
( Q3 ) = 59.5 + 15 × 10 = 59.5 + 15 × 10 = 59.5 + 1.33 = 60.83
Q − 2Q2 + Q1 60.83 − 2 × 50.30 + 37.36
Skewness = 3 =
Q3 − Q1 60.83 − 37.36
60.83 − 100.60 + 37.36 98.19 − 100.60 2.41
www.gayali.in
= = =− = −0.103
23.47 23.47 23.47
www.gayali.in
Statistics Made Easy | 140
Solution :
Table : Calculation for Q1, Q2, and Q3
Class boundary Frequency (f) Cumulative Frequency
0–9.5 0 0 N
← Q1 = = 582.50
4
9.5–14.5 786 786 N
← Q2 = = 1165
2
14.5–19.5 924 1710 3N
← Q3 = = 1747.50
4
19.5–29.5 320 2030
29.5–39.5 172 2202
39.5–49.5 96 2298
49.5–59.5 32 2330=N
Total 2330
582.5 − 0
Q1 = 9.5 + × 5 = 9.5 + 3.71 = 13.21
786
www.gayali.in
1165 − 786 379
Q2 = 14.5 + × 5 = 14.5 + × 5 = 14.5 + 2.05 = 16.55
924 924
1747.5 − 1710 37.5
Q3 = 19.5 + × 10 = 19.5 + × 10 = 19.5 + 1.17 = 20.67
320 320
Q − 2Q2 + Q1 20.67 − 2 × 16.55 + 13.21 20.67 − 33.10 + 13.21
Skewness = 3 = =
Q3 − Q1 20.67 − 13.21 7.46
33.88 − 33.10 0.78
= = = 0.105
7.46 7.46
[13] Compute Bowley’s measure and Pearson’s measures of skewness:
Monthly income (Rs.) 0–75 75–150 150–225 225–300 300–375 375–400
Frequency 15 200 250 225 10 5
[C.U., M.Com. '76]
Solution :
Table : Calculation for Bowley’s and Pearson's measure of skewness
Class boundary Frequency (f) Mid-value (x) c. f. x − 262.5 fy f y2
y=
75
0 – 75 15 37.50 15 -3 -45 135
www.gayali.in
www.gayali.in
Statistics Made Easy | 141
675
x = 262.5 + − × 75 = 262.5–71.81=190.69
705
352.5 − 215 137.50
Median = 150 + × 75 = 150 + × 75 = 150 + 41.25 = 191.25
250 250
250 − 200 50 × 75
Mode = 150 + × 75 = 150 + = 200
500 − 200 − 225 75
2
1215 675
σ = 75 − − = 75 1.7234 − 0.9167 = 75 0.806 = 0.8982 × 75 = 67.36
705 705
176.25 − 15 161.25
Q1 = 75 + × 75 = 75 + × 75 = 75 + 60.47 = 135.47
200 200
528.75 − 465 63.75
Q3 = 225 + × 75 = 225 + = 225 + 21.25 =246.25
225 3
Q − 2Q2 + Q1 246.25 − 2 × 191.25 + 135.47
Bowley's measure of skewness = 3 =
Q3 − Q1 246.25 − 135.47
381.75 − 382.50 0.78
= =− = −.0070
110.78 110.78
www.gayali.in
Mean − Mode 190.69 − 200 9.31
Pearson's measure of skewness = = =− = −0.14
S.D. 67.36 67.36
[14] Calculate the quartile measure of skewers for the distribution of the time taken
by 100 workers to complete a job:
Time (Seconds) -12 13-15 16-18 19-21 22-24 25-27 28-
No. of workers 4 16 22 28 15 9 6
[I.C.W.A 1974]
Solution: Table- Calculation for skewnes
Clan boundary Frequency Cumulative Frequency
– 12.5 4 4
12.5 – 15.5 16 20
15.5 – 18.5 22 42
18.5 – 21.5 28 70
21.5 – 24.5 15 85
24.5 – 27.5 9 94
27.5 - 6 100 = N
www.gayali.in
Total 100
25 − 20 5
First Quartile (Q1) = 15.5 + × 3 = 15.5 + × 3 = 15.5 + 0.68 = 16.18
22 22
50 − 42 8
Second Quartile (Q2) = 18.5 + × 3 = 18.5 + × 3 = 18.5 + 0.86 = 19.36
28 28
75 − 70 5
Third Quartile (Q3) = 21.5 + × 3 = 21.5 + × 3 = 21.5 + 1 = 22.5
15 15
www.gayali.in
Statistics Made Easy | 142
Q3 − 2Q2 + Q1
Skewnes (Bowley’s measure) =
Q3 − Q1
22.5 − 2 × 19.36 + 16.18
=
22.5 − 16.18
22.5 − 38.72 + 16.18 38.68 − 38.72
= =
6.32 6.32
.04
= − = −0.0063
6.32
www.gayali.in
Class boundary Frequency Cumulative Frequency
Below 20 13 1
20 – 25 29 42
25 – 30 46 88 N
← = Q1 = 105
4
30 – 35 60 148
N
← = Q2 = 210
2
35 – 40 112 260
3N
← = Q3 = 315
4
40 – 45 94 354
45 – 55 45 399
55 & above 21 420 = N
Total 420
105 − 88 17
Q1 = 30 + × 5 = 30 + × 5 = 30 + 1.42 = 31.42
60 60
www.gayali.in
210 − 148 62
Q2 = 35 + × 5 = 35 + × 5 = 35 + 2.77 = 37.77
112 112
315 − 260 55 × 5
Q3 = 40 + × 5 = 40 + = 40 + 2.93 = 42.93
94 94
Q3 − 2Q2 + Q1 42.93 − 2 × 37.77 + 31.42
Bowley’s measure of skewness = =
Q3 − Q1 42.93 − 31.42
www.gayali.in
Statistics Made Easy | 143
www.gayali.in
N
← = Q2 = 67.50
2
30 – 40 40 85
3N
← = Q3 = 101.25
4
40 – 50 24 109
50 – 60 12 121
60 – 70 9 130
70 – 80 3 133
80 – 90 2 135 = N
Total 135
33.75 − 0 33.75
Q1 = 0 + ×0 = × 30 = 22.50
45 45
67.5 − 45 22.5
Q2 = 30 + × 10 = 30 + = 35.63
40 4
101.25 − 85 16.25
Q3 = 40 + × 10 = 40 + × 10 = 40 + 6.77 = 46.77
24 24
www.gayali.in
www.gayali.in
Statistics Made Easy | 144
[17] Calculate with the use of quartiles the coefficient of skewness for the following
frequency distribution:-
Under years 10 20 30 40 50 60
No. of persons 15 32 51 78 97 109
[C.U., M.Com 1961]
Solution:
Table: Calculation for skewness
Class boundary Cumulative Frequency Frequency
0 – 10 15 N 15
← = Q1 = 27.25
4
10 – 20 32 17
20 – 30 51 N 19
← = Q2 = 54.50
2
30 – 40 78 3N 27
← = Q3 = 81.75
4
www.gayali.in
40 – 50 97 19
50 – 60 109 = N 12
Total 109
27.25 − 15 12.25 × 10
First Quartile (Q1) = 10 + × 10 = 10 + = 17.21
17 17
54.50 − 51 3.5 × 10
Second Quartile (Q2) = 30 + × 10 = 30 + = 30 + 1.3 = 31.3
27 27
81.75 − 78 3.75
Third Quartile (Q3) = 40 + × 10 = 40 + × 10 = 40 + 1.97 = 41.97
19 19
Q3 − 2Q2 + Q1 41.97 − 2 × 31.3 + 17.21
Skewness (Bowley's measure) = =
Q3 − Q1 41.97 − 17.21
41.97 − 62.6 + 17.21 59.18 − 62.6
= =
24.76 24.76
3.42
= − = −0.14
24.76
[18] Coefficient of skewness = -0.375, Mean = 62, Median = 65 Find the value of
standard deviation.
www.gayali.in
[C.A. 1972]
3 ( Mean − Median )
Solution: skewness =
S.D
3 ( 62 − 65 )
Or, – 0.375 =
S.D
−3 × 3 9000
Or, S.D. = = = 24
−0.375 375
www.gayali.in
Statistics Made Easy | 145
[19] The measure of skewness for a certain distribution is -0.8.If the lower and the
upper quartiles are 44.1 and 56.6 respectively. Find the median.
[I.C.W.A. 1971]
Solution:
Here, skewness = – 0.8
Q1 = 44.1, Q3 = 56.6
56.6 − 2Q 2 + 44.1 100.7 − 2Q 2
–0.8 = =
56.6 − 44.1 12.5
Or, 100.7 – 2Q2 = -0.8 × 12.5
Or, 2Q2 = 100.7 + 10 = 110.7
110.7
=
Q2 = 55.35
2
[20] The median, mode and coefficient of skewness for a certain distribution are respectively
17.4, 15.3 and 0.35. Calculate the coefficient of variation.
Mean − Mode [I.C.W.A. 1973]
Solution: skewness =
S.D
www.gayali.in
We know, Mean – Mode = 3(Mean – Median)
Mean – 15.3 = 3 (Mean – 17.4)
= 3 Mean – 52.2
Or, 2 Mean = 52.2 – 15.3 = 36.9
Mean = 18.45
18.45 − 15.3 3.15
Again, 0.35 = =
S.D S.D
3.15
Or, S.D = =9
0.35
S.D. 9
Coefficient of variation = ×100 = × 100 = 49 ( app ) .
Mean 18.45
[21] The mean, median and the coefficient of variation of the weekly wages of a
group of workers are respectively Rs.45, Rs.42 and Rs.40. Find the (i) mode (ii)
variance, and (iii) coefficient of skewness, for the distribution of wages.
Solution:
Here given, Mean = 45, Median = 42, C.V = 40
S.D
C.V = ×100
Mean
www.gayali.in
S.D
Or, 40 = 45 × 100
40 × 45
Or, S.D. = = 18
100
3 ( Mean − Median ) 3 ( 45 − 42 ) 9
(iii) Coefficient of skewness = = = = 0. 5
S.D 18 18
www.gayali.in
Statistics Made Easy | 146
www.gayali.in
–0.4 =
20
Or, 50 – Mode = –8
Or, Mode = 58
3 ( 50 − Median )
Again, –0.4 =
20
–8 = 150 –3 Median
Or, 3 Median = 150 + 8 = 158
158
Median = = 52.67
3
[23] The first two moments of a distribution about the value 4 of the variable are –1.5
and 2.7. It is also known that the median of the distribution is 2.1. Comment on the
shape of the distribution.
[I.C.W.A. 1974]
Solution:
∑ (x - 4)/4 = -1.5
Σx
or, n − 4 = –1.5
www.gayali.in
Σx
= 4 –1.5 = 2.5 = x
n
2nd moment about A = ∑ (x–A)2/n
∑ (x – 4)2/4 = 2.7
or, ∑ (x2 - 8x + 16)/n = 2.7
Σx 2 8 Σx
− + 16 = 2.7
n n
www.gayali.in
Statistics Made Easy | 147
Σx 2
= 2.7 − 16 + 8 × 2.5
n
= 2.7–16+20 = 22.7–16 = 6.7
σ2 = ∑ (x2 - 2x x + x 2 )/n
Σx 2 Σx 2
= − 2x2 + x2 = − x2
n n
= 6.7 – (2.5)2 = 0.45
S.D(σ) = 0.45 = 0.67
3 ( Mean − Median ) 3 ( 2.5 − 2.1) 3 × 0. 4 1. 2
Skewness = = = = = 1.79
S.D 0.67 0.67 0.67
Comment: Frequency curve is asymmetrical and has the longer tail on the right hand side.
[24] The following factor were gathered before and after an industrial dispute:
Before dispute After dispute
No. of workers Employed 516 508
Mean wages (Rs.) 49.50 51.75
Median wages (Rs.) 52.70 50.00
www.gayali.in
Variance of wages (Rs.) 100.00 121.00
Compare the position before and after the dispute in respect of (a) total wages, (b) Modal
wages, (c) Standard deviation (d) Coefficient of variation (e) Skewness.
Solution:
[a] Before the dispute total wages = 49.5 × 516 = 25542
After the dispute total wages = 51.75 × 508 = 26289
Total wages increased by Rs. 747.00 and 23% increase
[c] Before dispute: S.D = 100 = 10
After dispute: S.D = 121 = 11
Hence, S.D has increased.
3 ( Mean − Median ) 3 ( 49.50 − 52.70 )
[e] Before dispute, skewness = =
S.D 10
3 × −3.2 −9.6
= = = −0.96
10 10
3 ( 51.75 − 50 ) 3 × 1.75
After dispute, skewness = = = 0.48
11 11
Mean − Mode
[b] Before dispute: skewness =
www.gayali.in
S.D
49.5 − Mode
–0.96 = 10
or, –9.6 = 49.5 – Mode
Mode = 49.5 + 9.6 = 59.1
51.75 − Mode
After dispute: 0.48 = 11
www.gayali.in
Statistics Made Easy | 148
Curve fitting
When observations in respect of two variables are available, very often a
relation is found to exist between them. For example, height and weight of persons
are interdependent, expenditure depends on income, yield of a crop depends on the
amount of rainfall etc. Frequently, it is found desirable to express this relationship
between variables by means of some mathematical equation, representing a certain
www.gayali.in
geometrical curve. The process of finding such a curve or it’s equation on the basis of
a given set of observations is called curve fitting.
We list below the equations of some common type of curve :
[1] Y = a + bx Straight line
[2] Y = a + bx + cx2 Parabola
[3] Y = a + bx + cx2 + dx3 Cubic curve
The variables x and y are often referred to as independent variable and
dependent variable respectively. (All letters a, b, c, d except x and y, appearing in the
above equations represent constants.).
Straight line
Straight line is the geometrical representation of an equation of the form.
Ax + By + C = 0
Where A, B, C are constants. When B is not zero i.e. the equation contains a
term in y, the equation of the straight line can be solved for y, giving y = (-A/ B) x +
− C , which is of the form, y = a + bx
B
www.gayali.in
Where a and b are constants. This is the form are shall generally used to represent
a straight line. Sometimes, the form y = mx + c is also used.
www.gayali.in
Statistics Made Easy | 149
Y B Y
A
Y
Y
A B
B X
(c) Zero (d) Infinity
www.gayali.in
X
When the slope is positive, y increases as x increases ; When the slope is negative,
y decreases as x increases ; When the slope is zero, y remains a constant whatever be
the value of x, slope represents the amount of change ( increase or decrease ) in the
value of y for a unit increase in the value of x.
Geometrically, the slope depends on the inclination of the straight line with the
x – axis. Two parallel straight lines have the same slope. When the slope is zero, the
straight line is parallel to x – axis. As the slope increases, the inclination also increases.
When the slope is positive, the straight line is inclined towards the right ; When the
slope is negative, the line is inclined towards the left.
Parabola
Parabola is the geometrical representation of an equation of the form.
y = a + bx + cx2
where a, b, c are constants.
www.gayali.in
Free-hand method of curve fitting, when the given data are plotted as points on
a graph paper, it is often possible to draw a smooth curve through the cluster of points,
which appears best to represent their pattern. The smooth curve so drawn is called free
hand curve. It may be noted that the free-hand curve depends entirely on individual
judgement, and may either be a straight line or a curved line. If the pattern of points
is linear, the equation of a straight line of the form y = a + bx is obtained by choosing
two points on the line.
www.gayali.in
Statistics Made Easy | 150
Y Y
www.gayali.in
Method of least squares is a device for finding the equation of a specified type
of curve, which best fits a given set of observations. The method depends upon the
principle of least squares, which suggests that for the “best-fitting” curve, the sum
of the squares of differences between the observed and the corresponding estimated
values should be the minimum possible.
Suppose, we are given n pairs of observation (x1, y1), (x2, y2),-------- , (xn, yn)
and it is required to fit a straight line to these data. The general equation of a straight
line y = a + bx is taken, where a and b are constant. Any values for a and b would give
a straight line, and once these values are obtained, an estimate of y can be had by
substituting the value of x. That is to say, the estimated values of y when x = x1, x2. ----,
xn would be more a+bx1, a+bx2, ------, a+bxn respectively. In order that the equation y
= a+ bx gives a good representation of the relationship between x and y, it is desirable
that the estimated values a + bx1, a+ bx2, ------, a + bxn are, on the whole, close enough
to the corresponding observed values y1, y2, ---, yn.
Principle of Fitting Straight Line by Least Square Method
x Observed y Estimated y = a+bx Difference = (2)–(3) (Difference)2
(1) (2) (3) (4) (4)
x1 y1 a+bx1 y1–a–bx1 (y1–a–bx1)2
x2 y2 a+bx2 y2–a–bx2 (y2–a–bx2)2
www.gayali.in
www.gayali.in
Statistics Made Easy | 151
For the best fitting straight line, therefore, our problem is only to choose such
values of a and b for the equation y = a + bx which will provide estimates of y as close
as possible to the observed values. According to the principle of least squares, the
“best-fitting” equation is interpreted as that which minimizes the sum of the squares
of differences.
(yi–a–bxi)2 i.e. (y1–a–bx1)2+(y2–a–bx2)2+------+(yn–a–bxn)2
Figure – Method of least Squares (Geometrical Interpretation)
Y P (xi, yi)
B
M
y=a+bx
A
N
Xi X
If the pairs of observation (x1, y1), (x2, y2),-------- , (xn, yn) are plotted as points
www.gayali.in
on a graph paper, and all possible straight lines are drawn on it, that straight line will be
considered to be the “best-fitting” for which the sum of the squares of vertical distances
PM between the plotted points P and the line AB is the least.
Fitting Straight Line
Let y = a + bx
be the equation of the straight line to be fitted to a given set of n pairs of observations
(x1, y1), (x2, y2),-------- , (xn, yn). Applying the method of least squares, the value of a
n
∑ ( y i − a − bx i )
2
and b are so determined as to minimize .
i =1
Taking partial derivatives with respect to a and b, and equating them to zero, we get
δ δ
Σ ( y − a − bx ) = 0 Σ ( y − a − bx ) = 0
2 2
δa δb
∑ (y–a–bx)(–2)=0
∑(y – a – bx)(– 2x) = 0 Or, ∑x(y – a – bx) = 0
Or, ∑y = an + b∑x
∑xy = a∑x + b∑x2
Here, the values of n, Σx, Σy, Σx2 and Σxy are substituted on the basis of the given data.
We have then two equations involving a and b, solving which the values of a and b are
www.gayali.in
obtained.
Simplified Calculations
[1] If we change the origin of x only i.e. Write X = x – c, where c is an arbitrary
constant, then x is replaced by X in the equations giving
ΣY = an + bΣX
ΣXY = aΣX + bΣX2
www.gayali.in
Statistics Made Easy | 152
[2] If we change the origins of both x and y i.e. Write X = x – c and Y = y – c‘, where
c and c’ are arbitrary constants, then x and y are replaced by X and Y in the equations,
so that
ΣY = an + bΣX
ΣXY = aΣX + bΣX2
[3] In case the successive values of the independent variable x are found to have
a common difference, two special transformations are available for the cases, (i) n is
odd and (ii) n is even.
x − ( central values of x )
[i] When n is odd, write µ =
common difference
x − ( mean of two central values of x )
[ii] When n is even, write µ =
1
2
( common difference )
Fitting Parabola
Let y = a + bx + cx2
be the equation of the parabola to be fitted to a given set of n pairs of observations (x1,
www.gayali.in
y1), (x2, y2), ---------, (xn, yn). Using the method of least squares, the constants a, b, c of
the best fitting parabola are obtained by solving the normal equations,
Σy = an + bΣx + cΣx2
Σxy = aΣx + bΣx2 + cΣx3
Σx2y = aΣx2 + bΣx3 + cΣx4
Fitting Exponential and Geometrical Curves
In order to fit curves with equations of the form y = abx and y = axb, the procedure
is to take logarithms of both sides and then form normal equations. For example, to fit
the exponential curve y = abx, we take logarithms of both sides, obtaining
log y = (log a) + x (log b)
This can be written as Y = A + Bx
Where Y = log y, A = log a, B = log b. The normal equations are then
Σy = An + BΣx
Σxy = AΣx + BΣx2
These equations are solved for A and B, and then taking antilog, we find the
values of a and b.
www.gayali.in
www.gayali.in
Statistics Made Easy | 153
Exercises
[1] Fit a straight line of the form y = a + bx to each of the following set of data:
(i)
{ x
y
2
8
5
14
6
19
8
20
9
31
Solution: Let y = a + bx be the equation of the best fitting straight line by the
method of least squares. The constants a and b are obtained by solving the
normal equations.
Σy = an + bΣx
Σxy = aΣx + bΣx2
Where n is the number of pairs of observations.
Table : Calculations for fitting straight line
x y x2 xy
2 8 4 16
5 14 25 70
6 19 36 114
8 20 64 160
www.gayali.in
9 31 81 279
Total 30 92 210 639
Putting Σy= 92, n = 5, Σx = 30, Σxy = 639
Σx2 = 210 in the normal equations, we have
92 = 5a + 30b - - - (i)
639 = 30a + 210b - - - (ii)
Multiplying (i) by 30 and (ii) by 5, and subtracting
2760 = 150a + 900b
3195 = 150a + 1050b
– 435 = – 150b
435
or, b = = 2. 9
150
Putting the value of b in (i) we have
5a = 92 – 30 × 2.9 = 92 – 87 = 5
∴a=1
Now, substituting the value of a and b in y = a + bx, the equation of the fitted
straight line is y = 1.0 + 2.9x
www.gayali.in
(ii)
{ x
y 16
1 3
12
5
10
7
7
9
5
11
4
Solution:
Let y = a + bx be the equation of the best fitting straight line by the method of
least squares. The constants a and b are obtained by solving the normal equations.
Σy = an + bΣx
Σxy = aΣx + bΣx2
www.gayali.in
Statistics Made Easy | 154
www.gayali.in
504 = − 420b
∴ b = -1.2
Putting the value of b in (i) we have
6a = 54 - 36 × -1.2
= 54 + 43.2 = 97.2 ∴ a = 16.2
Now, substituting the value of a and b in y=a+bx, the equation of the fitting
straight line is y = 16.2 – 1.2x
(iii)
{ x
y
4
46
6
42
8
40
12
36
15
30
17
25
22
19
Solution:
Let y = a + bx be the equation of the best fitting straight line, where we write
X = x – 12 and Y = y – 36. The normal equations for determining the value of a and
b are viz ΣY = an + bΣX, ΣXY = aΣX + bΣX2
Table - Calculations for fitting straight line
x y X = x –12 Y = y – 36 X2 XY
4 46 -8 10 64 -80
6 42 -6 6 36 -36
www.gayali.in
8 40 -4 4 16 -16
12 36 0 0 0 0
15 30 3 -6 9 -18
17 25 5 -11 25 -55
22 19 10 -17 100 -170
Total - - 0 -14 250 -375
Substituting the values ΣY = –14, ΣX = 0, ΣX2 = 250, ΣXY = 375 and n = 7 in the
www.gayali.in
Statistics Made Easy | 155
www.gayali.in
0. 1 0. 1
Using the method of least squares, the normal equations for determining the values of
a and b are ΣY = an + bΣX, ΣXY = aΣX + bΣX2
Table: Calculations for fitting straight line
x − 2. 5 y − 7.2
x y X= Y= X2 XY
0.1 0.1
1.0 5.3 -15 -19 225 285
1.5 5.7 -10 -15 100 150
2.0 6.3 -5 -9 25 45
2.5 7.2 0 0 0 0
3.0 8.2 +5 10 25 50
3.5 8.7 10 15 100 150
4.0 8.4 15 12 225 180
Total - - 0 -6 700 860
www.gayali.in
Statistics Made Easy | 156
www.gayali.in
x −100
x y X= Y = y – 533 X2 XY
10
70 553 -3 20 9 -60
80 547 -2 14 4 -28
90 539 -1 6 1 -6
100 533 0 0 0 0
110 527 1 -6 1 -6
120 520 2 -13 4 -26
Total - - -3 21 19 -126
Substituting the values in the normal equations we get
21 = 6a - 3b - - - (i)
–126 = -3a + 19b - - - (ii)
Multiplying (ii) by 2 and adding with (i),
21 = 6a–3b
−252 = −6a + 38b
−231 = 35b
231
www.gayali.in
∴b = − = −6.6
35
Putting the value of b in equation (i), we get
21 = 6a - 3 × -6.6
or, 6a = 21 - 19.8 = 1.2
1. 2
∴a = = 0. 2
6
www.gayali.in
Statistics Made Easy | 157
www.gayali.in
x −9 y − 2.8
x y X= Y= X2 XY
2 0.1
5 1.7 -2 -11 4 22
7 2.4 -1 -4 1 4
9 2.8 0 0 0 0
11 3.4 1 6 1 6
13 3.7 2 9 4 18
15 4.4 3 16 9 48
Total - - 3 16 19 98
Substituting the values in the normal equations, we get
16 = 6a + 3b - - - (i)
98 = 3a + 19b - - - (ii)
Multiplying (ii) by 2 and subtracting from (i), we get
16 = 6a + 3b
96 = 6a + 38b
−
180 = 35b
180
∴b = = 5.14
35
Putting the value of b in equation (i), we get
www.gayali.in
6a + 3 × 5.14 = 16
6a = 16 – 15.42 = + .58
0.58
a=+ = .097
6
www.gayali.in
Statistics Made Easy | 158
www.gayali.in
16 70 7 -45 49 -315
10 85 1 -30 1 -30
8 100 -1 -15 1 +15
9 115 0 0 0 0
5 120 -4 5 16 -20
4 124 -5 9 25 -45
3 130 -6 15 36 -90
Total - - -8 -61 128 -485
Substituting the values ΣY = 610, ΣX = –8, ΣX2 = 128, ΣXY = –485 and n = 7 in
the normal equations, we get
–61 = 7a – 8b - - - (i)
–485 = -8a + 128b - - - (ii)
Multiplying equation (i) by -8 and equation (ii) by 7 and subtracting from
(iii) to (iv) we get
488 = -56a + 64b - - - (iii)
−3395 = −56a + 896b −−−−(iv )
3883 = − 832b
3883
∴b = − = −4.667
832
www.gayali.in
www.gayali.in
Statistics Made Easy | 159
[3] Apply the principle of least squares to fit a straight line y = a + bx to the following
data:
x 2 4 6 8 10 12 14
y 10 14 15 16 15 17 18
[C.U., B.Sc. (math hours) 1968]
Solution :
The normal equations for determining the values of a and b are viz.
ΣY = an + bΣX
x −8
ΣXY = aΣX + bΣX2 where X = , Y = y − 16
2
Table : Calculations for Fitting Straight Line
x y x −8 Y = y – 16 X2 XY
X=
2
2 10 -3 -6 9 18
4 14 -2 -2 4 4
6 15 -1 -1 1 1
www.gayali.in
8 16 0 0 0 0
10 15 1 -1 1 -1
12 17 2 1 4 2
14 18 3 2 9 6
Total - - - -7 28 30
Substituting the values ΣY = –7, ΣX = 0, ΣX2 = 28, ΣXY = 30 and x = 7 in the
normal equations, we get.
–7 = 7a + b × 0
or, a = –1 30
30 = a × 0 + 28b or,=b = 1.071
28
Hence, the equation of the best fitting straight line is
x −8
y − 16 = −1 + 1.071
2
or, y = 16 – 1 + 536x – 4.28
= 15 – 4.28 + 0.536x
= 10.72 + 0.536x
[4] Fit a straight line to the following data and estimate the most probable yield of
rice for 40 inches of water.
www.gayali.in
Water x (inches) 12 18 24 30 36 42 48
Yield y (tons) 5.27 5.68 6.25 7.21 8.02 8.71 8.42
[C.U., M.com 1964]
Solution:
Let Y=a+bX be te equation of te best fitting straight line, where we write X=
x − 30 y − 7.21
Y= . The normal equations for determining the values of a and b are,viz.
6 .01
www.gayali.in
Statistics Made Easy | 160
ΣY = an + bΣX
ΣXY = aΣX + bΣX2
Table: Calculations for Fitting Straight Line
x y x − 30 y − 7.21 X2 XY
X= Y=
6 0.01
12 5.27 -3 -194 9 582
18 5.68 -2 -153 4 306
24 6.25 -1 -96 1 96
30 7.21 0 0 0 0
36 8.02 1 81 1 81
42 8.71 2 150 4 300
48 8.42 3 121 9 363
Total - - - -91 28 1728
Substituting the values ΣY = –91, ΣX = 0, ΣX2 = 28, ΣXY = 1728 and n = 7 in the
normal equations, we get.
–91 = 7a + b.0
91
or , a = − = –13
www.gayali.in
7
1728 = a×0+28b
1728
or=
,b = 61.71
28
Hence, the equation of the best fitting straight line is
y − 7.21 x − 30
= −13 + 61.71
0.01 6
= –13 + 10.29x – 308.56
= –321.56 + 10.29x
or, y – 7.21 = –3.2156 + 0.1029x
y = 7.21 – 3.2156 + 0.1029x = 3.99 + 0.103x
When x = 40, y = 3.99 + 0.103 × 40
= 3.99 + 4.120
= 8.11 tons.
[5] Calculate the values of m and k for the equation y = mx + k to show the regression
of profit per unit of output on output.
Output x (000) 5 7 9 11 13 15
Profit per unit of output y (Rs.) 1.7 2.4 2.8 3.4 3.7 4.4
www.gayali.in
Estimate the profit per unit of output when there is an output of 10,500.
[I.C.W.A. 1973]
Solution:
Let the equation of the straight line be Y = mX + k, the normal equations will be
ΣY = kn + mΣX
ΣXY = kΣX + mΣX2
www.gayali.in
Statistics Made Easy | 161
9 + 11
x−
Where, X = 2 = x − 10 = x − 10
1 1
×2
2
Table: Calculations for Fitting Straight Line
x y X = x –10 X2 Xy
5 1.7 -5 25 -8.5
7 2.4 -3 9 -7.2
9 2.8 -1 1 -2.8
11 3.4 1 1 3.4
13 3.7 3 9 11.1
15 4.4 5 25 22.0
Total - - - 70 18.0
Substituting the values in the normal equations.
18.4 = 6k + m × 0
18.4
or=
,k = 3.067
6
www.gayali.in
18 = k × 0 + m × 70
18
=
or , m = 0.257
70
Putting the values of k and m, we get
y = 0.257 × (x – 10) + 3.067
y = 3.067 – 2.57 + 0.257x
y = 0.50 + 0.257x be the best fitting straight line
10, 500
When x = y = 0.50 + 0.257 ×10.5
1000
= 10.5 = 0.50 + 2.6985
= Rs.3.20
[6] The following data relate to results of a fertiliser experiment on crop yields:
Units of fertiliser used(x) 0 2 4 6 8 10
Units of yield (y) 110 113 118 119 120 118
Fit a straight line to the above data and estimate the amounts of yield when units
of fertiliser used are 3 and 7 respectively.
[C.U. M.com 1969]
www.gayali.in
Solution:
5−x
Let the equation of the straight line be y = a + bu, when µ = = x −5
1
×2
The normal equations are 2
Σy = na + bΣu
Σuy = aΣu + bΣu2
www.gayali.in
Statistics Made Easy | 162
www.gayali.in
when x = 3, y = 111.90 + 0.886 × 3 = 114.5
when x = 7, y = 111.90 + 0.886 × 7 = 118.1
[7] The weights (in lbs) of a calf taken at weekly intervals are given below. Fit a
straight line, and calculate the average rate of growth per week.
Age (x) 1 2 3 4 5 6 7 8 9 10
Weight (y) 52.5 58.7 65.0 70.2 75.4 81.1 87.2 95.5 102.2 106.4
Solution:
x − 5. 5
Let the equation of the straight line be y = a + bu, where u = 1 = 2 x − 11
×1
The normal equations are 2
Σy = an + bΣu
Σuy = aΣu + bΣu2
Table: Calculations for Fitting Straight Line
x y u = 2x – 11 u2 uy
1 52.5 -9 81 -472.5
2 58.7 -7 49 -410.9
3 65.0 -5 25 -325.0
4 70.2 -3 9 -210.6
www.gayali.in
5 75.4 -1 1 -75.4
6 81.1 1 1 81.1
7 87.2 3 9 261.6
8 95.5 5 25 477.5
9 102.2 7 49 715.4
10 106.4 9 81 957.6
Total - 794.2 0 330 998.8
www.gayali.in
Statistics Made Easy | 163
www.gayali.in
u = (T – 40)/20
y = S – 75
The normal equations are
Σy = bn + mΣu
Σuy = bΣu + mΣu2
Table :
T − 40
T S u= y = S – 75 u2 uy
20
0 54 -2 -21 4 42
20 65 -1 -10 1 10
40 75 0 0 0 0
60 85 1 10 1 10
80 96 2 21 4 42
Total - - 0 0 10 104
Substituting the values from the table in the normal equations:
0 = 5b + m×0 or, b = 0
104 = b×0+10m or, m = 10.4
Putting the values of b and m and rewriting u and y in terms of T and S,
www.gayali.in
T − 40
S − 75 = 0 + 10.4
20
or, S = 75 + 0.52T – 20.8
or, S = 0.52T + 54.2
When T = 500
S = 0.52 ×50 + 54.2
= 26 + 54.2 = 80.2 units
www.gayali.in
Statistics Made Easy | 164
www.gayali.in
4 410 2.6128 16 10.4512
5 328 2.5159 25 12.5795
6 262 2.4183 36 14.5098
7 210 2.3222 49 16.2554
Total 27 - 15.3847 139 67.5362
Substituting the values in the normal equations:
15.3847 = 6A +27B
67.5362 = 27A + 139B
Solving these equations we get
A = 2.999, B= -0.0968
{ log a = 2.999
log b = –0.0968
or { a = anti log 2.999 = 1000 (approx)
b= anti log – 0.0968 = 0.8
Therefore, the equation of the fitting curve is y = 1000 (.8)x
[10] Fit a curve of the form y = axb to the following data:
x 1 2 3 4 5
y 5.0 6.3 7.2 7.9 8.5
Solution:
www.gayali.in
Taking logarithms of both sides in the equation y = axb, we have log y = log a + b
log x i.e. Y = A + bX ------- (i)
Where Y = log y, A = log a, and X = log x. The normal equations for determining
the constants A and b in (i) are
ΣY = An + bΣX
ΣXY = AΣX + bΣX2
www.gayali.in
Statistics Made Easy | 165
www.gayali.in
Therefore , the equation of the fitting curve is y = 5x0.33
[11] Estimate the constants of the pareto curve n = Ax–a which fits the data below:
1945-46 : Number of net incomes more than Rs.x after tax
Income (Rs.x) Number (n)
150 14,000,000
500 825,000
1000 173,000
2000 35,500
[I.C.W.A. 1973]
Solution:
Taking logarithms of both sides in the equation n = Ax–a we have,
log n = log A – a log x i.e. Y = a/ + b/x
Where Y = log n, a/ = log x, b/ = –a, X = log x
The normal equations for determining the constants a/ and b/ in (i) are
ΣY = na/ + b/Σx - - - (ii)
ΣXY = a/ΣX + b/ΣX2 - - - (iii)
Table - Fitting Pareto curve
www.gayali.in
X = log X Y=log n
x n from log table rounded from log table rounded X2 XY
150 14,000,000 2.1761 2.18 7.1461 7.15 4.7525 15.587
500 825,000 2.6990 2.70 5.9165 5.92 7.2900 15.987
1000 173,000 3.000 3.00 5.2380 5.24 9.0000 15.720
2000 35,500 3.3010 3.30 4.5502 4.55 10.8900 15.015
Total - - 11.18 - 22.86 31.9324 62.306
www.gayali.in
Statistics Made Easy | 166
www.gayali.in
Σy=an+bΣx+cΣx2
Σxy=aΣx+bΣx2+cΣx3
Σx2y=aΣx2+bΣx3+cΣx4
Table: Calculations for Fitting Parabola
x y x2 x3 x4 xy x2y
0 1 0 0 0 0 0
1 5 1 1 1 5 5
2 10 4 8 16 20 40
3 22 9 27 81 66 198
4 38 16 64 256 152 608
Total 10 76 30 100 354 243 851
Substituting the values from the table (here r=5)
76 = 5a + 10b + 30c - - - (ii)
243 = 10a + 30b + 100c- - - (iii)
851 = 30a + 100b + 354c- - - (iv)
Multiplying (ii) by 2 and subtracting from (iii)
243 = 10a + 30b + 100c
152 = 10a + 20b + 60c
91 = 10b + 40c −−−−−( v )
www.gayali.in
www.gayali.in
Statistics Made Easy | 167
[13] Fit a parabola of the second degree to the following data taking x as the
independent variable (y = a + bx + cx2), by the method of least squares.
x 0 1 2 3 4
y 1 1.8 1.3 2.5 6.3
Find out the difference between the actual value of y and the value of y obtained
from the fitted curve when x=2.
[I.C.W.A. 1965]
Solution:
The constant a, b, c appearing in the equation y = a + bx + cx2-----(i) are obtined
by solving the normal equations
Σy=an+bΣx+cΣx2
Σxy=aΣx+bΣx2+cΣx3
Σx2y=aΣx2+bΣx3+cΣx4
Table - Calculations for Fitting Parabola
x y x2 x3 x4 xy x2y
0 1 0 0 0 0 0
1 1.8 1 1 1 1.8 1.8
www.gayali.in
2 1.3 4 8 16 2.6 5.2
3 2.5 9 27 81 7.5 22.5
4 6.3 16 64 256 25.2 100.8
Total 10 12.9 30 100 354 37.1 130.3
Substituting the values from the table (here n = 5)
12.9 = 5a + 10b + 30c- - - (ii)
37.1 = 10a + 30b + 100c- - - (iii)
130.3 = 30a + 100b + 354c- - -(iv)
Multiplying (ii) by 2 and subtracting from (iii)
37.1 = 10a + 30b + 100c
25.8 = 10a + 20b + 60c
11.3 = 10b + 40c −−−−−( v )
Again, multiplying (ii) by 6 and subtracting from (iv),
130.3 = 30a + 100b + 354c
77.4 = 30a + 60b + 180c
52.9 = 40b + 174c −−−−−(vi))
Solving (v) and (vi) we get b = –1.07, c = 0.55
Putting these values in (ii), we have a = 1.42. Now putting the values of a, b and
c in (i), the required equation of the parabola is y = 1.42 – 1.07x + 0.55x2
www.gayali.in
[14] The profits (in 1000 Rs.) of a company in the x year of its life are given below. Fit
a parabola and estimate its profit in the sixth year.
x 1 2 3 4 5
y 1250 1400 1650 1950 2300
Solution:
Let v = a + bu + cu2- - - (i) be the equation of the parabola, where u = x – 3,
www.gayali.in
Statistics Made Easy | 168
v = (y – 1650)/50. Applying the method of least squares, the normal equations for
determining the constants a, b, c are
Σv=an+bΣu+cΣu2
Σuv=aΣu+bΣu2+cΣu3
Σu2v=aΣu2+bΣu3+cΣu4
Table: Calculations for Fitting Parabola
y −1650
x y u=x–3 v= u2 u3 u4 uv u2v
50
1 1250 -2 -8 4 -8 16 16 -32
2 1400 -1 -5 1 -1 1 5 -5
3 1650 0 0 0 0 0 0 0
4 1950 1 6 1 1 1 6 6
5 2300 2 13 4 8 16 26 52
Total - - 0 6 10 0 34 53 21
Putting the values from the table in the normal equations
6 = 5a + 6 × 0 + 10c- - - (ii)
www.gayali.in
53 = 0 × a + 10b + 0 × c- - - (iii)
21 = 10a + b × 0 + 34c- - - (iv)
From (ii) 5a + 10c = 6- - - (v)
From (iii) 10b = 53 or, b = 5.3
From (iv) 10a + 34c = 21- - - (vi)
Multiplying (v) by 2 and subtracting from (vi)
10a + 34c = 21
10a + 20c = 12
9
14c = 9 or, =
c = 0.643
14
Putting the value of c in (v) we get
5a+10×.643=6
or, 5a=6–6.43=–0.43
−0.43
a= = −0.086
5
Now, putting the values of a, b, c in equation (i) and rewrite u and v is terms of x and y
y −1650
= –0.086 + 5.3 (x–3) + 0.643 (x–3)2
50
www.gayali.in
y −1650
or, =–0.086 + 5.3x – 15.9 + 0.643x2 – 3.858x + 5.787
50
= 15.986 – 5.787 + 1.442x + 0.643x2 = 10.199 + 1.442x + 0.643x2
or, y – 1650 = 509.95 + 72.1x + 32.2x2
or, y = 1140 + 72.1x + 32.2 x 2
When x=6, y=1140 + 72.1 × 6 + 32.2 × 36=1140 + 432.6 + 1159.2=Rs.2732 (approx.)
www.gayali.in
Statistics Made Easy | 169
Time Series
Meaning
A series of observations recorded in accordance with the time of occurrence is
called “Time Series”. Production, consumption, sales, profits during successive periods
of time and population, price etc. are successive points of time are examples of time
series. Components of time series.
The four components of time series are
1. Secular trend or Trend ( T )
2. Seasonal Variation ( S )
3. Cyclical Fluctuation ( C )
4. Irregular or Random movement ( I )
It is assumed that there is a multiplicative relationship between the four
components i.e. any particular observation is considered to be the product of the
effects of four components.
Yt = T × S × C × I
www.gayali.in
Secular trend (or simply trend) of time series is the smooth, regular and long-
term movement exhibiting the tendency of growth or decline over a period of time. The
trend is that part which the series would have exhibited, had there been no other factors
affecting the values. The population growth together with advances in technology and
methods of business organization are the main factors for the growth or upward trend
in most of the economic and business data. The decline and downward trend may
be due to the decreasing demand of the product, or a substitute taking its place, or
difficulty in obtaining raw materials etc. Many industries, however initially show a
steady growth until a saturation point is reached, and then the trend decline steadily.
But sudden or frequent changes are incompatible with the idea of trend.
Seasonal variation represents a type of periodic movement, where the period is
not longer than one year. Business activities are found to have a brisk and slack periods
at different parts of the year. This up-and-down movement of time series, recurring
with remarkable regularity year after year, is attributable to the presence of seasonal
variations. The factors which cause this type of variation are the climatic changes of
the different seasons, such as changes in rainfall, temperature, humidity etc. and the
customs and habits which people follow at different parts of the year.
Cyclical fluctuation is another type of periodic movement, where the period is
www.gayali.in
more than a year. Such movements are fairly regular and oscillatory in nature. One
complete period is called a cycle. Cyclical fluctuation is found to exist in most of the
business and economic time series, where it is known as business cycle. Business cycle
are caused by a complex combination of forces affecting the equilibrium of demand
and supply. Prosperity, decline, depression and recovery are usually considered to be
the four phases of business cycles. The swing from prosperity to recovery and back
again to prosperity varies both in time span and intensity.
www.gayali.in
Statistics Made Easy | 170
Irregular or random movements are such variations which are caused by factors
of an erratic nature. There are completely unpredictable or caused by such unforeseen
events as war, flood, earthquake, strike and lockout etc. and may sometimes be the
result of many small forces, each of which has a negligible effect, but there combined
effect is not negligible. Random movements do not reveal any pattern of the repetitive
tendency and may be considered as residual variation.
Measurement of trend
There are four methods of isolating secular trend in time series :-
1. Free-hand Method
2. Semi-average Method
3. Moving average Method and
4. Fitting mathematical curves.
[1] Free-hand method : The given data are plotted as points on a graph paper
against time. The time series data (Yt)are shown along the vertical axis and time (t)
along the horizontal axis. Then a smooth free-hand curve is drawn through the scatter
of the plotted points, which appears to represents their patterns of movement over
time. The distance of this line, known as trend line, gives the trend value for each time
www.gayali.in
period. The advantages of the method are that a quick estimate of the trend is obtained
and that the method can be used to obtain a preliminary knowledge of the nature of
trend with a view to applying more refined methods.
[2] Semi-average Method : Semi average method consists in dividing the data into
two parts, and then finding an average for each part. These averages are plotted as
points on a graph paper against the mid-point of the time interval covered by each
part. The straight line joining these two points gives the trend line. As before the
distances of trend line from the horizontal axis give the trend values. If the actual trend
is a straight line, the method will give quite satisfactory results.
[3] Moving Average Method : Moving average method is very commonly used
for the isolation of trend and in smoothing out fluctuations in time series. In this
method, a series of arithmetic means of successive observations, known as moving
averages, are calculated from the given data, and these moving averages are used as
trend values. Precisely, moving averages of period n are a series of arithmetic means
of groups of successive n observations, and are shown against the mid-points of time
intervals covered by the respective groups. If the period of moving average is odd, the
trend values correspond to the given value. If the period of moving average is even, a
two point moving average of the moving averages so obtained, has to be found out for
www.gayali.in
‘centering’ them.
[4] Fitting Mathematical Curves : In this method, an appropriate type of
mathematical equation is selected for trend, and the constraints appearing in the trend
equation are determined on the basis of the given time series data.
[i] If the plotted data shows approximately a straight line tendency on an
ordinary graph paper, the equation used is :
Y = a + bx (Straight Line)
www.gayali.in
Statistics Made Easy | 171
[ii] If they show a straight line on a semi logarithmic graph paper, the
equation used is :
log y = a + bx (Exponential Curve)
[iii] Sometimes a parabola or higher order polynomial may also be fitted.
[iv] Special types of curves are used in certain cases.
y = a + bcx (Modified Exponential Curve)
1/y = a + bcx (Logistic Curve)
log y = a + bcx (Gompertz Curve)
The constants appearing in the equations are referred to at (i) to (iii) are obtained
by applying the principle of least squares.
Measurement of seasonal variation
There are four methods of measuring seasonal fluctuation:
1. Method of (Monthly or Quarterly) Averages
2. Moving Average Method
3. Trend-ratio Method
4. Link Relative Method
www.gayali.in
[1] Method of (Monthly or Quarterly) Averages :
This method is applied when the given time series data do not contain trend
or cyclical fluctuations to any appreciable extent. From the quarterly data the totals
for each quarter and the averages A1, A2, A3, A4 for the 4 quarters, Q1, Q2, Q3 and
1
Q4 are found. The grand average G = (A1 + A2 + A3 + A4) is also calculated. If the
4
additive model is used, the deviations of quarterly averages from the grand average
give seasonal variation:
S1=A1-G, S2=A2-G, S3=A3-G, S4=A4-G
If the multiplicative model is used, each quarterly average is expressed as a
percentage, of the grand average giving the seasonal indices :
A1 A A A
S1 = × 100, S2 = 2 × 100, S3 = 3 × 100, S 4 = 4 × 100
G G G G
If monthly figures are given, we find 12 averages A1, A2, ---------, A12 for the
months January, February, ---------------, December respectively, and then proceeding
the same way as before, the seasonal index for each month is obtained. The total (or
average) seasonal variation (in the additive model) is 0 and the average seasonal index
(in the multiplicative model) is 100.
www.gayali.in
www.gayali.in
Statistics Made Easy | 172
www.gayali.in
S1 = × 100, S2 = 2 × 100, S3 = 3 × 100, S 4 = 4 × 100
P P P P
1
Corresponding to the quarters Q1, Q2, Q3, Q4 respectively, where P = ( P1 + P2 + P3 + P4 ) .
4
The total of the 4 seasonal indices will be 400.
[4] Link Relative Method :
If quarterly data are given, each value is expressed as a percentage of the value for
the immediate preceding period. These are known as Link Relatives (L.R). Of course,
the link relative for the first quarter (Q1) of the first year cannot be obtained. The L.R.s
are arranged by quarters and the average L.R for each quarter is found, either by using
the arithmetic mean or median. The average link relatives show the average relation of
each quarterly value to the value of the previous quarter.
From these average L.R.s we find chain relatives (C.R.) by relating them to a
common base, e.g. the first quarter, for which C.R. is taken as 100. The C.R. for any
quarter is now obtained on multiplying the L.R. for that quarter by the C.R. for the
immediate preceding quarters and dividing by 100. Proceeding this way, we find a
second C.R for the first quarter (Q1) by the relation.
(C.R For Q3 ) × (L.R For Q1 )
Second C.R for Q1 =
www.gayali.in
100
Usually, the second C.R for Q1 will differ from the originally assumed C.R 100,
owing to the presence of trend. Some adjustments to the C.R’s are therefore necessary.
Let C be the average quarterly deviations of the 2nd C.R from 100 i.e.
1
C = (Sec ond C.R. for Q1 − 100)
4
Subtracting C, 2C, 3C and 4C from the C.R’s for Q2, Q3, Q4 and the second C.R
www.gayali.in
Statistics Made Easy | 173
for Q1, we find that both the C.R’s for Q1 are now equal to 100. The adjusted C.R’s for
Q1, Q2, Q3, Q4 are now expressed as percentages of their A.M. to give the seasonal
indices. The total of these seasonal indices will be 400.
Exercise
[1] Using 3-year moving averages, determine the trend and short tern fluctuations,
Plot the original and the trend value on the same graph paper:-
Year 1968 1969 1970 1971 1972 1973 1974 1975 1976 1977
Production (‘000 tons) 21 22 23 25 24 22 25 26 27 26
[C.A. 1981]
Solution:
Calculations for 3 – yearly Moving Average
Year Value (‘000) tons 3-year moving total (‘000 tons) 3 – year moving average (‘000 tons)
1968 21 - -
1969 22 66 22.00
1970 23 70 23.33
1971 25 72 24.00
www.gayali.in
1972 24 71 23.67
1973 22 71 23.67
1974 25 73 24.33
1975 26 78 26.00
1976 27 79 26.33
1977 26 - -
∴Trend: 22.00, 23.33, 24.00, 23.67, 24.33, 26.00, 26.33 for 1969-76 (in ‘000 tons)
Stort – term fluctuations:
22 – 22 = 0, 23.00 – 23.33 = -0.33, 25 – 24 = 1.00, 24 – 23.67 = 0.23, 22 – 23.67 = -1.67,
25 – 24.33 = 0.67, 26 – 26 = 0, 27 – 26.33 = 0.67 (in ‘000 tons)
27 Trend values
26
ta
25 da
al
igin
Or
24
Value
23
22
21
www.gayali.in
20
1968 1969 1970 1971 1972 1973 1974 1975 1976 1977
year
[2] The net profits of a company for eleven successive years are given below. Find
the three – year moving averages:-
Year 1956 ‘57 ‘58 ‘59 ‘60 ‘61 ‘62 ‘63 ‘64 ‘65 ‘66
Profit in lakh of Rs. 2.7 2.9 3.4 5.2 5.8 6.4 9.3 9.2 9.8 10.2 11.0
[I.C.W.A. 1969]
www.gayali.in
Statistics Made Easy | 174
www.gayali.in
Day 11 12 13 14 15 16 17 18 19 20
Sales 28 30 36 46 54 28 31 36 46 54
[C.A. 1974]
Solution:
The data show a regular cycle of 5 days, because every 5th figure is the highest
after which there is slump, followed by gradual recovery.
Table: Calculation for 5 – Day Moving Average
Day Value 5 – day moving total 5 – day moving average
1 26 - -
2 29 - -
3 35 188 37.6
4 47 188 37.6
5 51 191 38.2
6 26 193 38.6
7 32 192 38.4
8 37 194 38.8
9 46 196 39.2
10 53 194 38.8
11 28 193 38.6
www.gayali.in
12 30 193 38.6
13 36 194 38.8
14 46 194 38.8
15 54 195 39.0
16 28 195 39.0
17 31 195 39.0
18 36 195 39.0
19 46 - -
20 54 - -
www.gayali.in
Statistics Made Easy | 175
5 – day moving averages are 37.6, 37.6, 38.2, 38.6, 38.4, 38.8, 39.2, 38.8, 38.6,
38.6, 38.8, 38.8, 39.0, 39.0, 39.0, 39.0 for days 3 to 18
[4] From the following data calculate the 4 – yearly moving average and determine
the trend values. Find the short-tern fluctuations. Plot the original values and trend on
a graph paper.
Year 1958 1959 1960 1961 1962 1963 1964 1965 1966 1967
Value 50.0 36.5 43.0 44.5 38.9 38.1 32.6 41.7 41.1 33.8
[C.A. 1980]
Solution: Table: Calculations for 4 – yearly Moving Average
Year Value 4-year moving total 4-year moving 2-item moving
(not centered) average (not centered) total (centered)
4-year moving
average (centered)
(1) (2) (3) (4) (5) (6)
1958 50.0 - - - -
1959 36.5 - - - -
1960 43.0 174.0 43.5 84.23 42.1
1961 44.5 162.9 40.73 81.86 40.9
www.gayali.in
1962 38.9 164.5 41.13 79.66 39.8
1963 38.1 154.1 38.53 76.36 38.2
1964 32.6 151.3 37.83 76.21 38.1
1965 41.7 153.5 38.38 75.68 37.8
1966 41.1 149.2 37.30
1967 33.8 - - - -
Note: Col (4) = Col (3)/ 4, Col (6) = Col (5)/2
Short Trend values are: 42.1, 40.9, 39.8, 38.2, 38.1, 37.8 for 1960 to 1965.
Short-trend fluctuations: 43.0-42.1=0.9, 44.5-40.9=3.6, 38.9-39.8=-0.9, 38.1-
38.2=-0.1, 32.6-38.1=–5.5, 41.7-37.8=3.9 for 1960 to 1965.
Figure: Trend by 4-year Moving Average
50.0 L ine
re nd
T
45.0 e r age
Av
v ing
Mo
Value
40.0
www.gayali.in
30.0
1958 1960 1962 1964 1966 1968
Year
[5] Determine trend by the method of moving averages from the figures of quarterly
production of a commodity:
www.gayali.in
Statistics Made Easy | 176
www.gayali.in
II 209 656 164.00
III 179 682 170.50
IV 145 - -
Trend: 126.0, 127.6, 133.9, 141.5, 147.8, 160.2, 167.2, for 1975-III to 1977-II
[6] Find the quarterly trend value from the following data by the moving average
method, using an appropriate period:
Quarterly output (million tons)
Quarter/Year 1964 1965 1966
I 52 59 57
II 54 63 61
III 67 75 72
IV 55 65 60
[I.C.W.A. 1971]
Solution: Table: Calculations for Moving Average Trend
Year/Quarter Output (million 4-Quarter moving 2-Period moving 4-quarter moving
tons) total total average
1964 I 52 - - -
II 54 - - -
III 67 228 463 57.9
IV 55 235 479 59.9
1965 I 59 244 496 62.0
www.gayali.in
www.gayali.in
Statistics Made Easy | 177
[7] Assuming a four-yearly cycle, calculate the trend by the method of moving
averages from the following data relating to the production of tea in India:
Year 1941 1942 1943 1944 1945 1946 1947 1948 1949 1950
Production (mn.lbs.) 464 515 518 467 502 540 557 571 586 612
[I.C.W.A. 1968]
Solution: Table: Calculation of Trend by Moving Averages
Year (1) Production (mn.lbs.) 4-year moving 2-item moving total 4-year moving
total of col.(3) (centered) average (centered)
(1) (2) (3) (4) (5)
1941 464 - - -
1942 515 - - -
1943 518 1964 3966 495.8
1944 467 2002 4029 503.6
1945 502 2027 4093 511.6
1946 540 2066 4236 529.5
1947 557 2170 4424 553.0
1948 571 2254 4580 572.5
1949 586 2326 - -
www.gayali.in
1950 612 - - -
Moving Average are 495.8, 503.6, 511.6, 529.6, 553.0, 572.5 (mn.lbs.) for 1943–1948.
[8] For the following series of observations verify that the 4-year centered moving
average is equivalent to a 5-year weighted moving average with weights 1, 2, 2, 2, 1
respectively:
Year 1964 1965 1966 1967 1968 1969 1970 1971 1972 1973 1974
Sales (Rs.’000) 2 6 1 5 3 7 2 6 4 8 3
[C.A. 1979]
Solution;
Table: Calculations for Moving Average
Year (1) Sales (Rs.’0000) 4-year moving total 2-item moving total 4-year moving
of (3) average
(1) (2) (3) (4) (5)
1964 2 - - -
1965 6 - - -
1966 1 14 29 29/8
1967 5 15 31 31/8
www.gayali.in
1968 3 16 33 33/8
1969 7 17 35 35/8
1970 2 18 37 37/8
1971 6 19 39 39/8
1972 4 20 41 41/8
1973 8 21 - -
1974 3 - - -
www.gayali.in
Statistics Made Easy | 178
Instead of taking simple averages of the values for 4-conseeutive years, the
weighted averages are calculated.
Table: Calculations for weighted Moving Average
Year Value i. e. Sales (Rs.0000) Weighted Moving Total Weighted Moving Average (a)
(1) (2) (3) (4)
1964 2 - -
1965 6 - -
1966 1 29 29/8
1967 5 31 31/8
1968 3 33 33/8
1969 7 35 35/8
1970 2 37 37/8
1971 6 39 39/8
1972 4 41 41/8
1973 8 - -
1974 3 - -
(x) 2 × 1 + 6 × 2 + 1 × 2 + 5 × 2 + 3 × 1 = 29
6 × 1 + 1 × 2 + 5 × 2 + 3 × 2 + 7 × 1 =31
(a) col (4) = col(3) ÷ sum of weights i. e. 1+2+2 +2+1=8
www.gayali.in
Hence, the result.
[9] Fit a suitable straight line to the following data by the method of least squares:
Year 1959 1960 1961 1962 1963
% of insured people 11.3 13.0 9.7 10.6 10.7
[Dip. Management 1972]
Solution: Let y = a + bx - - - (i)
be the equation of the straight line trend with origin at the year 1961 and x unit = 1 year.
By the least square method, the normal equations for finding the currants ‘a’ and ‘b’ are
Σy = an + bΣx - - - (ii)
Σxy = aΣx + bΣx2 - - - (iii)
Table: Fitting straight Line Trend
Year % of insured people (y) x x2 xy
1959 11.3 -2 4 -22.6
1960 13.0 -1 1 -13.0
1961 9.7 0 0 0
1962 10.6 1 1 10.6
1963 10.7 2 4 21.4
Total 55.3 0 10 -3.6
Number of observations n = 5. Substituting the value from the table in equations
(ii) and (iii)
www.gayali.in
55.3
55.3 = 5a + b. 0 or, a = = 11.06
5
3. 6
-3.6 = a. 0 + 10b or, b = − = –0.36
10
Putting the value of a and b in equation (i), the trend equation is
y = 11.6 – 0.36x
(origin = 1961, unit of x = 1 year)
www.gayali.in
Statistics Made Easy | 179
[10] Fit a straight line trend to the following data, and show the original observations
and trend values on graph paper.
www.gayali.in
Year Value (y) (Rs.crores) x x2 xy
1965 672 -3 9 -2016
1966 824 -2 4 -1648
1967 967 -1 1 -967
1968 1204 0 0 0
1969 1464 1 1 1464
1970 1758 2 4 3516
1971 2057 3 9 6171
Total 8946 0 28 6520
www.gayali.in
Statistics Made Easy | 180
These values are plotted on the graph paper and a straight line is drawn through
the points, giving the trend line.
2200
Value of output (Rs. crores)
1800
D ata
i nal
ig
1400 Or
ine
ndL
e
1000 Tr
600
0 1965 1966 1967 1968 1969 1970 1971
Year
[11] Find the value of the trend ordinates by the method of least squares from the
www.gayali.in
data given below.
Year 1971 1972 1973 1974 1975 1976 1977
Sales (Rs.’000) 125 128 133 135 140 141 143
[I.C.W.A. 1980]
Solution:
Let y = a + bx - - - (i)
be the equation of the straight line trend with origin at the year 1974 and x unit = 1 year.
By the least square method, the normal equations for finding the constants ‘a’ and ‘b’ are
Σy = an + bΣx - - - (ii)
Σxy = aΣx + bΣx2 - - - (iii)
Table: Fitting straight Line Trend
Year Sales (y) x x2 xy
1971 125 -3 9 -375
1972 128 -2 4 -256
1973 133 -1 1 -133
1974 135 0 0 0
1975 140 1 1 140
1976 141 2 4 282
1977 143 3 9 429
www.gayali.in
Total 945 0 28 87
Number of observations n = 7. Substituting the value from the table in equations
(ii) and (iii)
945
945 = 7a + b × 0 or, a = = 135
7
87
87 = a × 0 + 28b or, b = = 3.1
28
www.gayali.in
Statistics Made Easy | 181
www.gayali.in
Σy = an + bΣx - - - (ii)
Σxy = aΣx + bΣx2 - - - (iii)
Table: Fitting straight Line Trend
Year Sales (y) x x2 xy
1968 50.0 -5 25 -250.0
1969 36.5 -4 16 -146.0
1970 43.0 -3 9 -129.0
1971 44.5 -2 4 -89.0
1972 38.9 -1 1 -38.9
1973 38.1 0 0 0
1974 32.6 1 1 32.6
1975 38.7 2 4 77.4
1976 41.7 3 9 125.1
1977 41.1 4 16 164.4
1978 33.8 5 25 169.0
Total 438.9 0 110 -84.4
Number of observations n = 11. Substituting the value from the table in
equations (ii) and (iii)
438.9
438.9 = 11a + b(0) or, a = = 39.9
11
www.gayali.in
−84.4
-84.4 = a (0) + b (110) or, b = = –0.77
110
Putting these values of a and b in equation (i), the trend equation is
y = 39.9 – 0.77x - - - (iv)
With origin at 1973 and x unit = 1year
The value of x for the year 1971 and 1976 are respectively -2, 3
Hence, Putting x=–2, 3 in equation (iv), the estimates for 1971 and 1976 are respectively
www.gayali.in
Statistics Made Easy | 182
y = 39.9 – 0.77 × –2
= 39.9 + 1.54 = 41.44
and y = 39.9 – 0.77 × 3
= 39.9 – 2.31 = 37.59
Table: Calculations for 5-yearly Moving Average
Year Value 5-year moving total 5-year moving average
1968 50.0 - -
1969 36.5 - -
1970 43.0 212.9 42.40
1971 44.5 201.0 40.20
1972 38.9 197.1 39.42
1973 38.1 192.8 38.56
1974 32.6 190.2 38.00
1975 38.7 192.2 38.44
1976 41.7 187.9 37.58
1977 41.1 - -
1978 33.8 - -
Trend value (in Rs.’000) by least squares:
41.4 & 37.59 by moving averages: 40.20 & 37.58
www.gayali.in
[13] Fit a linear trend equation to the following series on production:
Year 1961 1962 1963 1964 1965 1966
Production (tons) 21 37 48 56 62 69
[M.B.A. 1979]
Solution:
Let y = a + bx - - - (i)
be the equation of the straight line trend with origin at the mid point of 1963 and 1964
and x unit = 6 months (since data are given for an even number of years, i.e. n=6 in even,
the origin and unit of x have been so chosen to make ∑x=0). By the least square method, the
normal equations for finding the constants a and b are
Σy = an + bΣx - - - (ii)
Σxy = aΣx + bΣx2 - - - (iii)
Table: Fitting straight Line Trend
year = 1963.5
Year Value (y) i. e. x= x2 xy
Production 1
2
1961 21 -5 25 -105
1962 37 -3 9 -111
1963 48 -1 1 -48
1964 56 1 1 56
1965 62 3 9 186
www.gayali.in
1966 69 5 25 345
Total 293 0 70 323
Putting the value in the normal equations (ii) and (iii)
293
293 = 6a + b(0) or, a = = 48.83
6
323
323 = a(0) + 70b or, b = 70 =4.61
www.gayali.in
Statistics Made Easy | 183
Substituting the value of 'a' and 'b' in equation (i), the trend equation is
y = 48.83 + 4.61x
(origin: mid – point of 1963-64 unit of x = 6 months.)
[14] Fit a straight line trend to the following series of production data:
Electricity Generated (monthly average) in West Bengal
Year 1951 1952 1953 1954 1955 1956
Electricity Generated (million KW) 101 107 113 121 136 148
[C.U.M.Com 1980]
Solution:
Let y = a + bx - - - (i)
be the equation of the straight line trend with origin at the mid-point of 1953 and 1954
and x unit = 6 months.
By the least square method, the normal equations for finding the constants ‘a’ and ‘b’ are
Σy = an + bΣx - - - (ii)
Σxy = aΣx + bΣx2 - - - (iii)
Table: Fitting straight Line Trend
www.gayali.in
year = 1953.5
Year Value (y) x= x2 xy
(million tons) 1
2
1951 101 -5 25 -505
1952 107 -3 9 -321
1953 113 -1 1 -113
1954 121 1 1 121
1955 136 3 9 408
1956 148 5 25 740
Total 726 0 70 330
Putting the value in the normal equations (ii) and (iii)
729
726 = a(6) + b(0) or, a = = 121
6
330
330 = a × 0 + b(70) or, b = = 4.71
70
Substituting the value of a and b in equation (i), the trend equation is
y = 121 + 4.71x
With origin at mid – point of 1953-54 and x unit = 6 months.
[15] The annual revenue expenditure (in Rs.crores) of Govt. of India is given below
for 6 successive years:
www.gayali.in
www.gayali.in
Statistics Made Easy | 184
year −1956
Year Value (y) x= x2 xy
1
2
1953 – 54 225 -5 25 -1125
1954 – 55 238 -3 9 -714
1955 – 56 262 -1 1 -262
1956 – 57 293 1 1 293
1957 – 58 399 3 9 1197
1958 – 59 520 5 25 2600
Total 1937 0 70 1989
www.gayali.in
Putting the value in the normal equations (ii) and (iii)
1937
1937 = 6a + b × 0 or, a = = 322.8
6
1989
1989 = a × 0 + b(70) or, b = = 28.41
70
Substituting the value of a and b in equation (i), the trend equation is
y = 322.8 + 28.41x
With origin at mid – point of 1955-56 and 1956-57 and x unit = 6 months.
[16] Fit a straight line trend equation by the method of least squares and estimate the
value for 1969.
Year 1960 1961 1962 1963 1964 1965 1966 1967
Value 380 400 650 720 690 600 870 930
[C.A. 1978]
Solution:
Let y = a + bx - - - (i)
www.gayali.in
be the equation of the straight line trend with origin at the mid-point of 1963 and 1964
and x unit = 6 months.
By the least square method, the normal equations for finding the constants ‘a’ and ‘b’ are
Σy = an + bΣx - - - (ii)
Σxy = aΣx + bΣx2 - - - (iii)
www.gayali.in
Statistics Made Easy | 185
www.gayali.in
Substituting the value of a and b in equation (i), the trend equation is
y = 655 + 35.83x
With origin at mid – point of 1963-64 and x unit = 6 months.
1969 − 1963.5
For the year 1969, the value of x =
1
2
5. 5
= = 11
1
2
The trend value is y = 655 + 35.83 × 11
= 655 + 394.13
= 1049.13
= 1049
[17] Fit a parabolic curve of second degree (y = a + bx + cx2 ) to the data given below
by the method of least squares:
Year 1973 1974 1975 1976 1977
Import(y) in ‘000 bales 10 12 13 10 8
(Take 1975 as origin and unit of x as 1 year)
[I.C.W.A. 1981]
www.gayali.in
Solution:
Let y = a + bx + cx2 - - - (i)
be the equation of the second degree polynomial i. e. parabola with origin of x
at the year 1975 and unit of x = 1 year. Using the method of least squares, the normal
equations for determining the constants a, b, c are
Σy = an + bΣx + cΣx2 - - - (ii)
Σxy = aΣx + bΣx2 + + cΣx3 - - - (iii)
Σx2y = aΣx2 + bΣx3 + + cΣx4 - - - (iv)
www.gayali.in
Statistics Made Easy | 186
www.gayali.in
12 6
or, c = − = − = −0.86
14 7
Putting the value of c in equation (v), we get
6
5a + 10 × − = 53
7
Or, 35a – 60 = 371
431
Or, 35a = 431 or, a = = 12.31
35
Solving these three equations, we find a = 12.3, b = –0.60, c = –0.86
The equation of the fitted second degree polynomial is therefore y = 12.3 – 0.60x – 0.86x2
Where the origin of x is at the year 1975 and unit of x = 1 year.
www.gayali.in
Statistics Made Easy | 187
www.gayali.in
−28a + 196c = 1193 ---- (viii)
−84c = −41
41
=
c = 0.49
84
Putting the value of c in equation (v), we get
41
7a = 288 − 28 × = 288 − 13.67 = 274.33; ∴a = 39.2
84
Solving these three equations, we find a = 39.2, b = 2.04, c = 0.49
The equation of the fitted second degree polynomial is therefore y = 39.2 + 2.04x +
0.49x where the origin of x is at the year 1963, and unit of x=1 year.
2
[19] Fit a trend equation log y = A + Bx to the series of sales data given below:
Year (x) 1943 ‘44 ‘45 ‘46 ‘47 ‘48 ‘49 ‘50 ‘51
Sales (y) 97 113 129 202 195 193 192 237 235
[C.U., B.A.(Econ.) 1971]
Solution:
Let us take the origin of x at the year 1947. The original exponential function
www.gayali.in
www.gayali.in
Statistics Made Easy | 188
www.gayali.in
60
Putting the value of A = Log a and B = Log b in the given trend equations is
Log y = 2.2290 + .0471x
(origin = 1947, unit x = 1 year)
[20] The following data relate to average monthly number of tourists coming to India
in different year. Fit an exponential trend by the method of least squares:
Year 1971 1972 1973 1974 1975
Number of tourists 25,083 28,579 34,157 35,267 38,773
[W.B.H.S. 1981]
Solution:
Let us take the origin of x at the year 1973. Let us take the exponential trend
y = abx of the data given and year is takes as x and nmuber of tourists as y, Taking
logarithms of both sides of the equation
We have, y = abx
log y = log a + x log b
This can be written in the form of a straight line
www.gayali.in
Y = A + Bx, - - - (i)
where Y = Log y, A = Log a and b = log b, using the method of least squares, the
normal equations for determining A and B are
ΣY = An + BΣx - - - (ii)
ΣxY = AΣx + BΣx2 - - - (iii)
www.gayali.in
Statistics Made Easy | 189
www.gayali.in
Year 1960 1961 1962 1963 1964 1965 1966
Y 37 38 37 40 41 45 50
Y Values being the average production in thousand tons, what is the monthly
trend increment ? Find the monthly trend values from the fitted equation for January,
March and December of 1961.
Solution: Let y=a+bx ----- (i) be the equation of straight line trend fitted to the
given yearly data (origin 1963; x unit = 1 year). The normal equations for finding the
constants a and b are
Σy = an + bΣx - - - (ii)
Σxy = aΣx + bΣx2 - - - (iii)
Table: Fitting Straight Line Trend
Year y x x2 xy
1960 37 -3 9 -111
1961 38 -2 4 -76
1962 37 -1 1 -37
1963 40 0 0 0
1964 41 1 1 41
1965 45 2 4 90
www.gayali.in
1966 50 3 9 150
Total 288 0 28 57
Using the results from the table in the normal equations,
288
288 = 7a + b(0) or , a = = 41.14
7
57
57 = a(0) + 28b or , b = = 2.04
28
www.gayali.in
Statistics Made Easy | 190
www.gayali.in
Since April 1961 is 28 months earlier from origin viz July 1963, putting x = –28
in the trend equation y = 41.23 + 0.17x – 28
= 36.47
Since December 1961 is 19 months earlier from origin vizJuly 1963, putting x = –19
in the trend equation y = 41.23 + 0.17x – 19
= 41.23 – 3.23 = 38
Ans: [a] y = 41.14 + 0.17x origin 1963, unit of x = 1 year
[b] Trend values for January 1961 = 36.13
Trend values for March 1961 = 36.47
Trend values for December 1961 = 38.00
[22] Determine the linear trend equation that fits the following figures on quarterly
consumption of raw material in some factory. Given that the seasonal index for the
third quarter of a year is 117%. What is the estimated consumption for the third
quarter of 1982 ?
Consumption (in tons)
Year / Quarter 1 2 3 4
1976 28 25 31 39
1977 42 44 48 51
1978 55
www.gayali.in
[D.S.W : 1978]
Solution:
Let y = a + bx be the equation of trend (origin: 1st quarter of 1977; unit of x = 1
quarter). By the method of least squares, the values of a and b are obtained from the
normal equations.
∑y = an + bx
∑xy = a∑x + b∑x2
www.gayali.in
Statistics Made Easy | 191
www.gayali.in
of x = 1 quarter)
For third quarter of 1982, x = 22
Trend Value, y = 40.33 + 3.75 × 22 = 40.33 + 82.5 = 122.83
∴ Seasonal index = 117%
∴ Estimated consumption = 122.83 × 1.17 = 143.7 tons.
[23] Suppose we have a series of quarterly production figures (in thousand tons) in
an industry for the years 1970 to 1976, and the equation of the linear trend fitted to the
annual data is
xt = 107.2 + 2.93t
Where t = year – 1973 and xt = annual, production in time period t.
Use this equation to estimate the annual production for the year 1977, and for
the year 1971.
Suppose now the quarterly indices of seasonal variations are:
January – March 125, April-June 105, July-September 87, October – December 83.
(The multiplicative model for the time series is assumed. Use these indices to
estimate the production during the first quarter of 1977.)
[C.U., B.A (Econ.) 1978]
Solution:
Here : t1977 = 1977 – 1973 = 4
www.gayali.in
www.gayali.in
Statistics Made Easy | 192
www.gayali.in
Using this information, draw up a monthly sales budget for the company.
(Assume that there is no trend).
[C.A. , 1978]
Solution:
Seasonal indices are usually expressed as percentages, their average being 100.
Hence, seasonal indices must be divided by 100, to obtain seasonal effects. The average
monthly sales being Rs.2, 00, 000 × (seasonal effect),
Where Seasonal Effect = Seasonal index ÷ 100
Table: Budget Estimates of Monthly Sales
Month Seasonal Index Seasonal Effect Estimated Sales (Rs.’000)
(1) (2) (3) (4)
Jan. 76 .76 152
Feb. 77 .77 154
Mar. 98 .98 196
Apr. 128 1.28 256
May. 137 1.37 274
Jun. 122 1.22 244
Jul. 101 1.01 202
www.gayali.in
www.gayali.in
Statistics Made Easy | 193
[25] Deseasonalise the following data with the help of seasonal index given against:
Month January February March April May June
Cash Balance(Rs.’000) 360 400 550 360 350 550
Seasonal Index 120 80 110 90 70 100
[C.A. May]
Solution:
“Seasonal Index” shown here actually refers to seasonal effect. Since, seasonal
index give ratio changes over the normal value, a multiplicative model is to be assumed
for the data, Yt = T × S × C × I. In order to deseasonalise (i.e. eliminate the seasonal
effect) it is therefore necessary to divide the data by the seasonal effects.
Yt T × S × C × I
Deseasonalise data = = = = T×C×I .
S S
Table : Deseasonalise Time Series Data
yt
Month Cash Balance (Rs.’000) yt Seasonal Index Seasonal effects S Decentralized data = S
January 360 120 1.20 300
February 400 80 0.80 500
www.gayali.in
March 550 110 1.10 500
April 360 90 0.90 400
May 350 70 0.70 500
June 550 100 1.00 550
Deseasonalised data are 300,500,500,400,500 and 550 (Rs.’000)
Seasonal Index
Note: Seasonal Effect = =
100
[26] The following table gives the cash receipts and the seasonal indices for 12 months:
Months Jan. Feb. Mar. Apr. May. Jun. Jul. Aug. Sep. Oct. Nov. Dec.
Cash Receipts (millions 35.1 23.7 20.8 21.1 28.3 22.5 23.1 24.3 41.3 62.1 65.4 71.7
of Rs.)
Seasonal Index 1.30 0.67 0.57 0.57 0.71 0.63 0.71 0.71 1.37 1.82 1.45 1.49
Eliminate the seasonal variations in the cash-receipts and discuss the significance
of such data.
Solution:
‘Seasonal Index’ shown here actually refers to seasonal effect. (Note that “seasonal
index is generally used to show ‘percentage’ position and should be distinguished from
www.gayali.in
“seasonal effect”, indicating effect ‘per unit’. The average seasonal index is 100, but
average seasonal effect is 1.) Since seasonal indices give ratio changes over the normal
value, a multiplicative model is to be assumed for the data, Yt = T × S × C × I. In order
to deseasonalise (i.e. eliminate the seasonal effect) it is therefore necessary to divide
the data by the ‘seasonal effects’.
Yt T × S × C × I
Deseasonalise data = = = T×C×I .
S S
www.gayali.in
Statistics Made Easy | 194
www.gayali.in
not affected by seasonal fluctuations.
[27] The sale of a company rose from Rs.60, 000 in the month of August to Rs.69,
000 in the month of September. The seasonal indices for these two months are 105
and 140 respectively. The owner of the company was not at all satisfied with the rise
of the sale in the month of September by Rs.9,000. He expected much more because
of the seasonal index for that month, what were his estimate of sales for the month of
September?
[I.C.W.A. 1977]
Solution:
The seasonal index for august is 105 and the actual sales were Rs.60, 000. On
60000
this basis the normal monthly sales would be = Rs.57143 and the expected sales
1.05
60000
during September, when the seasonal index is 140, would be, × 1.40 = Rs.80, 000
1.05
But the actual sales during September viz. 69000 are less than this i.e. Rs.80,000.
Hence, the company is a losing concern which justifies company owner’s dissatisfaction.
[28] Suppose that the secular trend of sales of a company is accurately described by
the equation Ye = 120,000 + 1000x where x represents a period of one month and has a
www.gayali.in
value 0 in December 1981. The seasonal indices for the company’s sales are as follows:
January 100, February 80, March 90, April 120, May 115, June 95, July 75, August
70, September 90, October 95, November 120, December 150.
Ignoring cyclical and random influences, forecast sales for (i) February 1983,
(ii) May 1986, (iii) December 1994.
[C.U., B.Sc. (Econ.) 1982]
www.gayali.in
Statistics Made Easy | 195
Solution:
[i] From Feb 1983 to Dec 1981 = 14 months
[ii] Form May 1986 to December 1981 = 53 months
[iii] Form December 1994 to December 1981 = 156 months
Putting the value of x = 14,53 and 156 is the equation ye = 120,000 + 1,000x we
get trend values as
[i] ye = 120,000 + 1000 × 14 = 120,000 + 14000 = Rs.134,000
[ii] ye = 120,000 + 1000 × 53 = 120,000 + 53000 = Rs.173,000
[iii] ye = 120,000 + 1000 × 156 = 120,000 + 156000 = Rs.276,000
Now, multiply by 0.80, 1.15 and 1.50 with the trend values Rs.134,000, Rs.173,000
and Rs.276,000, we get the forecast sales for
[i] February 1983 as Rs.134000 × .80 = Rs. 107,200
[ii] May 1986 as Rs.173,000 × 1.15 = Rs.198,950
[iii] December 1994 as Rs.276000 × 1.50 = Rs.414,000.
[29] Calculate the seasonal index from the following data using the average method:
Year 1st Qr 2nd Qr 3rd Qr 4th Qr
1974 72 68 80 70
1975 76 70 82 74
www.gayali.in
1976 74 66 84 80
1977 76 74 84 78
1978 78 74 86 82
[C.A. 1979]
Solution:
Method of quarterly averages seems as appropriate here, since no appreciable
trend is noticed in the given data (note that the values in any quarter do not reveal any
definite tendency to change). The calculations are shown below, using the multiplicative
model.
Table: Calculation for Seasonal Index
Year Q1 Q2 Q3 Q4 Total
1974 72 68 80 70 -
1975 76 70 82 74 -
1976 74 66 84 80 -
1977 76 74 84 78 -
1978 78 74 86 82 -
Total 376 352 416 384 1528
A.M 75.2 70.4 83.2 76.8 305.6
Seasonal Index 98 92 109 101 400
www.gayali.in
www.gayali.in
Statistics Made Easy | 196
[30] Using the method of exponential smoothing, find forecasts for the following
sales data, taking an initial forecast 25 and a smoothing coefficient 0.4.
Day 1 2 3 4 5 6 7 8
Sales 26 28 23 27 24 30 26 27
[C.A. 1974]
Solution:
The exponentially smoothed average at time t is Ut= Ut–1 + αe+
Where et = yt – Ut–1 is the “error” . Here we are given α= 0.4.
Table: Calculations for Exponential Smoothing
Day (t) Sales yt Previous forecast Ut–1 Error et = y+ – Ut–1 ∝ et Current Forecast
(1) (2) (3) (4) (5) (6)
1 26 25.00 1 0.4 25.40
2 28 25.40 2.6 1.04 26.44
3 23 26.44 -3.44 -1.38 25.06
4 27 25.06 1.94 0.78 25.84
5 24 25.84 -1.84 -0.74 25.10
www.gayali.in
6 30 25.10 4.90 1.96 27.06
7 26 27.06 -1.06 -0.42 26.64
8 27 26.64 0.36 0.14 26.78
Forecasted sales data are:
25.40, 26.44, 25.06, 25.84, 25.10, 27.06, 26.64, 26.78.
A scatter diagram indicates the nature of association between the two variables i.e.
the type of correlation between them. If the pattern of points (or dots) on the scatter
diagram shows a linear path diagonally across the graph paper from the bottom left-
hand corner to the top right, correlation will be positive. In other words, association
between variable is direct, indicating thereby that high values of one variable are in
general, associated with high values of the other variable, and low values are associated
with low values.
www.gayali.in
Statistics Made Easy | 197
y y y
x x x
(a) positive (b) negative (c) zero
y y y
x x x
(d) zero (e) +1 (f) –1
On the other hand, if the pattern of dots be such as to indicate a straight line path
from the upper Left-hand corner to the bottom right, correlation is negative, i.e. is the
association is indirect, high values of one variable being associated with low values of
the other ( fig.b).
www.gayali.in
When dots do not indicate any straight line tendency, but a swarm (fig.c) or
concentration around the curved line, correlation is small (fig.d). In fact, if no straight
line tendency is noticed, correlation will be zero.
When the dots lie exactly on a straight line, correlation is perfect- the correlation
coefficient being +1 or -1, according as the slope of the straight line is positive or
negative (figs. e, f).
The scatter diagram also gives an indication of the degree of linear correlation between
the variables, i.e. whether correlation is high or low. If the plotted points on the scatter
diagram i.e. approximately on, or near about, a straight line (figs e,f) correlation
coefficient will be nearly one, numerically. The more scattered the points are around a
straight line, the less is the correlation coefficient (figs. a,b).
Correlation:
The word ‘correlation’ is used to denote the degree of association between variables.
If two variables x and y are so related that variations in the magnitude of one variable
tend to be accompanied by variations in the magnitude of the other variable, they are
said to be correlated. If y tends to increase as x increases, the variables are said to be
positively correlated. If y tends to decrease as x increase, the variables are negatively
correlated. If y tends to decrease as x increases, the variables are negatively correlated.
If the values of y are not affected by changes in the values of x, the variables are said to
www.gayali.in
be uncorrelated.
Covariance:
Given a set of n pairs of observations (x1, y1), (x2, y2), ----------, (xn, yn) relating to two
variables x and y, the covariance of x and y, usually represented by cov (x, y), is defined as
1
cov ( x, y ) = Σ ( x − x ) ( y − y )
n
www.gayali.in
Statistics Made Easy | 198
www.gayali.in
moment formula and is used as a measure of linear correlation between x and y.
The formula for r may be written in various other forms:
Σ ( x − x )( y − y )
r=
√ Σ ( x − x ) Σ ( y − y )
2 2
Σxy − nx y
r=
( )(
√ Σx 2 − nx 2 Σy 2 − ny 2
)
nΣxy − ( Σx )( Σy )
r=
{ 2
}{
√ nΣ x 2 − ( Σ x ) nΣ y 2 − ( Σ y )
2
}
Properties of correlation coefficient:
[i] The correlation coefficient r is independent of the choice of both origin and
scale of observations. This means that if
x −c y − c′
u= and v =
d d′
Where c, c′, d, d′ are arbitrary constants and (d, d′ positive), then rxy = ruv
www.gayali.in
[ii] The correlation coefficient r is a pure number and is independent of the units
of measurement. This means that if, for example, x represents heights in inches and y
weight in lbs, than the correlation coefficient between x and y will neither be in inches
n or in lbs, or any other unit, but only a number.
[iii] The correlation coefficient r lies between -1 and +1 i.e. r cannot exceed 1
numerically.
–1 ≤ r ≤ + 1
www.gayali.in
Statistics Made Easy | 199
Calculation of r
Correlation coefficient (r) calculated from a given set of n pairs of observations (x1, y1),
(x2, y2), ----------, (xn, yn) as follows:
cov ( X , Y )
[i] If X = x – c and Y = y – c/ (here c, c/ are constants) then rXY =
σ X σY
2 2
ΣX 2 ΣX
2 ΣY 2 ΣY
Where σ X = − , σY 2 = −
n n n n
ΣXY ΣX ΣY
cov ( X , Y ) = −
n n n
x−c y − c′
[ii] If u = and v = (here c, c′ d, d′ are constants and d, d′ are positive), then
d d′
cov ( u, v )
rxy = ruv =
σu , σ v
2 2
Σu 2 Σu Σv 2 Σv
where σu 2 = 2
www.gayali.in
− , σ = −
n n n n
v
Σuv Σu Σv
cov ( u, v ) = −
n n n
Regression:
The word "regression" is used to denote estimation or predition of the average value
of one variable for a specified value of the other variable. The estimation is done by
means of suitable equations, derived on the basis of available bivariate data. Such an
equation is known as Regression equation and its geometrical representation is called
a Regression curve.
In linear regression (or simple regression) the relationship between the variables is
assumed to be linear. The estimate of y (say y’) is obtained from an equation of the
form. y ′ − y = byx ( x − x ) and the estimate of x (say, x’) from another equation of the
form x ′ − x = bxy ( y − y )
Equation (i) is known as Regression equation of y on x, and equation (ii) as Regression
equation of x an y. The coefficient byx appearing in the regression equation of y on x
is known as the regression coefficient of y on x. Similarly bxy is called the Regression
coefficient of x on y.
www.gayali.in
www.gayali.in
Statistics Made Easy | 200
[ii]
Regression Equation of x on y
x − x = bxy ( y − y )
cov ( x, y ) σx
Where bxy = 2
=r
σy σy
2. The product of two regression coefficients is equal to the square of correlation
coefficient byx, bxy = r2
3. r, byx and bxy, all have the same sign. If the correlation coefficient r is zero, the
regression coefficients byx and bxy are also zero.
4. The regression lines always intersect at the point ( x , y ) . The slopes of the regression
line of y on x and the regression line of x on y are respectively byx and 1/bxy.
5. The angle between the two regression lines depends on the correlation coefficient r.
When r = 0 the two lines are perpendicular to each other. When r = + 1 or r = –1, they
coincide. As r increases numerically from 0 to 1, the angle between the regression lines
diminishes from 900 to 00
Rank Correlation:
www.gayali.in
The correlation coefficient between the two series of ranks is called ‘Rank Correlation
Coefficient’ . It is given by the formula,
b Σd 2
R =1− - - - (1)
n3 − n
where d represents the difference of the ranks of an individual in the two characters
and n is the number of individuals. This formula is also known as ‘Spearman’s formula
for rank correlation coefficient.
The rank correlation coefficient lies between –1 and +1.
–1 ≤ R ≤ + 1
It has the maximum value +1, when the ranks in the two characters are equal. Again R
has the minimum value -1, when the ranks are just the opposite.
In the calculation of rank correlation coefficient from given scores, if several individuals
have the same score in any character, they must be allotted the same ranks and we are
then concerned with, what are known as ‘tied ranks’. In dealing with such cases, the
usual way is to allot the average rank to each of these individuals, and then calculate
the product moment correlation – coefficient from these ranks. However, in such cases
www.gayali.in
one way of correcting formula (1) is to increase ∑d2 by (t3 – t)/12 in respect of each tie,
where t denotes the number of individuals involved in a tie, whether in the first or
second series. The modified formula for rank correlation coefficient, when there are
ties, is then
R ′ = 1 −
{ (
6 Σd 2 + Σ t 3 − t / 12 ) }
3
n −n
www.gayali.in
Statistics Made Easy | 201
Exercise
[1] The data given below relate to the heights and weights of 20 persons. You are
required to form a two-way frequency table with class-intervals 62" to 64", 64" to 66"
and so on, and 115 to 125 lbs, 125 to 135 lbs, and so on.
Sl no. 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
Height 70 65 65 64 69 63 65 70 71 62 70 67 63 68 67 69 66 68 67 67
weight 170 135 136 137 148 124 117 128 143 129 163 139 122 134 140 132 120 148 129 152
[C.A. 1966]
Solution:
Table : Two-way Frequency Table Showing Height with Weight of 20 persons.
Height (inches)
62-64 64-66 66-68 68-70 70-72 115-125 125-135 135-145 145-155 155-165 165-175 Total
62-64
64-66
66-68
68-70
www.gayali.in
70-72
Weight (lbs.)
115-125 2 1 1 4
125-135 1 1 2 1 5
135-145 3 2 1 6
145-155 1 2 3
155-165 1 1
165-175 1 1
Total 3 4 5 4 4 20
Note. class intervals 62-64, 64-66 etc. represent “62 and above but below 64, 64
and above but below 66 etc.
[2] Calculate r from the following given results :–
x=10; Σx=125; Σx2=1585; Σy=80; Σy2=650; Σxy=1007.
[C.A. - 1966]
Solution :
∑ xy ∑ x ∑ y 1007 125 80 10070 − 10000 70 7
Cov (x, y)= − = − × = = =
n n n 10 10 10 100 100 10
2 2
∑ x2 ∑ x 15850 − 15625 225 15
σx = − = 1585 − 125 = = =
n n 100 100 10
www.gayali.in
10 10
2 2
∑ y2 ∑ y 650 80 = 6500 − 6400 = 100 = 1
σy = − = −
n n 10 10 100 100
7
cov( x, y ) 10 7
∴r= = = = 0.47
6x 6y 15 10
×1
10
www.gayali.in
Statistics Made Easy | 202
Solution :
∑ XY ∑ X ∑ Y
Cov(X,Y) = −
N N N
230.42 42.2 46.4 1843.36 − 1958.08 114.72
= − × = =−
8 8 8 64 64
2
∑ X2 ∑ X
σx = −
N N
2
291.20 42.2 2329.60 − 1780.84 548.76 23.43
= − = = =
8 8 64 64 8
2 2
ΣY 2 ΣY 290.52 46.4 2324.16 − 2151.96 176.20 13.08
σy = − = − = = =
N N 8 8 64 64 8
114.72
www.gayali.in
−
Cov (X , Y) 64 114.72
r = = =− = −0.37
σx σy 23 . 43 13 . 08 306.46
×
8 8
[4] Obtain the correlation coefficient from the following:
x 6 2 10 4 8
y 9 11 5 8 7
[D.S.W. 1977]
Solution:
Table - Calculations for correlation coefficient
x y X=x–6 Y=y–8 X2 Y2 XY
6 9 0 1 0 1 0
2 11 -4 3 16 9 -12
10 5 4 -3 16 9 -12
4 8 -2 0 4 0 0
8 7 2 -1 4 1 -2
Total 30 40 0 0 40 20 -26
ΣXY ΣX ΣY −26 −26
Cov(X,Y) = − = − 0×0 = = −5.2
N N N 5 8
www.gayali.in
2 2
∑ X2 ∑ X 40 0
σX = − = − = 8
N N 5 5
2 2
ΣY 2 ΣY 20 0
σY = − = − =2
N N 5 5
Cov ( X , Y ) −5.2 5. 2
r = = =− = −0.92
σx σy 8 ×2 5.66
www.gayali.in
Statistics Made Easy | 203
[5] Calculate the coefficient of correlation for the ages of husband and wife:
Age of husband 23 27 28 29 30 31 33 35 36 39
Age of wife 18 22 23 24 25 26 28 29 30 32
[I,C,W,A, 1970]
Solution:
Table - Calculations for Correlation Coefficient
x y X = x – 31 Y = y – 26 X2 Y2 XY
23 18 -8 -8 64 64 64
27 22 -4 -4 16 16 16
28 23 -3 -3 9 9 9
29 24 -2 -2 4 4 4
30 25 -1 -1 1 1 1
31 26 0 0 0 0 0
33 28 2 2 4 4 4
35 29 4 3 16 9 12
36 30 5 4 25 16 20
39 32 8 6 64 36 48
Total - - 1 -3 203 159 178
ΣXY ΣX ΣY 178 1 −3 1780 + 3 1783
Cov(X,Y) = − = − × = =
www.gayali.in
N N N 10 10 10 100 100
2 2
ΣX 2 ΣX 203 1 2030 − 1 2029
σX2 = − = − = =
N N 10 10 100 100
2 2
ΣY 2 ΣY 159 −3 1590 − 9 1581
σY2 = − = − = =
N N 10 10 100 100
1783
Cov ( X , Y ) 100 1783 1783 1783
r= = = = = = 0.996
σx σy 2029 1581 2029 . 1581 4 × 39.76
45.04 1790.79
×
100 100
∴ Correlation coefficient for ages of husband and wife (r) = 0.996
[6] Calculate the correlation coefficient rxy from the following:
x 65 66 67 67 68 69 70 72
y 67 68 65 68 72 72 69 71
[D.M., 1978]
Solution :
Table : Calculations for correlation coefficient
x y X = x – 67 Y = y – 72 X2 Y2 XY
www.gayali.in
65 67 -2 -5 4 25 10
66 68 -1 -4 1 16 4
67 65 0 -7 0 49 0
67 68 0 -6 0 36 0
68 72 1 0 1 0 0
69 72 2 0 4 0 0
70 69 3 -3 9 9 -9
72 71 5 -1 25 1 -5
Total - - 8 -26 44 136 0
www.gayali.in
Statistics Made Easy | 204
ΣXY ΣX ΣY 0 8 −26 13
Cov(X,Y) = − = − × = = 3.25
N N N 8 8 8 4
2 2
ΣX 2 ΣX 44 8 352 − 64 288
σX2 = − = − = = = 4.50
N N 8 8 64 64
σX = 4.50 = 2.12
2 2
ΣY 2 ΣY 136 −26 1088 − 676 412
γY2 = − = − = = = 6.44
N N 8 8 64 64
σY = 6.44 = 2.54
Cov (x, y ) 3.25 3.25
rxy = = = = 0.60
σx σy 2.12 × 2.54 5.38
rXY = rxy = 0.60
[7] Calculate the coefficient of correlation between x and y :
x 155 157 153 151 159 162 158
y 118 129 125 124 129 133 127
[C.U.B.A.(Econ), 1975]
www.gayali.in
Solution :
Table : Calculation for Correlation Coefficient
x y u = x – 155 v = y – 124 u2 v2 uv
155 118 0 -6 0 36 0
157 129 2 5 4 25 10
153 125 -2 1 4 1 -2
151 124 -4 0 16 0 0
159 129 4 5 16 25 20
162 133 7 9 49 81 63
158 127 3 3 9 9 9
Total - - 10 17 98 177 100
Σuv Σu Σv 100 10 17 700 − 170 530
Cov(u,v) = − = 7 − 7 × 7 = =
n n n 49 49
2 2
Σu 2 Σu 98 10 686 − 100 586
σu2 = − = − = =
n n 7 7 49 49
586 586
σu = =
49 7
2 2
Σv 2 Σv 177 17 1239 − 289 950
σv2 = − = − = =
n n 7 7
www.gayali.in
49 49
950 950
σv = =
49 7
530
cov ( u, v ) 49 530 530 530
r = = = = = = 0.71
σu σ v 586 950 586 × 950 24 . 21 × 30 . 82 74
4 6.15
×
7 7
www.gayali.in
Statistics Made Easy | 205
[8] Calculate Pearson’s coefficient of correlation from the following data using 44
and 26 as the origins of X and Y respectively.
X 43 44 46 40 44 42 45 42 38 40 42 57
Y 29 31 19 18 19 27 27 29 41 30 26 10
[C.A. 1978]
Solution:
Table - Calculations for correlation Coefficient
X Y x = X – 44 y = Y – 26 x2 y2 uv
43 29 -1 3 1 9 -3
44 31 0 5 0 25 0
46 19 2 -7 4 49 -14
40 18 -4 -8 16 64 32
44 19 0 -7 0 49 0
42 27 -2 1 4 1 -2
45 27 1 1 1 1 1
42 29 -2 3 4 9 -6
38 41 -6 15 36 225 -90
40 30 -4 4 16 16 -16
www.gayali.in
42 26 -2 0 4 0 0
57 10 13 -16 169 256 -208
Total - - -5 -6 255 704 -306
www.gayali.in
Statistics Made Easy | 206
Solution:
Table - Calculations for Correlation Coefficient
x y u = x – 58 v = y – 3.2 u2 v2 uv
47 2.7 -11 -0.5 121 0.25 5.5
50 2.7 -8 -0.5 64 0.25 4.0
52 2.8 -6 -0.4 36 0.16 2.4
52 2.8 -6 -0.4 36 0.16 2.4
54 2.9 -4 -0.3 16 .09 1.2
56 3.2 -2 0 4 0 0
58 3.2 0 0 0 0 0
59 3.3 1 0.1 1 0.01 0.1
60 3.4 2 0.2 4 0.04 0.4
60 3.5 2 0.3 4 0.09 0.6
62 3.5 4 0.3 16 0.09 1.2
64 3.6 6 0.4 36 0.16 2.4
65 3.7 7 0.5 49 0.25 3.5
www.gayali.in
66 3.8 8 0.6 64 0.36 4.8
Total - - -7 0.3 451 1.91 28.5
Σuv Σu Σv 28.5 −7 0.3 399 + 2.1 401.1
Cov(u,v) = − = − × = =
n n n 14 14 14 196 196
2 2
Σ u 2 Σu 451 −7 6314 − 49 6265
σu2 = − = − = =
n n 14 14 196 196
6265 6265
σu = =
196 14
2 2
Σv 2 Σv 1.91 0.3 26.74 .09 26.74 − 0.09 26.65
σv2 = − = − = − = =
n n 14 14 14 196 196 196
26.65 26.65
σv = =
196 14
401.1
Cov (u, v ) 196 401.1 401.1
∴r = = = = = 0.98
σu σ v 6265 26.65 6265 × 26.65 799.15 × 5.16
×
14 14
www.gayali.in
www.gayali.in
Statistics Made Easy | 207
Solution:
Table - Calculations for Correlation Coefficient
x y u = x – 55 v = y – 58 u2 v2 uv
42 56 -13 -2 169 4 26
44 49 -11 -9 121 81 99
58 53 3 -5 9 25 -15
55 58 0 0 0 0 0
89 65 34 7 1156 49 238
98 76 43 18 1849 324 774
66 58 11 0 121 0 0
Total - - 67 9 3425 483 1122
Σuv Σu Σv 1122 67 9 7854 − 603 7251
Cov(u,v) = − = − = =
n n n 7 7 7 49 49
2 2
Σu 2 Σu 3425 67 23975 − 4489 19486
σu2 = − = − = =
n n 7 7 49 49
19486 19486
σu = =
www.gayali.in
49 7
2 2
Σv 2 Σ v 483 9 3381 − 81 3300
σv2 = − = − = =
n n 7 7 49 49
3300 3300
σv = =
49 7
7251
cov ( u, v ) 49 7251 7251
∴r = = = = = 0.90
σu σ v 19486 3300 139 . 59 × 57 . 45 8019 . 45
×
7 7
2
1− r 1 − 0.902 1 − 0.81 0.19
Standard Error = = = = = 0.07
n 7 2.65 2.65
[11] Determine the correlation coefficient between x and y
x 5 7 9 11 13 15
y 1.7 2.4 2.8 3.4 3.7 4.4
[Dip Management 1967]
Solution:
Table - Calculations for Correlation coefficient
www.gayali.in
x y u=x–9 v = y – 2.8 u2 v2 uv
5 1.7 -4 -1.1 16 1.21 4.4
7 2.4 -2 -0.4 4 0.16 0.8
9 2.8 0 0 0 0 0
11 3.4 2 0.6 4 0.36 1.2
13 3.7 4 0.9 16 0.81 3.6
15 4.4 6 1.6 36 2.56 9.6
Total - - 6 1.6 76 5.10 19.6
www.gayali.in
Statistics Made Easy | 208
www.gayali.in
income and the general level of prices:
Income (X) 360 420 500 550 600 640 680 720 750
General Level of Prices (Y) 100 104 115 160 180 290 300 320 330
[C.U., M.com 1968]
Solution:
Table - Calculations for correlation coefficient
X − 600
X Y u= v = Y – 180 u2 v2 uv
10
360 100 -24 -80 576 6400 1920
420 104 -18 -74 324 5476 1332
500 115 -10 -65 100 4225 650
550 160 -5 -20 25 400 100
600 180 0 0 0 0 0
640 290 4 110 16 12100 440
680 300 8 120 64 14400 960
720 320 12 140 144 19600 1680
www.gayali.in
www.gayali.in
Statistics Made Easy | 209
12942 12942
σu = =
81 9
2 2
Σv 2 Σv 85101 281 765909 − 78961 686948
σv2 = − = − = =
n n 9 9 81 81
686948 686948
σv = =
81 9
89046
Cov ( u, v ) 81 89046 89046 89046
r= = = = = = 0.94
6u 6 v 12942 686948 12942 × 686948 113.76 × 828.82 94286.56
×
9 9
[13] The following data give the hardens (x) and tensile strength (y) for some
specimens of a material in certain units. Find the correlation coefficient and calculate
its probable error:
www.gayali.in
y 4.2 3.8 4.6 3.2 5.2 4.7 4.4 5.6
[I.C.W.A., 1972]
Solution :
Table - calculations for Correlation coefficient
X −20.7 y −3.2
x y u= v= u2 v2 uv
0.1 0.1
23.3 4.2 26 10 676 100 260
17.5 3.8 -32 6 1024 36 -192
17,8 4.6 -29 14 841 196 -406
20.7 3.2 0 0 0 0 0
18.1 5.2 -26 20 676 400 -520
20.9 4.7 2 15 4 225 30
22.9 4.4 22 12 484 144 264
20.8 5.6 1 24 1 576 24
Total - - -36 101 3706 1677 -540
www.gayali.in
www.gayali.in
Statistics Made Easy | 210
2 2
Σv 2 Σv 1677 101 13416 − 10201 3215
σv2 = − = − = =
n n 8 8 64 64
3215 3215
σv = =
64 8
684
( )=
Cov u , v −
64 684 684 684
r= =− =− =− = −0.072
6u 6 v 28352 3215 28352 × 3215 168.38 × 56.70 9547.15
×
8 8
1 − r2 1 − (−0.072)2
Probable Error = 0.6745 × = 0.6745 ×
n 8
1 − 5184 / 1000000 0.994816
= 0.6745 × = × .6745 = 0.237
2.83 2.83
[14] The following table gives the saving bank deposits in billions of dollars and
strikes and lock - outs in thousand over a number of years. Compute the correlation
coefficient and comment on the result.
www.gayali.in
Saving deposits 5.1 5.4 5.5 5.9 6.5 6.0 7.2
Strikes and loek-outs 3.8 4.4 3.3 3.6 3.3 2.3 1.0
[I.C.W.A., 1964]
Solution:
Table - Calculations for correlation coefficient
x − 5. 9 y − 3. 6
x y u= v= u2 v2 uv
0. 1 0.1
5.1 3.8 –8 2 64 4 –16
5.4 4.4 –5 8 25 64 –40
5.5 3.3 –4 –3 16 9 12
5.9 3.6 0 0 0 0 0
6.5 3.3 6 –3 36 9 –18
6.0 2.3 1 –13 1 169 –13
7.2 1.0 13 –26 169 676 –338
Total – – 3 –35 311 931 –413
Σuv Σu Σv −413 3 −35 −2891 + 105 −2786
Cov(u,v) = − = − × = =
n n n 7 7 7 49 49
2 2
www.gayali.in
www.gayali.in
Statistics Made Easy | 211
5292 72.75
σv = =
49 7
−2786
Cov (u, v ) 49 −2786 −2786
r= = = = = 0.82
σu σ v 46.56 72.75 46.56 × 72.75 33887.24
×
7 7
Saving deposits and bank strikes and lookouts are fairly positively correlated.
[15] Two positively correlated variables x1 and x2 have variances σ12 and σ22 respectively.
σ1
Determine the value of the constant ‘a’ such that x1+ax2 and x1 + x 2 are uncorrelated.
σ2
[B.U., B.A (Econ), 1972]
Solution :
Here, u = x1 + ax 2 or , u = x1 + ax2
σ1 σ
v = x1 + x 2 or , v = x1 + 1 x 2
σ2 σ2
If u, v are uncorrelated then
www.gayali.in
Cov (u, v )
r= =0
σu σ v
or, Cov (u, v) = 0
1
or, Cov (u, v) = ∑(u − u)(v − v ) = 0
n
1 σ σ
or, Σ ( x1 + ax 2 − x1 − ax 2 ) x1 + 1 x 2 − x1 − 1 x 2 = 0
n σ2 σ2
1 σ1
or,
n
{ }( x
Σ ( x1 − x1 ) + a ( x 2 − x 2 ) 1 − x1 ) +
σ2
( x 2 − x2 ) = 0
1 σ1 σ1 2
∑ ( x1 − x1 ) + a ∑ ( x1 − x1 ) ( x 2 − x 2 ) + ∑ ( x1 − x1 ) ( x 2 − x 2 ) + a ∑ ( x 2 − x 2 ) = 0
2
or,
n σ2 σ2
σ1 σ
or, σ12 + a cov ( x1 , x 2 ) + cov ( x1 , x 2 ) + a 1 .σ22 = 0
σ2 σ2
or, σ12 + 0 + 0 + aσ1σ2 = 0
or, aσ1σ2 = −σ12
σ1
or, a = −
www.gayali.in
σ2
[16] Given Σx=56, Σy=40, Σx2=524, Σy2=256, Σxy=364, x=8.
Find (i) the correlation coefficient and (ii) the regression equation of x on y.
[I.C.W.A, 1967]
Solution:
Σxy Σx Σy 364 56 40 2912 − 2240 672
(i) Cov(x,y) = − = − × = =
n n n 8 8 8 64 64
www.gayali.in
Statistics Made Easy | 212
2 2
Σx 2 Σx 524 56 4192 − 3136 1056
σx2 = − = − = =
n n 8 8 64 64
1056 32.50
∴σx = =
64 8
2 2
Σy 2 Σy 256 40 2048 − 1600 448
σy2 = − = − = =
n n 8 8 64 64
448 21.17
∴σy = =
64 8
672
Cov ( x, y ) 64 672
r = = = = 0.98
σx σy 32.50 21.17 688.03
×
8 8
Cov ( x, y )
(ii) The regression equation of x on y is x − x = bxy ( y − y ) where bxy =
σy 2
Σx 56 Σy 40
Here, x = = = 7, y = = =5
n 8 n 8
www.gayali.in
672
Cov(x,y) = as above (i)
64
448
and σy2= as above (i)
64
672
672
∴bxy = 64
= = 1. 5
448 448
64
∴ The regression equation of x on y is then x – 7 = 1.5(y–5)
or, x = 1.5y – 7.5 + 7 = 1.5y – 0.5 ∴x = 1.5y – 0.5
[17] The following sums have been obtained from 100 observations - pairs:
∑x = 12,500, ∑y = 8,000, ∑x2 = 1,585,000, ∑y2 = 648,100, ∑xy = 1,007,425
(i) Find the regression of y on x, and estimate the value of y when x=130
(ii) Compute the correlation coefficient (r) between x and y and state what you
learn from the value of r obtained by you,
[C.U., B.A.(Econ), 1976]
Solution:
Cov (x, y )
[i] The regression of y on x is y − y = byx(x − x ) where byx = σx2
www.gayali.in
Σxy Σx Σy 1, 007, 425 12, 500 8, 000 10, 07, 425 − 10,000, 000 7425
Cov(x,y) = = × = − × = = = 74.25
n n n 100 100 100 100 100
2
Σx 2 Σx
σx2 = −
n n
2
1, 585, 000 12, 500
= − = 15850 − 15625 = 225
100 100
www.gayali.in
Statistics Made Easy | 213
74.25
byx = = 0.33
225
Σx 12, 500
x = = = 125
n 100
Σy 8, 000
y = = = 80
n 100
∴The regression equation of y on x is y – 80 = 0.33(x – 125)
Or, y = 0.33x – 41.25 + 80 = 0.33x + 38.75
∴y = 0.33x + 38.75
when x = 130, y = 0.33 × 130 + 38.75= 42.9 + 38.75 = 81.65
2
Σy 2 Σ y
(ii) σy 2 = −
n n
2
648,100 8, 000
= − = 6481 − 6400 = 81
100 100
∴σy= 81 = 9 , σx2=225 from (i)
www.gayali.in
σy
We know, byx = r × , σx=15
σx
9
∴0.33 = r ×
15
0.33 × 15
or, r = = 0.55
9
The correlation between x and y is moderately positively related.
[18] Given the following totals for 10 pairs of obervations on two caracters x and y
obtain the two regression equations and hence calculate the correlation coefficient: ∑x=12,
∑y=4, ∑x2=16.20, ∑y2=1.96, ∑xy=5.2
[M.B.A. 1979]
Solution:
Cov (x, y )
The regression equation of y on x is y − y = byx(x − x ), where byx =
σx2
Σy 4
y= = = 0. 4
n 10
Σx 12
x= = = 1.2
n 10
www.gayali.in
Σxy Σx Σy
Cov(x,y) = − ×
n n n
5.2 12 4 52 − 48 4
= − × = =
10 10 10 100 100
2 2
Σx 2 Σx 16.20 12 162 − 144 18
σx2= − = − = =
n n 10 10 100 100
www.gayali.in
Statistics Made Easy | 214
4
4
∴byx = 100= = 0.222
18 18
100
∴y–0.4=0.222(x–1.2)
or, y = 0.222x–0.267+0.4=0.222x+0.1333
Cov ( x, y )
The regression equation of x on y is x − x = bxy ( y − y ) , where bxy=
σy 2
2 2 2
Σy Σy 1.96 4 19.6 − 16 3.6
σy2 =− = − = =
n n 10 10 100 100
4
100 4
∴bxy = = = 1.11
3. 6 3. 6
100
The regression equation of x on y is then x–1.2 = 1.11 (y–0.4)
Or, x= 1.11y – 0.444 + 1.2 = 1.11y + 0.756
www.gayali.in
σy
we know, byx= r
σx
3. 6
4 100 3. 6
=r× =r×
18 18 18
100
2 3. 6
or, = r∴
9 18
3. 6 4
or, r 2 × =
18 81
72
or,
= r2 = 0.247
291.6
∴r = 0.50
[19] Estimate from the information given below, the probable crop yield, when
rainfall is 29 inches:-
Mean S.D
Rain fall in inches 25 3
www.gayali.in
www.gayali.in
Statistics Made Easy | 215
σy
∴byx = r
σx
6
byx= 0.65 × = 1.30
3
The regression equation of y on x is
y − y = byx ( x − x )
y – 40 = 1.30 (x – 25)
y = 1.30x – 32.50 + 40
= 1.30x + 7.50
When x=29'', y=1.30×29+7.50=37.7+7.50=45.2 unit per acre.
[20] The correlation coefficient between two variates x and y is r = 0.60. If σx = 1.50,
σy = 2.00,= =
x 10, y 20 find the equations of the regression lines of (i) y on x (ii) x on y.
[I.C.W.A. 1977]
Solution:
σy 2 1.20
www.gayali.in
byx= r. = 0.60 × = = 0. 8
σx 1.50 1.50
σy 1. 5 0. 9
bxy= r. = 0.60 × = = 0.45
σx 2 2
Therefore, The regression equation of y on x is
y − y = byx ( x − x )
y–20=0.8(x–10)=0.8x–8
∴y=0.8x–8+20=0.8x+12
The regression equation of x on y is
x − x = bxy ( y − y ) = 0.45 ( y − 20 )
x–10=0.45y–9 or, x=0.45y+1
[21] The following data pertain to the marks in two subject, say A and B.
Mean marks in A = 39.5, Mean marks in B = 47.5
S.D. of marks in A = 10.8, S.D. of marks in B = 16.8
Coefficient of correlation between marks in A and B = 0.42. Obtain the equations
of two regression lines and then estimate the marks in B for candidates who secured 50
marks in A.
www.gayali.in
[I.C.W.A. 1978]
Solution:
Let the marks in A is denoted by x
Let the marks in B is denoted by y
∴ x 39
= = .5 , y 47.5
σx= 10.8, σy = 16.8
r = 0.42.
www.gayali.in
Statistics Made Easy | 216
σy 16.8 7.056
byx= r. = 0.42 × = = 0.65
σx 10.8 10.8
σ 10.8 4.536
bxy= r. x = 0.42 × = = 0.27
σy 16.8 16.8
Therefore, The regression equation of y on x is y − y = byx ( x − x )
y – 47.5 = 0.65 (x – 39.5)
y = 0.65x – 25.68 + 47.5 = 0.65x + 21.82
The regression equation of x on y is x − x = bxy ( y − y )
x – 39.5 = 0.27 (y – 47.5)
x = 0.27y – 12.82 + 39.5 = 0.27y + 26.68
Here, marks of y to be estimated
∴ y = 0.65 × 50 + 21.82 = 32.5 + 21.82 = 54.32 = 54 (approx)
∴ Marks in B = 54
[22] Given the following results of the height and weight of 1000 men students:
x = 68 inches, y = 150 lbs, r = 0.60, σ x = 2.50 inches, σy = 20.00 lbs. John Doe weighs
www.gayali.in
200 lbs, Richard Roe is five feet tall. Estimate the height of Doe from his weight and
weight of Roe from his height.
[C.U.M.Com, 1976]
Solution:
Let height of students = x inches
weight of students = y lbs
Therefore, x =68 inches y =150 lbs.
σx=2.5 inches σy=20.00 lbs.
The regression equation of y on x is
y − y = byx ( x − x ) - - - - (i)
σy 20 12
byx= r. = 0.60 × = = 4.8
σx 2. 5 2. 5
Putting the value of byx in (i) we get
y – 150 = 4.8(x – 68)
y = 4.8x – 326.40 + 150 = 4.8x – 176.40
The regression equation of x on y is
x − x = bxy ( y − y ) - - - - (ii)
www.gayali.in
σx 2. 5 1. 5
bxy= r. = 0.60 × = = 0.075
σy 20 20
Putting the value bxy in equation (ii) we get,
x – 68 = 0.075(y–150)
x = 0.075y – 11.25 + 68 = 0.075y + 56.75
when weight of John Doe is 200 lbs. i. e. y = 200 lbs.
www.gayali.in
Statistics Made Easy | 217
www.gayali.in
8 5 64 25 40
9 7 81 49 63
11 8 121 64 88
14 9 196 81 126
Total 56 40 524 256 364
ΣX 56 ΣY 40
X= = =7 Y = = =5
n 8 n 8
2 2
ΣX 2 ΣX 524 56 4192 − 3136 1056
σX 2 = − = − = =
n n 8 8 64 64
2 2
ΣY 2 ΣY 256 40 2048 − 1600 448
σY 2 = − = 8 − 8 = =
n n 64 64
ΣXY ΣX ΣY 364 56 40 2912 − 2240 672
Cov(X,Y) = − × = − × = =
n n n 8 8 8 64 64
672
Cov ( X , Y ) 64 672 672 672
r= = = = = = 0.98
σx σy 1056 448 1056 × 448 32.50 × 21.17 688.03
×
64 64
672
Cov ( X , Y )
www.gayali.in
672
byx= = 64 = = 0.64
σX 2 1056 1056
64
The regression equation of Y on X is y − y = byx(X − X)
or, Y – 5 = 0.64(X – 7)
or, Y = 0.64X – 4.48 + 5 = 0.64X + 0.52
when X = 12. Y = 0.64 × 12 + 0.52 = 7.68 + 0.52 = 8.20
www.gayali.in
Statistics Made Easy | 218
www.gayali.in
2 2
Σu 2 Σu 60 0 60
σ x 2 = σu 2 = − = − =
n n 9 9 9
2 2
Σv 2 Σv 60 0 60
σy 2 = σv 2 = − = − =
n n 9 9 9
Σuv Σu Σv 57 0 0 57
Cov(X,Y)= − = − × =
n n n 9 9 9 9
Cov (u, v ) 57 60 57
byx=bvu= = = = 0.95
6u 2 9 9 60
45 108
=
X = 5= Y = 12
9 9
57
Cov ( u, v ) 9 57
bxy=buv= = = = 0.95
σv 2 60 60
9
The regression equation of Y on X is Y − Y = byx ( X − X )
Y – 12 = 0.95(X – 5)
www.gayali.in
www.gayali.in
Statistics Made Easy | 219
[25] Find the two lines of regression from the following data:
Age of husband(x) 25 22 28 26 35 20 22 40 20 18
Age of wife (y) 18 15 20 17 22 14 16 21 15 14
Hence, estimate (i) the age of husband when the age of wife is 19, (ii) the age of
wife when the age of husband is 30.
[C.U.M.Com 1970]
Solution:
Table: Calculations for Regression
x y u = x – 25 v = y – 20 u2 v2 uv
25 18 0 -2 0 4 0
22 15 -3 -5 9 25 15
28 20 3 0 9 0 0
26 17 1 -3 1 9 -3
35 22 10 2 100 4 20
20 14 -5 -6 25 36 30
22 16 -3 -4 9 16 12
40 21 15 1 225 1 15
20 15 -5 -5 25 25 25
www.gayali.in
18 14 -7 -6 49 36 42
Total 6 -28 452 156 156
6
x = 25 + = 25.6
10
28
y = 20 − = 17.2
10
2 2
2 2 Σu 2 Σu 452 6 4520 − 36 4484
σ x = σu = − = − = =
n n 10 10 100 100
2 2
2 2 Σv 2 Σv 156 −28 1560 − 784 776
σy = σv = − = − = =
n n 10 10 100 100
Σuv Σu Σv 156 6 −28 1560 + 168 1728
Cov(x,y)=Cov(u,v)= − = − × = =
n n n 10 10 10 100 1000
1728
Cov ( u, v ) 100 1728
byx=bvu= = = = 0.39
σu 2 4484 4484
100
1728
Cov ( u, v ) 100 1728
bxy=buv= = = = 2.23
www.gayali.in
σv 2 776 776
100
(i) The regression equation of x on y is x − x = bxy ( y − y )
x – 25.6 = 2.23(y – 17.2)
or, x = 2.23y – 38.36 + 25.6
or, x = 2.23y – 12.76
when y=19, x=2.23×19–12.76=42.37–12.76
∴x = 29.61 = 30 (approx.)
www.gayali.in
Statistics Made Easy | 220
www.gayali.in
121 97 31 27 961 729 837
67 70 -23 0 529 0 0
124 91 34 21 1156 441 714
51 39 -39 -31 1521 961 1209
73 61 -17 -9 289 81 153
111 80 21 10 441 100 210
57 47 -33 -23 1089 529 759
Total 900 700 0 0 6360 2868 3900
900 700
=
X = 90 =
Y = 70
10 10
2 2
Σu 2 Σu 6360 0 6360
σ x 2 = σu 2 = − = − = = 636
n n 10 10 10
2 2
Σv 2 Σv 2868 0 2868
σy 2 = σv 2 = − = − = = 286.8
n n 10 10 10
Σuv Σu Σv 3900 0 0 3900
Cov (X,Y)=Cov(u,v) = − . = − × = = 390
n n n 10 10 10 10
Cov ( u, v ) 390
byx=bvu= = = 0.613
σu 2 636
www.gayali.in
Cov ( u, v ) 390
bxy=buv= 2
= = 1.36
σv 286.8
The regression equation of x on y is x − x = bxy ( y − y )
X – 90 = 1.36(y – 70)
or, X = 1.36Y – 95.20 + 90
X = 1.36y – 5.20
www.gayali.in
Statistics Made Easy | 221
www.gayali.in
12 5.27 -3 -1.73 9 2.99 5.19
18 5.68 -2 -1.32 4 1.74 2.64
24 6.25 -1 -0.75 1 3.06 0.75
30 7.21 0 0.21 0 0.04 0
36 8.02 1 1.02 1 1.04 1.02
42 8.71 2 1.71 4 2.92 3.42
48 8.42 3 1.42 9 2.02 4.26
Total 210 49.56 0 0.56 28 13.81 17.28
210 49.56
=x = 30 = , y = 7.08
7 7
28 0 2
σ x 2 = d 2 σu 2 = 62 − = 36 × 4 = 144
7 7
17.28
Cov(x,y)=d Cov(u,v)= 6 × = 14.81
7
Cov ( u, v ) 14.81
byx=buv= = = 0.103
σu 2 144
Therefore, the regression equation of y on x is y – 7.08 = byx (x – 30)
Or, y = 0.103x – 3.09 + 7.08
www.gayali.in
y = 0.103x + 3.99
when x = 40, y = 0.103 × 40 + 3.99 = 4.12 + 3.99 = 8.11 tous.
[28] If the regression equation of y on x be y = 0.57 + 6.93 and the regression equation
of x on y be x = 1.12y – 2.46 find the correlation coefficient between x and y.
[B.U., B.A.(Econ) 1972]
www.gayali.in
Statistics Made Easy | 222
Solution:
From the equation, Y = 0.57x + 6.93
∴byx = 0.57
From the equation, X = 1.12y – 2.46
∴bxy = 1.12
we know r2 = byx.bxy = 0.57 × 1.12
r = 0.57 × 1.12 = 0.64
∴ r = +0.80
[29] For some bivariate data the following results were obtained. The mean value of
X = 53.2, the mean value Y = 27.9, the regression coefficient of y on X = –1.5, and the
regression coefficient of X on Y = –0.2. Find the (i) most probable value of Y when
X = 60,(ii) r the coefficient of correlation between X and Y.
[C.U., M.Com., 1974]
Solution:
The regression equation of Y on X is Y − Y = b yx ( X − X )
www.gayali.in
or, Y – 27.9 = –1.5 (X – 53.2)
or, Y = 27.9 + 79.80 – 1.5X
Y = 107.7–1.5X
when X = 60, Y = 107.7 – 1.5 × 60 = 107.7 – 90 = 17.7
we know the relation
r2 = bYX.bXY = –1.5x – 0.2 = 0.30
∴ r = ± 0.30 = ± 0.55
But since the regression coefficients are negative, the correlation coefficient also
must be negative i. e. r = –0.55.
[30] The regression equations calculated from a given set of observations are x = –0.2y
+ 4.2, y = –0.8x + 8.4. Calculate (i) x and y (ii) r, (iii) the estimated value of y when x = 4.
[I.C.W.A., 1986]
Solution:
Here, (ii) byx = –0.8
bxy = –0.2
∴r2 = byx×bxy
= –0.8×–0.2 = 0.16
www.gayali.in
r = ± 0.16 = ± 0.4
But since the regression coefficients are negative the correlation coefficient also
must be negative i. e. r = –0.4.
(i) The regression equations are
y = –0.8x + 8.4 - - - (i)
x = –0.2y + 4.2 - - - (ii)
www.gayali.in
Statistics Made Easy | 223
www.gayali.in
X = 12.5 + 0.6Y - - - (ii)
Solving equation (i) and (ii)
Multiplied by 10 in both the equations
10Y = 56 + 12X - - - (iii)
10X = 12.5 + 6Y - - - (iv)
Multiplying (iii) by 6 and (iv) by 10 we get
60Y − 72 X = 336
–60Y + 100X = 1250
(Adding) 28X = 1586
1586
X= = 56.64
28
Putting the value of X in equation (i) we get
Y = 5.6 + 1.2 × 56.64 = 5.6 + 67.97 = 73.57
Since two regression lines intersect at x , y therefore
= =
x 56 .64, y 73.57
Here, From equation (i)
byx = 1.2
From equation (ii) bxy = 0.6
r2 = byx.bxy
= 1.2 × 0.6 = 0.72
www.gayali.in
∴ r = 0.72 = + 0.85
Ans. X = 56.64, Y = 73.57, r = +0.85
[32] Two variates have the least squares regression lines x + 4y + 3 = 0 and 4x + 9y + 5 = 0.
Find their mean values and the correlation coefficient.
[W.B.H.S., 1978]
www.gayali.in
Statistics Made Easy | 224
Solution:
The regression lines are
x + 4y + 3 = 0 - - - (i)
4x + 9y + 5 = 0 - - - (ii)
Solving equation (i) and (ii),
Multiplied (i) by 4 and (ii) by 1 we get
4x+16y+12=0
4x + 9y + 5=0
(Substracting) 7y + 7 = 0
or, 7y = –7
y = –1
Putting the value of y = –1 in equation (i) we get x + 4 × –1 + 3 = 0 or, x = 1
Since, two regression lines intersect at. point x , y , therefore x = 1, y = −1 .
From equation (i)
4y = –x – 3
1 3
y= − x−
4 4
www.gayali.in
1
∴byx = −
4
From equation (ii)
4x = –9y – 5
9 5
x= − y−
4 4
9
∴bxy = −
4
1 9 9
r2=byx×bxy= − × − =
4 4 16
9 3
∴r= ± = ± = ±0.75
16 4
As, byx and bxy are negative, therefore r = –0.75
[33] Two lines of regression are given by x+2y=5 and 2x+3y=8 and σx2 = 12. Calculate
the values of x , y ,σ and r.
y [I.C.W.A. 1976]
Solution:
www.gayali.in
www.gayali.in
Statistics Made Easy | 225
www.gayali.in
we know, r2=byx.bxy = − x − =
2 2 4
3 3
∴r = ± =±
4 2
3
As byx and bxy are -ve sign, therefore r = −
2
σy
we know, byx = r.
σx
1 3 σY 1 3 σY
− =− × =− =− ×
2 2 12 2 2 2 3
σy=2
3
Ans. x = 1, y = 2, σ y = 2 and r = −
2
[34] In order to find the correlation coefficient between two variants x and y from 12
pairs of observations, the following calculations were made.
∑x = 30, ∑y = 5, ∑x2 = 670, ∑y2 = 285, ∑xy = 334
On subsequent verification it was found that the pair (x = 11, y = 4) was copied
wrongly, the correct value being (x = 10, y = 14). Find the correct value of correlation
coefficient.
[I.C.W.A. 1975]
www.gayali.in
Solution:
Here given, ∑x = 30, ∑y = 5
Correct ∑x = 30 –11 + 10 = 29
Correct ∑y = 5 –4 + 14 = +15
∑x2 = 670, ∑y2 = 285, ∑xy = 334
Correct ∑x2 = 670 – 112 + 102 = 670 – 121 + 100 = 649
Correct ∑y2=285–42+142=285–16+196=465
Correct ∑xy = 334 – 44 + 140 = 290 + 140 = 430
www.gayali.in
Statistics Made Easy | 226
6947 6947
∴σx = =
144 12
2 2
Σy 2 Σ y 465 15 5580 − 225 5355
σy2 = − = − = =
n n 12 12 144 144
5355 5355
∴σy = =
144 12
Cov ( x, y ) 4725 6947 5355 4725
∴r = = ÷ × = = 0.77
σx σy 144 12 12 83.35 × 73.18
Ans. r = + 0.77
[35] Obtain the linear regression equation that you consider more relevant for the
www.gayali.in
following set of paired observations and give reasons why you consider it to be so:
Age 56 42 72 36 63 47 55 49 38 42 68 60
Blood Pressure 147 125 160 118 149 128 150 145 115 140 152 155
Also estimate the blood pressure of a person whose age is 45.
[C.U.M.Com. 1973]
Solution:
Let Age be x and blood pressure be y.
Table - Calculations for Regression
x y u = x – 50 v = y – 140 u2 v2 uv
56 147 6 7 36 49 42
42 125 -8 -15 64 225 120
72 160 22 20 484 400 440
36 118 -14 -22 196 484 308
63 149 13 9 169 81 117
47 128 -3 -12 9 144 36
55 150 5 10 25 100 50
49 145 -1 5 1 25 -5
www.gayali.in
www.gayali.in
Statistics Made Easy | 227
www.gayali.in
value of x, when y = yo.
[C.U., B.A.(Econ.)]
Solution:
Let us take equation
4x – 5y + 33 = 0 as regression equation of y on x
or, 5y = 4x + 33
4 33 4
or, y = x + , byx=
5 5 5
Let us take equation 20x – 9y = 107 as regression equation of x on y
or, 20x = 9y + 107
9 107 9
or, x= y+ , bxy=
20 20 20
4 9 9
∴r2 = byx × bxy = × =
2 20 25
9 3
or, r= = = 0.60 < 1
25 5
www.gayali.in
www.gayali.in
Statistics Made Easy | 228
20 107 20
or, y = x− , byx=
9 9 9
20 5 25
r2 = × = > 1
9 4 9
As r can not be more than 1 so that 1st equation is the regression equation of y
on x and the second equation is regression equation of x on y.
4 33 4 33
when x=10, y= x + = × 10 + = 14.6
5 5 5 5
∴y0 = 14.6
9 107
x= y+
20 20
9 107
when y = y0 = 14.6, x = × 14.6 + = 6.57 + 5.35 = 11.92
20 20
[37] State the meaning of the terns explained variation and unexplained variation,
used in theory of regression. If the coefficient of correlation between two variables X
and Y be 0.83, what percentage of total variation remains unexplained by the regression
www.gayali.in
equation?
[I.C.W.A. 1975]
Solution:
If y i/ represents the estimated value of y from the regression equation of
y on x when x = x i, i.e. y i / − y = byx ( x i − x ) then it can be shown that
www.gayali.in
Statistics Made Easy | 229
www.gayali.in
H 7 6 1 1
Total - - 0 28
Here, n = 8, ∑d2 = 28,
6Σd 2 6 × 28 168 504 − 168 336 2
R =1− =1− =1− = = = = 0.67
(n 3
−n ) 3
8 −8 504 504 504 3
2
Ans. R=
3
[40] Compute the correlation coefficient of the following ranks of a group of students
in two examinations. what conclusion do you draw from the result?
Roll Nos. 1 2 3 4 5 6 7 8 9 10
Ranks in B.com Exam 1 5 8 6 7 4 2 3 9 10
Ranks in M.com Exam 2 1 5 7 6 3 4 8 10 9
[C.U., M.com 1975]
Solution:
Table - Calculations for Rank correlation coefficient
Roll Nos. Ranks in B.com Exam (x) Ranks in M.com Exam. (y) d = x – y d2
1 1 2 -1 1
2 5 1 4 16
www.gayali.in
3 8 5 3 9
4 6 7 -1 1
5 7 6 1 1
6 4 3 1 1
7 2 4 -2 4
8 3 8 -5 25
9 9 10 -1 1
10 10 9 1 1
Total - - - 0 60
www.gayali.in
Statistics Made Easy | 230
www.gayali.in
Judge A Judge B Judge C d12 = R1 − R 2 d13 = R1 − R 3 d23 = R 2 − R 3 d12 d13 d23
2 2 2
www.gayali.in
Statistics Made Easy | 231
Solution:
Table - Calculations for Rank Correlation Coefficient
Mathematics Physics
Roll No. d=x–y d2
Marks(X) Ranks(x) Marks(Y) Rank(y)
1 78 4 84 3 1 1
2 36 9 51 9 0 0
3 98 1 91 1 0 0
4 25 10 60 6 4 16
5 75 5 68 4 1 1
6 82 3 62 5 -2 4
7 90 2 86 2 0 0
8 62 7 58 7 0 0
9 65 6 53 8 -2 4
10 39 8 47 10 -2 4
Total - - - - - 0 30
6Σd 2 6 × 30 180 990 − 180 810
R =1− =1− =1− = = = 0.82
n3 − n 103 − 10 990 990 990
[43] Compute the rank correlation coefficient from the following data:
www.gayali.in
Series A 115 109 112 87 98 98 120 100 98 118
Series B 75 73 85 70 76 65 82 73 68 80
Solution:
Table - calculations for Rank Correlation coefficient
Series A Series B Rank of A (x) Rank of B (y) d = x – y d2
115 75 3 5 -2 4
109 73 5 6.5 -1.5 2.25
112 85 4 1 3 9
87 70 10 8 2 4
98 76 8 4 4 16
98 65 8 10 -2 4
120 82 1 2 -1 1
100 73 6 6.5 -0.5 0.25
98 68 8 9 -1 1
118 80 2 3 -1 1
Total - - - - 0 42.50
There are two ties, one of them containing 3 entries and the other 2 entries.
∴
(
Σ t3 − t ) = (3
3
−3 ) + (2 3
−2 ) = 24 + 6 = 2.5
12 12 12 12 12
www.gayali.in
Σ t − t ( 3
)
/
2
R = 1 − 6 Σd +
12
n3 − n ( )
6 ( 42.5 + 2.5 ) 6 × 45 270 990 − 270 720
=1− =1− =1− = = = 0.73
103 − 10 990 990 990 990
[44] Twelve sales man are ranked in order of merit of efficiency by their manager.
They are also ranked in accordance with their length of service. what indication is
www.gayali.in
Statistics Made Easy | 232
www.gayali.in
4 9 -5 25
6 8 -2 4
9 5 4 16
1 2 -1 1
11.5 10 1.5 2.25
5 3 2 4
7.5 7 0.5 0.25
3 4 -1 1
10 11 -1 1
Total - - 0 58.00
There are 2 ties with 2 entries each.
(
Σ t3 − t ) = 2(2 3
−2 ) = 2×6 =1
12 12 12
Σd 2 + Σ t 3 − t
6
( )
12 6 ( 58 + 1)
/ 6 × 59 227
R =1− =1− 3 =1− = = 0.79
n3 − n 12 − 12 143 × 12 286
[45] Given the following coefficients: r12 = 0.41, r13 = 0.71, r23 = 0.5. Find r12.3 , r13.2 and
r1.23, where the symbols have their usual signifinance.
[C.U., M.Com., 1974]
www.gayali.in
Solution:
r12 − r13r23 0.41 − 0.71 × 0.5 0.41 − 0.355 0.055
r12.3 = = = = = 0.09
(1 − r )(1 − r )
13
2
23
2
(1 − 0.41 )(1 − 0.5 )
2 2 0.8319 × 0.75 0.6239
r13 − r12r32 0.71 − 0.41 × 0.5 0.71 − 0.205 0.505 0.50 0. 505
r13.2 = = = = = = = 0.64
(1 − r )(1 − r ) (1 − 0.41 )(1 − 0.5 )
12
2
32
2 2 2
(1 − 0.17 )(1 − .25) 0.83 × .75 0.83 × 0.75 0.79
www.gayali.in
Statistics Made Easy | 233
( 0. 7 ) + ( 0. 6 )
2 2
r122 + r132 − 2r12 r23r13 − 2 × 0 . 7 × 0. 4 × 0. 6 0.49 + 0.36 − 0.336 0.85 − 0.336 0.514
r1.23 = = = = = = 0.78
1 − ( 0.44 )
2 2
1 − r23 1 − 0.16 .84 .84
[46] In a three - variate multiple correlation analysis, the following results were found:
x1 = 60 x 2 = 70 x 3 = 100
s1=3 r2=4 s3=5
r12=0.7 r13=0.6 r23=0.4
the symbols having their usual significance. Find regression of x1 on x2 and x3,
and the multiple correlation coefficient R1.23.
[B.U., M.A.(Econ.), 1968]
Solution:
x1 − x1 = b12.3 ( x 2 − x 2 ) + b13.2 ( x 3 − x 3 )
σ1 r12 − r13r23
3 0.7 − 0.6 × 0.4 3 0.7 − 0.24 3 0.46
b12.3 = = 4 = = × = 0.41
www.gayali.in
σ2 1 − r232
2
1 − 0.4 4 1 − 0.16 4 0.84
σ r − r r 3 0.6 − 0.7 × 0.4 3 0.6 − 0.28 3 .32
Again, b13.2 = 1 13 12 23 = = 5 1 0 16 = 5 × 84 = 0.23
σ2 1 − r232 5 1 − 0.4
2
− . .
( 0. 7 ) + ( 0. 6 )
2 2
− 2 × 0 . 7 × 0. 4 × 0. 6 0.85 − 0.336 0.514
R1.23 = = = = 0.78
1 − ( 0. 4 )
2
.84 84
INTERPOLATION
Interpolation has been defined as the ‘art of reading between the lines of a table, and
www.gayali.in
the term usually denotes the process of the finding the intermediate value of a function
from a set of given values of that function.
Finite Differences:∆ and E operators
In problems of interpolation, the independent variable x is often known as ‘argument’, and
the dependent variable or the function y = f (x) is known as ‘entry’. Let x0, x1, x2,- - -, xn
denote a set of equidistant values of the argument, i. e. x1 - x0 = x2 - x1 = - - - = xn - xn-1 = h
www.gayali.in
Statistics Made Easy | 234
where h is a constant, and y0, y1, y2 ----, yn denote the corresponding values of the entry.
Differences of the successive values of y, viz. ( y 1 − y 0 ) , ( y 2 − y 1 ) , ( y 3 − y 2 ) , ----, ( y n − y n −1 )
are called finite differences of the first order and are denoted by ∆y 0, ∆y 1 , ∆y 2 , ----, ∆y n −1
respectively.
The differences of the successive first order differences ∆y, namely
( ∆y1 − ∆y 0 ) , ( ∆y 2 − ∆y1 ) , ----, ( ∆y n −1 − ∆y n −2 ) are known as finite differences of the
2 2 2
second order and are denoted by ∆ y 0 , ∆ y 1 , ----, ∆ y n −2 respectively. Similarly the
third differences ∆3 y, the fourth differences ∆4 y and differences of higher order may
be defined.
Argument (x) Entry (y) First differences (∆y) Second differences (∆2 y)
x0 y0
y1 – y0 = ∆y0
x1 y1 ∆y1 – ∆y0 = ∆2y0
y2 – y1 = ∆y1
x2 y2 ∆y2 – ∆y1 = ∆2y1
y3 – y2 = ∆y2
x3 y3 ∆y3 – ∆y2 = ∆2y2
www.gayali.in
y4 – y3 = ∆y3
x4 y4
A table which shows the finite differences is known as Difference Table.
Table - Difference Table
First Second Third Fourth
Argument x Entry y differences differences differences differences
∆y ∆2 y ∆3 y ∆4 y
x0 = 1 y0 = 1
14
x1 = 3 y1 = 15 36
50 24
x2 = 5 y2 = 65 60 0
110 24
x3 = 7 y3 = 175 84 0
194 24
x4 = 9 y4 = 369 108
302
x5 = 11 y5 = 671
The initial term y0 of the entry is called the leading term and the initial terms in the
difference columns, viz.∆y0,∆2y0, ∆3y0 etc. are called leading differences.
www.gayali.in
Both the operators ∆ and E can be applied repeatedly, the repeated operations being
indicated by ∆2, ∆3, - - - - and E2, E3, etc. Thus
∆2y0=∆(∆y0)= ∆y1– ∆y0=(y2–y1)–(y1–y0)=y2–2y1+y0
∆3y0=∆(∆2y0)= ∆2y1– ∆2y0=(y3–2y2+y1)–(y2–2y1+y0)=y3–3y2+3y1–y0
www.gayali.in
Statistics Made Easy | 235
Similarly, E2y0=E(Ey0)=E(y1)=y2
E3y0=E(E2y0)=E(y2)=y3;
From the definitions, we may in general write
∆yr = yr+1–yr
Eyr = yr+1
These operators may thus be interpreted in the following manner:
(a) ∆ when prefixed to yr implies that yr is to be subtracted from the next value of the
entry yr+1
(b) E when prefixed to yr denotes the next value of the entry yr+1
From (a) and (b) we find that
E yr = yr + ∆yr
or, E yr = (1 + ∆) yr (Suppose)
Omitting yr from both sides, we find that the operators E and ∆ are connected by the
symbolic relation E≡1+∆
www.gayali.in
This does not mean that when added to ∆ gives E, but that the operation by E is
equivalent to the operation by (1+∆). It may be shown that the above relation follows
certain algebraic rules.
As shown earlier, we have
∆y0=y1–y0
∆2y0=y2–2y1+y0
∆3y0=y3–3y2+3y1–y0
∆4y0=y4–4y3+6y2–4y1+y0
Alternatively, we may write
∆y0=Ey0–y0
∆2y0=E2y0–2Ey0+y0
∆3y0=E3y0–3E2y0+3Ey0–y0
∆4y0=E4y0–4E3y0+6E2y0–4Ey0+y0
,
With the operators only (removing y0 s from both sides)
∆=E–1
∆2=E2–2E+1=(E–1)2
∆3=E3–3E2+3E–1=(E–1)3
www.gayali.in
∆4=E4–4E3+6E2–4E+1=(E–1)4
This, we have developed a convenient method of expressing the finite difference of any
order in terms of the entries.
Newton’s Forward Interpolation Formula
Let y0, y1, y2-----, yn be some tabulated values of a function y=f(x) corresponding to the
equidistant values x=x0, x1, x2, ----, xn.
x1–x0=x2–x1=x3–x2=----=xn–xn–1=h (say).
www.gayali.in
Statistics Made Easy | 236
www.gayali.in
the argument are equidistant, Newton’s forward and backward formulae are generally
applied in all cases of interpolation. However, for interpolation near the middle of the
set of tabulated values, Central Difference formulae, which utilize differences near the
central part of the difference table, are found more useful; because the successive terms
coverage more rapidly than in Newton’s forward and backward formulae.
Let the function y = f(x) be tabulated for some equidistant values of the argument.
(x0 – nh), ---- (x0 – 2h), (x0 – h), x0, (x0 + n), (x0 + 2h) ---- (x0 + nh). The common
difference being h; and the corresponding entries be denoted by y–n, ---- y–2, y–1, y0, y1,
y2 ---- yn
It is required to interpolate the value of function y = f(x) for a value of x in the interval
(x0 – h) to (x0 + h).
I. Stirling’s Interpolation Formula
y=y0+u.
( ∆y 0 + ∆y −1 ) + u2 .∆2 y +
(
u u 2 − 12 ) . ( ∆y −1 + ∆ 3 y −2 )
−1
2 2! 3! 2
+
2
(
u u −1 2 2
) .∆ y
4
(
u u −12 2
)(u 2
−2 2
) . (∆ y5
−2 + ∆ 5 y −3 )
−2 +
www.gayali.in
4! 5! 2
+ ------------------
where u = (x – x0)/h
Stirling’s formula is appropriate when the value of u lies in the interval -0.25 to +0.25.
It uses y0 and even order differences on the horizontal line through y0, but the average
of odd order differences above and below that line.
www.gayali.in
Statistics Made Easy | 237
y=
y 0 + y1
+ v .∆y 0 + .
(
v − 4 ∆ 2 y o + ∆ 2 y −1
+ )
∆ v2 −
4 3
.∆ y −1
2 2! 2 3!
2 1 2 9 1 9
+ . (
v − 4 v − 4 ∆ 4 y −1 + ∆ 4 y −2
+
) v v2 − v2 −
4 4 5
.∆ y −2
4! 2 5!
+----------
1
where u=(x–x0)/h and v= u −
2
Bessel’s formula is appropriate for values of u lying in the interval +0.25 to +0.75, i. e.
for interpolation in the middle - half of two tabulated values.
Bessel’s formula is especially found useful for interpolating the value of the
1
function exactly at the middle of two given values, i. e. when u = so that v = 0. Every
2
alternate term in Bessel’s formula then vanish. This special case of Bessel’s formula
www.gayali.in
known as “Formula for Interpolating to Halve".
Lagrange’s Interpolation Formula
Let y0, y1, y2, ----, yn denote the tabulated values of a function y = f(x)corresponding
to the values of the argument x0, x1, x2, - - -, xn (which may not be equidistant).It is
required to find the value of y corresponding to a specified value of x lying in between
the given values. This is obtained by using Lagrange’s Interpolation Formula:
( x − x1 ) ( x − x 2 ) − − − ( x − x n ) y
y=
( x 0 − x1 ) ( x 0 − x 2 ) − − − ( x 0 − x n ) 0
( x − x0 )( x − x2 ) − − − ( x − xn ) y
+
( x1 − x 0 ) ( x1 − x 2 ) − − − ( x1 − x n ) 1
+---- ---- ---- ---- ---- ---- ----
( x − x 0 ) ( x − x1 ) − − − ( x − x n −1 )
+ yn
( x n − x 0 ) ( x n − x1 ) − − − ( x n − x n −1 )
Inverse Interpolation
Given a set of tabulated values of a function y = f(x) corresponding to some values of
the argument x, the process of finding the value of the argument for an intermediate
value of the function is called. ‘Inverse Interpolation’.
www.gayali.in
www.gayali.in
Statistics Made Easy | 238
x=
( y − y1 ) ( y − y 2 ) ( y − y 3 ) − − − ( y − y n ) x
( y 0 − y1 ) ( y 0 − y 2 ) ( y 0 − y 3 ) − − − ( y 0 − y n ) 0
( y − y 0 )( y − y 2 )( y − y 3 ) − − − ( y − y n ) x
+
( y1 − y 0 ) ( y1 − y 2 ) ( y1 − y 3 ) − − − ( y1 − y n ) 1
( y − y 0 ) ( y − y1 ) ( y − y 3 ) − − − ( y − y n ) x
+
( y 2 − y 0 ) ( y 2 − y1 ) ( y 2 − y 3 ) − − − ( y 2 − y n ) 2
+---- ---- ---- ---- ---- ---- ----
( y − y 0 ) ( y − y1 ) ( y − y 2 ) − − − ( y − y n −1 )
+ xn
( y n − y 0 ) ( y n − y1 ) ( y n − y 2 ) − − − ( y n − y n −1 )
Exercise
[1] The following data show the monthly average number of deaths under one year
in a certain large city. Find the missing term?
Year 1960 1961 1962 1963 1964
www.gayali.in
Number of deaths (monthly average) 940 ? 907 843 798
[I.C.W.A. 1972]
Solution:
Since only 4 values are given, we assume a third degree polynomial, so that
the fourth differences are zero. Denoting the entries corresponding to the years 1960,
1961, 1962, etc. by y0, y1, y2, --------, we have then.
∆4 y0 = 0
or, (E – 1)4 y0 = 0
Expanding we get
y4 – 4y3 + 6y2 – 4y1 + y0 = 0
Putting y0=940, y2=907, y3=843, y4=798
798–4×843+6×907–4y1+940=0
Or, 7180 – 3372 = 4y1
3808
Or, 3808 = 4y1 ∴y1= = 952
4
[2] The following gives the amount y of cement in thousands of tons manufactured
www.gayali.in
www.gayali.in
Statistics Made Easy | 239
∆5y0=0
(E–1)5y0=y5–5y4+10y3–10y2+5y1–y0=0
or, Putting y0=39, y1=85, y3=151, y4=264,y5=388 in the above equation,
388–5×264+10×151–10y2+5×85–39=0
or, 388–1320+1510–10y2+425–39=0
or, 2323–1359–10y2=0
10y2=964
964
∴y2= = 96.4
10
[3] The growth of population in India, according to the decennial census, is shown
below:
Year 1901 1911 1921 1931 1941 1951
Population (Lakh) 2384 2522 2514 2791 ------- 3613
The census figure for 1941 is not given here. Give an estimate of the actual
population for 1941.
www.gayali.in
[C.U., B.Sc., 71]
Solution:
Since only 5 values are known, we assume a 4 degree polynomial for f(x);
so that 5th order differences are zero. In particular ∆5f(5)=0. Denoting the entries
corresponding to the years 1901, 1911, 1921, 1931, 1941 etc. by y, y1, y2 ---- we have
then
∆5y0=(E–1)5y0
= y5–5y4+10y3–10y2+5y1–y0=0
Putting y0=2384, y1=2522, y2=2514, y3=2791, y5=3613 in the above equation.
3613–5y4+10×2791–10×2514+5×2522–2384=0
or, 3613–594–27910–25140+12610–2384=0
or, 44133–27524=5y4
or,5y4=16609
16609
or y4= = 3322 lakh.
5
[4] Below are given the values of a function Ux for certain values of x :
x 0 1 2 3 4
www.gayali.in
Ux 1 0 5 22 57
Construct the table of differences. What does the table suggest? Use this table to
find U5.
[I.C.W.A., 1976]
Solution:
Since the given values of x are equidistantand the value U5 can be found by
www.gayali.in
Statistics Made Easy | 240
Difference Table
x Ux ∆Ux ∆2Ux ∆3Ux ∆4Ux
0 1
1
1 0 4
5 8
2 5 12 2
17 6
3 22 18
35
4 57
Since only 4 values are given. We assume that Ux is a 3rd degree polynomial in x.
So 4 difference may be regarded as zero, i.e. ∆4Ux=0
th
www.gayali.in
[5] Form a difference table and find the values of y3 and y9 from the following :
y4=135, y5=432, y6=1015, y7=2016, y8=3591
Solution:
Since only 5 values are known, we assume a 4th degree polynomial for f(x), so
that 5 and higher order differences are zero. In particular ∆5f(5)=0
th
(E–1)5f(5)=0
or, E5y0–5E4y0+10E3y0+10E2y0+5Ey0–y0=0
or, y5–5y4+10y3+10y2+5y1–y0=0
Substituting the values of y5 , y4 , y3 , y2 , y1 , y0 to y8 , y7 , y6 , y5 , y4 , y3 we get
y8–5y7+10y6+10y5+5y4–y3=0
or, y3=3591–5×2016+10×1015–10×432+5×135
=3591–10080+10150–4320+675=14416–14400=16
Difference Table
x f(x) = yx ∆yx ∆2 yx ∆3 yx ∆4 yx
4 y4 = 135
www.gayali.in
297
5 y5 = 432 286
583 132
6 y6 = 1015 418 24
1001 156
7 y7 = 2016 574
1575
8 y8 = 3591
www.gayali.in
Statistics Made Easy | 241
www.gayali.in
that 4th and higher order differences are zero.
In particular ∆4y0=0
or, (E–1)4y0=0
Expanding we get (E4–4E3+6E2–4E+1)y0=0
or, y4–4y3+6y2–4y1+y0=0
or, 24–4y3+6×4–4×6+8=0
or, 4y3=24+24–24+8=32
32
∴y3= =8
4
[b] Since 3 values are given, we assume a 2nd degree polynomial for yx so
that 3rd and higher order differences are zero. In particular, ∆4y1=0
∆3y0=0
or, (E–1)3 y0=0
or, (E3–3E2+3E–1)y0=0
or, y3–3y2+3y1–y0=0
Replacing y0, y1, y2, y3, y4 by y2, y3, y4, y5, y6
y5–3y4+3y3–y2=0
www.gayali.in
or, 122–3y4+3y3–5=0
or, 3y3–3y4+117=0 ------ (1)
y4–3y3+3y2–y1=0
or, y6–3y5+3y4–y3=0
or, 193–3×122+3y4–y3=0
or, 3y4–y3–173=0 ------ (2)
www.gayali.in
Statistics Made Easy | 242
www.gayali.in
10 0 -21
f(x) 1 3 9 ? 81
Solution:
Since 4 values are given, we assume a 3rd degree polynomial, so that the 4th
differences are zero. Denoting the entries corresponding to x 0, 1, 2, 3, 4 by y0, y1, y2, y3,
y4 we have them
∆4 y0 = 0
Or, (E – 1)4 y0 = 0
www.gayali.in
Statistics Made Easy | 243
Expanding we get
or, y4–4y3+6y2–4y1+y0=0
or, 81–4y3+6×9–4×3+1=0
or, 4y3=81+54–12+1=136–12=124
124
∴y3 = = 31
4
[8] Find y for x = 2, from the following table:
x 0 1 3 4 5
y 39 85 151 264 388
[I.C.W.A., 1969]
Solution:
Since 5 values are given. We assume a 4th degree polynomial, so that 5th differences
are zero, Denoting the enties corresponding to x 0, 1, 2, 3, 4, 5 by y0, y1, y2, y3, y4, y5 we
have then
∆5 y0 = 0
Or, (E – 1)5 y0 = 0
www.gayali.in
Or, y 5 − 5y 4 + 10 y 3 − 10 y 2 + 5y 1 − y 0 = 0
Or, 388 – 5 × 264 + 10 × 151 – 10y2 + 5 × 85 – 39 = 0
10y2 = 388 – 1320 + 1510 + 425 – 39
= 2323 – 1359 = 964
964
∴y2 = = 96.4
10
[9] Find f(5) from the following data :
f(3)=4, f(4)=13, f(6)=43
Solution:
Since only 3 values are known, we assume a 2nd degree polynomial for f(x), so
that 3 and higher order differences are zero. In particular ∆3f(0)=0 where entries
rd
corresponding to f(3), f(4), f(5), f(6) denoted by y0, y1, y2, y3 we have then
( E − 1)
3
y0 = 0
Expanding we get,
y3–3y2+3y1–y0=0
or, 43–3y2+3×13–4=0
or, 3y2=43+39–4=82–4=78
www.gayali.in
78
or, y=
2 = 26
3
f(5) = 26
[10] Find the polynomial function f(x) from the following values f(3)=–1, f(4)=5, f(5)=15
Solution:
Since only 3 values of the function are known, we may assume that the
www.gayali.in
Statistics Made Easy | 244
www.gayali.in
b+9c=10
(Substracting) –2c=–4
or, c=2
Putting the value of c=2 in equation (v) we get
b+9×2=10
or, b=–8
Putting the value of b=–8 and c=2 in equation (ii) we get
a+3×–8+9×2=–1
or, a=24–18–1=5
Hence the required equation of the polynomial is
5–8x+2x2=f(x)
or, 2x2–8x+5=f(x)
[11] Given the following table, find the function f(x), assuming it to be a polynomial
of the 3rd degree in x
x 0 1 2 3
f(x) 1 2 11 34
[I.C.W.A., 1975]
www.gayali.in
Solution:
We assume the polynomial as f(x)=a+bx+cx2+dx3
where a, b, c, d are certain constants to be determined.
Putting x=0,1, 2, 3 successively
f(0) = a = 1
f(1)=a+b+c+d=2
www.gayali.in
Statistics Made Easy | 245
f(2)=a+2b+4c+8d=11
f(3)=a+3b+9c+27d=34
a=1 -------(i)
or, b+c+d=1-------(ii)
2b+4c+8d=10-------(iii)
3b+9c+27d=33-------(iv)
From (ii) & (iii) b+c+d = 1
b+2c+4d=5
(Substracting) –c–3d=–4
or, c+3d=4 --------(v)
From (iii) & (iv) b+2c+4d=5
b+3c+9d=11
(Substracting) –c–5d=–6
or, c+5d=6 -------(vi)
From (v) & (vi) c+3d=4
c+5d=6
(Substracting) –2d=–2
www.gayali.in
or, d=1
Putting the values of d = 1 in equation (v), we get
c+3×1=4 b+c+d=1
or, c=1 or, b+1+1=1
∴b=–1
The required polynomial is
1+(–1)x+1x2+1x3
= 1–x+x2+x3
f(x)=x +x –x+1
3 2
www.gayali.in
Statistics Made Easy | 246
or, b + 2c + 4d = 5 ------(iii)
5b + 25c + 125d = 145
or, b + 5c + 25d = 29 ------(iv)
From (ii) & (iii) b+c+d=1
b+2c+4d=5
(Substracting) –c–3d=–4
or, c+3d=4 ------(v)
From (iii) & (iv) b+2c+4d=5
b+5c+25d=29
(Substracting) –3c–21d=–24
or. c+7d=8 ------(vi)
From (v) & (vi) c+3d=4
c+7d=8
(Substracting) –4d=–4
or, d=1
Putting the value of d in (vi) we get
c+7×1=8 or, c=1
www.gayali.in
Putting the value of c, d in (ii) we get
b+1+1=1 or, b=–1
Therefore, the polynomial is 2+(–1)x+1x2+1.x3
Ux=x3+x2–x+2
[13] Below are given the values of a function f(x) for contain values of x. Find f(2),
stating your assumption.
x 0 1 3 4
f(x) 5 6 50 105
[I.C.W.A, 1975]
Solution:
Since 4 values are given, so it is a case of polynomial of degree 3, hence the
4 differeces are zero. ∆4y0=0
th
www.gayali.in
Statistics Made Easy | 247
Solution:
Since only 4 values of the function are known we may assume that the polynomial
is of 3rd degree,
∴ f (x) = a + bx + cx 2 + dx 3
where a, b, c, d are certain constants to be determined.
Putting x = 0, 1, 2, 3 in the polynomial
a=–3 ------(i)
a+b+c+d=6 -----(ii)
a+2b+4c+8d=8 -----(iii)
a+3b+9c+27d=12 -----(iv)
or, b+c+d=9 -----(ii)
2b+4c+8d=11 -----(iii)
3b+9c+27d=15
or, b+3c+9d=5 -----(iv)
From (ii) & (iii) 2b + 4c + 8d = 11
2b+2c+2d=18
www.gayali.in
(Substracting) 2c+6d=–7 -----(v)
From (iii) & (iv) 2b+6c+18d=10
2b+4c+8d=11
(Substracting) 2c+10d=–1 -----(vi)
From (v) & (vi) 2c+10d=–1
2c+6d=–7
(Substracting) 4d=6
6 3
d= =
4 2
Putting the value of d in equation (vi) we get
3
2c + 10 × = −1
2
or, 2c=–1–15=–16
c=–8
Putting the value of c, d in equation (ii)
3
b + ( −8 ) + = 9
2
www.gayali.in
3 3 31
b = 9 − + 8 = 17 − =
2 2 2
31 3
f(b)= −3 + × 6 + ( −8 ) × 36 + × 216
2 2
= –3+31×3–288+3×108
= –3+93–288+324
= 417–291=126
www.gayali.in
Statistics Made Easy | 248
www.gayali.in
or, 12C = 7.2 – 9.6 = –2.4
−2.4
∴C = = −0.2
12
f(10)=A+7B+C×7×3+D×7×3×1=6.3
or, A+7B+21C+21D=6.3
or, 16.8+7×–1.2+21×–0.2+21D=6.3
or, 16.8–8.4–4.2+21D=6.3
or, 21D = 6.3 – 4.2 = 2.1
2. 1
∴D = = −0.1
21
f(x)=16.8–1.2(x–3)–0.2(x–3)(x–7)+0.1(x–3)(x–7)(x–9)
f(6)=16.8 – 1.2(6–3)–0.2(6–3)(6–7)+0.1(6–3)(6–7)(6–9)
=16.8–1.2×3–0.2×3×–1+0.1×3×–1×–3
=16.8–3.6+0.6+0.9
=18.3–3.6=14.7
[16] For a certain polynomial function yx it is known that y1 = –1, y2+y3=–1, y4+y5+y6 = 61.
Find yx and hence value of y3.
Solution:
www.gayali.in
Since only 3 values are available, we assume a 2nd degree polynomial for yx and
we write yx = a + bx + cx2
Putting x=0, y0=a
Putting x=1, y1=a+b+c=–1------(1)
Putting x=2&3, y2=a+2b+4c
and adding y3=a+3b+9c
y2 + y3 = 2a + 5b + 13c = –1 ------(2)
www.gayali.in
Statistics Made Easy | 249
www.gayali.in
Putting the value of c in equation (4)
3b+11×2=1
3b=1–22=–21
b=–7
Putting the value of b & c in the equation
a+b+c=–1
or, a–7+2=–1
or, a–5=–1
or, a=4
∴yn = 4 + (–7)x + 2x2
yn = 4 – 7x + 2x2
when x=3, y3=4–21+18=22–21=1
[17] Given U0 + U6 = –107, U1 + U5 = –36, U2 + U4 = –3, find the value of U3.
Solution:
Since 3 values are given, we assume a 2nd degree polynomial for U, so that 3rd and
higher order differences are zero. In particular.
∆6 U0 = 0
www.gayali.in
or, ( E − 1) U 0 = 0 Expanding
6
U 6 − 6 U 5 + 15 U 4 − 20 U 3 + 15 U 2 − 6 U1 + U 0 = 0
or, ( U 0 + U 6 ) − 6 ( U1 + U 5 ) + 15 ( U 2 + U 4 ) − 20U 3 = 0
or, −107 − 6 × −36 + 15 × −3 − 20U 3 = 0
or, −107 + 216 − 45 − 20U 3 = 0
www.gayali.in
Statistics Made Easy | 250
www.gayali.in
1 256
369
3 625 –59
310 15
5 935 –44 -5
266 10
7 1201 –34
232
9 1433
x − x0 4 − 1 3
u= = = = 1. 5
h 2 2
∆ 2 y 0 u ( u − 1) ∆ 3 y 0 u ( u − 1) ( u − 2 ) ∆ 4 y 0 u ( u − 1) ( u − 2 ) ( u − 3 )
y= y 0 + ∆y 0 u + + +
1× 2 1× 2 × 3 1× 2 × 3 × 4
1.5 (1.5 − 1)
1.5 (1.5 − 1) (1.5 − 2 ) −5 × 1.5 (1.5 − 1) (1.5 − 2 ) (1.5 − 3 )
= 256 + 1.5 × 369 + ( −59 ) × + 15 × +
1× 2 1× 2 × 3 1× 2 × 3 × 4
59 × 1.5 × .5 15 × 1.5 × .5 × −.5 −5 × 1.5 × −0.5 × −1.5
= 256 + 553.5 − + +
2 6 24
=256+553.5–22.13–0.94–0.12 = 809.5–23.19 = 786.31=787 (approx)
www.gayali.in
[19] The following table gives the expectation of life e0x at age x. Calculate the
expectation of life as age 12 by Newton’s forward Interpolation formula
x 10 15 20 25 30 35
e0x 35.4 32.2 29.1 26.0 23.1 20.4
[I.C.W.A. 1977]
www.gayali.in
Statistics Made Easy | 251
Solution:
Difference Table
x e0x ∆e0x ∆2e0x ∆3e0x ∆4e0x
10 35.4
-3.2
15 32.2 0.1
-3.1 -0.1
20 29.1 0 0.3
-3.1 0.2
25 26.0 0.2 -0.2
-2.9 0
30 23.1 0.2
-2.7
35 20.4
x − x 0 12 − 10 2
u= = = = 0.4
h 5 5
∆ y 0 u(u − 1) ∆ 3 y 0 u(u − 1)(u − 2) ∆ 4 y 0 u(u − 1)(u − 2)((u − 3)
2
y= y 0 + ∆y 0 u + + +
1× 2 1× 2 × 3 1× 2 × 3 × 4
www.gayali.in
0.1 × 0.4(0.4 − 1) −0.1 × 0.4(.4 − 1)(.4 − 2) 0.3 × 0.4(0.4 − 1)(.4 − 2)(.4 − 3)
= 35.4 − 3.2 × 0.4 + + +
1× 2 1× 2 × 3 1× 2 × 3 × 4
= 35.4–1.28–0.012–0.064–0.0125 =35.4–1.369 =34.031=34.1 (approx)
[20] The following shows the values of the function y = f(x) for a number of values of x :-
x 0.5 0.6 0.7 0.8 0.9
y 0.35207 0.33322 0.31225 0.28969 0.26609
Obtain the values of y when x = 0.58, using suitable interpolation formula.
[C.U., B.A.(Econ.), 1976]
Solution:
Difference Table
x y ∆y ∆2 y ∆3 y ∆4 y
0.5 0.35207
-0.01885
0.6 0.33322 -0.00212
-0.02097 0.00053
0.7 0.31225 -0.00159 0.00002
-0.02256 0.00055
0.8 0.28969 -0.00104
www.gayali.in
-0.02360
0.9 0.26609
x − x 0 0.58 − 0.5 0.08 8
u= = = = = 0. 8
h 0. 1 0.1 10
∆ 2 y 0 u ( u − 1) ∆ 3 y 0 u ( u − 1) ( u − 2 ) ∆ 4 y 0 u ( u − 1) ( u − 2 ) ( u − 3 )
y= y 0 + ∆y 0 u + + +
1× 2 1× 2 × 3 1× 2 × 3 × 4
www.gayali.in
Statistics Made Easy | 252
−0.00212 × (0.8)(−0.2) 0.00053 × 0.8 × −0.2 × −1.2 .00002 × 0.8 × −0.2 × −1.2 × −2.2
= 0.35207 + (0.01885 × 0.8) + + +
1× 2 1× 2 × 3 1× 2 × 3 × 4
=0.35207–0.01508+0.00017+0.00002–.0000004 =0.33718
[21] The table below gives the average number of years of life remaining to persons
who survive to exact age x, for male African population of Belgium Congo.
x 0 5 10 15 20
e0x 37.64 44.04 41.40 37.78 34.41
Obtain 0e2 approximately
[I.C.W.A. 1973]
Solution:
Difference Table
x yx= 0ex ∆y ∆2y ∆3 y ∆4y
0 37.64
6.40
5 44.04 -9.04
-2.64 8.06
www.gayali.in
10 41.40 -0.98 -0.09
-3.62 7.97
15 37.78 6.99
-3.37
20 34.41
x − x0 2 − 0 2
u= = = = 0.4
h 5 5
∆ y 0 u ( u − 1) ∆ 3 y 0 u ( u − 1) ( u − 2 ) ∆ 4 y 0 u ( u − 1) ( u − 2 ) ( u − 3 )
2
y= y 0 + ∆y 0 u + + +
1× 2 1× 2 × 3 1× 2 × 3 × 4
V5 2.236
0.213
V6 2.449 -0.016
0.197 0.001
V7 2.646 -0.015
0.182
V8 2.828
www.gayali.in
Statistics Made Easy | 253
www.gayali.in
[C.U., B.A.(Econ.), 1977]
Solution:
Since there are 5 values, it is a 4th degree polynomial and ∆5y0. Let income=x or, and
number of earners=y
(E–1)5y0=0
or, y5–5y4+10y3–10y2+5y1–y0=0
or, 88–5×147+10×225–10×304+5y1–412=0
or, 88–735+2250–3040+5y1–412=0
or, 2338–4187+5y1=0
or, 5y1=1849
1849
or,=y 1 = 369.8 = 369
5
Difference Table
Income (Rs.) x No. of years (y) ∆y ∆2y ∆3y ∆4y
50,000 412
-108
75,000 304 29
-79 -28
1,00,000 225 1 46
-78 18
www.gayali.in
1,25,000 147 19
-59
1,50,000 88
x − x 0 60, 000 − 50, 000 10, 000
u= = = = 0.4
h 25, 000 25, 000
∆ 2 y 0 u ( u − 1) ∆ 3 y 0 u ( u − 1) ( u − 2 ) ∆ 4 y 0 u ( u − 1) ( u − 2 ) ( u − 3 )
y= y 0 + ∆y 0 u + + +
1× 2 1× 2 × 3 1× 2 × 3 × 4
www.gayali.in
Statistics Made Easy | 254
www.gayali.in
2413
27 26,511
10 − 7 3
(i) u= = = 0.75
4 4
By Newton’s forward formula,
∆ 2 y 0 u ( u − 1) ∆ 3 y 0 u ( u − 1) ( u − 2 ) ∆ 4 y 0 u ( u − 1) ( u − 2 ) ( u − 3 )
y= y 0 + ∆y 0 u + + +
1× 2 1× 2 × 3 1× 2 × 3 × 4
302 × .75 × (.75 − 1) 138 × 0.75 × (0.75 − 1)(0.75 − 2) 2 × 0.75 × (0.75 − 1)(0.75 − 2)(0.75 − 3)
= 20256 + 369 × 0.75 + + +
1× 2 1× 2 × 3 1× 2 × 3 × 4
= 20256+276.75–28.31+5.39–.05=20256–28.36=20509.78=20510
(ii) By Newton’s Backward Formula,
x − x n 25 − 27 2
v= = = − = −0.5
h 4 4
∆y v ( v + 1) ∆y n −3 v ( v + 1) ( v + 2 ) ∆y n − 4 v ( v + 1) ( v + 2 ) ( v + 3 )
y= y n + ∆y n −1 v + n −2 + +
1× 2 1× 2 × 3 1× 2 × 3 × 4
−0.5(−0.5 + 1) −0.5(−0.5 + 1)(−0.5 + 2) 2 × −0.5(−0.5 + 1)(−0.5 + 2)(−0.5 + 3)
= 26511 + 2413 × −0.5 + 722 × + 142 × +
1× 2 1× 2 × 3 1× 2 × 3 × 4
= 26511–1206.5–90.25–8.88–0.08 =26511–1305.71 =26205.29
www.gayali.in
y = 26205
Ans. 20510, 26205.
www.gayali.in
Statistics Made Easy | 255
Solution:
Since the given values of x are equidistant and the value x = 480 lies near the end
of these values, we use Newton’s backward interpolation formula:
Difference Table
x y = f(x) = sin x ∆y ∆2y ∆3y ∆4y
30 .5000
.0736
35 .5736 -0.0044
.0692 -0.0005
40 .6428 -0.0049 0
.0643 -0.0005
45 .7071 -0.0054
.0589
50 .7660
x − x n 48 − 50 2
v= = = − = −0.4
h 5 5
∆y n −2 v ( v + 1) ∆y n −3 v ( v + 1) ( v + 2 )
y= y n + ∆y n −1V + +
1× 2 1× 2 × 3
( −.00054 ) × −0.4 × ( −.4 + 1) + −0.0005 × −0.4 ( −0.4 + 1) ( −0.4 + 2 )
www.gayali.in
= 0.7660 + 0589 × −0.4 +
1× 2 1× 2 × 3
.0054 × −0.4 × 0.6 ( −0.0005 ) × −0.4 × 0.6 × 1.6
= 0.7660 − 0.0236 − +
2 6
= 0.7660–0.0236+0.0065+0.00005 =0.7667–0.0236=0.7431
Ans. 0.7431
[26] Using Newton’s interpolation formula, find the number of factories earning less
than Rs.65,000 as profits, from the following data:
Profits (Rs.’000) 30 - 40 40 - 50 50 - 60 60 - 70 70 - 80
No. of factories 34 43 56 39 29
[I.C.W.A. 1975]
Solution:
Let yx denote the number of factories earning Profits less than x rupees i. e. yx
gives the “less than” cumulative frequency (c. f). The less than cumulative frequency
and the difference table is given below:
Difference Table
x c.f = y ∆y ∆2 y ∆3 y ∆4 y ∆5 y
30 = x0 0=y0
34
www.gayali.in
40 34 9
43 4
50 77 13 -34
56 -30 71
60 133 -17 37
39 7
70 172 -10
29
80 201
www.gayali.in
Statistics Made Easy | 256
Since we have to find the value of y (= 65) near the end of tabulated values and
the x values are equidistant we may use Newton’s backward interpolation formula.
x − x n 65 − 80 15
Here, v= = = − = −1.5
h 10 10
www.gayali.in
( Find correct up to 7 decimal places)
[I.C.W.A., 1978]
Solution:
Since the tabulated value of the argument x are equidistant and 3.146 lies near
the end of the tabulated values of x, Newton’s backward interpolation formula seems
to be appropriate here.
Difference Table
x y ∆y ∆2 y ∆3 y ∆4 y
3.141 0.4970679
0.0001383
3.142 0.4972062 -0.0000001
0.0001382 0
3.143 0.4973444 -0.0000001 0
0.0001381 0
3.144 0.4974825 -0.0000001
0.0001380
3.145 0.4976205
x − x n 3.146 − 3.145
Here, v= = =1
h 0.001
f(3.146)=log 3.146=0.4976205+0.0001380×1+
( −0.0000001) × 1× 2
www.gayali.in
1× 2
= 0.4976205 + 0.0001380 – 0.0000001
= 0.4977584
[28] Use Stirling’s interpolation formula to find the value of the probability integral
(P) when X = 1.52 :-
X 1.3 1.4 1.5 1.6 1.7
P = prob. integral .90320 .91924 .93319 .94520 .95543
www.gayali.in
Statistics Made Easy | 257
Solution:
Difference Table
x y ∆y ∆2y ∆3y
1.3=x–2 .90320=y–2
.01604=∆y–2
1.4=x–1 .91924=y–1 –.00209=∆2y–2
.01395=∆y–1 .00015=∆3y–2
1.5=x0 .93319=y0 –.00194=∆2y–1
.01201=∆y0 .00016=∆3y–1
1.6=x1 .94520=y1 –.00178=∆ y0 2
.01023=∆y1
1.7=x2 .95543=y2
1.52 − 1.50 .02
u= ( x − x 0 ) / h. = = = .2
.1 .1
Stirling’s Interpolation Formula
y= y 0 +
u ( ∆y 0 + ∆y −1 )
+
u2 2
∆ y −1 +
( .
)(
u u 2 − 12 ∆ 3 y −1 + ∆ 3 y −2 )
2 2! 3! 2
.2 × ( .01201 + 01395 ) ( )
.2 .22 − 12 ( .00016 + .00015 )
www.gayali.in
.22
= .93319 + + × −.00194 + ×
2 2! 3! 2
=0.93319+.00260–.0004–.032×0.00016=0.93319+0.00260–.0004–.00001
=0.93579–.00005=0.93574
Ans. 0.93574
[29] Given the following cube - roots, find by Bessel’s interpolation formula, the cube
- root of 102.5:
Number 101 102 103 104
Cube - root 4.657,0095 4.672,3287 4.687,5481 4.702,6694
[B.U., B.A.(Econ.), 1965]
Solution:
Since the values of the argument are equidistant, and 102.5 lies near the middle
of the tabulated values, we use central difference formula
x − x 0 102.5 − 102
u= = = 0. 5
h 1
1 1
v =u− = 0.5 − = 0
2 2
Difference Table
www.gayali.in
x y ∆y ∆2y ∆3y
x–1 = 101 y–1 = 4.6570095
∆y–1 = .0153192
x0 = 102 y0 = 4.6723287 ∆2y–1 = .0000998
∆y0 = .0152194 ∆3y–1 = .0000017
x1 = 103 y1 = 4.6875481 ∆ y0 = .0000981
2
∆y1 = .0151213
x2 = 104 y2 = 4.7026694
www.gayali.in
Statistics Made Easy | 258
2 1
y=
y 0 + y1
+ v∆y 0 + . ( )
v − 4 ∆ 2 y 0 + ∆ 2 Y−1
2 2! 2
1
− −.0000981 + ( −.0000998 )
4.6723287 + 4.6875481
= + 4
2 2
=4.6799384+.0000124=4.6799508
Ans. 4.6799508
[30] Find by using Bessel’s interpolation formula, the expectation of life at age 22
from the following data:
Age (x) 10 15 20 25 30 35
Exp. of life (y) 35.4 32.2 29.1 26.0 23.1 20.4
Solution:
Here, x0=20
x − x 0 22 − 20 2
u= = = = 0. 4
www.gayali.in
h 5 5
1 1
v= u − = 0.4 − = 0.4 − 0.5 = −0.1
2 2
Difference Table
x y ∆y ∆2 y ∆3 y
x–2 = 10 y–2 = 35.4
∆y–2 = –3.2
x–1 = 15 y–1 = 32.2 ∆2 y–2 = 0.1
∆y–1 = –3.1 ∆3 y–2 = –0.1
x0 = 20 y0 = 29.1 ∆ y–1 = 0
2
∆y1 = –2.9 ∆3 y0 = 0
x2 = 30 y2 = 23.1 ∆ y1 = 0.2
2
∆y2 = –2.7
x3 = 35 y3 = 20.4
Ans. = 27.85
[31] The following table gives the normal weight of a baby during the first six months of life:
Age in months 0 2 3 5 6
Weight in lbs. 5 7 8 10 12
Estimate the weight of a baby at the age of 4 months.
[I.C.W.A., 1970]
www.gayali.in
Statistics Made Easy | 259
Solution:
Since the successive values of x for the tabulated function f(x)are not equidistant,
Newton’s forward or backward formula cannot be applied, Appling Lagrange’s formula
y=
( x − x1 ) ( x − x 2 ) ---- ( x − x n ) y
( x 0 − x1 ) ( x 0 − x 2 ) ---- ( x 0 − x n ) 0
( x − x 0 ) ( x − x 2 ) ---- ( x − x n ) y
+
( x1 − x 0 ) ( x1 − x 2 ) ---- ( x1 − x n ) 1
+ ---------------------------------------
( x − x 0 ) ( x − x1 ) ---- ( x − x n −1 ) y
+
( x n − x 0 ) ( x n − x1 ) ---- ( x n − x n −1 ) n
Let age in years = x
Weight in lbs = y = f(x) corresponding to the values of the argument x 0 , x1 , x 2 , ----, x n
∴y=
( 4 − 2 )( 4 − 3)( 4 − 5)( 4 − 6 ) × 5 + ( 4 − 0 )( 4 − 3)( 4 − 5)( 4 − 6 ) × 7
(0 − 2 )(0 − 8 )(0 − 5)(0 − 6 ) (2 − 0 )(2 − 3)(2 − 5)(2 − 6 )
www.gayali.in
+
( 4 − 0 ) ( 4 − 2 ) ( 4 − 5) ( 4 − 6 ) × 8 + ( 4 − 0 ) ( 4 − 2 ) ( 4 − 3) ( 4 − 6 ) × 10
(3 − 0 )(3 − 2 )(3 − 5)(3 − 6 ) (5 − 0 )(5 − 2 )(5 − 3)(5 − 6 )
+
( 4 − 0 ) ( 4 − 2 ) ( 4 − 3) ( 4 − 5) × 12 = 2 × 1× −1× −2 × 5 + 4 × 1× −1× −2 × 7
(6 − 0 )(6 − 2 )(6 − 3)(6 − 5) −2 × −3 × −5 × −6 2 × −1 × −3 × −4
4 × 2 × −1 × −2 4 × 2 × 1 ×× − 2 4 × 2 × 1 × −1
+ ×8 + × 10 + ×112
3 × 1 × −2 × −3 5 × 3 × 2 × −1 6 × 4 × 3 ×1
4 8 16 −16 −8 1 7 64 16 4
= + ×7 + ×8 + × 10 + × 12 = − + + −
36 −24 18 −30 72 9 3 9 3 3
1 − 21 + 64 + 48 − 12 113 − 33 80 8 8
= = = = 8 lbs. Ans. 8 lbs. .
9 9 9 9 9
[32] State Lagrange’s interpolation formula. Use it to find f(x). When x = 0 given.
x -1 -2 2 4
f(x) -1 -9 11 69
[I.C.W.A. 1974]
Solution:
www.gayali.in
Let y0, y1, y2, -------, yn denote the tabulated values of a function y= f(x)
corresponding to the values of the argument x0, x1, x2, -------, xn (which may not be
equidistant). It is required to find the value of y corresponding to a specified value of
x lying in between the given values. This is obtained by using Lagrange’s Interpolation
Formula:
y=
( x − x1 ) ( x − x 2 ) − − − − ( x − x n ) y
( x 0 − x1 ) ( x 0 − x 2 ) − − − − ( x 0 − x n ) 0
www.gayali.in
Statistics Made Easy | 260
+
( x − x 0 ) ( x − x 2 ) ---- ( x − x n ) y
( x1 − x 0 ) ( x1 − x 2 ) ---- ( x1 − x n ) 1
+------------------------------
+
( x − x 0 ) ( x − x1 ) ---- ( x − x n −1 ) y
( x n − x 0 ) ( x n − x1 ) ---- ( x n − x n −1 ) n
y=
( 0 + 2 ) ( 0 − 2 ) ( 0 − 4 ) × −1 + ( 0 + 1) ( 0 − 2 ) ( 0 − 4 ) × −9
( −1 + 2 ) ( −1 − 2 ) ( −1 − 4 ) ( −2 + 1) ( −2 − 2 ) ( −2 − 4 )
+
( 0 + 1) ( 0 + 2 ) ( 0 − 4 ) × 11 + ( 0 + 1) ( 0 + 2 ) ( 0 − 2 ) × 69
( 2 + 1) ( 2 + 2 ) ( 2 − 4 ) ( 4 + 1) ( 4 + 2 ) ( 4 − 2 )
2 × −2 × −4 1 × −2 × −4 1 × 2 × −4 1 × 2 × −2
= × −1 + × −9 + × 11 + × 69
1 × −3 × −5 −1 × −4 × −6 3 × 4 × −2 5×6×2
16 8 −8 −4 16 11 23
= × −1 + × −9 + × 11 + × 69 = − + 3 + −
15 −24 −24 60 15 3 5
−16 + 45 + 55 − 69 100 − 85 15
= = = = 1 Ans.1
15 15 15
www.gayali.in
[33] State Lagrange’s interpolation formula. Use it to find value of U4 of a function
=
Ux, given that =
U1 10, U 2 15, U 5 = 42
[I.C.W.A., 1974]
Solution:
Lagrange’s interpolation formula as in Q.32.
Here, x : 1 2 5
y=Ux : 10 15 42
y=U4=
( 4 − 2 ) ( 4 − 5) × 10 + ( 4 − 1) ( 4 − 5) × 15 + ( 4 − 1) ( 4 − 2 ) × 42
(1 − 2 ) (1 − 5) ( 2 − 1) ( 2 − 5) ( 5 − 1) ( 5 − 2 )
2 × −1 3 × −1 3×2 2 −3 6
= × 10 + × 15 + × 42 = − × 10 + × 15 + × 42
−1 × −4 1 × −3 4×3 4 −3 12
= –5+15+21=31. Ans. 31
[34] Using Lagrange’s formula or otherwise, obtain the value of log 96 approximately
from the following table:
x 95 97 98 99
log x 1.977, 7236 1.986, 7717 1.991, 2261 1.995, 6352
[C.U., B.com(Hons), 1969]
www.gayali.in
Solution:
Using Lagrange’s Interpolation Formula,
y=log 96=
( 96 − 97 ) ( 96 − 98 ) ( 96 − 99 ) × 1.9777236
( 95 − 97 ) ( 95 − 98 ) ( 95 − 99 )
+
( 96 − 95 ) ( 96 − 98 ) ( 96 − 99 ) × 1.9867717 + ( 96 − 95) ( 96 − 97 ) ( 96 − 99 ) × 1.9912261
( 97 − 95) ( 97 − 98 ) ( 97 − 99 ) ( 98 − 95) ( 98 − 97 ) ( 98 − 99 )
www.gayali.in
Statistics Made Easy | 261
+
( 96 − 95 ) ( 96 − 97 ) ( 96 − 98 ) × 1.9956352
( 99 − 95 ) ( 99 − 97 ) ( 99 − 98 )
−1 × −2 × −3 1 × −2 × −3
= × 1.9777236 + × 1.9867717
−2 × −3 × −4 2 × −1 × −2
1 × −1 × −3 1 × −1 × −2
+ × 1.9912261 + × 1.9956352
3 × 1 × −1 4 × 2 ×1
−6 6 3 2
= × 1.9777236 + × 1.9867717 + × 1.9912261 + × 1.9956352
−24 4 −3 8
=.4944309+2.98015755–1.9912261+0.4989088
=3.97349725–199122610=1.98227115 Ans. 1.98227115
[35] Given log10 654=2.8156, log10 658=2.8182, log10 659=2.8189, log10 661=2.8202. Find
by Lagrange’s Interpolation formula log10 656 (Retain 4 decimal places).
[I.C.W.A., 1978]
Solution:
x 654 658 659 661
y = f(x) = log10 x 2.8156 2.8182 2.8189 2.8202
www.gayali.in
y=
( 656 − 658 ) ( 656 − 659 ) ( 656 − 661) × 2.8156
( 654 − 658 ) ( 654 − 659 ) ( 654 − 661)
+
( 656 − 654 ) ( 656 − 659 ) ( 656 − 661) × 2.8182
( 658 − 654 ) ( 658 − 659 ) ( 658 − 661)
+
( 656 − 654 ) ( 656 − 658 ) ( 656 − 661) × 2.8189
( 659 − 654 ) ( 659 − 658 ) ( 659 − 661)
+
( 656 − 654 ) ( 656 − 658 ) ( 656 − 659 ) × 2.8202
( 661 − 654 ) ( 661 − 658 ) ( 661 − 659 )
−2 × −3 × −5 2 × −3 × −5 2 × −2 × −5 2 × −2 × −3
= × 2.8156 + × 2.8182 + × 2.8189 + × 2.8202
−4 × −5 × −7 4 × −1 × −3 5 × 1 × −2 7 ×3×2
−30 30 20 12
= × 2.8156 + × 2.8202 + × 2.8189 + × 2.8202
−140 12 −10 42
= 0.6033+7.0455–5.6378+0.8058 =2.8168 Ans. 2.8168
[36] Find the value of x for which y = 40
x 10 12 15 20
y 25 32 35 45
www.gayali.in
Solution:
The process of finding a value of x is known as ‘Inverse Interpolation; Since Lagrange’s
Interpolation formula is applicable for unequal intervals, we can use this formula for inverse
interpolation interchanging the role of the argument x and the function y.
x=
( y − y1 ) ( y − y 2 ) ( y − y 3 ) ------ ( y − y n ) x
( y 0 − y1 ) ( y 0 − y 2 ) ( y 0 − y 3 ) ------ ( y 0 − y n ) 0
www.gayali.in
Statistics Made Easy | 262
+
( y − y 0 ) ( y − y 2 ) ( y − y 3 ) ------ ( y − y n ) x
( y1 − y 0 ) ( y1 − y 2 ) ( y − y 3 ) ------ ( y1 − y n ) 1
+
( y − y 0 ) ( y − y1 ) ( y − y 3 ) ------ ( y − y n ) x
( y 2 − y 0 ) ( y 2 − y1 ) ( y 2 − y 3 ) ------ ( y 2 − y n ) 2
+ ------------------------------------------
+
( y − y 0 ) ( y − y1 ) ( y − y 2 ) ------ ( y − y n −1 ) x
( y n − y 0 ) ( y n − y1 ) ( y n − y 2 ) ------- ( y n − y n −1 ) n
Applying the above formula
(40 − 32)(40 − 35)(40 − 45) (40 − 25)(40 − 35)(40 − 45)
x= × 10 + × 12
(25 − 32)(25 − 35)(25 − 45) (32 − 25)(32 − 35)(32 − 45)
(40 − 25)(40 − 32)(40 − 45) (40 − 25)(40 − 32)(40 − 35)
+ × 15 + × 20
(35 − 25)(35 − 32)(35 − 45) (45 − 25)(45 − 32)(45 − 35)
8 × 5 × −5 15 × 5 × −5 15 × 8 × −5 15 × 8 × 5
= × 10 + × 12 + × 15 + × 20
−7 × −10 × −20 7 × −3 × −13 10 × 3 × −10 20 × 13 × 10
−200 −375 −600 600 10 (−)125 60
= × 10 + × 12 + × 15 + × 20 = + × 12 + 30 +
−1400 273 −300 2600 7 91 13
www.gayali.in
130 − 1500 + 2730 + 420 3280 − 1500 1780
= = = = 19.56 Ans. 19.56
91 91 91
INDEX NUMBERS
Index Numbers are numerical figures which indicate the relative position in respect of
price or quantity or value of a group of articles at certain periods of time as compared
with another period called base period. Index numbers for the base year period is
always taken as 100.
Method of Construction of Index Numbers
Index Nuber Construction
Aggregative Relative
Method Method
Simple Weighted Simple Weighted
Aggregative Aggregative Average of Average of
Formula Formula Relatives Relative
www.gayali.in
Laspeyre's Paasche's Edgeworth Fisher's
Formula Formula Marshall's Ideal
Formula Formula
[I] Aggregative Method
In this method, the aggregate price of all items in the given year is expressed as
a percentage of the same in the base year, giving the index number.
www.gayali.in
Statistics Made Easy | 263
www.gayali.in
w = (q0 + qn ), we get
ΣP ( q + q )
Edgeworth-Marshall's Index (Ion)= n o n ×100
ΣPo ( q o + q n )
[iv] The geometric mean (i.e. square-rite of the product) of Laspeyre’s Index
and Paasche’s index.
Fisher's Ideal Index (Ion)= (Laspeyre’s Index × Paasche’s Index
ΣPn q o ΣPn q n
= × ×100
ΣPo q o ΣPo q n
1
[v] Bowley’s Index (Ion)= (Laspeyre’s Index + Paasche’s Index)
2
1 ΣPn q o ΣPn q n
= + × 100
2 ΣPo q o ΣPo q n
[vi] If the geometric mean of base year and current year quantities is used as
weight i.e. W = q 0 q n we get.
ΣPn q o q n
Walsh’s Index (Ion)= ×100
ΣPo q o q n
www.gayali.in
[vii] If the weights used are kept fixed for all periods i.e. weights are constant
quantities (q). Without any reference to base or current period we get,
ΣPn q
Kelly’s Index (Ion)= ×100
ΣPo q
[II] Relative Method
In this method, the price of each item in the current year is expressed as a percentage
of the price of the base year. This is called price relative and is given by the formula,
www.gayali.in
Statistics Made Easy | 264
www.gayali.in
P
Σ n × 100 Po q o
P ΣP q
Ion= o = n o × 100 =Laspeyre's index
ΣPo q o ΣPo q o
[2] The A.M of relatives formula weighted by values of current year quantities at
base year prices (p0qn) gives. Paasche’s Formula:
P
Σ n × 100 po q n
P ΣP q
Ion= o = n n × 100 = Paasche’s Index
ΣPo q n ΣPo q n
[3] The H.M of relatives formula weighted by current year values (pnqn) gives the
same formula as Paasche’s :
ΣPn q n ΣPn q n
Ion= = × 100 = Paasche’s Index
Pn ΣPo q n
Σ ( Pn q n ) × 100
P
o
Construction of General Index from Group Index
In the construction of any index number, the items included are usually classified under
some broad categories called Groups, with similar or related items coming under each
group. A separate index number is constructed for each group, and is called Group Index.
The weighted average (usually A.M.) of group index numbers gives the General Index.
www.gayali.in
ΣIW
General Index =
ΣW
when I represents the Group Index and W is the Group Weight.
Quantity Index Number
Just as price index numbers measure and permit comparison of the price of a group
of related items, quantity index numbers similarly measure and permit comparison
www.gayali.in
Statistics Made Easy | 265
Σq n Po Σq n Pn
Fisher’s Ideal Index = × × 100
Σq o Po Σq o Pn
qn
www.gayali.in
Quantity Relative = × 100
qo
Simple A. M of Quantity Relative Index = ∑ (Quantity Relatives) ÷ K
Σ(Quantity Relative × Weight)
Weighted A. M of Quantity Relative Index = ΣWeight
Tests of Index Numbers
In order to judge the efficiency of an index number formula as a measure of the level of
phenomenon from one period to another, the noted economist Irving fisher suggested
certain tests. The three most important tests of index numbers are (1) Time Reversal
Test, (2) Factor Reversal Test, and (3). Circular Test.
[1] Time Reversal Test
According to this test, a good index number formula should work both ways,
forward and backward, with respect to time. In other words, we should get the same
picture of change between two points of time, no matter which of the two is taken as
base. Consequently, the index number (Ion) for period n with base period 0 should be the
reciprocal of the index number (Ino) for period 0 with the base period n. Symbolically
Ion × Ino = 1
www.gayali.in
An index number formula which obeys this relation is said to satisfy the time
reversal test.
Time reversal test is satisfied by simple aggregative formula, Marshall –
Edgeworth’s formula, Fisher’s ideal index formula and simple geometric mean of
relatives formula. Weighted aggregative formula and weighted geometric mean of
relatives formula also satisfy this test, if constant weights are used which do not depend
upon the base or current period.
www.gayali.in
Statistics Made Easy | 266
Fisher’s ideal index is the only formula which satisfies this test.
[3] Circular Test
This is an extension of time reversal test. An index number formula is said
to satisfy the circular test, if the time reversal test is satisfied through a number of
intermediate years. Symbolically, I01 × I12 × I23 ×-------× In-1, n × In,o = 1
This means that the relation is satisfied in a circular fashion through several
years, 0 to 1, 1 to 2, 2 to 3 -------, (n – 1) to n, and finally from n back to 0. Simple
aggregative formula and the simple geometric mean of relatives formula satisfy this
test. Weighted aggregative formula and weighted geometric mean of relatives formula
www.gayali.in
satisfy this test, if constant weights are used for all time periods.
Cost of Living Index Numbers
Cost of living index numbers are special-purpose index numbers which are designed to
measure the relative change in the cost level for maintaining similar standard of living
in two different situations. These are generally intended to represent the average changes
in prices over time, paid by the ultimate consumer for a specified group of goods and
services and hence are also called Consumer Price Index Numbers.
The steps in the construction of a Cost of Living Index are as follows:
[1] The first step is to decide on the class of people for whom the index number is
intended.
[2] The next step is to conduct a ‘family budget enquiry’ in the base period relating
to the class of people concerned, by the process of random sampling only important
items among those which are used by the majority of the class of people are included
in the construction of a cost of living index.
[3] The items of expenditure are classified in certain major groups e.g. (i) Food,
(ii) Clothing, (iii) Fuel and light, (iv) Housing, and (v) Miscellaneous. There major
groups are further divided into smaller groups and sub-groups, so that the items are
individually mentioned.
www.gayali.in
[4] Arrangements should be made to callect retail prices of the items are regular
intervals of time from important local markets. Price quotations are taken at least once
a week.
[5] For each item there will be a number of price quotation covering different
qualities and markets. The simple average of price relatives of the different quotations
is taken as the price relative for the particular item.
www.gayali.in
Statistics Made Easy | 267
[6] A separate index number is then computed for each group, using Laspeyre’s
formula in the form of weighted average of price relatives.
P
ΣW n × 100
Po
Group Index (1) =
100
Po q o
where W = ×100
ΣPo q o
Thus, in the construction of a Group Index, the weight (W) of an item is the
percentage expenditure of an ‘average family’ on that item in relation to the total
expenditure in the Group, as obtained from the family budget enquiry.
[7] The weighted average of group index numbers, gives the final cost of Living
Index number.
ΣIW
Cost of Living Index =
100
The weight (W) of a group index is the percentage of total expenditure of an
www.gayali.in
average family spent on that group, as shown by the family budget enquiry.
[8] Cost of living index numbers are generally constructed for each week. The
average of the weekly index numbers is taken as the index number for a month. The
average of monthly index numbers gives the cost of living index for the whole year.
Chain Base Method
There are two methods of constructing of index number depending on the nature of
base period employed: (i) Fixed Base Method and (ii) Chain Base Method. Most of
the index numbers in common use are of the fixed base type, where a fixed period
is chosen as base and the index number for any given year is calculated by direct
reference to this fixed base period. The fixed base index for any year is not, therefore,
affected by changes in price or quantity in any other year. It is however considered
that the net changes in any given year are the result of gradual changes that have taken
place during the past years. The idea is reflected in “Chain Base Index” numbers.
For the construction of index numbers by the chain base method, using an appropriate
index number formula (say laspeyres formula), it is first necessary to compute index
numbers for all the years, always using the preceding year are base. These are known
as Link Index.
www.gayali.in
Link index = Index number with preceding period as base. For example, using
laspeyres formula,
Σp1q 0
Link index for year 1 (I01)= × 100
Σp 0 q 0
Σp 2 q 1
Link index for year 2 (I12)= × 100
Σp1q1
www.gayali.in
Statistics Made Easy | 268
Σp 3 q 2
Link index for year 3 (I23)= × 100
Σp 2 q 2
Σp 4 q 3
Link index for year 4 (I34)= × 100 etc.
Σp 3 q 3
The link indices I01, I12, I23, I34, ------ are then multiplied successively (called chaining
process) in order to relate them to a common base. The progressive products, expressed
as percentages, give the required index numbers by the chain base method. These are
called Chain Index Number or Chain Base Index Number. Thus, a chain index number
is the product of several index numbers, each calculated with the preceding period as
base.
The chain index numbers with reference to year 0 are (omitting the factor 100 from
each index)
I′01 = I01
I′02 = I01×I12
I′03 = I01×I12×I23
I′04 = I01×I12×I23×I34
www.gayali.in
(Here I′ is used for chain index and I for index of the fixed base type).
The chain index number I′on will not in general be equal to the corresponding fixed
base index number Ion unless the formula employed satisfies the circular test of index
numbers.
Sums
[1] Find the simple Aggregative index number from the following data:-
Commodity Base Price Current Price
Rice 140 180
Sugar 100 300
Oil 400 550
Wheat 125 150
Pulse 160 200
Solution:
Commodity Base Price(Rs.) Current Price(Rs.)
Rice 140 180
Sugar 100 300
www.gayali.in
www.gayali.in
Statistics Made Easy | 269
[2] Find by the weighted aggregative method, the index number of the following
data:-
Commodity Base Price Current Price Weight
Rice 140 180 10
Oil 400 550 7
Sugar 100 250 6
Wheat 125 150 8
Fish 200 300 4
Solution:
Commodity Base Price (P0) Current Price (Pn) Weight (Wi) Pn w P0 w
Rice 140 180 10 1800 1400
Oil 400 550 7 3850 2800
Sugar 100 250 6 1500 600
Wheat 125 150 8 1200 1000
Fish 200 300 4 1200 800
∑Pnw = 9550 ∑P0w = 6600
www.gayali.in
ΣPn w 9550
Weight Aggregative Index (Ion)= × 100 = × 100 = 145
ΣP0 w 6600
[3] Calculate the price index numbers by a). Paasch’s Method, b).Laspeyre’s Method,
c).Bowleys Method, d).Fisher’s ideal formula.
1979 1980
Commodities
Price (Rs.) Quantity (Kgs) Price (Rs.) Quantity (Kgs)
A 20 8 40 6
B 50 10 60 5
C 40 15 50 10
D 20 20 20 15
Solution:
Table : Calculations for Price Index
Commodity P0 Pn q0 qn p0 q0 pn qn p0 qn pn q0
A 20 40 8 6 160 240 120 320
B 50 60 10 5 500 300 250 600
C 40 50 15 10 600 500 400 750
D 20 20 20 15 400 300 300 400
www.gayali.in
ΣPn q 0 2070
[b] Laspeyre’s Index = × 100 = × 100 = 124.7
ΣP0 q 0 1660
www.gayali.in
Statistics Made Easy | 270
1
[c] Bowley’s Index = [ Laspeyre’s Index + Paasche’s Index]
2
1 1
= [125.2+124.7] = × 249.9 = 125
2 2
[d] Fisher’s Ideal Index = Laspeyre’s index × Paasche’s index
= 124.7 × 125.2 = 125
[4] Prepare price index numbers for 1977 with 1975 as base year from the following
data, using (i) Laspeyre’s, (ii) Paasche’s, (iii) Fisher’s method.
Commodity Unit Quantity Price (Rs.) Quantity Price (Rs.)
A Kg. 5 2.00 7 4.50
B Quintal 7 2.50 10 3.20
C Dozen 6 3.00 6 4.50
D Kg. 2 1.00 9 1.80
Solution:
Table : Calculations for Price Index
Commodity P0 Pn q0 qn p 0 q0 p 0 qn pn q0 pn qn
www.gayali.in
A 2.00 4.50 5 7 10.00 14.00 22.50 31.50
B 2.50 3.20 7 10 17.50 25.00 22.40 32.00
C 3.00 4.50 6 6 18.00 18.00 27.00 27.00
D 1.00 1.80 2 9 2.00 9.00 3.60 16.20
Total 47.50 66.00 75.50 106.70
Σp n q 0 75.50
[i] Laspeyre’s Index = × 100 = × 100 =159
Σp 0 q 0 47.50
Σp n q n 106.70
[ii] Paasche’s Index = × 100 = × 100 =162
Σp 0 q n 66.00
[iii] Fisher’s Index = Laspeyre’s index × Paasche’s index = 159 × 162 = 160
[5] Using the data given below, calculate price index numbers for the year 1958 by i)
Laspeyre’s formula, ii) Paasche’s Formula, iii) Fisher’s Formula, with the year 1949 as base :
Price (Rs.) Quantity (1000 kg.)
Commodity
1949 1958 1949 1958
Rice 9.3 4.5 100 90
Wheat 6.4 3.7 11 10
Pulses 5.1 2.7 5 3
State with reasons one advantage of Laspeyre’s index over the Paasche’s index in
case revisions of an index number are to be made from year to year.
www.gayali.in
Solution:
Table: Calculations for Index Number
Commodity p0 pn q0 qn p0 q0 p0 qn pn q0 pn qn
Rice 9.3 4.5 100 90 930 837.0 450 405
Wheat 6.4 3.7 11 10 70.4 64.0 40.7 37
Pulses 5.1 2.7 5 3 25.5 15.3 13.5 8.1
Total 1025.9 916.3 504.2 450.1
www.gayali.in
Statistics Made Easy | 271
Σp n q 0 504.2
[i] Laspeyre’s Index = × 100 = × 100 = 49.15
Σp 0 q 0 1029.9
Σp n q n 450.1
[ii] Paasche’s Index = × 100 = × 100 = 49.12
Σp 0 q n 916.3
[iii] Fisher’s Index = Laspeyre’s index × Paasche’s index
= 49.15 × 49.12 = 49.13
[6] Given the following data, calculate price index numbers by i) Laspeyre’s Formula,
and ii) Fisher’s Formula with 1927 as base:
Rice Wheat Jowar
Year
Price Qty Price Qty Price Qty
1927 9.3 100 6.4 11 5.1 5
1934 4.5 90 3.7 10 2.7 3
Solution:
www.gayali.in
Table: Calculations for Index Number
Commodity p0 pn q0 qn p0 q0 p0 qn pn q0 pn qn
Rice 9.3 4.5 100 90 930 837 450 405
Wheat 6.4 3.7 11 10 70.4 64 40.7 37
Jowar 5.1 2.7 5 3 25.5 15.3 13.5 8.1
Total 1025.9 916.3 504.2 450.1
Σp n q 0 504.2
[i] Laspeyre’s Index = × 100 = × 100 = 49.15
Σp 0 q 0 1025.9
Σp n q n 450.1
[ii] Paasche’s Index = × 100 = × 100 = 49.12
Σp 0 q n 916.3
[iii] Fisher’s Index = Laspeyre’s index × Paasche’s index
= 49.15 × 49.12 = 49.13
[7] Calculate the price index number for 1940 with 1937 as base year by the
aggregative method, using (a) Base year quantities as weight, and (b) given year
quantities as weights, from the following data:
www.gayali.in
1937 1940
Commodity
Quantity (‘000 tons) Price per ton (Rs.) Quantity (‘000 tons) Price per ton(Rs.)
A 350 100 400 120
B 200 130 180 200
C 140 50 200 110
D 80 125 100 140
www.gayali.in
Statistics Made Easy | 272
Solution:
Table: Calculations for Index Numbers
Commodity p0 pn q0 qn p0 q0 p0 qn pn q0 p n qn
A 100 120 350 400 35,000 40,000 42,000 48,000
B 130 200 200 180 26,000 23,400 40,000 36,000
C 50 110 140 200 7,000 10,000 15,400 22,000
D 125 140 80 100 10,000 12,500 11,200 14,000
Total 78,000 85,900 1,08,600 1,20,000
Σp n q 0 1, 08, 600
[A] Price Index = × 100 = × 100 = 139.2
Σp 0 q 0 78, 000
Σp n q n 1, 20, 000
[B] Price Index = × 100 = × 100 = 139.7
Σp 0 q n 85, 900
[8] The following table gives the change in the price and consumption of three
commodities in the workers consumption basket. Compute Fisher’s ideal index
number from the data given in the table:
1950 1960
Commodity
Price (Rs.) Consumption (units) Price (Rs.) Consumption (units)
www.gayali.in
Wheat 100 10 110 6
Rice 150 15 170 18
Cloth 5 50 4 30
Solution:
Table: Calculations for Index Numbers
Commodity p0 pn q0 qn p0 q0 p0 qn pn q0 pn qn
Wheat 100 110 10 6 1000 600 1100 660
Rice 150 170 15 18 2250 2700 2550 3060
Cloth 5 4 50 30 250 150 200 120
Total 3500 3450 3850 3840
Σp n q 0 3850
[I] Laspeyre’s Index = × 100 = × 100 = 110
Σp 0 q 0 3500
Σp n q n 3840
[II] Paasche’s Index = × 100 = × 100 = 111.30
Σp 0 q n 3450
www.gayali.in
Statistics Made Easy | 273
Solution:
Table: Calculations for Price Index Number
Commodity p0 pn q0 qn p0 q0 p0 q0 p0 q0 p0 q0
a 4.3 5.2 20 16 86.00 68.8 104 83.2
b 2.1 3.9 5 4 10.50 8.4 19.5 15.6
c 0.8 1.6 11 8 8.80 6.4 17.6 12.8
d 3.2 4.8 8 6 25.60 19.2 38.4 28.8
Total 130.90 102.8 179.5 140.4
ΣPn q 0 179.5
Laspeyre’s Index = × 100 = × 100 = 137.13
ΣP0 q 0 130.90
ΣPn q n 140.4
Paasche’s Index = × 100 = × 100 = 136.58
ΣP0 q n 102.8
www.gayali.in
Commodity Base Price Current Price
Rice 140 180
Sugar 100 300
Oil 400 550
Wheat 125 150
Pulse 160 200
Solution:
Table: Calculations for Index Numbers
Commodity P0 Pn Pn
Price Relative = P ×100
o
Rice 140 180 180/140 × 100 = 128.6
Sugar 100 300 300/100 × 100 = 300.0
Oil 400 550 550/400 × 100 = 137.5
Wheat 125 150 150/125 × 100 = 120.0
Pulse 160 200 200/160 × 100 = 125.0
Total 811.10
Pn
Σ × 100
Σ(Pr ice Re latives) P0
Simple Arithmetic Mean of Price Relatives Index= =
www.gayali.in
n n
811.10
= = 162
4
[11] Find the index number from the following data by the method of Relatives (use A.M.):
Commodity Rice Wheat Fish Potato Coal Pulse
Base Price 30 22 54 20 15 4
Current Price 33 25 64 23 16 5
www.gayali.in
Statistics Made Easy | 274
Solution:
Table: Calculations for Index Numbers
Commodity p0 pn Price Relative =
pn
×100
po
33
Rice 30 33 × 100 = 110.00
30
25
Wheat 22 25 × 100 = 113.64
22
64
Fish 54 64 × 100 = 118.52
54
23
Potato 20 23 × 100 = 115.00
20
16
Coal 15 16 × 100 = 106.67
15
5
Pulse 4 5 × 100 = 125.00
4
www.gayali.in
Total = 688.83
Σ(Pr ice Re latives) 688.83
Simple A.M. of Price Relative Index = = = 115
n 6
where n = number of items
[12] Calculate a suitable index number from the data given below:
Commodity Price Relative Weight
A 125 5
B 67 2
C 250 3
Solution:
Commodity Price Relative (I) Weight (W) IW
A 125 5 625
B 67 2 134
C 250 3 750
Total - 10 1509
ΣIW 1509
Weighted A.M of Price Relative Index = = = 150.9
ΣW 10
[13] Find by Arithmetic Mean method the index number from the following:
www.gayali.in
www.gayali.in
Statistics Made Easy | 275
Solution:
Commodity Base Price(P0) Current Price(Pn) Weight (W) Price Relative (I) IW
52
Rice 30 52 8 × 100 = 173.33 1386.64
30
30
Wheat 25 30 6 × 100 = 120.00 720.00
25
150
Fish 130 150 3 × 100 = 115.38 346.14
130
49
Potato 35 49 5 × 100 = 140.00 700.00
35
105
Oil 70 105 7 × 100 = 150.00 1050.00
70
Total - - 29 - 4202.78
ΣIW 4202.78
Weighted A.M of Price Relative Index = = = 145
ΣW 29
[14] The price quotations of 4 different commodities for 1951 and 1965 are given
www.gayali.in
below. Calculate the index number for 1965 with 1951 as base, by using i). Simple
Average of price relatives, ii). The Weighted Average of price relatives.
Price
Commodity Unit Weight (Rs.1000)
1951 1965
A Seer 5 2.00 4.50
B Mound 7 2.50 3.20
C Dozen 6 3.00 4.50
D Seer 2 1.00 1.80
Solution: Table: Calculations for Index Numbers
n P
Commodity Base price Current Price Weight Price Relatives P × 100 = I IW
0
4.50
A 2.00 4.50 5 × 100 = 225 1125
2.00
3.20
B 2.50 3.20 7 × 100 = 128 896
2.50
4.50
C 3.00 4.50 6 × 100 = 150 900
3.00
1.80
D 1.00 1.80 2 × 100 = 180 360
1.00
Total 20 683 3281
www.gayali.in
683
Simple Average of Price Relative = = 171
4
3281
Weighted A.M of Price Relative Index = = 164
20
[15] An index number of wholesale prices, based on the simple arithmetic mean
of price relatives comprises 40 items. They are divided into seven groups. A separate
www.gayali.in
Statistics Made Easy | 276
index is published for each group. Find the index number for all the group combined
for 1968, from the following data:
Group A B C D E F G
No. of items 10 5 8 4 3 4 6
Group Index for 1968 120 95 115 142 86 100 105
Solution:
Table: Calculations for Index Number
Group Index (I) Weight (W) IW
A 120 10 1200
B 95 5 475
C 115 8 920
D 142 4 568
E 86 33 258
F 100 4 400
G 105 6 630
Total - 40 4451
www.gayali.in
ΣIW 4451
Index Number of the groups = = = 111
ΣW 40
[16] In 1976 the average price of a commodity was 20% more than in 1975, but 20%
less than in 1974; and moreover it was 50% more than in 1977. Reduce the data to price
relatives using 1975 as base (1975 Price Relative = 100).
Solution:
Assume 1976 price to be 100;
100 100 100
Then 1975 price is 100 × , 1974 price is 100 × and 1977 price is 100 ×
120 80 150
If price relative for 1975 is taken as 100
100 × 100 120
Price relatives for 1974 = × = 150
80 100
100 × 120
Price relatives for 1976 = = 120
100
100 × 100 120
Price relatives for 1977 = × = 80
150 100
∴ The price relatives are 150,100, 120, 80 for 1974-1977.
www.gayali.in
[17] Using Paasche's formula, compute the quantity index and price index numbers
for 1970 with 1966 as base year.
Quantity Units Value (Rs.)
Commodity
1966 1970 1966 1970
A. 100 150 500 900
B. 80 100 320 500
C. 60 72 150 360
D. 30 33 360 297
www.gayali.in
Statistics Made Easy | 277
Solution :
Table : Calculations for Quantity Index Number
Commodity q0 p0 qn pn q0p0 q0pn qnp0 qnpn
A. 100 5 150 6 500 600 750 900
B. 80 4 100 5 320 400 400 500
C. 60 2.5 72 5 150 300 180 360
D. 30 12 33 9 360 270 396 297
Total 1330 1570 1726 2057
Σq n p n 2057
Paasche's Quantity Index = × 100 = × 100 = 131
Σq 0 p n 1570
Σp n q n 2057
Paasche's Price Index = × 100 = × 100 = 119
Σp 0 q n 1726
[18] Using Fisher’s ‘Ideal’ formula, calculate the quantity index number from the
following data:
Base year
Commodity Base year Price(Rs.) Quantity(Kg.) Current year Current year
Price(Rs.) Quantity (Kg.)
A 5 50 10 56
www.gayali.in
B 3 100 4 120
C 4 60 6 60
D 11 30 14 24
E 7 40 10 36
Solution:
Table: Calculations for Quantity Index Number
Commodity q0 p0 qn pn q0 p0 q0 pn qn p0 qn pn
A 50 5 56 10 250 500 280 560
B 100 3 120 4 300 400 360 480
C 60 4 60 6 240 360 240 360
D 30 11 24 14 330 420 264 336
E 40 7 36 10 280 400 252 360
Total - - - - 1400 2080 1396 2096
Σq n P0 1396
Laspeyre’s Quantity Index= × 100 = × 100 = 99.71
Σq 0 P0 1400
Σq n Pn 2096
Paasche’s Quantity Index = × 100 = × 100 = 100.77
Σq 0 Pn 2080
Fisher’s Ideal Index= Laspeyre’s index × Paasche’s index = 99.71 × 100.77 = 100.2
www.gayali.in
www.gayali.in
Statistics Made Easy | 278
Solution:
Table: Calculations for Index Numbers
Commodity q0 qn W Quantity Relatives (I) IW
200
A 160 200 13 × 100 = 125 1625
160
12
B 10 12 21 × 100 = 120 2520
10
100
C 80 100 35 × 100 = 125 4375
80
Total 69 370 8520
370
Arithmetic mean of quantity relatives = = 123.3
3
8520
Weighted arithmetic mean of quantity relatives = = 123.5
69
[20] In 1970 the price of a commodity increased by 50% over that in 1952 while the
production of the quantity decreased by 30%. By what percentage did the total rupee
www.gayali.in
value of the commodity in 1970 increase or decrease with respect to the 1952 value?
Solution:
Let, Base year Price (P0 ) = 100
Current year Price (Pn ) = 100 × 150% =150
Base year Quantity (qo) = 100
Current year Quantity (qn) = 100 × 70% = 70
150 70
∴ Value Ratio = × = 1.50 × 70 = 1.05
100 100
∴ Total rupee value of the commodity increased by 5%.
[21] Using the following data, show that Laspeyres price index formula does not
satisfy the time reversal test:
Commodity Base Year Price Base Year Quantity Current Year Price BaseYear Quantity
A 6 50 10 56
B 2 100 2 120
C 4 60 6 60
D 10 30 12 24
E 8 40 12 36
Solution:
www.gayali.in
Using Laspeyres Price Index formula and omitting the factor 100.
ΣP q
Index number for current year with base year (o) ( Ion ) = n 0
ΣP0 q 0
Interchanging the suffixes o and n
Index number for Base year with current year
ΣP0 q n
In0=
ΣPn q n
www.gayali.in
Statistics Made Easy | 279
www.gayali.in
Men’s £7 36 £10 48
Women’s 5 50 9 80
Children’s 4 18 6 26
Solution:
Σp q
The factor reversal test may be represented in symbols as Pon.Qon=Value Ration= n n
Σp 0 q 0
Table: Calculations for Factor Reversal Test
Commodity P0 Pn q0 qn P0 q0 P0 qn Pn q0 Pn qn
Men’s 7 10 36 48 252 336 360 480
Women’s 5 9 50 80 250 400 450 720
Children’s 4 6 18 26 72 104 108 156
Total - - - - 574 840 918 1356
Using Laspeyres Price Index and omitting the factor 100.
∑ Pn q 0 918
Price Index = (Pon)= =
∑ P0 q 0 574
Interchanging P and q,
∑ q n P0 840
Quantity Index (Qon)= =
∑ q 0 P0 574
∑P q 1356
www.gayali.in
n n
Value Ratio = ∑ P q = 574 = 2.36
0 0
918 840
Pon.Qon = × = 2.34
574 574
∴Pon.Qon ≠ Value Ratio
This shoes Laspeyres Index formula does not satisfy Factor Reversal Test.
[23] Prove using the following data that the factor reversal test and time reversal test
www.gayali.in
Statistics Made Easy | 280
∑P q
n n 480
Paasche’s Index = ∑ P q = 192
0 n
www.gayali.in
Fisher’s Index = Laspeyre’s index × Paasche’s index
600 480
= × ------- (i)
240 192
Interchanging the suffixes 0 and n in the above formula, Price Index Number for
year 1949 with base 1959.
∑P q
0 n 192
Laspeyre’s Index = ∑ P q = 480
n n
∑P q
0 0 240
Paasche’s Index = ∑ P q = 600
n 0
Using Fisher’s formula we find that Ion×Ino=1. This verifies that Fisher’s formula
satisfies Time Reversal Test.
The factor reversal test may be represented in symbols as Pon.Qon=Value Ratio
Using Fisher’s Ideal Formula and Omitting the factor.
∑P q
n 0 ∑P q
n n 600 480
Price Index (Pon) = ∑ P q × ∑ P q = ×
0 0 0 n 240 192
www.gayali.in
Statistics Made Easy | 281
Interchanging P and q,
∑q P
n 0 ∑q P
n n 192 480
Quantity Index = ∑q P × ∑q P = ×
0 0 0 n 240 600
∑P q
n n 480
Value Ration = ∑ P q = 240
0 0
www.gayali.in
C 4 60 6 60
D 10 30 12 24
E 8 40 12 36
Solution:
Table: Calculations for Time and Factor Reversal Test
Commodity P0 q0 Pn qn P0 P0 P0 qn Pn q0 Pn Pn
A 2 50 10 56 100 112 500 560
B 2 100 2 120 200 240 200 240
C 4 60 6 60 240 240 360 360
D 10 30 12 24 300 240 360 288
E 8 40 12 36 320 288 480 432
Total - - - - 1160 1120 1900 1890
[I] Price Index Number for the 1972 with base 1970 omitting factor 100.
∑P q
n 0 1900
Laspeyre’s Index = ∑ P q = 1160
0 0
∑P q
n n 1890
Paasche’s Index = ∑ P q = 1120
0 n
1900 1890
= × -------- i)
1160 1120
[II] Interchanging the suffixes 0 and n in the above formula, Price Index Number for
year 1970 with base 1972.
∑P q
0 n 1120
Laspeyre’s Index = ∑ P q = 1890
n n
www.gayali.in
Statistics Made Easy | 282
∑P q
0 0 1160
Paasche’s Index = ∑ P q = 1900
n 0
Using Fisher’s formula we find that Ion×Ino=1. This verifies that Fisher’s formula
satisfies Time Reversal Test.
www.gayali.in
Using Fisher’s Ideal Formula and Omitting the factor 100.
∑ Pn q 0 ∑ Pn q n 1900 1890
Price Index (Pon) = ×
∑ P0 q 0 ∑ P0 q n =
×
1160 1120
Interchanging P and q,
∑ q n P0 ∑ q n Pn 1120 1890
Quantity Index = ×
∑ q 0 P0 ∑ q 0 Pn =
×
1160 1900
∑P q
n n 1890
Value Ratio = ∑ P q = 1160
0 0
This shows that Fisher’s Ideal Index formula satisfies Factor Reversal Test.
[25] Calculate a simple price index number for the year 1934 from the following data and
verify numerically whether the formula employed satisfies the appropriate test or not.
Commodity A B C D E
www.gayali.in
1927 6 2 4 10 8
Price (Rs)
1934 10 2 6 12 12
Solution:
www.gayali.in
Statistics Made Easy | 283
Commodity P0 Pn Pn
Price Relative = = × 100
Po
10
A 6 10 × 100 = 166.67
6
2
B 2 2 × 100 = 100.00
2
6
C 4 6 × 100 = 150.00
4
12
D 10 12 × 100 = 120.00
10
12
E 8 12 × 100 = 150.00
8
Total 30 42 686.67
686.67
www.gayali.in
Simple A.M of Price Relative Index = = 137.33
5
42
Simple Aggregative Index = × 100 = 140
30
Verifying Time Reversal Test: i.e Ion × Ino = 1
P 42 7
Ion = ∑ n = =
P0 30 5
P 30 5
Ino = ∑ 0 = =
Pn 42 7
7 5
Ion × Ino = × = 1
5 7
∴ It satisfies Time Reversal Test.
[26] Find index numbers for the years 1961, 1962, 1963 by the chain base method,
with base-year 1960. From the following table:
Year 1960 1961 1962 1963
Link Index 100 110 95.5 109.5
www.gayali.in
Solution:
Table: Calculations for chain Base Index
Year Link Index Chain Index (Base 1960=100)
Y0 = 1960 100 100
Y1 = 1961 I01 = 110 I′01 = 100 ×1.10 = 110
Y2 = 1962 I12 = 95.5 I′12 = 100 × (1.10 × .955) = 105.05 = 105
Y3 = 1963 I23 = 109.5 I′23 = 100 ×(1.10×0.955×1.095) = 115
www.gayali.in
Statistics Made Easy | 284
Ans: Y1 = 110
Y2 = 105
Y3 = 115
[27] Compute chain index numbers with 1970 prices as base, from the following
table giving the average wholesale prices for the years 1970-74.
Average Wholesale Price (Rs.)
Commodity
1970 1971 1972 1973 1974
A 20 16 28 35 21
B 25 30 24 36 45
C 20 25 30 24 30
Link Index for 1971 (Ion) Link Index for 1972 (I12)
16 28
× 100 = 80 × 100 = 175
20 16
30 24
× 100 = 120 × 100 = 80
25 30
25 30
× 100 = 125 × 100 = 120
20 25
_________________ _________________
www.gayali.in
Total = 325 ÷ 3 = 108 Total = 375 ÷ 3 = 125
Link Index for 1973 (I23) Link Index for 1974 (I34)
35 21
× 100 = 125 × 100 = 60
28 35
36 45
× 100 = 150 × 100 = 125
24 36
24 30
× 100 = 80 × 100 = 125
30 24
___________________ ___________________
Total = 355 ÷ 3 = 118.3 Total = 310 ÷ 3 = 103.3
Table: Calculations for Chain Index
Year Link Index Chain Index (Base 1970=100)
Y0= 1970 100
Y1 = 1971 I01 = 108 I′01 = 100 × 1.08 = 108
Y2 = 1972 I12 = 125 I′12 = 100 × 1.08 × 1.25 = 135
Y3 = 1973 I23 = 118.3 I′23 = 100 × 1.08 ×1.25 × 1.183 = 160
Y4 = 1974 I34 = 103.3 I′34 = 100 × 1.08 ×1.25 × 1.183 × 1.033 = 166
Ans: 100, 108, 135, 160, 166 (Using A.M of relatives)
www.gayali.in
[28] From the table of group index numbers and group expenditures given below
calculate the cost of living index number:
Group Index Number Percentage of Total Expenditure
Food 428 45
Clothing 250 15
Fuel & Light 220 8
House Rent 125 20
Others 175 12
www.gayali.in
Statistics Made Easy | 285
Solution:
Table: Calculations for Cost of Living Index
Group Group Index (I) Weight (W) IW
Food 428 45 19260
Clothing 250 15 3750
Fuel & Light 220 8 1760
House Rent 125 20 2500
Other 175 12 2100
Total - 100 29370
∑ IW 29370
Cost of Living Index = ∑ W = 100 = 293.70
[29] The following are the group index numbers and corresponding group weights of
an average working class family’s budget. Construct the cost of living index number.
Group Food Fuel & Lighting Clothing Rent Miscellaneous
www.gayali.in
Index No. 352 220 230 160 190
Weight 48 10 8 12 15
Solution:
Table: Calculations for Cost of Living Index Number
Group Index No. (I) Weight (W) WI
Food 352 48 16896
Fuel & Lighting 220 10 2200
Clothing 230 8 1840
Rent 160 12 1920
Miscellaneous 190 15 2850
Total 93 25706
∑ WI 25706
Cost of Living Index = ∑ W = 93 = 276.41
[30] The following table given group index numbers and corresponding group
weights with regard to cost of Living for a given year. Construct the overall cost of
living index for the year.
www.gayali.in
www.gayali.in
Statistics Made Easy | 286
www.gayali.in
i) Cost of Living Index will be changed in the same ratio.
ii) Cost of Living Index will be changed in the same ratio as group index numbers.
iii) Cost of Living index will be increased by 10.
[31] The percentage increase in price in 1971 over 1960 in the following groups for
middle class people in calcutta and the percentage of total expenditure spent on those
groups are shown below. Calculate the cost of Living index number for 1971 with 1960
as base.
Group Percentage Increase in price Percentage of total expenditure
Food 125 45
Clothing 66 6
Fuel & Lighting 112 5
House Rent& Tax 90 10
Miscellaneous 105 34
Solution:
Table: Calculations for cost of Living Index
P n
Percentage of total IW
Group Percentage increase in price (I) P × 100 expenditure (W)
www.gayali.in
0
Food 125 45 5625
Clothing 66 6 396
Fuel & Lighting 112 5 560
Rent 90 10 900
Miscellaneous 105 34 3570
Total 100 11051
www.gayali.in
Statistics Made Easy | 287
11051
Price increase over 1960 = = 110.51
100
Cost of Living Index = 100 + 110.51 = 210.51
[32] Determine the relative importance for the food group given that the cost of
living index number for 1975 with 1970 as base in 175 from the following figures:
Group % increase in expenditure Weight
Food 65 -
Clothing 90 12
Fuel etc. 20 18
Miscellaneous 70 10
Rent etc. 150 20
Solution:
Let the weight of food group be x.
Table: Calculations for Cost of Living Index
Group Index (I) Weight (W) IW
www.gayali.in
Food 165 x 165x
Clothing 190 12 2280
Fuel etc. 120 18 2160
Miscellaneous 170 10 1700
Rent etc. 250 20 5000
Total - 60 + x 11140 + 165x
As per given condition,
11140 + 165x
175 = or, 10500 + 175x = 11140 + 165x or, 10x = 640 ∴x = 64
60 + x
∴ The relative importance of food group is 64.
[33] The group indices and the corresponding weights for the working class cost of
living index numbers in an industrial city for the years 1976 and 1980 are given below:
Group index
Group Weight
1976 1980
Food 71 370 380
Clothing 3 423 504
www.gayali.in
www.gayali.in
Statistics Made Easy | 288
Solution:
Table: Calculations for cost of Living Index
Group Weight (W) Group index for 1976 (I1) Group index for 1980 (I2) I1 W I2 W
Food 71 370 380 26270 26980
Clothing 3 423 504 1269 1512
Fuel etc. 9 469 336 4221 3024
House Rent 7 110 116 770 812
Miscellaneous 10 279 283 2790 2830
Total 100 - - 35320 35158
35320
Cost of living index for 1976 = = 353.20
100
35158
Cost of living index for 1980 = = 351.58
100
since the index number for 1980 is smaller, no extra allowance need to be given.
[34] An enquiry into the budgets of the middle class families of a certain city revealed
that on an average, the percentage expended on the different groups were- Food 45,
Rent 15, Clothing 12, Fuel, Light and miscellaneous 20. The group index numbers for
the current year as compered with a fixed base period were respectively 410, 150, 343,
www.gayali.in
248 and 285. Calculate the consumer price index number for the current year. Mr.X
was getting Rs. 240 in the base period and Rs. 430 in the current year. State now much
he ought to have received as extra allowance to maintain his former standard of living.
Solution:
Table: Calculations for Cost of Living Index
Group Percentage Expenses (W) Group Index (I) IW
Food 45 410 18450
Rent 15 150 2250
Clothing 12 343 4116
Fuel and Light 8 248 1984
Miscellaneous 20 285 5700
Total 100 - 32500
32500
Cost Living Index = = 325
100
When base period index is 100 current period index is 325
325
Base period index is 1 current period indexes is
100
www.gayali.in
325
Base period index is 240 current period index is × 240 = 780
100
∴ He has to receive a sum of Rs.(780-430) = 350 to maintain his former standard of living.
[35] During a certain period the cost of living index number goes up from 110 to 200
and the salary of a worker is also raised from Rs. 325 to Rs. 500. Does the worker really
gain and if so, by how much in real terms?
www.gayali.in
Statistics Made Easy | 289
Solution:
Let the years be Y1 and Y2
Actual Wages
Year Cost of living index Wages Real wages= ×100
CLI
Y1 110 325 325
× 100 = 295
100
Y2 200 500 500
× 100 = 250
200
Hence real wage in the year Y2 has fallen by Rs. (295-250) = Rs.45.
[36] The average weekly wages for all manufacture industries for a number of months
in 1960 are Rs. 78.52, 79.71, 78.55, 78.17, 78.99, the corresponding consumer price
index numbers are 115, 116, 118, 117, 120. Find the real wages for the different months
and calculate the percentage change in the real wages during the period.
Solution:
www.gayali.in
Table: Calculations for Real Wages
Months in Weekly wages Consumer Price Index Actual Wages
1960 (Rs.) (Base =100) Real Wage= ×100
CPI
1 78.52 115 68.28
2 79.71 116 68.72
3 78.55 118 66.57
4 78.17 117 66.81
5 78.99 120 65.82
78.52
Note: Real wage for month 1 = × 100 = 68.28
115
79.71
2= × 100 = 68.72
116
Change in real wage from month 1 & month 5 = Rs. (68.28-65.82) = Rs. 2.46
2.46
% change in real wage during the period = × 100 = 3.6%
68.25
www.gayali.in
www.gayali.in
Statistics Made Easy | 290
Solution:
Table: Calculations for Real Wages
www.gayali.in
230 200
1972 250 380 380 152
× 100 = 152.00 × 100 = 76
250 200
1973 250 400 400 160
× 100 = 160.00 × 100 = 80
250 200
Real Wage Index is: 100, 80, 88, 82, 78, 76 and 80 (base 1967=100)
[38] Given the following table, calculate the real wage rates and the purchasing power
of the rupee for the years 1947-1954, taking 1947 as the base year:
Year 1947 1948 1949 1950 1951 1952 1953 1954
Wage rate per day (Rs.) 1.19 1.33 1.44 1.57 1.75 1.84 1.89 1.94
Consumer price index 95.5 102.8 101.8 102.8 111.0 114.4 114.4 114.8
(1947-49=100)
Solution:
Table: Calculations for Real Wages & Purchasing Power of Rupee
Year Consumer Price Actual Col.3 Purchasing Power of
Index number (Base Wage (Rs. Real Wages = × 100 Rupee for 1947-54
1947=100) per day) Col.2
1 2 3 4 5
www.gayali.in
www.gayali.in
Statistics Made Easy | 291
Real Wages Rates are: 1.25, 1.29, 1.41, 1.53, 1.58, 1.62, 1.65, 1.69
Purchasing Power of Rupee: 1.00, 0.93, 0.94, 0.93, 0.86, 0.84, 0.83, 0.83
[39] Given below are the average wages in rupees per hour of unskilled workers of
a factory during the years 1975-1980. Also shown in consumer price index for these
years(taking 1975 as base year with Price Index 100). Determine the real wages of the
workers during 1975-1980 compared with their wages in 1975.
Year 1975 1976 1977 1978 1979 1980
Consumer Price Index 100 120.2 121.7 125.9 129.3 140
Average Wage(Rs/ hours) 1.19 1.94 2.13 2.28 2.45 3.10
How much is the worth of one rupee of 1975 in subsequent year?
Solution:
Table: Calculations for Real Wages & Purchasing Power of Money
Year Consumer Actual Col.3 Purchasing Power
Price Index Wage Real Wages = × 100 of Money(consumer
number (Rs./hour) Col.2 price index ÷ 100)
1 2 3 4 5
www.gayali.in
1975 100 1.19 1.19 1.00
1976 120.2 1.94 1.94 1.20
× 100 = 1.61
120.2
1977 121.7 2.13 2.13 1.22
× 100 = 1.75
121.7
1978 125.9 2.28 2.28 1.26
× 100 = 1.81
125.9
1979 129.3 2.45 2.45 1.29
× 100 = 1.89
129.3
1980 140 3.10 3.10 1.4
× 100 = 2.21
140
100
Note: Purchasing power of money =
Price Index Number
(With references to base period)
Index Number for subsequent years 120.2
Purchasing power of money = = = 1. 2
Index number in the base year 100
www.gayali.in
www.gayali.in
Statistics Made Easy | 292
Solution:
Table: Base shifting from 1969 to 1975
Year Index Number (Base 1969 = 100) Index Number (Base 1975 = 100)
1969 100 100
× 100 = 33
300
120
1970 120 × 100 = 40
300
180
1971 180 × 100 = 60
300
207
1972 207 × 100 = 69
300
243
1973 243 × 100 = 81
300
270
1974 270 × 100 = 90
300
1975 300 300
× 100 = 100
300
1976 360 360
× 100 = 120
300
www.gayali.in
400
1977 400 × 100 = 133
300
1978 420 420
× 100 = 140
300
Base shifted to 1975 = 100
Index Numbers are: 33, 40, 60, 69, 81, 90, 100, 120, 133, 140.
[41] The following table shows the index number of wholesale prices in India
(Revised Series) with base 1970-71 = 100:-
Year 1971 1973 1974 1975 1976 1977
Index Number 105 132 169 176 172 185
Find the index numbers for these years with base 1973 = 100.
Solution:
Table: Index Numbers with base shifted to 1973
Year Index Numbers Index Numbers (1973 = 100)
1971 105 105
× 100 = 80
132
1973 132 100
1974 169 169
× 100 = 128
132
www.gayali.in
www.gayali.in
Statistics Made Easy | 293
[42] An index number is at 100 in 1971. It rises 10% in 1972, falls 4% in 1973, falls
2% in 1974, and again rises 10% in 1975, over the preceding year. Calculate the index
number for the 5 years with 1974 as base.
Solution:
110 110
1972 × 100 = 110 × 100 = 106.3
100 103.5
96 105.6
1973 × 110 = 105.6 × 100 = 102.0
100 103.5
www.gayali.in
98 103.5
1974 × 105.6 = 103.5 × 100 = 100
100 103.5
110 113.9
1975 × 103.5 = 113.9 × 100 = 110
100 103.5
[43] Given below are two series of index numbers, one with 1961 as base and the
other with 1970 as base:
The index number series (a) was discontinued in 1971. Splice the series (a) to
the series (b) with 1970 as base.
www.gayali.in
Statistics Made Easy | 294
Solution:
Table: Splicing Two Series of Index Numbers
Year ‘a’ series index (1961=100) ‘b’ series index (1970=100) New continuous series index (1970=100)
1 2 3 4
180
1965 180 × 100 = 72
250
192
1966 192 × 100 = 76.8
250
208
1967 208 × 100 = 83.2
250
220
1968 220 × 100 = 88
250
232
1969 232 × 100 = 92.8
250
1970 250 100 100
1971 108 108
1972 112 112
1973 125 125
1974 130 130
www.gayali.in
1975 150 150
Ans: 72, 76.8, 83.2, 88, 92.8, 100, 108, 112, 125, 130, 150
[44] In 1950, a statistical Bureau started constructing an index number series with 1950 as base.
Year 1950 (Base) 1956 1960
Index 100 140 200
In 1961, the Bureau reconstructed the index number series on a plan with base 1960.
Year 1960 (Base) 1965 1970
Index 100 150 210
In 1971 the Bureau again reconstructed the series on yet another plan with base year 1970.
Year 1970 (Base) 1975 1981
Index 100 180 240
Obtain a Continuous series with base 1970, by splicing the three series.
Solution:
Tbale: Splicing of Three Series of Index Numbers
Year A –(1950=100)
Series index B – Series index C – series index New Continuous C – Series index
(1960=100) (1970=100) (1960=100) (1970=100)
1 2 3 4 5 6
100 50
1950 100 200
× 100 = 50
210
× 100 = 23.8
140 70
1956 140 × 100 = 70 × 100 = 33.3
www.gayali.in
200 210
200 100
1960 200 100 200
× 100 = 100
210
× 100 = 47.6
150
1965 150 210
× 100 = 71.4
www.gayali.in
Statistics Made Easy | 295
Year Cost of Living Index Actual Per Capita Real Income (Rs.)
(1965=100) Income (Rs.)
1 2 3 4
1965 100 65 65.00
1966 110 70 70
× 100 = 63.64
110
1967 120 75 75
www.gayali.in
× 100 = 62.50
120
1968 130 80 80
× 100 = 61.54
130
1969 150 90 90
× 100 = 60.00
150
1970 200 100 100
× 100 = 50.00
200
1971 250 110 110
× 100 = 44.00
250
1972 350 130 130
× 100 = 37.14
350
Comments: It is observed from column (2) that although actual income has
gradually increased from Rs. 65 in 1965 to double i.e. 130 in 1972, the “real income”
has considerably gone down. This indicates that people of the particular category have
been hard hit by the substantial rise in the cost of living index.
www.gayali.in
&