Professional Documents
Culture Documents
Statistics For Economics For Class 11 N M Shah
Statistics For Economics For Class 11 N M Shah
Statistics For Economics For Class 11 N M Shah
Periodic updating and review accommodate new knowledge as well as adds freshness even as it allows
for continuity. This third revised edition is an effort to that end.
Statistics for Economics for class XI in its revised format brings forth the changed mode by the Central
Board of Secondary Education, New Delhi in 2005-06. The revision includes sufficient exercises keeping
in mind learning tools of Statistics in the context of the study of Economics.
This volume incorporates extensive and colourful diagrams and illustrations to enhance a better and
friendlier understanding of concepts of Statistics. Answer to the numerical questions in the exercise of
unit 3 are also provided so that the students can verify the solutions.
It is hoped that this revised edition will be of great help to both teachers and students.
— N.M. SHAH
PREFACE
Stamtics has become an anchor for social, economic and scientific studies Stat,st,cal methods are widely
used in several disciplines, be i, planning bul ' management, psephology (study of voting patterns),
psycholo^ oT adve Sn;
steps. A hst of formulae has been provided at the end of each chapter of unit 3 and rJr rTT —•
^ong years of teLh ng thrsubta
I have to acknowledge that in the wriring of this volume I have got immense help rom my frtends and
relation. The pubUshers have been very cooperative aTrelp ll
me and looktng after the household, has done ind.spensable work for the volume Z
M.M. Shah, himself a scholar and teacher of economics, and who retired as dean of acuity of Commerce,
Nagpur University and Prmc.pal, G.S. College of clt rct
—N.M. SHAH
SYLLABUS
One Paper
3 Hrs.
100 marks
104 Periods/50 xMarks 5 Periods/3 Marks 25 Periods/12 Marks 64 Periods/30 Marks 10 Periods/5 Marks
PART-A
1. Introduction
Unit 1: Introduction
What is Economics? ^^
SoToflot^tm^^^^^^^ W imrr^ -«
Tu . J ■ Periods
of the examples of the projects are as follows (they are not mandatory but suggestivT) (t) A report on
demographic amongst households; suggestive).
u1
CONTENTS
UNIT 1 : Introduction
1. What is Economics
4. Organisation of Data
Presentation of Data
5. Tabular Presentation
6. Diagrammatic Presentation
7. Graphic Presentation
1 10
22 52
76 87 108
394
WHAT IS STATISTICS?
S:T:A:T:I:S:T:I:C:S:
Scientific Methodology
Theory of Figures
Aggregate of Facts
Investigation
Systematic Collection
Interpretation
Comparison
Systematic Presentation
tc fc ea dc in lif wl his
in (
UNIT 1 —
HmODUCTIOr^
■■
Chapter 1
what is economics?
1- Introduction
2. Activity
3. Definition of Economics
4. Nature of Economics
introduction
If each of us possessed 'Aladdin's magic lamp, which we had merely to rub in order to get our desires
fulfilled immediately, there would be no economic problem and no need for a science of economics. In
real life we are not lucky as Aladdin, we have to work to earn our livelihood. All people in this world
work to satisfy their unlimited wants and desires. Every one requires food to eat, clothes to wear and
house to live in. Besides these in daily life. They need television, mobile phone, motor bike, car etc., to
lead a comfortable life. The person visits the market and enquires about the varieties and prices of the
item which he wants to purchase. Thinking about his source and alternative choices, he uses his sense of
economy and decides to buy that item. This is economics.
So,
A Service holder is a person who is in a job to earn either wages or salary to buy goods.
A Service provider is a person who provides services to society to earn money, e g doctors, scooter
drivers, lawyers, bankers, transporters, etc.
All above persons are busy in different activities to earn, called economic activity in ordinary business of
life. They face in their life the problem of scarcity of income
Thus,
and^J^Ttu' "" ^r^'^ff'^^l^dge with economic activities relating to earning ar^ spending the wealth and
tncome. Economics is the study of how human beinJZa^
tZ^ tn":^-' T""" unlimited wLts in sZZ Z^LTZ
^omm maxnntse thetr satisfaction, producers can maximise their profits and society can maxtmtse its
social welfare'. ^^
into the Nature and Causes of Wealth of Nations", in the year 1776 At its Mxth Z
name of economics was 'PoHtical Economy'. Some of the suggested names '
(HoItVoTr E'^gli^h has its origin into two Greek words : Oikos
(Household) and nomos (to manage). Thus, the word economics was used to mean home management
with limited funds available in the most possible economLal mTn2
activity
1. Non-economic Activities
2. Economic Activities
- Political activities such as various activities performed by different political parties namely by
Bhartiya Janata Party (BJP), Congress Party etc
~ ^'ds or helping
What is Economics? 3
living. Every one is concerned with one or the other type of activity to earn money or wealth to meet
their wants. An economic activity means that activity which is based on or related to the use of scarce
resources for the satisfaction of human wants. Economic activities are classified as under :
the
ECONOMIC ACTIVITIES
nt
>ution
kos ome
I day
any
[types I their
Production : Production is that economic activity which is concerned with increasing the utility or value
of goods and services. Manufacturing shirt with the help of cloth (raw material) and tailoring (labour)
etc. is an act of production. Transporting sand from river bank to a town, where it is needed, is also an
act of production. Here utility is created through transportation of goods to the person who needs it.
Consumption : Consumption is that economic activity which is concerned with the use of goods and
services for the direct satisfaction of individual and collective wants. Consumption activity is the base of
all production activities. There would have been no production if there would have been no
consumption. For example, eating bread, drinking water or milk, wearing shirt, services of lawyer or
doctor etc. are consumption activities.
Investment : Investment is that economic activity which is concerned with production of capital goods
for further production of goods and services. Investment indirectly satisfies human wants. For example,
the production of printing press machines to print newspapers, books, magazines etc. or investment in
computers to provide Internet, banking and related services.
Exchange : Exchange is that economic activity which is concerned with sale and purchase of
commodities. This buying and seUing is mostly done in terms of money or price. So, it is also called
""Product Pricing'' which relates to determination of the price of the product under different conditions
of the market, viz., perfect competition, imperfect competition, monopoly etc.
Distribution : Distribution is that economic activity which deals with determination of price of factors of
production (land, labour, capital and enterprise). This is known as the 'Factor Pricing', e.g., price of land
is rent, that of labour is wage, that of capital is interest and price of entrepreneur is profit. Distribution is
the study to know how the national income or total income arising from what has been produced in the
country (called Gross Domestic Product or GDP) is distributed through salaries, wages, profits and
interest.
"Economics is that branch of knowledge that studies consumption, production, exchange and
distribution of wealth".
—Chapman
definition of economics
Economics has been defined by many economists m different ways The set of cat"go^s^'"'" '' ^^
mto the folwL; W
1. Wealth Definition
(/) Adam Sm^th the father of modem economics, m his book 'An Inquiry mto the Nature and Causes of
Wealth of Nations' in 1976 defined that-
(«) According to J.B. Say, Economics as "the science which deals wtth wealth"
Criticism : This definition is not a precise definition. It gives importance to wealth rather than production
of human and social welfare. importance to wealth
The wealth definition of economics was discarded towards the end of the 19th century. 2. Material
Welfare Definition
What is Economics?
Criticism :
Then basic difference between Adam Smith's and Marshall's definition is that Ad.m
3. Scarcity Definition
tact of Me. When one warn gets satisfied, another want crops up.
coal IS used m factories, m running railways and in thermal stations for electric generation and by
households, etc. electric
In short, according to Robbins, Economics is a science of choice It deals with how Crit^sm : Sfet™s
Scarcity definition of economics has been criticised on the following grounds ■ (0 The defimtion is
impractical and difficult. It is narrow and restricted in scope It
The definition combines the essential elements of the definitions by Marshall and Robbms. Accordmgly,
economics is concerned with the efficient allocation and use of scarce means as a result of which
economic growth is increased and social welfare is promoted. The definition has been accepted
universally. In short, the growth definition of economics is most comprehensive of all the earUer
definitions.
iture of economics ^
NATURE OF ECONOMICS
ics as Art
A. ECONOMICS AS A SCIENCE
Science can be divided into : (a) Natural science, and (b) Social science : Sciences like Physics Biology and
Chemistry are natural or physical sciences, where experiments can be conducted in the laboratory under
controlled conditions. Relationships can be decided between cause and effect, which are based on facts.
Observations can be made and used to prove or disprove theories. The results apply universally.
Economics is a social science because it is systematic study of economic activities of human beings.
Economics is a science as it is a branch of knowledge where various facts have been systematically
collected, classified and analysed.
have their own laws and theories. Economics as social science which is a systematic
maximum profit to producers and maximum social welfare to the society as a whole. ^
Hi) Scientific Laws : Economics is a science because its laws are universally true Different laws m
economics namely, law of demand, law of supply, law of dimimshing marginal utility, law of returns,
Gresham's law etc. are applicable to all types of economies, whether capitaKstic, socialistic or mixed
economy
ae. of ; to
What is Economics? y
(m) Cause and Effect Relationship : Economic laws establish cause and effect relationship like the laws in
other sciences. For example, the law of demand shows the relationship between change in price and
change in demand..It shows that mcrease in price of a commodity (the cause) will decrease its demand
(the effect) establishing the negative or inverse relationship between price and quantity demanded. The
law of supply shows that the increase in price of a commodity (cause) will increase its supply (the effect)
establishing the positive relationship between price and supply of quantity of commodity.
(iv) Verification of Laws : Like other sciences economic laws are also open to verification. These
economic laws can be verified through any empirical investigation.
On the basis of the arguments given above, we can say economics is a science—not exacriy natural or
physical science but social science that studies economic problems and policies in a scientific manner.
A positive science is one which makes a real description of an activity. It only answers what ts} what
was! It has nothing to suggest about facts, positive economics deals with what IS or how the economic
problems facing a society are actually solved. Prof. Robbins held that economics was purely a positive
science. According to him, economics should be neutral or silent between ends; /.e., there should be no
desire to learn about ethics of economic decisions. Thus, in positive economics we study human
decisions as facts which can be verified with actual data.
Some exampi es of Economics as a positive science are : {i) India is second largest populated country of
the world. (k) Prices have been rising in India.
(m) Increase in real per capita income increases the standard of living of people. (iv) The targeted
growth rate of the tenth five-year plan is 8 per cent per annum. {v) Fall in the price of commodity leads
to rise in its quantity demanded.
(vii) The share of the primary sectors in the national income of India has been declining.
{viii) Ordinary business of life is affected enormously by tsunami, earthquakes, the bird flue, droughts,
etc.
(b) Economics as a Normative Science
A normative science is that science which refers to what ought to be} what ought to have happened}
Normative economics deals with what ought to be or how the economic problems should be solved.
Alfred Marshall and Pigou have considered the normative aspect of economics, as it prescribes that
cause of action which is desirable and necessary to achieve social goals. It makes an assessment of an
activity and offers suggestions for that. The statements which make assessment of activity and offer
suggestions are called
normative statements. The normative statements, in fact, are the opinions of different
statements cannot be empirically verified. That part of economics which deals with
normative statements is called Normative Economics. Thus, economics is both positive and normative
science.F^smvc
(/) Minimum wages should be guaranteed by the government in all economic activities. (//) India should
not take loans from foreign countries. [Hi) Rich people should be taxed more. [iv) Free education should
be given to the poors.
{v) Effective steps should be taken to reduce income-inequalities in India. (vi) India should spend more
money on defence. {vii) Government should stop minimum support price to the farmers.
(vtii) Our education system should produce sufficient qualified and trained persons to the economy.
Economics as positive science and normative science is inseparable. In reality economics has developed
along, both positive and normative lines. The role of economist is not only to explain and explore as
positive aspect but also to admire and condemn as negative aspect which is essential for healthy and
rapid growth of economy.
In the followii^ examples first part of statement is positive giving facts and second i part IS normative
based on value judgements.
H) Indian economy is a developing economy, the government should make development through
correct and proper planning.
(ii) A rise in the price of a commodity leads to a fall in demand of quantity of commodity, therefore
government should check rise in prices.
(iti) Rent Control Act provides accommodation to the needy peoples, therefore, the act should be
honestly implemented.
B. ECONOMICS AS AN ART
Art IS practical application of knowledge for achieving some definite aim. It helps in solution of practical
problems Art is the practical application of scientific principles. Sc ence lays down princip es while art
puts these principles into practice. Economics is an art as it gives us practical guidance in solution to
various economic problems. '
We all know that there is oil shortage in India. The information given by economics .sposmve sconce We
also know the govermnent aims at removing' oil shortage X information supplied by economics is
normative science. In order to achieve the objective of full availability of oil m India, the govermnent has
followed the path of oil plaLng The path of planmng is an art as it implies practical application of
knowledge with a view to achieve some specific objectives. So, we can say that economics is an art.
Economics is, thus, a science as well as an art.
What is Economics? y
exercises
Make a Ust of economic activities that constitute the ordinary business of life.
Which is the most accepted definition of economics? Give the definition. Explain welfare definition of
economics.
"Economics is about making choices in the presence of scarcity." Explain. How scarcity and choice go
together? What is meant by economics? Economics is a science? Give reasons.
Discuss the nature of economics as a science. Give argimient in favour of economics as a science.
3.
4.
5.
6.
Chapter 2
i introduction
thinking and reasoning which had evo^ '""I' ^^ ^^"^^^e-is gifted with
and scientific man'ner. A meZdo^X b'n ^ things. The empirical methodolog^ consists Jf la^^^^
knows it or not, he uses this method to f'""'.^'e made Whether a common man
observes from his daily ex^elre whT'> ^^^ ^hich shop. A shopkeeper
the pattern of demand and manufactures larle o^T "^^^ufacturer also observes
\ 11
demand and supply, he collects data (information) systematically, gets it organised in some logical or
systematic way, analyses this data according to certain principles and draws conclusions. He has to do it
carefully since a wrong judgement can completely ruin him.
Quantitative Data and Qualitative Data : An empirical investigation is an investigation where facts are
collected through observation. In Physics, Chemistry and Botany, only those things that can be observed
by our senses—seeing, hearing, touching, tasting and smelling—are taken to be reliable and then
recorded (noted).
We all agree that the rose is beautiful. How do we reach that conclusion? We all like its colour, shape
and above all its smell. In this respect, it is not a subjective or personal conclusion. But I say that I like
the rose most of all the flowers, this would be a subjective statement. A scientist however, makes very
precise statement—he would say that roses have a sweet smell. Similarly, people might say that theft
and robbery have increased these days. This might be a conclusion based on impression people get from
the newspaper reports of cases of theft and robbery. This impression may or may not be true. We can
find out whether it is true or not only by comparing the number of cases of theft and robbery reported
during one year with the number of cases reported in other years. An investigator would collect such
information from police records.
When information or observations are recorded in numbers or quantity, we say we have quantified
information. For example, the number of people in a state who are strict vegetarians, heights or weights
of students, everyday temperature, income of individuals, prices of wheat during this week, number of
people in country are really poor-rich-middle class, number of people are illiterate who will not get jobs,
number of highly educated and will have best job opportunities, etc. are known as 'Quantitative data'.
However, not all information can be numerically expressed. It is not possible in certain cases to measure
or quantify information, e.g., preference of people viewing TV. channels, intelligence of students,
appreciation of art, beauty, music etc. Supposing a selection for a post is to be made, candidates are
interviewed, some questions are put to them and their qualifications are taken into consideration. The
interview board discusses the comparative merit of the candidates and ranks them for final selection.
This judgement is not quantifiable, it is based on impression.
Social sciences, such as economics, sociology, management etc., do not always deal with what we call
inherently measurable or quantifiable facts.
is smistics ? >
It is necessary to have quantitative measurements even for things which are not basically quantifiable.
This is necessary for preciseness of statement. The systematic
12
treatment of quantitative expression is known as 'Statistics'. Not all quantitative expressions are
statistics; we will see that certain conditions must be fulfilled for a quantitative statement to be called
statistics. We will also consider later the functions and hmitations of statistics. First, let us understand
what comes under the name Statistics. Statistics can be defined in two ways :
1600, 400, 80, 20, 700, 300, 70 and 30 are Statistics? Figures are innocent and do not
speak anythmg. But when they refer to some place, person, time etc., they are called statistics. Let us
look at the table given below :
Govt Senior Secondary School. Students are grouped as boys and girls and percentage is st^
calculated for each group. Now, in this context the figures 1600, 400, 700 etc have a ' of
statistics of scores in a cricket match, statistics of price, statistics of agricultural production, i sin statistics
of export and import etc. j^"
The above definition covers the following main points about statistics as numerical presentation of facts
(Plural sense).
Statistics are aggregates of facts : A single observation is not statistics, it is a group of observations, e.g.,
"pocket expenses of Anil during a month is Rs 50" is not statistics. But "pocket expenses of Anil, Prakash,
Sunil and Suresh during a month are Rs 50, 55, 80 and 70 respectively" are statistics.
{b) Statistics are affected to a marked extent by multiplicity of causes : Statistics are generally not
isolated facts they are dependant on, or influenced by a number of phenomena, e.g., electricity bills are
affected by consumption and rate of electricity (c) Statistics are numencally expressed : Qualitative
statements are not statistics unless A they are supported by numbers. For example, if we say that the
students of a class colle.
13
are very good in studies, it is not a statistical statement. But when a statement reads as 40 students got
first division, 30 second division, 20 third division and 10 failed out of 100 students, it is a statistical
statement expressed numerically.
(e) Statistics are collected in a systematic manner : Statistics collected without any order and
system are unreliable and inaccurate. They must be collected in a systematic manner.
if) Statistics are collected for a pre-determined purpose : Unless statistics are collected for a specific
purpose they would be more or less useless. For example, if we want to collect statistics of agricultural
production, we must decide before hand the regions, commodities and periods for which they are
required.
(g) Statistics are placed in relation to each other : Statistical data J»re often required for comparisons.
Therefore, they should be comparable periodwii>c, regionwise, commoditywise etc.
When the above characteristics are not present numerical data cannot be called statistics. Thus, "all
statistics are numerical statements of facts bui all numerical statements of facts are not statistics."
Statistics defined in singular sense (as a statistical method) : Statistics in its second, singular sense, refers
to the methods adopted for scientific empirical studies. Whenever a large amount of numerical data are
collected, there arises a need to organise, present, analyse and interpret them. Statistical methods deal
with these stages :
PRESENTATION
I Interpretation
Statistics as Methodology
According to Croxton and Cowden, "Statistics may be defined as a science of collection, presentation,
analysis and interpretation of numerical data.'"
It
Statistics for Economics-XI The above definition covers the following statistical tools :
(a) Collection of data : This is the first step in a statistical study and is the foundation of statistical
analysis. Therefore, data should be gathered with maximum care by the investigator himself or obtained
from reliable pubHshed or unpublished sources
(b) Organisation of data : Figures that are collected by an investigator need to be organised by
editing, classifying and tabulating.
(c) Presentation of data : Data collected and organised are presented in some systematic manner to
make statistical analysis easien The organised data can be presented : with the help of tables, graphs,
diagrams etc.
(d) Analysis of data : The next stage is the analysis of the presented data. There are
large number of methods used for analy sing the data such as averages, dispersion correlation etc.
'
(e) Interpretation of data : Interpretation of data implies the drawing of conclusions
on the basis of the data analysed in the earlier stage. On the basis of this conclusion certain decisions
can be taken.
According to the figure, interpretation of data is the last stage in order to draw some conclusion. One
has to go through the four stages to arrive at the final stage; they are — collection, organisation,
presentation and analysis. First stage — collection of data refers to gather some statistical facts by
different methods. The second stage is to organise the data so that collected information is easily
intelligible. This is the arrangement of data in a systematic order after editing. Third stage of statistical
study is presentation
of data After collection and organisation the data are to be reproduced by various
characteristics of data can easily be understood on the basis of their quality and uniformity.
Fourth stage of statistical study is the analysis of data. Calculation of a value by different
methods and tools for various purposes is made to arrive at the last stage of study viz interpretation of
data. ^
In brief statistics is a method of taking decisions on the basis of numerical data properly collected,
organised, presented, analysed and interpreted.
((
functions of statistics
1. Statistics simplifies complex data : With the help of statistical methods a mass of data can be
presented in such a manner that they become easy to understand. For example, the complex data may
be presented as totals, averages, percentages etc
Stati
2. Statistics presents the facts in a definite form : This definiteness is achieved by stating
conclusions in a numerical or quantitative form.
5. Statistics helps in formulating policies : Many policies such as that of import, export, wages,
production etc., are formed on the basis of statistics. Some laws such as Malthus' theorj^ of population
and Engel's law of family expenditure are based on statistics.
6. Statistics helps in forecasting : Statistics also helps to predict the future behaviour of
phenomena such as market situation for the future is predict^).' on the basis of available statistics of
past and present. Economist might be interested in predicting the changes in one economic factor due
to the changes ir another factor. For example, he might be interested to know the impact of today's
investment on the national income in future which is possible with the knowledge of statistics.
7. Statistics helps to test and formulate theories : When some theory is to be tested, statistical
data and techniques are useful. For example, whether cigarette smoking causes cancer; whether
demand increase affects the price, can be tested by collecting and comparing the relevant data.
importance of statistics
The use of statistical method is so widespread that it has become a very important tool in affairs of the
world. It IS indispensable to fields of investigations especially in the sciences, such as Botany, Sociology,
Economics, Medicine etc. It helps particularly in drawing research conclusions. Let us examine the
importance of statistics in some fields relating to economics and business :
quantitative and mathematical studies^ In India, Prof. P.C. Mahalanobis, Dr V.K R V Rao R.C. Dcsai, ctc.
iiave cgntrmuieu aTOtWtne aeveiopmeiu ui lucuiclk-^t-----'
of statistics.
!H
If
16
population etc. New things are being invented today in all the sciences becauTo7the
development of empirical side of economics; the inductive method of economics is dependent upon
statistical methods. economics is
appliances of his laboratory, m the same way as the doctor uses stethoscope for diagnosis
depend on his income; but there is no end to his desires and demands. No sooner does he consume one
thing, he desires to obtain the other. We discover how
(b) Statistics and the study of production : The progress of production every year can
various elements of production (e.g., land, labour, capital and enterprise) is also
done with the help of statistics. The statistics of production are ver^ helpful or
production with a view to make a comparative study of various fields of production and economic
planning. ^ wuucuou
^mmodity m a market. The law of price determination and cost price which are :
(d) Statistics and the study of distribution : Statistics are helpful in calculation of national income in the
field of distribution. Statistical methods are used in solving the problem of the dismbution of national
income. Various problems arise di^I to
Thus, statistics is useful in the various fields of economics. It gives statement of facts, direction to solve
problems, evolution of economic laws and helps in economic planning. Economic laws in the modern
economic world are based on mathematics and statistics which help to form econometric models; these
models are helpful in solving economic problems. On this basis we can call economics a Science of
Human welfare and statistics as an Arithmetic of Human welfare.
The comparison of the stage of development of one country with other is possible only with the
availability of statistical data. There are a number of problems of underdeveloped countries, e.g., over
population, lack of industries, lack of agricukural development, lack of education etc. These problems
can be fully viewed and understood only by getting the actual figures for different areas. Similarly,
general review of progress in all fields of economic development needs the help of statistical data and
statistical methods. Priorities of expenditure of a national budget can be determined through the
comparative study of past performances with the present. Thus, planning without statistics is a ship
without radar and compass.
f"
BUSINESS
International
Import
Export
i of Trade
^-1-1 t V
Advertisement
Mf
18
method, of .ta„st,ea.
profttable trade he must know what the ZoZ, '' celling activities. For
f'i
umitations of statistics
Statistics is very widely used in all sciences but it is not without limitations. It is necessary to know the
misuses and limitations of statistics. The following are the limitations of statistics.
1. It does not study the qualitative aspect of a problem : The most important condition of
statistical study is that the subject of investigation and inquiry should be capable of being quantitatively
measured. QuaHtative phenomena, e.g., honesty, intelligence, poverty, etc., cannot be studied in
statistics unless these attributes are expressed in terms of numerals.
2. It does not study individuals : Statistics is the study of mass data and deals with aggregates of
facts which are ultimately reduced to a single value for analysis. Individual values of the observation
have no specific importance. For example, the income of a family is, say Rs 1,000, does not convey
statistical meaning while the average income of 100 families say Rs 400, is a statistical statement.
3. Statistical laws are true only on an average : Laws of statistics are not universally applicable Hke
the laws of chemistry, physics and mathematics. They are true on an average because the results are
affected by a large number of causes. The ultimate results obtained by statistical analysis are true under
certain circumstances only.
4. Statistics can be misused : Statistics is liable to be misused. The results obtained can be
manipulated according to one's own interests and such manipulated results can mislead the community.
20
h
(Si.
misuse of statistics
and^^eS-pJT~ - support knowledge of statistics, the truth with the help of his
exercises
- above
.............
4.
5.
6.
7.
8.
21
16.
17.
18.
19.
20,
Discuss with illustration the importance of Statistics in the solution of social and economic problems.
"Statistical Analysis is of vital importance for successful businessmen, economists, administrators and
educationists." Discuss with illustrations.
Write notes on : (a) Importance of statistics in modern economic set up, {b) Statistics in economic
analysis.
Define Statistics. Explain its utiHty in the field of economic planning. "Statistical thinking is as necessary
for efficient citizenship as the ability to read and write." Explain this statement in about 200 words.
"Statistics in these days is indispensable for dealing with socio-economic problems". How far is this
statement true?
What is the importance of Statistics in modern economic set up? Explain giving
examples. . i- u
"Planning without Statistics is a ship without radar and compass." In the light ot this statement explain
the importance of Statistics as an effective aid to national planning.
Explain the relationship between Economics and Statistics and discuss how far it is correct to say that
the science of economics is becoming statistical in its method.
Explain briefly :
UNIT 2
}1:
If
Chapter 3
What is a Statistical Enquiry? Sources of Data g Primary and Secondary Data Drafting the Questionnaire
Methods of Collecting Primary Data Census and Sample Surveys Sample Surveys Methods of Sampling
Random Sampling Non-Random Sampling Advantages of Sampling Reliability of Sample Data How
Secondary Data is Collected? Some Important Sources of Secondary Data Census of India
COI
Sourct
other, statisti
Enquiry means a search for truth, knowledge or information. Statistical enquiry therefore means a
search conducted by statistical methods. There are different subjects on this earth; some are described
by the degree of expression (quality) and some by the degree of figures or magnitudes (quantity). The
application of a statistical technique is possible when the questions are answerable in figures (quantity),
in other words the first and the foremost condition for the answer to the questions in statistical enquiry
should be quantitative, for instance :
But, there are questions like—How great was Jawaharlal Nehru? How brave was Bhagat Singh? etc.,
which cannot be answered through statistical methods. Questions that can be answered in quantity lies
within the purview of statistics, viz.. What is the average production of rice per acre in India? What is the
total population of India? How many students are there in a class?
Thus, statistical enquiry means statistical investigation or statistical survey, one who conducts this type
of enquiry is called an investigator. The investigator needs the help of certain persons to collect
information, they are known as enumerators, and respondents are those from whom the statistical
information is collected. Survey is a method of collecting information from individuals.
TABLE 1
fc^ yearProduction
1950-51 1.0
1980-81 6.8
1990-91 13.5
2000-01 30.3
2001-02 31.1
2002-03 34.5
2003-04 36.9
ti i-.
24 ,
. sources of data
SOURCES OF DATA
dary
' ^^ ^reparmg^at •
But, you may have the other choice that of visiting the factory accounts department, and record the
information from the salary register or, may gather this information from the published report of the
factory about the payment of wages. This is secondary source for an investigator but, for the factory it is
a primary source.
Thus, primary data is collected originally and secondary data is collected through other sources. Primary
data is first hand information for a particular statistical enquiry while the same data is second hand
information for an another enquiry. The same data is primary in one hand and secondary in the other,
e.g., any Government publication is first hand (Primary) for Government and second hand (Secondary)
for a research worker. Thus, secondary data can be obtained either from published sources or from any
other source, for example, a website which saves time and cost.
The most popular and common tool is questionnaire/interview schedule to collect the primary data. The
questionnaire is managed by the enumerator; researchers or trained, investigators. Sometimes the
questionnaire is managed by the respondents also.
MIMTim
(1) Covering letter : The person conducting the survey must introduce himself and make the aims
and objectives of the enquiry clear to the informant. A personal letter can be enclosed indicating the
purposes and aims of enquiry. The informant should be taken into confidence. He should be assured
that his answers will be kept confidential and he will not be solicited after he fills up the questionnaire. A
self-addressed and stamped envelope should be enclosed for the convenience of the informant to
return the questionnaire.
(2) Number of questions : The informant should be made comfortable by asking minimum number
of questions based on the objectives and scope of enquiry. More the number of questions, lesser the
possibiUty of response. Therefore, normally
\l
(I ■•
I-J
Ni
m-
.26
the mformant should ^ abrtr^ve Ae aZ^f "" 'TT' ■"."''ject.ve. For this the blank space, e.g., ®
^y "smg a tick-mark in
WUch of the folloJng languages you use most for uniting, (Pu, a cross) (.) English p
M Punjabi □ iit,)Vrd>x n
or 'Right' or
These questionsleXle aS In which class do you read? In which subject you are more interested?
A SPECIMEN QUESTIONNAIRE
27
H,.
S.I
Hit:
28 1
Example :
Mer
7.
8.
We^wmg are the methods of primary data collection which a« in common use
COLLECTION OF DAW
of
29
lARY
SECONDARY
►Direct Personal Interview ►Indirect Personal Interview ►Telephone Interview •^Information from
Correspondents ►Mailed Questionnaires ►Questionnaires Filled by Enumerators
Published Sources
-1
Unpublished Bourns
etc and collect the desired information. In the same way one can think of personal Imryof collection of
information regarding family budget and living conto Ta group area. The investigator must be skilled,
tactful, accurate, pleasing and should
5" Information can be obtained easily from the informants by a personal interview. 6. Since the enquiry
is intensive and m person, the results obtained are normally
reliable and accurate. 7 Informants' reactions to questions can be properly studied. ,,.,
8'. Investigators can use the language of communication according to the educational standard and
attitude of the informant.
Limitations : 1ju„
1. This method can be used if the field of enquiry is small. It cannot be used when
J'"
SSf
'30
""erwise
ifSifMlSi
m-smrnrn
Merits :
Kiai^ obtained from the third party, it is more or less free froJ
Limitations :
ar in to oh G( in( wt
Mt
Lin
Thus, we find that both the above methods—direct and indirect personal interviews— have certain plus
and minus points. For this reason the choice of the method depends on the nature of enquiry and
sometimes we balance the demerits of one method by '.tsing the other method also for the same
investigation. This way we can counter chec'. the data collected by one method with the other.
(m) Telephone interview : The investigator asks questions over landhne telephone, mobile telephone
and even through website. Various researchers, newspapers, television channels, mobile service
providers, banks etc., use telephone service to get information from different people, e.g., exit poll,
political or economical opinions, music or dance performance opinion etc. Even sometimes website or
internet are used for obtaining statistical data. These days online surveys through Short Message
Service, i.e., SMS has become popular.
Merits :
Limitations :
1. Information cannot be obtained from people who do not have their own telephones.
2. Reactions of respondents on certain issues cannot be judged; but it sometimes becomes helpful
in obtaining information from respondents.
(IV) Information firom correspondents : In this method, local agents or correspondents are appointed in
different parts of the investigation area. These agents regularly supply the information to the central
office or investigator. They collect the information according to their own judgements and own
methods. Radio and newspaper agencies generally obtain information about strikes, thefts, accidents
etc. by this method. It is adopted by Government departments to get estimates of agricultural crops and
the. wholesale price index number. It is suitable when the information is to be obtained from a wide
area and where a high degree of accuracy is not required.
Merits :
Limitations :
32
fij
3. As the correspondent uses his own judgement, his personal b^as may affect the accuracy of the
information sent. ^
Limitations :
4. ms method can be used only when the informants are educated or hterate so that ^ they return
the questionnaires duly read, understood and answered 1
' possibility of getting wrong results due to partial responses, and those
6. There may be loss of questionnaires in mail. This method is suitable for the following situations •
compel bank and companies etc., to supply information regularlv to the Government in a prescribed
form. ^ ^ regularly to the
(b) This method can be successful when the informants are educated.
Mer
Limit
3.
4.
5.
'Ki
33
Following are some suggestions for making this method more effective and successful.
(a) Questions should be simple and easy so that the informants may not find it a
(b) Informants should not be required to spend for posting the questionnaires back
therefore, prepaid postage stamp should be affixed. ic) This method should be used in a large sample or
wide universe.
(d) This method is preferred in such enquiries where it is compulsory by law to till the schedule.
Thus, there is little risk of non-response.
(e) The language of the schedule should be polite and should not hurt the sentiments
of the informants.
(VI) Ouestionnaire filled by enumerators : Mailed questionnaire method poses a tanber oi difficulties in
collection of data. Generally, these filled questionnaires received to incomplete, inadequate and
unrepresentative.
S The second alternative approach is to send trained investigators^or enumeratoi^m M,rmants with
standardised questionnaires wl.ich are to be fiUed^jn
^e im^estigator helps the informants in recording their answers. The invest^a^rs shoidd i honest tactful
and painstaking. This is the most common method used by research iSons. They train investigators
properly specifically for the purpose of an enqu^ ^d also tram them in dealing with different persons
tactfu ly, to get Proper answers to Ac questions put to them. The statistical information collected under
this method is
highly reliable.
Merits :
3' True and reliable answer to difficult questions can be obtained through
■ establishment of personal contact between the enumerator and the informant. 4. As the information
is collected by trained and experienced enumerators, it is
reasonably accurate and reUable. ,
5 This method can be adopted in those cases also where the informants are illiterate.
6'. Personal presence of investigator assured complete response and respondents can
Limitations : ^ n■^
2. This method is time consuming since the enumerator is required to visit people
spread out over a wide area. 3 This method needs the supervision of investigators and enumerators. 4"
Enumerators need to be trained. Without good interview and proper traming, most
■ of the collected information is vague and may lead to wrong conclusions. 5. It needs a good battery of
investigators to cover the wide area of universe and therefore it can be used by bigger organisations.
I,
If.
34
mam survey. This is done to try out the auetlr before starting th,
the general mformation about L po^Ja"^^^^ -thods for obtaL the pilot survey helps in : ^ ^e sampled.
The information supplied b, [i) Estimating the eosr of ^ ■
W'll get the mformation abont all the fLrhnnd f each girl and
Ii
f( P
O £ S<
UI
gro' Sun
20
Source
by taking only 50 girls out of 500 and obtain the average of this part of the total population. The average
of 50 girls reasonably be representative of average weight of 500 girls. In this case weight of 50 girls is
the sample.
Census Surveys
The objective of a census method or complete enumeration is to collect information for each and every
unit of the population/universe. In this method every element of population is included in the
investigation. Thus, when we make a complete enumeration of all items in population, it is known as
'Census Method" or 'Method of Complete Enumeration'. In above example, collecting weights of all the
500 girls in Senior Secondary School is census method of collection where no student is left over, as each
student is a unit.
2. Demographic data obtained by census method on death rates and birth rates, literacy, work
force, life expectancy and composition of population etc. are published by Registrar General of India.
3. The data relating to estimation of the total area under principal crops in India are obtained by
using village records maintained regularly by Patwari.
Let us review the following census data in the following Table no. 2 regarding relative growth of Urban
and Rural Population in India obtained from Reports and Economic Survey 2002-2003.
TABLE2
r-................. Year i f r UrhaiP Popuiatinn {tn itorpi) Rural PopttUuioti (m rmrei) Total
Ptipuldtion (m Lrine») As Perceraage of Total Popukttidn
if:.
iti
I7
I (1
36
74 2 crore persons, out of about 102.7 crore total population lived in around 5 5 lakhs'
urban areas The urban population formed about 11 per cent and rural population 89 per
n 2001 while still over 72 per cent people lived m rural areas. The above table show the relative growth
of rural and urban population m India since 1901.
The net addition to rural population between 1991-2001 was 1133 crore while urban population
increased by 6.74 crore persons. The decadal growth at frru^d
in the growth rate of urban population m the decade ending 2001 over the decade^nding
SAMPLE SURVEYS
We may study a sample drawn from the large population and if that sample is adequate representative
of the population, we should be able to arrive at val 7corcSn
Method of collecting of data. In above example, collecting the weights of 50 girls out of
500 girls m Semor Secondary School is sample method of collectiol In this method ew students as
sample considered for our study. metnod tew
{a) We look at a handful of gram to evaluate the quality of wheat, rice or pulses, etc
[c) A drop of blood is tested for diseases like malaria or typhoid etc
nnnT"'' T ""V^^^'^^ical termmology population or universe does not mean the total numbe of people m
an area; it means the total number of observations or terns fn
methods of sampling
^^^Broadly speaking, various methods of sampling can be grouped under mam (a) Random Sampling,
and (b) Non-Random Sampling.
37
Let us discuss now the various samphng methods which are popularly used in practice.
MiTHODS OF SAMPLING
-1
Non-Random Sampling
ib)
Simple or Unrestricted Random Sampling Restricted Random Sampling (f) Stratified Sampling (//)
Systematic Sampling
random sampling
Random Sampling is one where the individual units (samples) are selected at random.
Random sampling does not mean unsystematic selection of units. It means the chances of each item of
the universe being included in the sample is equal. The term 'Random Sampling' here is not used to
describe the data in the sample but it refers to the process used for selecting the sample. Following are
the methods of random sampling.
Simple or Unrestricted Random Sampling
This method is also known as simple random sampling. In this method the selection of item is not
determined by the investigator but the process used to select the terms of the sample decides the
chances of selection. Each item of the universe has an equal chance of being included in the sample. It is
free from discrimination and human judgement. Random sampling is the scientific procedure of
obtaining a sample from the given population. It depends on the law of probability which decides the
inclusion of items in a sample. To ensure randomness, mechanical devices are used. There are t^vo
methods ot obtaining the simple random sample. They are :
(a) Lottery Method : A random sample can generally be selected by this simple and popular method. All
the items of the universe are numbered and these numbers are written on identical pieces of paper
(slip). They are mixed in a bowl and then there starts the selection by draw one by one by shaking the
bowl before every draw The numbers are picked out blind folded. All slips must be identical in size,
shape and colour to avoid the
biased selection.
IMH'
38
-ry large the above procedures if the disks, balls or slips L not XrouThnf' ^^ T^^^^^ " been a marked
tendency to usetSroTrindT^^^^ > T T' " ^^^^ ^as
by a random process. The follLing of Som ^gS tt^^^^^^^^^^^^ ~ (.) Tippet. Random Sampling Numbers.
There are 10^00 numbe^t^anged 4 digits
MG. Kendall and Babington Smith's Random Sampling Numbers, having 1 lakh
ic) Rand Corporation's a million random digits (d) Snedecor's 10000 random numbers. ie) Fisher and
Yates Table having 15000 digits
Rc
Tippett Numbers
ho
Th
hoi
7969 5911
1545 1396
2370 7483
5913 7691
6608 8126
students, now we will cons J a pag^of ?fp i'" ^^^ ""'"''"ing the
Merits
[Meri 1.
2.
3.
4.
5.
6.
3. This method is economical as it saves time, money and labour in investigating a population.
Demerits
1. This requires complete list of population but up-to-date lists are not available in many enquiries.
2. If the size of the sample is small, then it will not be a representative of a population.
3. When the distribution between items is very large, this method cannot be used.
4. The numbering of units and the preparation of the slips is quite time consuming and not
economical particularly if the population is large.
(t) Stratified random sampling : In this method the universe is divided into strata or homogeneous
groups and an equal sample is drawn from each stratum or layer at random. This method is therefore
useful when the population of the universe is not fully homogeneous. For example, suppose we want to
know how much pocket money an average university student gets every month will be taken equal
sample from various strata, namely : B.A. students, M.A. students and Ph.D. students etc. Stratified
random sampling is widely used in market research and opinion polls, it is fairly easy to classify people
into occupational, economic, social, religious and other strata. There are different types of stratified
sampling
{a) Proportional stratified sampling is one in which the items are taken from each stratum in the
proportion of the units of the stratum to the total population.
(b) Disproportionate stratified sampling is one in which units in equal numbers are taken from each
stratum irrespective of its size.
(c) Stratified weighted sampling is one where units are taken in equal number from each stratum,
but weights are given to different strata on* the basis of their size.
Merits
1. The sample taken under this method is more representative of the universe as it has been taken
from different groups of universe.
2. It ensures greater accuracy as each group (stratum) is so formed that it consists of uniform or
homogeneous items.
V\L
m.
40
Demerits
1. Stratified sampHng is not possible unless some mformation concerning ti population and its
strata is available. concerning u
2. If proper stratification is not done the sample will have an effect of bias. If differ, strata of
population overlap such a sample will not be a representative one
by preparing this list m some random order, for example, alphabetical order
« - 10 The method of selecting the first item from the list is to decide at random f^^t
Then the other items will be 15th, 25th, 35th, and so on unSl we have got oVr fuH sal
fully random and that there are no inherent periodicities in the list.
Merits
1. It ^yystematic, very simple, convenient and checking can also be done quickl]
Demerits
to divide and sub-divide a universe according to its characteristics. Thus if a survev ki be conducted in a
country it will first be divided into zones or states l region t^^^^^ mailer units cities towns and villages
and then into localities and hLseToW; At Jd
non-ranoom sampun6
s-XI
the
:rent
tipler on is ieved
ickly.
!on as
'n
Judgement Sampling
This is also called purposive or deliberate sampling. In this method individual items of sampling are
selected by the investigator consciously using his judgement. Therefore, it requires that the investigator
should have a good knowledge of the universe and some experience in the field of investigation.
Obviously, the choice of samples will vary from one investigator to another. For example, from a
universe of 10,000 ladies who use a particular brand of hairdye, the investigator will select a sample of
say, 1,000. His choice of this sample will be such that it is irrespective of the universe. For this an
exercise oi judgement is required.
In order for the judgement sampling to be reliable, it should be free from individual lies or prejudice.
Since the choice of sample is not based on probability it does not guarantee accuracy and it makes
detecting of sampling errors difficult. However, this methods is useful in solving a number of kinds of
problems in universe and economics.
(a) The number of items in the universe is small to which some items of important characteristics
are likely to be left out.
(c) When some known characteristics of the universe are to be intensively studied.
Quota Sampling
It is a method of sampling that saves time and cost and is commonly used m surveys of political,
religious and social opinion.- Interviewers are allotted definite quotas of the universe and they are
required to interview a certain number from their quota. Quotas are decided on the basis of the
proportion of persons in various categories. In other words, the investigator is given instructions about
how many interviews should be taken say in a given localitv and what proportion should be from say
upper, middle and lower mcome groups, as by some other classification which is predetermined. For
example, for a study of truancy (running away) from school in Delhi the investigators are allotted quotas
of say 10 schools each out of which two should be public schools (Boys), one public school (Girls), three
Boys' Senior Secondary Schools, two Girls' Senior Secondary Schools, two Co-education Schools and
from each school he is asked to interview 50 students, taking 10 students each from Classes VIII, IX, X, XI
and XII. The interviewer can select any 10 students according to his own judgement.
It is a kind of judgement samphng and provides satisfactory results only when interviewers are carefully
trained and personal prejudice is kept out of the-process of selection. ' '
hi
!i<
42 ,
schools. This method is used wLn^e " .V ^^ ^^ convenient for hL to g^trthe not clear or complete source
hst is t^a^lbl" T ^^e sample unit i
^^HIAGESOF
(I
le otl: Thi
the larf
Statistic
The means a
43
The main purpose of sampling is to collect maximum information with minimum ^nditure of money,
time and labour and yet achieve a high degree of -^curacy and Ability. For ensuring reliability certain
principles must be followed. In samphng method : is presumed that whatever conclusions are drawn
from a sample are also true for the lole population. This presumption is based mainly on the followmg
two laws :
(b) The Law of Inertia of Large Numbers. .'r u (a) Law of statistical regularity : The law of
statistical regularity is derived from the
mathematical theory of probability. It says that a comparatively small group of items chosen at random
from a very large group will, on the
characteristics of the large group. Basically, it applied to rWom se^lection. Thus so in the process of
sampling each unit of the universe has an equal chance of being selected. Therefore, the selected items
can be said to be representative of the universe. Although the law is not as accurate as a scientific law is,
it does insure a reasonable degree of accuracy. Since there is a certam regularity m natural phenomena,
we assume a certain uniformity in nature A random samphng is said to follow the law of statistical
regularity because of this basic uniformity m a
universe. r , -t- r
lb) Law of inertia of large numbers : This law is also called the law of stability of mass
data. It is based on the law of statistical regularity. Basica ly, it states that if the
numbers involved are very large, the change in a sample is likely to be very small
in other words, the individual units of a universe very continually but the total
universe changes slowly. That is, large aggregates are most stable than «tnaU
Because of the slow change in the nature of total universe this law is called the law
of inertia (laziness) of large numbers. For example, sugar production of factory will vary significantly
from year to year but Ac sugar production of a country as a whole will remain comparatively s able. Or a
g eat Inge may take place in the male-female ratio of family may appreciably -bange ove a short period,
but the male-female ratio of a country as a whole will ^^
the period, ^o take another example, if a. coin is tossed 6 times we may get heaj^s f^r ^ Js and tails two
times. But if a coin is tossed 5^0 times^ there is a high p^^i^ of getting heads and tails 2,500 times each.
This happens due to ^^^
I oplation of this law. That is, when one part of large group is changing m one direction
Thus, reliability of sampling depends mainly on randomness of selection of data and the large size of
universe, expressed by the above two laws.
Statistical Errors
There is a great difference in the meaning of mistake and error in statistics. Mistake imeans a
wronfcalculation or use of inappropriate method in the collection or analysis
44
other words, the difference between the approximated (estimated value) and the actual value (true
value) is called statistical error m a technical sense. For examl we make a' estimation that in a particular
meeting, 1,000 persons are there. But we clnt persons It may be wrongly counted, as 1,030. There is a
difference of 30 between the estimate value and counted va ue. This difference is called '...or' in
statistics. But w^en weTak*
aTS''^r VrThey arl knowi as mistake . For example, there is a meeting, we sent a person to count the
audience
Sources of Errors
unit scale, or defective questionnaire etc. For example, wrong scale to measun
meLl -'I height to nearest of inch or approximatrTh
differences may also occur due to differences in measuring tapes due tc manufacturing defect. In Physics
or Chemistry such errors of mLsurementrwlI occur while taking readings on various instruments.
due to clerical errors, arithmetic slips etc. by omitting some figure consideri
wrong value, making wrong totals etc. by respondent L investigator thrjlta^"''''^'"''^''''" ' statisticians for
misinterpret!
Types of Errors
(a) Absolute and relative errors : Absolute error is the difference between the actua
true value and estimated approximate value while relative error is the raTo o absoS error to the
approximated value. absolut
Ue = U' -U
wr
Estimated value
e=
U'-U U
e = Relative error U' = Actual value U = Approximate value niustration. Sales of commodity
approximated Rs 497 and actual sale Rs 500. Absolute error (Ue) = 500 - 497 = 3
500-497 3
= .006
500 500
X 100 = 0.6%.
500
Relative error is generally used in statistical calculations because absolute error gives wrong or
misleading calculations.
(h) Biased and unbiased errors : Biased errors arise due to some prejudice or bias in the mind of
investigator or the informant or any measurement instrument. Suppose the Hiumerator used the
deliberate sampling method in place of simple random sampling method; then it is called biased error.
These errors are cumulative in .ir re and increase when the sample size also increases. Biased errors
arise due to fauli^ j^iocess of selection, faulty work during the collection of information and faulty
method of analysis.
Unbiased errors are not the result of any prejudice or bias. They are those which arise acccidently just
on account of chance in the normal course of investigation. Unbiased errors are generally compensating.
(c) Sampling and non-sampling errors : The errors arising on account of drawing inferences about the
population on the basis of few observations (sampling) are called sampling errors. The errors mainly
arising at the stages of ascertainment and processing 'of data, are called non-sampling errors. They are
common both in census enumeration and sample surveys.
To avoid these errors, the statistician must take proper precaution and care in using itfie correct
measuring instrument. He must see that the enumerators are also not biased. Unbiased errors can be
removed with proper planning of statistical investigations.
i"
Secondary data are those which are collected by some other agency and are used for i^her studies. It is
not necessary to conduct special surveys and investigations. We can obtain the required statistical
information from other institutions, or reports which are ^eady published by them as a part of their
routine work. It saves cost and time which 'are involved in collection of primary data. Secondary data
may be either (a) published or (fc) unpublished.
46
Ji
hm. Sf.
Published Soiu-ces
(/■) Gove^ent pubUcations : Different ministries and departments of Central ar State Governments
publish regularly current information along with statistical da on a number of subjects. This information
is quite reliable for related studies. ^ examp es of such publications are: Annual Survey of Industries,
Labour Gaze Agriculture Statistics of India, Indian Trade Journal, etc.
(«) Publications of international organisations : We can obtain valuable internation s atistics from official
publication of different international organisations, like, ti United Nations Organisation (UNO),
International Labour Organisation (ILO International Monetary Fund (IMF), World Bank, etc. .
(Hi) Semi-official publications : Local bodies such as Municipal Corporations, Distri Boards etc: publish
periodical reports which give factual information about heal sanitation, births, deaths etc.
(iv) Reports of committees and commissions : Various Committees and Commission are appointed
by the Central and State Governments for some special study an recommendations. The reports of .uch
committees and commissions contai valuable data^ Some of the reports are : Report of National
Agricultu Commission, Report of the Tariff Commission, the Patel Committee Report e
(a) Journal and new^apers. Journals like Eastern Economists, Journal of Industr and Trade, Monthly
Statistics of Trade; and newspapers, like Financial Expres Economic Times, collect and regularly puWish
the data on different fields ( economics, commerce and trade.
(b) R^earch institutions. There are a number of institutions doing research o allied subjects This is
the most importarn source of obtaining secondary dat The National Council of Applied Economic
Research and Foundation ( !>cientihc and ^onomic Research are such institutions. Research scholars at ti
university level also contribute significandy to the availabihties of secondai
(c) Professional trade bodies. Chambers of Commerce and Trade Associatio, publish statistics
relating to trade and commerce. Federation of Indian Chamb of Commerce, Institute of Chartered
Accountants, Sugar Mills Associatio Bombay Mill Owners Association, Stock Exchanges, Bank and
Cooperath Societies, Trade Unions, etc. pubhsh statistical data.
(d) Annual reports of joint stock companies are also useful for obtaining statistic information. These
are pubKshed by companies every year.
Unpublished Data
Research institutions, trade associations, universities, labour bureaus, research workers and scholars do
collect data but they normally do not pubHsh it. Apart from the above sources we can get the
information from records and files of government and private offices. -
One should use the secondary data with care and full precaution and should not accept them at their
face value as they may be suffering from the following limitations:
2. They may not be suitable for a required purpose. The information which was collected on a
particular base may not be suitable and relevant to an enquiry.
3. They may have been influenced by the biased investigation or personal prejudices.
4. They may be out of date and not suitable to the present period.
The investigator should consider the following points before using th j secondary data : (a) Are the data
reliable?
(e) From which source were the data collected? if) Who has collected the data?
. Thus, the secondary data should not be used at its face value. It is risky to use such statistics collected
by others unless they have been properly scrutinised and found reliable, suitable and adequate. ■
■
ijl^ofrlant sources of secondary dali|> of india and national survey organisations)
There are various sources and organisations through which statistical data are being compiled in India.
Since India achieved Iiidependence, great and rapid strides have been made in the field of collection of
data. In the context of economic planning, importance of statistics (data) in the country has become
great. Statistics are necessary for framing and judging the progress of economic planning. The study of
Indian statistics is made under following heads :
48
There are some agencies both at the national and state level, which collect, process .^nd tabulate
statisticar data. Some important major agencies at the national level are ^ensus of ^dia, Narionai Sample
Survey Organisation (NSSO), Labour Bureau, Central Statistical Organisation (CSO), Registrar General of
India (RGI), Director General of Commercial Intelligence and Statistics (DGCIS), etc.
census of india
unique experience of undertaking the biggest census in the world in 1981 and has also an unbroken
record of more than hundred years of decadal censuses Ihe Indian census is universally acknowledged as
most authentic and comprehensive source of information about our land and people. In 1869 Hunter
was appointed Director General of Statistical Surveys. He not only elaborated the statistical system but
also assisted the statistical surveys of districts and provinces. That later followed into famous
Gazetteers. He advised m conducting of census of India which undertook explanatory surveys from 1869
to 1872 and thereafter matured into a decennial census which ever since contmued without
interruption. After 1872 the next census was taken in 1881 and ^nce then it has ^become a regular
feature of holding census every ten years uninterruptedR The Census of India provides the most
complete and continuous demographic record of
The data generated by the Census of India 2001 provide benchmark statistics on the
people of India at the beginning of the next millennium. This is a mirror of a fair
constitute about one-sixth of the human population on this planet. The census statistics
s useful for assessing the^impact of the developmental programmes and identify new
thrust areasTor focussing the efforts on improving the quality of life in our country Basic
population data fmm Primary Census Abstract. Census of India 2001 gives information ot population m
India as :
TABLE 3
49
The National Sample Survey (NSS), initiated in the year 1950, is a nationwide, large scale continuous
survey operation conducted in the form of successive rounds. It was established on the basis of a
proposal from Prof. P.C. Mahalanobis to fill up data gap for socio-economic planning and policy making
through sample surveys. On march 1970, the NSS was recognised and all aspects of its work were
brought under a single Government organisation namely the National Sample Survey Organisation
(NSSO) under the overall direction of a Governing Council to impart objectivity and autonomy in the
matter of collection, processing and publication of the NSS data.
The Governing Council consists of 18 experts from within and outside Government and is headed by an
eminent economist/statistician and the member-secretary of the council is Director General and Chief
Executive Officer of NSSO. The Governing Council is empowered to take all technical decisions in respect
of survey work, from planning of survey to release of survey results. The NSSO headed by a Director
General and Chief Executive Officer, has four divisions namely. Survey Design and Research Division
(SDRD), Field Operation Division (FOD), Data Processing Division (DPD) and Coordination Publication
Division (CPD). A Deputy Director General heads each division except FOD. An Additional Director
General heads FOD.
Functions of NSSO
[ii) Collection of data relating to the organised industrial sector of the country. {Hi) Supervision of
surveys conducted by states in agricultural sector through their own
agencies and also giving guidance to them for analysing and coordinating the results of these surveys.
The NSSO took a forward view of the data requirements to planners, research workers and other users
and draw up a long term programme. The programme conducts periodical surveys on :
{a) Demography, health and family planning; {b) Assets, debt and investment; (c) Land holdings and
livestock enterprises;
{d) Employment and unemployment, rural labour and consumer expenditure; and (e) Self employment
in non-agricultural eflterprises.
The data collected by NSSO surveys on different socio-economic subjects are released tiirough reports
and its quarterly journal 'Sarvekshana\ The data comprises different iocio-economic subjects like
employment, unemployment literacy, maternity child care.
■1tr
50
care. Apart from collection of rural and urban retail prices for compilation of consume
pn.e mdex numbers NSSO also undertakes field work of Annual S^ ^dust^ conducts crop estimation
surveys. ^ maustries an
exercises
2.
3.
4.
5.
6.
7.
8.
9.
14.
15.
i-XT
What are the similarities and dissimilarities between the two methods-l questionnaires to be filled in by
informants and schedules to be fild in h enumerators? Explain with examples. *
Describe the questionnaire method of collecting primary data. What precaution! must be taken while
preparing questionnaire? precautionf
{b) National Sample Survey Organisation (NSSO) mat IS Secondary^data? Discuss the various sources of
collecting secondary data, mat precaution should be taken before using secondary data? Explain
Jv) Which of the following most important when you buy a new dress' l<rame two way questions (with
'Yes' or 'No')
(/) Data collected by investigator is called secondary data. («) There are many sources of data.
(m) Telephone survey is the most suitable method of collection of data when the population is literate
and spread over a large area.
17, Distinguish between census and sample surveys. List four important types of sampling :
methods. Explain the reasons for preferring sample surveys in the collection of data.
Name the methods of selecting a sample. Describe the method of stratified sampling - with merits and
demerits.
19. The Education Ministry is interested in determining the level of education of unmarried girls in
the country. How would you organise a survey for this purpose?
20. Does the lottery method always give you random sample? Explain.
21. Do samples provide better resuhs than surveys? Give reasons for your answer.
23. Distinguish between random sampling and systematic sampling. Give suitable examples.
25. What do you understand by 'Census' investigation? Explain its suitability with illustrations.
26. What do you mean by 'Sample' investigation? Explain its suitability with illustrations.
30. How would you distinguish convenience sampling with judgement (deliberate) sampling?
Explain.
(b) Absolute errors and relative errors p; :—^ (c) Sampling and non-sampling errors
Give two examples each of sample, population and variable. Which of the following methods gives
better result and why? (a) Census (6) Sample
Chapter 4
organisation of data
(b)
Classification
1. Definition
2. Objects of Classification
3. Characteristics of Classification
1. Definition
2. Types of Series
3. Frequency Distribution
(a) classification
The quantitative information collected in any field of society or science is never uniform. They always
differ from one to another, e.g., prices of vegetables, students in different sections, income of families,
time in different watches, height or weight of students. A single item out of all the observations of group
as numerical may be called variate or variable, e.g..
Price of potato is Rs 10.00 per kg, in a group of vegetable prices.
Variate can also be called 'variable' or 'magnitude' or 'observation' or 'item' or 'measure' or value'.
The characteristics which are not capable of being measured quantitatively are called attributes. For
example, blindness, deafness, literacy, sickness, tall and short, black and blue eyed, intelligence,
aptitude for art and music, etc. They cannot be measured numerically in the same way as heights and
weights, or, price and incomes. Individuals may be ranked according to quality of attributes. The ranks
are sometimes used as their numerical values for purposes of statistical analysis.
The collected data (either by primary or secondary method) are always in an unorganised form in
schedules or questionnaires or another written form. The collected data in unorganised form is called
RAW DATA. Because of the limitation of human mind
Organisation of Data
53
to understand such a complex, varied and unorganised data, it is necessary to make them available for
comparison, analysis and appreciation by proper and suitable grouping and arrangement in condensed
form. The process of grouping into different classes or subclasses according to characteristics is called
classification. The classified information arranged in a logical and systematic order in a particular
sequence is called seriation or statistical series. The classified information presented in precise and
systematic tables is called tabulation. In other words, classification is for division of data, seriation is for
arrangement of data in a systematic order and tabulation is for presentation of data in a table.
DEFINITION
According to Professor Connor, "-Classification is the process of arranging things (either actually or
notionally) in the groups according to their resemblances and affinities, and give expression to the unity
of attributes that may subsist amongst a diversity of individuals."
(1) The facts are classified into homogeneous groups by the process of classification All the units
having similar characteristics are placed in one class or group.
Classification is grouping of data according to their identity, similarity, or resemblances For example,
letters in the post office are sorted out in groups of cities and towns of destination, viz., Delhi, Chennai,
Agra,- Chandigarh etc. Similarly, students in a school may be grouped as boys and girls, or according to
age, in library the books and periodicals are classified and arranged according to subjects, students are
classified according to division they secured in certain examination, animals or plants may be grouped
according to origin or structure etc.
OBJECTS OF CLASSIFICATION
1. To present the facts in a simple form : Classification process eliminates unnecessary details and
makes the mass of complex data, simple, brief, logical and understandable. For example, the data
collected in a population census is so huge and fragmented that it is not possible to draw any conclusion
from them. When these massive figures are classified according to sex, education, marital status,
occupation etc., then the structure and nature of the population can easily be understood.
2. To bring out clearly points of similarity and dissimilarity : Classification brings out clearly the
points of similarity and dissimilarity of the data so that they can be
easily grasped. Facts having similar-characteristics are placed in a class, such as educated, uneducated^
employed, unemployed etc.
4. To bring out relationship : Classification helps in finding out cause-effect relationship, if there is
any in the data. For example, data of small-pox patients can help m finding out whether small-pox cases
occurred more on vaccinated or unvaccinated population.
5. To present a mental picture : The process of classification enables one to form a mental picture
of objects of perception and conception. Summarised data can easily be understood and remembered.
6. To prepare the basis for tabulation : Classification prepared the basis for tabulation and
statistical analysis of the data. Unclassified data cannot be presented in tables.
CHARACTERISTICS OF CLASSIFICATION
It is important that the classification should possess following characteristics :
2. The classes must not overlap : Each item of data must find its place in one class and one class
only There must be no item which can find its way into more than one class.
3. Classification should be stable : If classification is not stable and if each time an enquiry is
conducted it has to be changed. The data would not be fit for comparison. Therefore, the classification
must proceed at every stage in accordance with one principle, and that principle should be maintained
throughout.
4. Classification should be flexible : It should be flexible and should have the capacity of
adjustment to new situations and circumstances. With change in time, some classes became obsolete
and have to be dropped and fresh classes have also to be added.
6. Classification, should have arithmetical accuracy : The total of items included in different classes,
should tally with the total of the universe.
55
■ganisation of Data
For example.
Population of India
1951 35.7
1961 43.8
1971 54.6
1981 68.4
1991 81.8
2001 102.7
OR
2001 102.7
1991 81.8
1981 68.4
1971 54.6
1961 43.8
1951 35.7
Year
Year
Yield of .
Yield of
56
Two-fold classification
POPULATION
Males
co
to
2ooE
CO
= iS « «
su
Employed
Females
Married (1)
Unemployed
Employed
Married (5)
Unemployed
Unmarrit (8)
: "r;"
Tu 1 -10 27 58 72
Thus, there are 15 workers in the income group of Rs 100 to 199 77 7 • mcome group of Rs 200-299 and
so on. ' '1
DEHNinON
57
Organisation of Data
STATISTICAL SERIES
Jiasis of bharacter
Frequency distribution
CountryPer Capita
USA France Japan Canada India 5,100 3,900 2,800 2,100 500
City 3»
Delhi Mumbai Chennai Kolkata Bangalore 792 649 573 532 459
58
3. Condition series : A series of values of some variable made according to a condition is called condition
series. Data are presented with reference to some condition, viz., height, age, weight, income etc. For
example :
500- 999 35
1000-1499 25
1500-1999 15
2000-2499 20
2500-2999 5
After collection and classification of data it is the most important job now to construct the data in an
arranged order that is the formation of series for further study of presentation, analysis and
interpretation. This arrangement can be done in three ways :
{a) Series of Individual Observation, {b) Discrete Series, (c) Continuous Series.
(/■) Serial order of alphabetical order, (ii) Ascending order, {Hi) Descending order.
The mass data when put in ascending or descending order of magnitude is called an array. A series of
individual observations is a series where items are listed singly after collection. They are not listed in
groups.
Suppose an investigator has obtained the following information from a factory about the payment of
daily wages of 30 workers, which is in unorganised form (Raw Data) as shown in Table 1.
TABLE 1
60 102 61 101 92 80
87 72 86 73 96 101
92 56 90 58 85 74
83 63 84 62 92 100
56 84 90 86 67 72
Organisation of Data
TYPES OF SERIES
STATISTICAL SERIES
57
^J^sls of "Character
3. Condition Series
- 2. Discrete Series
- 3. Continuous Series
Frequency distribution
1 Time series. A series of values of some variable according to successive points in time is called time
series. Data are presented with reference to some time unit, viz., year, month, week, or day. For
example :
Year Production
(in WO tons)
1999 78
2000 75
2001 94
2002 86
2003 89
2004 92
2005 95
Day Sale
(Rs)
Men. 1,892
Tues. 2,757
Wednes. 3,090
Thurs. 2,650
Fri. 2,592 ■
Satur. 3,822
2 Spatial series. A series of values of some variable according to geographical division of the universe
under study is called a spatial series or geographical series. Data are presented with reference to some
geographical division, viz., country, sate, city, town.
Number of Schools
CountryPer Capi^
USA 5,100
France 3,900
Japan 2,800
Canada 2,100
India 500
Delhi 792
Mumbai 649
Chennai 573
Kolkata 532
Bangalore 459
58
3. Condition series : A series of values of some variable made according to a condition IS called condition
series. Data are presented with reference to some condition, viz., height, age, weight, income etc. For
example :
500-999 35
1000-1499 25
1500-1999 15
2000-2499 20
2500-2999 5
i;
After collection and classification of data it is the most important job now to construct the data in an
arranged order that is the formation of series for further study of presentation, analysis and
interpretation. This arrangement can be done in three ways :
(a) Series of Individual Observation, (b) Discrete Series, (c) Continuous Series.
Mass data in its original form is called raw data or unorganised data which can be arranged in any of the
following ways :
(/•) Serial order of alphabetical order, (ii) Ascending order, (Hi) Descending order.
The mass data when put in ascending or descending order of magnitude is called an array. A series of
individual observations is a series where items are listed singly after collection. They are not listed in
groups.
Suppose an investigator has obtained the following information from a factory about the payment of
daily wages of 30 workers, which is in unorganised form (Raw Data) as shown in Table 1.
TABLE 1
60 102 61 101 92 80
87 72 86 73 96 101
92 56 90 58 85 74
83 63 84 62 92 100
56 84 90 86 67 72
Organisation of Data
59
The above raw data can be arranged either in serial order (Table 2) or ascending order (Table 3) or
descending order (Table 4) as given below :
1 60 11 61 21 92
2 87 12 86 22 96
3 92 13 90 23 85
4 83 14 84 24 92 ■
5 56 15 90 25 67
6 102 16 101 26 80
7 72 ■ 17 73 27 101
8 56 18 58 28 74
9 63 19 62 29 100
10 84 20 86 30 72
56 62 73 84 90 96
56 63 74 85 90 100
58 67 80 86 92 101
60 72 83 86 92 101
61 72 84 87 92 102
102 92 87 84 72 61
101 92 86 83 72 60
101 92 86 80 67 58
100 90 85 74 63 56
96 - 90 84 73 62 56
FREQUENCY DISTRffiUTION
Before discussing anything about frequency distribution it is advisable to know the following important
terms of frequency distribution under which the two types of distributions are grouped. The two types
are :
60
Examine the following two sets of illustrations to clearly understand the basic termmology of frequency
distribution. ^
01234 25 45 37 15 8
Total 130
56-58 12
58-60 16
60-62
62-64 4
64-66 10
Total 57
Frequency : The number of times given value in an observation appears is the frequencv
four chiljen ^ 10 students m the group of 64" to 66" and 16 students m group of 58 to 60 etc. So the
frequency of famihes having no child is 25, frequency of families havmg 4 children is 8; frequency of
students m the group of 6^" to 66•^s 10 and frequency m the group of 58" to 60" is 16.
class frequency^.^., out of the five classes of Set II students in a group of 58" to 60"
e.g., the total 130 and 57 in our set I and set II.
boundaries of a d^ss, are known as the upper and lower limits, respectivdy For
andTht hf H / f magnitudes 56, 58, 60, 62, and 64 are the lower limits
Organisation of Data ^^
The above raw data can be arranged either in serial order (Table 2) or ascending order (Table 3) or
descending order (Table 4) as given below :
1 60 11 61 21 92
2 87 12 86 22 96
3 92 13 90 23 85
4 83 14 84 24 . 92
5 56 15 90 25 67
6 102 16 101 26 80
7 72 • 17 73 27 101
8 56 18 58 28 74
9 63 19 62 29 100
10 84 20 86 30 72
56 62 73 84 ' 90 96
56 63 74 85 90 100
58 67 80 86 92 101
60 72 83 86 92 101
61 72 84 87 92 102
TABLE 4
(Wages in Rupees)
102 92 87 84 72 61
101 92 86 83 72 60
101 92 86 80 67 58
100 90 85 74 63 56
96 - 90 84 73 62 56
FREQUENCY DISTRIBUTION
Before discussing anything about frequency distribution it is advisable to know the following important
terms of frequency distribution under which the two types of distributions are grouped. The two types
are : .
Examine the following two sets of illustrations to clearly understand the basic terminology of frequency
distribution.
0 25
1 45
2 37
3 15
4 8
Total 130
56-58 12
58-60 16
60-62 15
62-64 4
64-66 10
Total 57
Series is a systematic arrangement of items into a particular order or sequence in ffe. different
classified categories, as Set I for Children in Families and Set II for Height of
Students.
Frequency: The number of times given value in an observation appears is the frequency For example, in
the above sets there are 25 families having no child and 8 families having four children; and 10 students
in the group of 64" to 66" and 16 students in group of 58" to 60" etc. So the frequency of families having
no child is 25, frequency of families having 4 children is 8; frequency of students in the group of 64" to
66" is 10, and frequency in the group of 58" to 60" is 16.
Class frequency : The number of values in each of the quantitative classes is called the class frequency,
e.g., out of the five classes of Set II students in a group of 58" to 60" are 16 and students in a group of
62" p 64" are 4, so the class frequency of the class 58" to 60" is 16 and of 62" to 64" is 4. There is no
instance of a class in Set I.
Total frequency : The sum (total) of the frequencies is known as the total frequency, e.g., the total 130
and 57 in our set I and set II.
Frequency distribution : The distribution of observations over the several values is called frequency
distribution. For example. Set I is the frequency distribution of children m families, and Set II is the
frequency distribution of heights of students.
Class. It is a decided group of magnitudes, e.g., 56"-58", 100-200, 10-19, 4-8, 7-13 etc.
Upper and lower limits of the classes : The lowest and the highest magnitudes, which form the
boundaries of a class, are known as the upper and lower limits, respectively For 1 example, for a class of
62-64, 62 is lower limit and 64 is upper limit. Thus in the first column of Set II, left hand side magnitudes
56, 58, 60, 62, and 64 are the lower limits! and right hand side magnitudes 58, 60, 62, 64 and 66 are the
upper limits of their] respective classes.-
Organisation of Data 6\
Cias- mterval : The magnitude spread between the lower and upper class limits is called class interval. It
is the span or width of a class which can be obtained by finding the difference between the upper and
lower limits of the class. For example, for class 64"-66". the class interval is upper limit (l^) - lower limit
(/j), i.e., (l^ - l^) = 66-64 = 2. The class interval in this case is 2, l^ is the lower limit and is the- upper limit.
Mid-point : The mid-value which lies half way between the lower and upper class limits is known as mid-
point. Thus, in a class of 62 "-64" the mid-point is
or
/2+/1 64+62
= 63
22
Calculated mid-points are the most important values, as being the representatives of the classes, and
are taken for use in further statistical calculations.
Variable : A quantity which varies from one individual to another is known as a variable or variate.
Quantitative characteristics such as income, height, weight, number of units sold etc., are variables. A
variable may be either discrete or continuous.
Discrete and Continuous Variables
Discrete and Discontinuous Variables are those which are exaci or finite and are not normally fractions.
They cannot manifest every conceivable fractional value, but appear by limited gradations. For example,
children in a family can be either 2 or 3, but cannot be 2.2, 2.8 or 2.7. It is a descrete variable which is
not expressed in a fraction. In the same way test scores of a cricket match, rooms in a house, workers in
a factory, fans installed in an auditorium, students in a class are all the examples of discrete variables.
The occurrence of the observation will be integers, i.e., 1, 2, 3, 4, 5, 6, ... and so on. Thus the variable is
said to be of a discrete type when there are gaps between one value and the next. For example, in set I;
0, 1, 2, 3, 4 are discrete variables.
Even fractional values are discrete or discontinuous variables provided there is an uniform difference
from one variable to the other variable. For example, if wage rate per unit is 50 paise then workers of a
factory may get wages in rupees as : 0.50, 1, 1.50, 2, 2.50, 3, 3.50, and so on.
Continuous variables are those that one in units of measurement which can be broken down into infinite
gradations, e.g., weights, heights, incomes, rainfall etc. They are capable of manifesting every
conceivable fractional value {i.e., in decimals) within the range of possibilities. They fall in any numerical
value within a certain range. For instance covering a distance on a road, by car, say from 0 kilometre to 5
kilometres one never jumps from 0 to 1 km, 1 to 2 km, or 2 to 3 km, but every fraction of distance from
0 km to 5 km is touched. In other words, the car must pass through'all the infinitely small gradations of
distance between 0 km to 5 km. All the fractional values are continuous variables. Heights of students
from 56" to 58" in our set II for example, cover all the fractional values falling within the limit of 56 and
58.
II
62
1. Prepare a table with three columns-first for variable under study, second for 'Tallv
column
15
16
17
18 19
mill 7
mi nil .9
mi 5
III 3
1 1
Total 4 25
Organisation of Data
Class interval : The magnitude spread between the lower and upper class limits is called class interval. It
is the span or width of a class which can be obtained by finding the difference between the upper and
lower limits of the class. For example, .or c^ass 64"-66". the class interval is upper limit (Z^) - lower limit
(/,), i.e., (l^-/,) - 6t,-b4 - 2. The class interval in this case is 2, is the lower limit and is the-upper limit.
Mid-point : The mid-value which lies half way between the lower and upper class limits is known as mid-
point. Thus, in a class of 62"-64" the mid-point is
or
l2±k 2
64+62
= 63
Calculated mid-points are the most importam values, as being the representatives of the classes, and
are taken for use in further statistical calculations.
Variable : A quantity which varies from one individual to another is known as a variable or variate.
Quantitative characteristics such as income, height, weight, number of units sold etc., are variables. A
variable may be either discrete or continuous.
Discrete and Discontinuous Variables are those which are exac. or finite and are not normally fractions.
They cannot manifest every conceivable fractional value, but appear by limited gradations. For example,
children in a family can be either 2 or 3, but cannot be 2 2 2 8 or 2 7. It is a descrete variable which is not
expressed in a traction. In the same way test scores of a cricket match, rooms in a house, workers in a
factory, fans mstaUed in an auditorium, students in a class are all the examples of discrete variables. The
occurrence of the observation will be integers, i.e., 1, 2, 3, 4, 5, 6, ... and so on. ihus the variable is said
to be of a discrete type when there are gaps between one value and
Even fractional values are discrete or discontinuous variables provided there is an uniform difference
from one variable to the other variable. For example, if wage rate per unit is 50 paise then workers of a
factory may get wages in rupees as : 0.50, 1, 1.5U, Z,
Continuous variables are those that one in units of measurement which can be broken down into infinite
gradations, e.g., weights, heights, incomes, rainfall etc. They are capable of manifesting every
conceivable fractional value {i.e., in decimals) within the range ot possibilities. They fall in any numerical
value within a certain range. For mstance covering a distance on a road, by car, say from 0 kilometre to 5
kilometres one never jumps from 0 to 1 km, 1 to 2 km, or 2 to 3 km, but every fraction of distance from
0 km to 5 km is touched. In other words, the car must pass through'all the infinitely small gradations of
distance between 0 km to 5 km. All the fractional values are continuous variables Heights of students
from 56" to 58" in our set II for example, cover all the fractional values falling within the limit of 56 and
58.
62
Discrete series : Any series represented by discrete variahdes is called a discrete series e.g.. Set I of the
distribution of children in families is a discrete series.
Continuous series : Any series described by continuous variables is called continuous series, e.g.. Set II of
the distribution of heights of students is a continuous series.
It is to be noted that a discrete variable series can be presented in a continuous type of series also, but
continuous variables cannot be presented in a discrete series. Whenever the range of values in a
discrete series is too wide, one can have the choice of a continuous frequency distribution.
Considering discrete and continuous series, now individual observations can be constructed and
condensed in two ways :
1. Prepare a table with three columns—first for variable under study, second for 'Tally bars' and
the third for the total, representing corresponding frequency to each value or size of the variable.
2. Place all the values of the variables in the first column in ascending order-beginning with the
lowest and giving to the highest. The gap between one magnitude to another may preferably be the
same.
3. Put bars (vertical lines) in front of the values accordingly in the second column keeping in view
the number of items a particular value repeats itself. This column IS for facility in counting. Blocks of five
bars or mi or W are prepared and some space IS left between each block of bars.
4. Count the number of bars in respect of each value in the variable and place it in the third
column made for total or frequency.
Solution.
15 mill 7
16 mi nil .9
17 m^ 5
18 III 3
19 1 1
Total 4 25
63
Organisation of Data
Illustratxon 2. In a aty 45 famUies were surveyed for the number of domestic apphances
2 2 2 2 1 2. 1 2 2 - ^ ^ 3
3324
Solution.
22
37
24
22
2 1 2 2.3 3 3 6 l" 6 2 1 5 1 5 4 3 4 2 0 3 1 4
Number of Appliances
012
67
Tally bars
mill
mi M M miM II M
Total
1 7 15 12 5 2 2 1
45
I »|
appliances.
"ations are divided mto groups havmg class mtervals. There are two methods of
Sometimes lower limits are excluded from their respective classes. For example, if the students'
obtained marks are grouped as 5-10, 10-15, 15-20, 20-25, 25-30 etc., we include in the first group the
students whose marks are above 5 and up to 10. If the marks of a student are 10, he is included in the
first group. But if a student gets 5 marks, we will have to prepare a group 0-5 to include.
There are various methods by which class intervals can be designated. They are: {a) By Inclusive
method :
Marks : 5-9,
or Prices in (Rs) : 5-9.99, {b) By Exclusive method : (i) Lower limit excluded :
Marks : 5-10,
15-20,
20-25,
25-30
These are to be
These are to be
10-15,
Lower limits 5, 10, 15, 20, 25, 30 of their respective groups are excluded. («) Upper limit excluded :
Upper limits 10, 15, 20, 25, 30 of their respective groups are excluded. However, if the class intervals are
given as 5-10, 10-15, 15-20, 20-25 etc., it is always presumed that upper limits are excluded in absence
of any specific instructions.
(c) By mentioning lower limits (followed by a dash) : Marks : 5- 10-, 15-, 20-, 25-,
(d) By mentioning upper limits (preceded by a dash): Marks : -10, -15, -20, -25, -30.
These mid-points are required to be converted into class intervals. Say for first midpoint (12.5-7.5) and
divide the difference by 2, i.e., (5/2). The quotient is added and subtracted to first mid-point we get,
(7.5-2.5 = 5) and (7.5 + 2.5 = 10). We get thus the class interval 5-10. In the same way intervals of all the
mid-points can be obtained, i.e., 10-15, 15-20, 20-25, 25-30.
In certain frequency distributions 'open-end' class intervals are given as we find in the example given
below :
Marks Below 10 10-15 15-20 20-25 25-30 30-35 35 and above Total
Frequency (f) 7 10 13 18 8 5 3 64
In such cases, values are put on the basis of construction of series. In the above series '5' in place of
'below' and '40' in place of 'above' may be put. Thus making the classes as : Marks 0-10 10-15 15-
20 20-25 25-30 30-35 35^0
Organisation of Data 63
Illustration 2. In a city 45 families were surveyed for the number of domestic appliances they used.
Prepare a frequency array based on their replies as recorded below.
Solution.
I
Frequency Array of Domestic Appliances Used by 45 Famihes
0 1 1
1 Mil 7
2 MMM 15
3 MMii 12
4 M 5
5 II 2
6 II 2
7 1 1
Total 45
Thus, from the above table it is clear that out of 45 families 1 is not using any domestic appliance, 7
using 1 appliance, 15 using 2 appliances, 12 using 3 appliances, 5 using 4 appliances, 2 using 5, 2 using 6
appliances and only 1 family using 7 domestic appliances.
Observations are divided into groups having class intervals. There are two methods of classifying the
data according to class intervals.
(a) Inclusive Method : Under this method upper class limits of classes are included in respective
classes. For example, if the students obtained marks are grouped as 5-9, 10-14, 15-19, 20-24, 25-29 etc.,
in the group 5-9, we include in first group students whose marks are between 5 and 9. If the marks of a
student are 10 he is included in the next class, i.e., 10 to 14. If there are no whole numbers, the classes
can be made 5-9.9., 10-14.9, 15-19.9 and so on.
(b) Exclusive method : Under this method upper limits are excluded. The upper limit of class interval
is the lower limit of the next class. For example, if the marks obtained by the students are grouped as 5-
10, 10-15, 15-20, 20-25, 25-30 etc., we include in first group of students whose marks are 5 or more but
under 10. If the marks of a students are 10 he is not included in the first group but in the second, i.e., 10
to 15.
There are various methods by which class intervals can be designated. They are: (a) By Inclusive
method :
Marks : S-9,
or Prices in (Rs) : 5-9.99, (&) By Exclusive method : (/) Lower limit excluded :
Marks : 5_io,
25-30
These are to be
These are to be
Lower limits 5, 10, 15, 20, 25, 30 of their respective groups are excluded. (ii) Upper limit excluded :
Upper limits 10, 15, 20, 25, 30 of their respective groups are excluded. However, if the class intervals are
given as 5-10, 10-15, 15-20, 20-25 etc it is always presumed that upper limits are excluded in absence of
any specific instructions, (c) By mentioning lower limits (followed by a dash) : ^'^'ks : 5-, 10- 15-
20- 25-
{d) By mentioning upper limits (preceded by a dash): Marks : -10, -15, -20, -25, -30.
These mid-points are required to be converted into class intervals. Say for first midpoint (12 5-7.5) and
divide the difference by 2, (J/2). The quotiem is added and subtracted to first mid-point we get, (7.5-2.5
= 5) and (7.5 + 2.5 = 10). We get thus the
class '"t^al 5 10 In jhe same way intervals of all the mid-points can be obtained, lU 15, 15—20, 20—25,
25—30.
In certain frequency distributions 'open-end' class intervals are given as we find in the example given
below :
Marks Below 10 10-15 15-20 20-25 25-30 30-35 35 and above Total
Frequency (f) 7 10 13 18 8 5 3 64
In such cases values are put on the basis of construction of series. In the above series 5 m place of below
and '40' in place of 'above' may be put. Thus making the classes as • Marks 0-10 10-15 15-20 20-25
25-30 30-35 35-40
ganisation of Data
65
ciples of Grouping
There is no hard and fast rule for grouping the data, but following general principles ay be kept in mind
for satisfactory and meaningful classification of data :
[a) It is advisable to have total number of classes between 5 and 15. The preference for the total
number of classes depends on the numbers and figures to be grouped, the magnitude of the figure and
possibility of simplified calculations of further statistical studies.
[b) Odd figures for example 3, 7, 9, 11, 27, 33 etc. should be avoided for class intervals. The choice
for the class intervals should be either 5 or a multiple of 5. It simplifies our further statistical
calculations.
[d) For maintaining continuity and correct classes exclusive method of preparing classes is adopted.
The first and the last classes are open-end classes; the first is open at the lower-end and last at the
upper end. For statistical calculations the open-ends should be closed. Maintaining the regularity of the
class intervals we can close these groups as 0-5 and 20-25.
(g) For frequency distribution, we prepare a table having three columns—first for variables, second for
'Tally bars" and the third for the total representing corresponding frequency to each class.
Simple Series and Cumulative Series : We have seen in the above illustrations the erns of simple series of
discrete type and continuous type (Using inclusive and exclusive [lods of class intervals). In simple series
the frequency is shown against each value or in cumulative series the frequencies are progressively
totalled. See the following tration :
Simple Series
10 4 0-10 4
20 8 10-20 8
30 15 20-30 15
40 20 30-40 20
50 13 40-50 13
66
Cumulative Series
Less than
Marks
Less than 10
Less than 20
Less than 30
Less than 40
Less than 50
More than
Marks i No. of Students (i
More than 0 i 60
Now, we can read students getting less than 10 marks are 4, less than 20 marks 12, less than 30 marks
are 27 and so on.
In the same way the students getting more than 0 mark are 60, more than 10 mai are 56, more than 20
marks are 48 and so on.
Illustration 3. From the following table given below of monthly household expenditi (m Rs) on food of 50
households;
(b) Divide the range into appropriate number of class intervals and obtain the frequei distribution
of expenditure.
(c) Find the number of households whose monthly expenditure on food is (/■) less than Rs 2000 (ii)
more than Rs 3000
(c
(e (;
(g)
Solution.
(a) Finding the highest and lowest expenditure on food of 50 households to get range by the following
formula.
Range = L - S
■ rj^ . 65
msatton of Uata
.viples of Grouping
There is no hard and fast rule for grouping the data, but following general principles ly be kept in mind
for satisfactory and meaningful classification of data : la) It is advisable to have total number of classes
between 5 and 15. The preference for the total number of classes depends on the numbers and figures
to be grouped, the magnitude of the figure and possibility of simplified calculations of further
(b) Odd figures for example 3, 7, 9, 11, 27, 33 etc. should be avoided for class intervals. The choice for
the class intervals should be either 5 or a multiple ot 5. It simplifies our further statistical calculations. ic)
Lower limit of the class as far as possible, should be 0 or a multiple of 5.
(d) For maintaining continuity and correct classes exclusive method of preparing classes is adopted.
The first and the last classes are open-end classes; the first is open at the lower-end and last at the
upper end. For statistical calculations the open-ends should be closed. Maintaining the regularity of the
class intervals we can close these groups
corresponding frequency to each class. Simple Series and Cumulative Series : We have seen in the above
illustrations the terns of simple series of discrete type and continuous type (Using inclusive and exclusive
lods of class intervals). In simple series the frequency is shown against each value or I, in cumulative
series the frequencies are progressively totalled. See the following
ration :
Simple Series
66
Cumulative Series
Less than
Marks
Less than Less than Less than Less than Less than
10 20 30 40 50
More than
Marks
More than 0
More than 10
More than 20
More than 30
More than 40
i No. of Students {t \ 60
56 (60-4)
48 (60-12)
33 (60-27)
13 (60-47)
Now we can read students getting less than 10 marks are 4, less than 20 marks 12, less than 30 marks
are 27 and so on.
In the same way the students getting more than 0 mark are 60, more than 10 ma are 56, more than 20
marks are 48 and so on.
Illustration 3. From the following table given below of monthly household expendit (m Rs) on food of 50
households;
(b) Divide the range into appropriate number of class intervals and obtain the frequeu distribution
of expenditure.
(c) Find the number of households whose monthly expenditure on food is (/■) less than Rs 2000 (ii)
more than Rs 3000
(a) Finding the highest and lowest expenditure on food of 50 households to get I range by the following
formula.
Range = L - S
'^ganisation of Data
Range = 5090 - 1007 = Rs 4083 (b) Dividing the class interval of Rs 500, we get
4083
67
500
= 8.166
Now, we decide 9 classes to include all the given values preparing a continuous frequency distribution
by exclusive method (excluding upper limit).
1000-1500 miMTHiTHl 20
1500-2000 MM III 13
2000-2500 Ml 6
2500-3000 M 5
3000-3500 II 2
3500-4000 1 1
4000-4500 II 2
4500-5000 0
5000-5500 1" 1
Total 50
(c) (i) Number of households whose monthly expenditure is less than Rs 2000 (i.e., 1000 - 2000)
= 20 + 13 = 33 Households (ii) Number of households whose monthly expenditure is more than 3000
(i.e., 3000 - 5500)
31 23 19 29 22 20 16 10 13 34
38 33 28 21 15 18 36 24 18 15
12 30 27 23 20 17 14 32 26 25
18 29 24 19 16 11 22 15 17 10
68
Solution.
Frequency Distribution
Class interval
Tally bars
Frequency (f)
875421
Total
40
Oass Boundaries
In above illustration 10-13 14-17 io 91 ^c of inclusive method of construction of coin nor f ' ^^^ I™
Steps
Mid-point =
(c)
m.v. =
li+h
nil
[ methoc Also ol
31 38 12 18
Organisation of Data
4083
67
500
= 8.166
Now, we decide 9 classes to include all the given values preparing a continuous frequency distribution
by exclusive method (excluding upper limit).
1500-2000 MM III 13
2000-2500 Ml 6
2500-3000 M 5
3000-3500 II 2
3500-4000 1 1
4000-4500 II 2i„
4500-5000 10
5000-5500 1" 1
Total 50
(c) (i) Number of households whose monthly expenditure is less than Rs 2000 [i.e., 1000 - 2000)
= 20 + 13 = 33 Households («■) Number of households whose monthly expenditure is more than 3000
(i.e., 3000 - 5500)
31 23 19 29 22 20 16 10 13 34
38 33 28 21 15 18 36 24 18 15
12 30 27 23 20 17 14 32 26 25
18 29 24 19 16 11 22 15 17 10
68
Solution.
Frequency Distribution
10-13 M 5
14-17 mini 8
18-21 mini 8
22-25 m^ii 7
26-29 m^ 5
30-33 nil i 4
34-37 n 1 7
38^1 1ii 1
Total 40
f^.Iass Boundaries
In above illustration 10-13, 14-17, 18-21, 22-25, 26-29 and so on are class Um of mclusxve method of
construction of contmuous frequencv distribution. We S 'Z or discontinmty between upper limit of a
class and lower limit of next class Fo elS
Steps
14 - 13 = 1
- =0.5
3. Subtract the value obtained from lower limits of all the classes (- 0 5)
2
m.v. =
Ink 2
lelativ
It i factual 1
69
Illustration 5. Prepare a frequency distribution by inclusive method taking class interval of 7 from tbe
following data :
lit of
9.5-13.5 5 11.5
13.5-17.5 8 15.5
17.5-21.5 8 19.5
21.5-25.5 7 23.5
25.5-29.5 5 27.5
29.5-33.5 4 31.5
33.5-37.5 2 35.5
37.5-41.5 1 39.5
Total 40
28 17 15 22 29 21 23 27 18 12 7 2
9 4 6 1 8 3 10 5 20 16 12 8
4 33 27 21 15 9 3 36 27 18 9 2
4 6 32 31 29 18 14 13 15 11 9 7
1 5 37 32 28 26 24 20 19 25 19 20
Solution.
Frequency Distribution (Inclusive Method)
class
0-7 miMmi 15
8-15 miMTHi15
16-23 mm nil 14
24-31 mimi 1 11
32-39 M 5
Total 60
It is sometimes required to show the relative frequency of occurrences rather than ual number of
occurrences in each class of frequency distribution. If actual frequencies I expressed as per cent of the
total number of observations, relative frequencies are ained.
70
Individum
Money (Rs)
Individual
6
7
89
10
109 11 131
117 12 136
119 13 143
121 14 156
126 15 169
Individual
■"-'igclllisc
frequencies. Solution.
16
17
18
19
20
Money (Rs)
OJ
lof 7 :
Tally bars
Frequency (f)
Mil
Total
2742221
20
10 35 20 10 10 10 5
100
Soli
Money (Rs)
Tally bars
MMii
Total
Frequency (f)
1 12 4
5 60 20 15
dative ]
anisation of Data
69
13.5-17.5 8 15.5
17.5-21.5 8 19.5
21.5-25.5 7 23.5
25.5-29.5 5 27.5
29.5-33.5 4 31.5
33.5-37.5 2 35.5
37.5-41.5 1 39.5
Total 40
28 9 4 4 1
Solution.
17 15 22 29 21 23 27 18 12 7 2
4 6 1 8 3 10 5 20 16 12 8
33 27 21 15 9 3 36 27 18 9 2
6 32 31 29 18 14 13 15 11 9 7
5 37 32 28 26 24 20 19 25 19 20
0-7 mimm 15
8-15 miMM 15
16-23 mm 111! 14
24-31 MM 1 11
32-39 M 5
Total 60
lative Frequency Distribution
It is sometimes required to show the relative frequency of occurrences rather than Illmber of occuLnces
in each class of frequency distribution If actual frequencies ^pressed as per cent of the total number of
observations, relative frequencies are
ained.
Illustration 6 In a hypothetical sample of 20 individuals the amounts of money them were found to be :
frequencies. Solution.
75-100 II 2 10
100-125 Mil 7 35
125-150 nil 4 20
150-175 II 2 10
175-200 II 2 10
200-225 II 2 10
225-250 1 1 5
Total 20 100
50-100 1 1 c
200-250 III 3
Total 20 100
-XI
vith
ganisation of Data ^^
Data are sometimes given in unequal class intervals. Such series are used when there f great fluctuation
in data. For example :
ative
0-5 X 2 X 2-A X
x)ss of Information
Raw data is grouped by making equal or unequal class frequency distribution, say 1-5, 5-10, 10-15 or 0-5,
5-7, 7-12, 12-20 and so on. By making such classes there is loss of information of individual observation.
Further, the statistical analysis is based on die mid-points of these classes without giving any importance
to individual observation. ^ such, the significance of individual observation is lost.
livariate Frequency Distribution
We have so far studied above frequency distributions involving single variable only, uch frequency
distributions are called univariate frequency distributions. Often we come aoss data composed of
measurements made on two variables for each individual items.
example, we may study the weights and heights of group of individuals, the marks .uiined by a group of
students in two different subjects, ages of husbands and wives for group of couples, etc. A frequency
table where two variables have been measured in the ue set of items through cross classification is
known as 'bivariate frequency distribution" ntervalB 'two-way frequency distribution'. Various values of
each variable are grouped into ious classes (not necessarily the same for each variable).
lUustration 7. Following figures give the ages of 20 newly married couples in year, jresent the da ! of
husband t of wife {of husband ! of wife
Solution. We are given two variables : (i) age of husbands, and (ii) age of wives. We Id represent the data
in the form of a two-way frequency distribution so that we are to show the ages of husbands and wives
simultaneously. This is also called bivariate \cy distribution.
24 26 27 25 28 24 27 28 25 26
17 18 19 17 20 18 18 19 18 19
25 26 27 25 27 26 25 26 26 26
17 18 19 19 20 19 17 20 17 18
72
. -I i '1 i.fV
24
25
26
27
28
Total (/)
17
Bivariate Frequenqr Distribution , Age of wife, (years)
20
Total (
25742
20
Illustration 8 Tbe data given below relate to the heights and weights of 20 nersc
64 -66 and so on and 115 to 125 lbs., 125 to 135 lbs. and so on.
S.N.
10
Solution.
170 70 11 163 70
135 65 12 139 67
136 ■ 65 13 122 63
137 64 14 134 68
148 69 15 140 67
124 63 16 132 69
117 65 17 120 66
128 70 18 148 68
143 71 19 129 67
129 62 20 152 67
115-125 125-135 135-145 145-155 155-165 165-175 II (2) i (1) 1 (1) 111 (3) 1 (1) 1 (1) II (2)
1 (1) II (2) 11(2) i (1) 1 (1) 1 (1) 1 (1) 456311i
Total (/) 3 4 5 4 4 20
71
CData are sometimes given in unequal class intervals. Such series are used when there eat fluctuation in
data. For example :
ions. 64",
iO
1 0-5 X 1 X 2-A X
5-10 Y 5 Y 2-6 X+Y
10-20 Z 7 Z 2-8 X + Y+ Z
a (f)
KS of Information
Raw data is grouped by making equal or unequal class frequency distribution, say -5, 5-10, 10-15 or 0-5,
5-7, 7-12, 12-20 and so on. By making such classes there is loss of information of individual observation.
Further, the statistical analysis is based on ' mid-points of these classes without giving any importance to
individual observation, such, the significance of individual observation is lost.
We have so far studied above frequency distributions involving single variable only. 1 frequency
distributions are called univariate frequency distributions. Often we come uss data composed of
measurements made on two variables for each individual items. • example, we may study the weights
and heights of group of individuals, the marks ained by a group of students in two different subjects,
ages of husbands and wives for oup of couples, etc. A frequency table where two variables have been
measured in the • set of items through cross classification is known as 'bivariate frequency distribution'
i'two-way frequency distribution'. Various values of each variable are grouped into
ous classes (not necessarily the same for each variable). Inhistration 7. Following figures give the ages of
20 newly married couples in year.
24 26 27 25 28 24 27 28 25 26
17 18 19 17 20 18 18 19 18 19
25 26 27 25 27 26 25 26 26 26
17 18 19 19 20 19 17 20 17 18
Solution. We are given two variables : [i) age of husbands, and (ii) age of wives. We lid represent the
data in the form of a two-way frequency distribution so that we are to show the ages of husbands and
wives simultaneously. This is also called bivariate
icy distribution.
72
(years)
24
25
26
27
28
17
Total (/)
18
19
I (1)
I (1)
III (3)
I (1)
20
Total i
25742
______——20
nterval 62"-
S.N.
10
170
135
136
Solution.
Height
70 65 65
64
69 63
65
70
71 62
S.N.
11 12
13
14
15
16
17
18
19
20
Weight
163
Height
70
67 63
68
67 69 66
68 67 67
lOrganisation of Data
73
_:
exercises
uestions :
I Distinguish between variable and attribute. Explain with examples, i Define classification. Explain the
objects and characteristics of classification. ! What do you understand by classification? Explain the
methods of classification of j data giving suitable examples.
; Is there any use in classifying things? Explain with illustrations. ^ Explain discrete and continuous
variables with examples. Define series and explain the different types of series. Define Frequency
Distribution. State the principles required to be observed in its formation.
8. Explain with illustration the 'inclusive' and 'exclusive' methods used in classification of data.
12. Do you agree that classified data is better than raw data?
13. What is a relative frequency distribution? Illustrate. Write short notes on the following :
(c) Exclusive and inclusive class-intervals, i (d) Discrete and continuous series. I [e) Simple and
cumulative frequency. I (/) Equal and unequal class frequency
blems :
Prepare a statistical table from the following data taking the class width as 7. by ! inclusive method :
28 17 15 22 29 21 23 27 18 12
7 2 9 4 6 1 8 3 10 5
20 16 12 8 4 33 27 21 15 9
3 36 27 18 9 2 4 6 32 31
29 18 14 13 15 11 9 7 1 5
37 32 28 26 24
74
I
i^v
50 57 58 51 53 62 64 60 61
51 64 55 55 52 60 65 58 60
52 63 56 56 58 64 63 62 60
54 62 54 54 60 65 60 62 59
56 63 52 53 62 53 61 61 59
69 33 91 53 63 69
70 36 80 78 52 51
73 73 92 64 55 49
74 57 95 70 64 57
75 80 42 85 43 29
77 65 73 95 76 53
86 73 40 83 43 76
84 72 75 57 58 59
62 65 67 87 81 84
61 75 85 81 58 81
4.
47 69 78 62 72 43 87 61 84 23
Change the following into continuous series and convert the series into 'less than' and more than
cumulative series :
5.
5 15 25 35 45 55
8 12 15 9 4 2
Marks obtained by 24 students in English and Statistics in a class are given below
1 22 16 13 23 16
2 23 16 14 25 17
3 23 18 15 23 17
4 23 16 16 22 17
5 23 16 17 27 15
6 24 17 18 27 16
7 23 16 19 26 18
.8 25 19 20 28 19
9 22 16 21 25 19
10 23 18 22 24 16
11 24 18 23 23 17
12 24 17 24 25 19
I " ^^
^ganisation of Data
tin a survev it was found that 64 famiUes bought milk in the following quantities a parSar Inth. Quantity
of milk (in litres) bought by 64 famthes m a month.
.O 99 9 22 12 39 19 14 23 6 24 16 18 7
i y. p • i I i i iH i i ■■
1 Comrert the above data in a frequency distribution making classes of 5-9, 10-14 and
J so on. - u 1
I: The marks obtained by 20 studends in Statistics and Economics are. given below.
« • • . r _____—. Vviii*i/~vn
Marksin
10 11 10 11 11 14 12 12 13 10
20 13 24
21 12 23
22 11 22
21 12 23
23 10 22
23 14 22
22 14 24
21 12 20
24 13 24
25 10 23
8 Prepare 'less than' and 'more than' cumulative frequency distributions of the
I Find out the frequency distribution and 'more than' cumulative fi^quency^^ble . below : 10
- 30 40 50 60
Quantity(kg) : 17 22 ^ lociqo
139, 146, 153, 160, 167, 174, 181 pounds, find (^i) size of the class intervals, and (b) the class
boundaries.
PRESENTATION OF DATA
g^^lfeftwr^ Prcssentaiion
-4«nmaiic Presentation
Chapter 5
tabular presentation
J" jjji
w.
(i) Text presentation, (it) Semi-tabular presentation, (ni) Tabular presentation, and (iv) Pictorial
presentation.
\^Jabular Presentation ^^
increased from an extremely low figure of less than 2 lakhs in 1950-51 to over 46 lakhs in 1990-91. There
was around ten-fold increase in this sphere between 1991 and 2004-05 as the number of landline
connections increased to 4.42 crore besides 4.5 crore mobile phones. Thus the number of telephones
stood 9.7 crore in March ?C05. With Wnifold increase in telephone connections, the teledensity [viz.,
the number of telephone connections per hundred persons) has increased from 3.6 in 2001 to 6.7 m
2005.
■ Semi-tabular presentation is both through tables and paragraphs, This method is not often used, but is
useful when figures are required to be compared along with one or two sentences of explanation.
Tabular presentation is a systematic presentation of numerical data in columns and rows in accordance
with some important features or characteristics.
Systematic presentation of data is one of the most important consideration in statistical j work and it is
done through the use of tables. A statistical table is an arrangement of I systematic presentation of data
in columns and rows. Tabulation is the process of fpresenting in tables. Tabulation is a process and the
outcome of which are statistical Itables. In brief, tabulation is a scientific process involving the
presentation of classified ata in an orderly manner so as to bring out their essential features and chief
iracteristics.
According to H. Secrist, "Tables are a means of recording in permanent form the alysis that is made
through classification and of placing juxtaposition things that are ,nilar and should be compared".
According to Tuttle, "A statistical table is the logical listing of related quantitative ta in vertical columns
and horizontal rows of numbers, with sufficient explanatory and alifying words, phrases and statement
in the form of titles, headings and notes to make and full meaning of the data and their origin.''
bjectives of Tabulation Statistical data arranged in a tabulated form have following important objectives:
I 1. They simplify complex data and the data presented are easily understood.
2. They facilitate comparison due to proper systematic arrangement of statistical data in different
columns.
78
It
5. They present facts in minimum space and unnecessary, repetition and explanatic are avoided
and required figures can be located more quickly.
6. Tabulated data makes easy for summation of various items and errors and omissions can easily
be detected.
7. Tabulated data are good for references and they make it easy to present intormation on graphs
and diagrams.
It so, care should be taken in determining its size, proportion of columns and rov writing of figures, etc.
2. Manageable size : The size of the table should be neither too big nor too sma loo much of details
should not be given in a table. If the table is too large becomes confusing to the eyes and there is great
difficulty in following the lir and columiis at a glance. If more details are to be given, then a number of sr
tables should be preferred to one big table. So, it should be simple and comp.
3. Comparable : The facts should be arranged in a table as to make comparis. between them easy,
because, comparison is one of the chief objectives of tabulatio Whenever it is necessary, average,
percentage, proportion, etc., should be given the table to facihtate comparison.
^^ « easily understandable,
should be complete within itself containing all the explanations necessary to mi clear the meanmg to
items. Units of measurement must be clearly stated such, price m rupees" or "weight in kilograms".
Columns and rows should be numl when It is desired to facilitate reference to specific parts of a table
title, proper captions and stubs, source, footnotes etc. Certain figures which are I
thick lines^ A table should have miscellaneous columns for the data which can«
be grouped m the classification made. Large numbers are hard to read and dif
to compare therefore, they should be approximated e.g., up to the nearest
79
Wlabular Presentation
OF ATi
2, 3, 4 etc.) whenever more than or^e table ^s^ prepa ^^^^^ ^^ ^ ^^^^^ ^^^^^^^
either at
like 1.2 and 2.4 are also used. In ^ ^ould mean second table m
and second digit to its order. ^^ the fourth table in second first chapter or section and Table 2.4 wo
chapter or section. ^ ^^ or a catch title written 2. Title : There may be a V'^^^^Se^ be W, clekr and
self explanatory,
The lettering of the title should be ^he most pr ^^ ^^^^^ ^^^ ^^e
80
\'L
structure of table
Number Title
ifcsiEswwaiiisaMJB^ aw
-——
— _________
Footnote : Source :
- ----
table 1 Literacy Rates in India
Year
Rural
1951 19.02
1961 34.30
1971 48.60
1981 49.60
1991 57.90
2001 71.40
Source : Economic
Total
27.16
40.40
45.96
56.38
64.13
75.85
15.50 48.80
21.70 56.30
30.60 64.00
46.70 73.20
Total
(Per cent)
Persons
Rural
12.10
22.50
27.90
36.00
44.70
59.40
Urban
Total
18.33
28.30
34.45
43.57
52.21
65.38
Total
81
globular Presentation
against female education and in our conservative society, girls still get discriminated in the matters like
health, nutrition, education, etc. iii) Literacy rate in urban areas was high at 80 per cent in 2001 than
rural areas where ^ ^ ^t rs less L 60 per cent. This clearly speaks of inadequate facilities of education
av^IbL in the rural areas as well as comparatively lower willingness of the conservative rural folk to go to
schools for education. Illustration 1. In a sample study about coffee drinking habits m two towns, the
were 55%, Males non-coffee drinkers were 30% and Females coffee drinkers were 15%.
Solution. Let us calculate the missing percentages of the above information before
STOWN A 100
TOWN B 100
35 40
30 25
TABLE 2
(in percentages)
Coffee Drinkers
Non-Coffee Drinkers
82
Alternative Solution
TABLE 3
{in percentages]}
Males 40 20 60 25 30 55
Females 5 35 40 15 30 45
Illustration 2. Of the 1,125 students studying in a school during 2005-2006, 720 are Hindus, 628 are boys
and 440 are science students. The number of Hindu boys is 392, that of boys studying science 205 and
that of Hindu students studying science 262; finally, the number of science students among the Hindu
boys was 148. Enter these frequencies in a table and complete the table by obtaining the frequencies of
the remaining cells.
Solution.
TABLE 4
Science Arts 148 24457 179 205 423114 214121 48 235 262262 458178 227440 j 685 1
Total 392 236 628 328 169 497 720 405 1125 j
niustration 3. Census of India 2001 reported that Indian population had risen to 102 crore of which only
49 crore were females against 53 crore males. 74 crore people resid m rural India and only 28 crore lived
in towns or cities. While there were 62 crore nc workers Population against 40 crore workers in the
entire country, urban population an even higher share of non-workers (19 crore) against the workers (9
crore) as comp; to the rural population where there were 31 crore workers out of 74 crore populatic
Represent the above information in a tabular form.
83
(figures in crores)
fs of tables
library
wMcH co„.ai„ —. inro™a.o„ i. .^e «ae fom, in wUch they are origi^lly collected
«J,„
SsfT
84
. table 6
of Students in a School
Marks
Total
15 12 28 5
60
,, , , table 7
I---i" a School
Mirt,
r^- • , table 8
ular Presentation
85
The above table can even be called as manifold table, higher order table or ma^ 2lon Z "lh we can
increase the number of charactensttcs, more sections,
" rh'aX"is needed when a number of characteristics are to be simultaneously ii But as more
characteristics are included, the table becomes more complex, and
, may be confusing to the reader. If the field of investigation is not big, the data have not too many
future use and thirdly when ,he table requirements are varymg. ifabulation wiU be more accurate than
the manual process.
EXERCISES
tions :
i Describe the major functional parts of statistical tables. Draw a structure of a table I ExpLn bnefly
the main characteristics of a good statistical
Whai are the points to be taken into accomit while preparing a table? IxpTarand discuss the various
types of tables used in a survey after the data have
i following industries : . . . ■
hArepare a blank table to show the distribution of population according to sex and ^ four religions in
three age groups in Delhi and Mumbai. y ^ five
2006.
oi ^ ——
(a) Faculty
(b) Class
8,
Social Sciences, Commercial Sciences. Under-graduate and Post-graduate classes. Male and Female.
2005 and 2006. Tabulate the following •
JXaSSe^—^^^
of the total sales during the yeaT "^P^cfvely. Texnles accounted for 30%
pesttt^
•t
Ls
Town A
Town B
60% people were males 40% were coffee drinkers, and 26% were male coffee drinkers 55% people were
males, 30% were coffee drinkers, and 20% were male coffee drinkers
Chapter 6
digrammatic presentation
Introduction
Importance and Uses of Graptis and General Rules for Constructing Diagrams Types of Diagrams
A. One-dimensional Diagrams
B. Pie Diagrams
rORIAL PRESENT^
Presentation
resentation
—► ONE-DIMENSIONA^DIAGRAMS
(/■) Simple Bar Diagram-(i7) Sub-divided Bar Diagram (i/i) Multiple Bar Diagram (/i^ Percentage Bar
Diagram (v) Broken Bar Diagram (vA Deviation Bar Diagram -►TWO-DIMENSIONAL DIAGRAMS (f)
Rectangles (fO Squares
'//A Circles and Pie-diagrams -►THREE-DIMENSIONAL DIAGRAMS (!) Cubes (iO Cylinders (I/O Blocks etc.
—► PICTOGRAM —►CARTOGRAMS OR MAPS
Histogram (Hi) Frequency Polygon (iV) Smoothed Frequency Curve (Frequency Curve)
'Ogive' or Cumulative Frequency Curve GRAPHS OF TIME SERIES (A One Variable Graphs (Ii) Two or more
than two Variable Graphs (i/0 Graphs of Different Units
fr
88
r =a
For the purpose of simplifying an^tter^p'Tas ''h ^^^^ W diagrams m this chapter and some iZonZ Znr
used m presenting statistical informal ^ ^ are commonly
fluctuations of the statistical values bv^n? ' ^^^ ^^^nd and interested in going through
tiff^ure's^.e^"^ ^^^^^ -ho is not ^agrams are used for publicity X^a^aX'"'
say without any stram on' mmd a"d knowl^^^^^^^^^ ^^^^ ^^ey warn to
journals, newspapers, board meeting etc D L^T ^^ ^ " exhibitions, fairs, information to the common man
TW are wideT Particularly to givj
and other fields. Diagrams play arimorttr'1 " campaigns. ^ ^ ^^ an importarn role m the modern
advertising
Digrammatic Presentation
89
i
..J The «:i7e of the diagram should be neither too big nor too
paper. It should be attractive, neat and appealing to the eyes, so that peoples attention is automatically
drawn towards it. .,,
'il^ilifp
scale slould be selected to su.t far as possible be in even numbers or multiple ot 5, lU, /u, zo, luo
through different colours, shades, dotting, crossing, etc., an index must g for identifying and
understanding the diagram.
the source from which data have been obtained, more effective than a complex one.
types of diagrams
There are various types of geometric forms of diagrams used in practice as shown on
A. One-dimensional Diagrams
B. Pie Diagram
A ONF-DIMENSIONAL DIAGRAMS
used in practice. They are called one-dimensional because of height of the bar
90
(c) Multiple bar diagram id) Percentage bar diagram (e) Broken bar diagram (/) Deviation bar
diagram (a) Simple Bar Diagrams • The
variable can be presented, A Jimple^rdlTlrc^ b'^ T
42000 -
35000
ff 28000-S O
r 21000-
^ 14000-
6,500
36,500
28,350
17,150
10,940
YEARS
Fig. 1
2001-02
w T Years on X-axis; Value (Rupees m -^ores on Y-ax s W 2 • Years on Y-axis; Value (Rupees m crores) on
X-axis.
2001-02
2000-01
1999-00
1998-99
1997-98 0
36,500
■ 28,350
4 17,150
10,940
. 6,500
—r
7000
,, ....______Dri>>a nhanaes
52-WeeteAverageJnflat^^
X (Provisional) Average up to Jan. 14, 2006. e First advance estimates (Khanf only).
.-05 -06
Fig. 3
92
proportion to the values given in the dat^ S/ ? , P^^s i, dotting or designs can be'used to d^sttguish-os-
g
Year
2001 2002
2003
2004
2005
Trains
Murder
Solution.
Robbery
82 115 144 70 68
Loot
Total
800
600-
co m
E - 400-1 O
a loot e robbery
■ murder
2004
jr
2005
Fig. 4
(c)M between t inter-relaC of drawin; In this cai spacing isi in a set, d be given. '
93
YA
100
o-E. Asia
2003-04
2004-05
YEARS
Fig. 5
94
Solution.
2005 75 68 245
(Scale ; 1 cm = 100)
500
4001
2 300H
tr o
200
loo-
ses
321
352
285
245
95
Fortnight Sugar Production, Off-take for Internal Consumption, Export arU Stock
Production Off-take from Mills Export Stock 378 154 Nil 224 387 283 41 63
export and stock which we have calculated. (Hi) Diagrammatic presentation of above data by
(Fortnight Sugar production, off-take for internal consumption, export and stools in Sugar Mills in India
Scale : 1 cm = 50,000 tons.)
400-
350-
300-
250-
iction K
8,000 1 200-
tories ■ 150-
iption 1
lil for ■ 100-
50-
grams B0-
Fig. 13
96
OT
UJ 4
m ^ a.
tr
97
Digrammatic Presentation
% COST Y
100-
: 1 cm = 20%
factory b
factory a
(t UJ 60-
Q-
CO 40"
UJ
lU
Q-
rj 20'
cc
o:
-20
Fig. 8
. c .pries in which some values may Broken Bar Diagram : Sometimes we may S™^ reasonable
shape
98
Scale : 1 cm = 25 students
200-
175-
f2 150-
LLI Q 125-
CO 100-
u.
d 75-
50-
25-
0--
YEARS
2005
Fig. 9
Year
1998
1999
Export
47 125 20 94 120
Import
30 115 39
no
125
(Rs in Lacs)
Balance of Trade
17 10
-19 -16 -5
99
Scale 1 1 cm = 5 lacs
25 -20 -15 -I
CO O
CO lU LU ti. =3 CC
10 5 0
-5 -10 -15
-20 -25
gg Surplus ■ Deficit
1998
1999
2000 YEARS
Fig. 10
2001
2002
With circles and sectors, totals as well as comp^em parts ca ^^^^ ^^^ ^^^^^
the expenditure over different heads l^ke^ heads Namely, food, clothing.
rent, education, etc. If the series is diagrams are less effective than
difference among the components is very small, then pie a g
bar diagram.
100
™^rees.S.„ceiperce„totthetotaWalne,se,„alto^.3..,.hepercentages
centre with the help ofp^r t] "^a o t ^ ^^^ component, the new line drawn a iTete to f ^
Be distinguished
Item Expenditure
Labour
Bricks
Cement
Steel
Timber
Supervision
Fig. 11
Items
Years
2003-04
Total
2004-0S
100.0
100.0
102
Items
Redymade Garments Cotton Textile WoIIen Textile
Total
2003-04
100.0
Degree of angle
188 69 103
360
2004-0S
100.0
Degree of angle
150 84 126
360
2003-04
2004-05
Fig. 12
basfs of 360 r ,
Family X Family Y
103
the
digrammatic Presentation
>ms of Expenditure
Rs
1. Food
2. Clothing
3. Rent
4. Education
Total
Square root
1000
31.6
400
—x360
= 144
Family Y
Rs
_ , ___i
1000
^x360 = 90
1000
^360 = 54 x360 = 14-4
x360 = 57.6
360
1600
40
640
1600 480
x360 = 144
x360
= 108
x360 = 72
x360 = 22.50
x360 = 13.50
360
Radii of circle are determined m proportion 3.2 : 4 (31.6 : 40). Wore the radU of arcle accordmg to
avaUabUtty of space 3.2
are :
Family Y : Radius - = 2 cm
FAMILY X
FAMILY Y
Fig. 24
104
A well constructed simple and attractive Ltam sho ^are and caution,
2.
3.
4.
5.
6.
Questions :
• a-:: —
TpLin JT "ftheir utility, txplam M bar diagram, and (b) pie diagram
Digrams are less accurate but more effective than tables in presenting the data •
W Composition of the population of Delhi by reltgion H Agnculture production of five states of India.
Explain the following with illustration ■ M Sub-divided bar diagrams, and W Multiple bar diagrams.
9.
105
iigrammatic Presentation
(Write short notes on the following 1(a) Percentage bar diagram , (c) Deviation bar diagram 1 (e)
Multiple bar diagram.
bar diagrams
total import
Total
2001-02
2002-03 2003-04
474 795
125 298
341 1,113
1,789 1,951
2,729 4,167
(Rupees in crores)
2004-05
4,744 /
Yi^f ■ ■■ ■ .. ■ . ■:--—---
' 7.
73 80 85
70 72 74
ZOOS
8.
Education Miscellaneous
Farntly A (Rs)
Family B (Rs)
P'Xpettdiiure
9 TU --——i—^ 1440
' ^'iree year's result of XTT ri T _
107
Ipigrammatic Presentation
ntmaiic 11 -----
B (Rs)
3 1
75 100
175 150
30 25
20 25
Other Expenses
________
chart :
(a) Wages
(c) Polishing
Total Cost
cmpbic presentation
3.
sw
asiiiiiifa*
as a tool of analysis.
109
Fig. 1
.ea. - - - -
110
OF FREQOEIICr
scale wid, d,e difference of lO^wS™ T ' " "" wasting too much of space of ^a7h pSr «« ^ S'^Ph
frequency graph
S'graphs..
(b) Histogram
(d) Frequency Curve or Smoothed Frequency Curve ie) Cumulative Frequency Curve or 'Ogive'
fluency array, on graph by which the line is drawn. represents the frequency of that variable on
kaphic Fresentation
111
60" 90
61" 80
62" 120
63" 140
64" 132
65" 70
66" 40
^ Metlwd
3. Draw a vertical line on each value equal to the length of each frequency
4. Both the axes must be clearly lebelled and scale of measurement clearly shown. X-axis can
conveniently be determined according to the need of the problem. We can
have three varieties of X-axis. Taking the above illustration they are :
(c) Starting from 60" (use thick line to read the data properly). See the graphs given
{d) Both axes must be clearly labelled and the scale of measurement should be clearly shown.
Scale
140 -
120 -
^ 100 -
lU
is 80-
LL
o 60 -
d 40 -
20
Fig. 13
il2
>mtcs~}
yf
(b) Histogram
and also called a frequency histo^m : '' " ' ^^-dimensional diagram
U) Histogram of Equal Class Intervals {« Histogram when Mid-points are given Histogram of Unequal
Class intervals
Method
are
freq
Thus is pn
(«)K
obtai
113
3. oe. reW. —
Solution. histogram
1 cm = frequency 4 on Y-axis
frequency)
Class (Marks)
4 10 16 22 18 2
10 X 4 = 40 10 xlO = 100
114
Method
Graf
2. X-axis for variables under study (Marks). 4 iTZ r fr^q^es (No. of Students).
5: with frequency
Thus, the class decided is 145 ~to 155 ^ ^ = ^^^ - "PP- H.^"
^ ^hri ways : ^
(iii) I
ni
Solution.
histogram x-axis-starting
marks Fig. 7
Sd Nc histogi
Metho 1. 2.
3.
4.
115
^Graphic Presentation
Scale
UJ
u. O
dZ
Fig. 9
205 215
S.
of Workers
.he Cass — are unequal, frequencies n.us. he adiusred, otherwise .he his.ogram would give a misleading
picmie.
^rr^^^^^^ each .ecan^e of h.s.ogram hu. ■ widths will be according to class limits.
116
histogram
1 cm = 5 Workers on V-axis
daily wages in rs
Fig. 10
japhic Presentation
r -Jji^s ^ Students W
117
17 25 32 13 6
Note : Since the class intervals are given ^ .^d upper limits of
and so on.
4.5- 9.5 5
9.5-14.5 17
14.5-19.5 25
19.5-24.5 32
24.5-29.5 13
29.5-34.5 6
histogram
1 cm = 10 Students on V-axis
Fig. 24
118
Method
loISo" ---
20-30 5
30-40 12
40-50 15
50-60 22
60-70 14 4
ciearly sro^^^"^' ^^^^^ ^^^elied and the scale of the meas Solution. measurement should
25'
CO 20-
&
UJ
s 15-
1-
co
u.
o 10-
5-
0-
- Histogram
-Frequency Polygon
Fig. 24
While drawing the frequency polygon, we observe that some area which was under the histogram has
been excluded and some area which was not under histogram has been included under frequency
polygon. This dotted area which was under histogram but is not under the frequency polygon. This
dotted are is excluded from the area of frequency polygon. But the shaded area has been included
under the polygon. This was not under histogram. Thus there is always some area included under the
frequency polygon instead ot the area excluded from histogram. Therefore, the total area excluded from
the histogram ts equal to the area mcluded under frequency polygon.
und™"^ ^^^ illustration, we can get the frequency polygon without histogram as Method
3. Join the points plotted for the mid-points corresponding to their frequencies by straight lines.
We will get the same figure as obtained by the first method (i.e., with histogram).
■BJ
Hr
15 5
25 12
35 15
45 22
55 14
65 4
Solution.
frequency polygon
Fig. 13
120
2.1 in -
Gi foi
Marks
m(
Total
58
64
25
J--'-WUTtlUN
» i*
marks Fig. 14
121
Graphic Fresentation
lllusmtion 8. We have the following data on the daily expendttute on food (in rupees) fot 30 households
in^alocaU^: ^^^
(a) Obtain a frequency distribution using class intervals : 100-150, 150-200, 200-250, 250-300 and
300-350
(b) Draw a frequency polygon. ju r-^nt \c) What per cent of the households spend less
than Rs 250 per day, and what per cent
lit
100-150 nil 4
150-200 mil 6
200-250 mim^iii 13
250-300 M 5
300-350 11 A.
Total 30
(b)
frequency polygon
1 cm = 2 Households on V-axis
*■ X
Fig. 15
400
122
area mcluded ,s ,ust the same as^I Tthf poL^^^^ "u ^ ^^^^ ^^^"he required to be done carefully to ge^
co rect "eS Smoothing the frequency polygon
shows neither more nor less area of the rectanLs of .h v drawn with care
frequency curve
1 cm = 2 Households on V-axis
200 250 300
>X
350 400
We observe that :
123
Graphic Presentation
_____1—------ I _____.ri^ol
124
ie) y-Shaped Curve (Curve E) : In this case, maximum frequency'is at the ends of rh. (e) Cumulative
Frequency Curve (Ogive)
Marks
Ma. of StHdents
44
7 10
Marks
No. of Students
12
125
Graphic Presentation
of each class .g in above illustration, the number of students obtain,ng marks more In 0 .s 50; moi; than
10 is 46; more than 20 is 42; and so on.
Marks
Less than 10 Less than 20 Less than 30 Less than 40 Less than 50 Less than 60 Less than 70
No. of Students
(c.f-)
15 25 37 45 50
Marks
50 46 42 35 25 13 5
case of 'less than method' and declining curve in case of 'more cuLlative frequencies are plotted on the
graph paper.
""Ilet the cumulative frequencies of the given frequencies either by 'less than method'
4. Plot the various points and )om them to get a curve (i.e., ugiv
5. be clearly lebelled and the scale of the measurement should
1 cm = 10 Students on V-axis
by
Scale
>X
10 20 30 40 50 60 70 80 MARKS
10 20 30 40 50 60 70 MARKS
Fig. 18
Fig. 19
126
FAv
100-109 7
110-119 / 13 15 32 20 8
120-129
130-139
140-149
150-159
Method
and
127
13 15 32 20
7 20 35 67 87 95
• 1 cm = 20 Workers on V-axis
be
..........Fig. 21 J
128
Solution.
It I
Marks r\ r- Number of Students Cumulative Frequency (Less than) c.f Cumulative Frequency
(More than) c.f
than' ogive
1 cm = 20 Students on /-axis
or h
Kg. 22
Graphic Presentation
129
, ^ , , .......
Time series can be sbown on the graph paper. The information arranged over a period of time (e.g.,
years, months, weeks, days etc.) is termed as a time series. Presentation of this type of information by
hne or curve on the graph paper is of great use in economic statistics. These graphs are known as hne
grapjhs or histograms, or arithmetic hne graph. (a) General Rules to Construct a Line Graph
1. As the time (year, month, week) is never in negative (i.e., in minus figures), there is no need of
using Quadrant II and III.
2. Year, month or week according to the problem, is taken on X-axis. Give titles to X-axis and Y-axis.
3. Start Y-axis with zero and decide the scales for both the axes. For example, on every 1 cm for Y-
axis one may represent an equal gap of 50 students and 1 cm for X-axis a gap between 2000 arfd 2001.
X-axis can start either from 1999 or 2000 (See Fig. 23).
4. The pair values will give different dots on the graph paper. For example, values corresponding to
time factor are :
Years Students
2000 50
2001 150
2002 100
2003 150
2004 200
2005 225
2006 200
These dots obtained of pair values are joined by straight line which is called line graph or histogram (See
Fig. 23).
students (2000-06)
300-
250-
CO
Z 111 200-
Z)
OT 150-
O
d 100-
50-
0-
/ N s /
/ f S
2000 2001
2002
Fig. 23
2005
2006
130
5. It is not advisable to ■
Kendriya Vidyalaya
Method
1998-99
1999-00
2000-01 2001-02
2002-03
2003-04
2004-05
120
400
567
490
760
834
750
Gra Wh
largi <
UJ Q =3 I-
co u.
od
Sc
Fig. 24
J Graphic Presentation
131
't rlvldTuse faUe base U„e according to ne^ of tbe ptoblent. Keeping
doln^—^ .o out tequiretnents by using False ^se Line) mustrafon No. n-bet of stude^ ts ^ one
t^usaud .n e.b
wmmmmM
P" : . . C T in crranh r nresentation See Fig.
£ that is the use of False Base Line in graphic presentation (See Fig. 25).
Year__________ Students
2U00 1120
2001 1380
2002 . 1587
2003 1490
2004 1760
2005 1734
2006 1675
Fig. 25
132
1994-95 1995-96 1996-97 1997-98 1998-99 1999-00 5.0 - 0.9 9.6 - 1.9 7.2 0.8 9.1 11.8
6.0 5.9 4.0 6.9
Services (4)
7.0 10.3
8.2
Gr (d)
Wt
as
axi
data as * rime series graph. estimated sectoral growth rate ,n gdp at factor cost
— -----Services
YEARS
Fig. 26
133
IGraphic Presentation
1(d) Graphs of Different Units different units, we will have two different scales.
When two values are given into two ^^"erent unn , ^^ ^^^
Year
1997-98
1998-99
1999-00
9 10 12 11
14
15
Quantity
-S-Rupees
1997-98
134
10000
exports (Provisional)
(US $ IMillicn)
(US $ Million)
5000
Fig. 28
Questions :
13.
Sm Vim
What is a false base hne? Under what conditions would its use be desirable? What is meant by (a)
Histogram, and (b) Ogive? Explain their construction with the help of sketches.
Distinguish Histogram and Historigram clearly with illustrations. What is a smoothed frequency curve?
Discuss briefly various types of frequency curves.
Explain the importance of graphic presentation of data. 19. Describe the procedure of drawing
histogram when class intervals are (i) equal, and (ii) unequal.
i4. 15-
16.
17.
18.
Probl^^s :
40-50 3
below:
No. of Students : 3 10 14 10
Dtaw a histogram to represent the frequency distribution of marks. Comment on the shape of the
histogram.
What is histogram? Present the data given in the table below in the form of a Histogram:
Mid-points : 115 125 135 145 155 165 175 Frequency : 6 25 48 72 116 60 38 3/ Make a frequency
Polygon and Histogram using the given data / Marks Obtained : 10-20 20-30 30-40 40-50
/Number of Students : 5 12
4. Draw Histogram from the following data : r Marks Obtained : 10-20 20-30
Number of Students : 6 10
In a certain colony a'sample of 40 households was selected. The data on daily income for this sample are
given as follows :
200
15
30-40 15
22
40-50 10
185 22
50-60 14
50-70 6
195
70-80
70-100 3
5.
(b) Show that the area under the polygon is equal to the area under the histogram. (Hint. Get a
frequency distribution table to obtain a continuous series).
136
Frequency - f s '''''
15-19, 20-24 TQ
Size of classes
Students ^ 10 15
"'Z; ^
Workers : 9 12 15
cZ''^""
u. I
Companies : 2 3 j
^—
a-OOO.o^, , 35 3. 3. .0 « ^ ^
iwi
■ 137
Waphic Presentation
\ Profit (Rs in ^^ 65 80 95
Year
'T99"o-91
1991-92
1992-93
1993-94
1994-95
1995-96
1996-97
,, 12 » 25 31 29 27 35
; company.
2001 8500 24
29 34 45 49
<53t
»TICAL TOOLS AND INTIRPRETATIOlf ii
■ Average p^^
••—ye, of Correla«p„
Chapter 8
As satisfy
fl
139
value." £ 1
According to KeUoy and Smith : "An average is sometimes called a measure of central
tendency because individual values of the variable usually cluster around it."
1 To represent the salient features of a mass complex data : It determines a single figure' of the
whole series. It is a tool to represent the salient features of a mass of.complex data. It is helpful in
reducing the mass information into a single value for drawing.general conclusions. It is difficult to
generalise anythmg from the ages of crores of Indian People. But if it is said that the average age of an
Indmn is 55 years one can draw conclusions about health conditions of the people. Thus the purpose of
an average is to represent a group of individual values in a simple manner, so that the mind can get a
quick understanding of the general size of the
2 To facilitate comparison : Averages are useful for comparison. The average of one group can be
compared with averages of other groups. For example, the average marks of students in section A can
be compared with the average marks of students in section B, easily at a glance or the average monthly
sales of Department A are compared with average monthly sales of Department B.
3 To know about universe from a sample : Averages also help-to obtain a picture of complete
group by means of sample data. In statistical enquiries, very often, sample method is used. The mean of
a sample gives a good idea about the mean ot the
population.
4 To help in decision making : Averages are helpful for making decisions in planning ■ in various
fields. For example, a sales manager may need to know the average
number of calls made per day by salesman in the field. A railway officer will require information
regarding the average number of passengers carried by rails on the various passenger runs. Averages are
valuable in setting standards, estimating and planning and other managerial decision areas.
As the average represents statistical information and it is used for comparison, it must
satisfy the following conditions : . uiju-
1 It should be simple to calculate and easy to understand : An average should be calculable with
reasonable ease and rapidity only then it can be wide y used. It should not involve heavy arithmetical
calculations. If the calculation of the average
are not separated the aver^cotton cloth per mill, if big and small mills cotton mill industry fnTdfa s^pUt'
u ^^ ^^^^ ^
OF MEASUREMENT
141
students).
♦►Moving Average
.....^.........^
'"M^lirt cl of qualitative data which caunot be measured quantitatively for rdatiou to all the values,
naturally, median should be the choice.
1. Meaning
4. Miscellaneous Problems
(b) Short Cut Method (Assumed Mean Method) ic) Step Deviation Method
of v^updLtS^^^^^^^ we .e
1010+1020+1030 3060
--= Rs 1020
/.e., average wage taken by the workers is Rs 1020 Direct Method ; Symbolically,
Mea
1M
/.(
Alten
where denote
Worker Wages (Rs) X
N=3 - 3 XX = 3060
Special 1. I
these «
+ +......x„
X —-
143
- XX • 3060 A. —
= Rs 1020
N3
/.e., X, + X, + X3 + ..... X„
N = Number of observations
Alternative equation
- Iv
X = —Sx,
where, the symbol X is the 'Greek alphabet called sigma and is used xsi mathematics to denote the sum
of values.
n - total number of observations
3060
= Rs 1020
1. If we replace each item of observation by the calculated mean, then the total of these replaced values
will be equal to the sum of the given observations.
V- v=' -
A 1010 1020
B 1020 1020
C 1030 1020
NX = IX 3 X 1020 - 3060
144
AB
N=3
Wages (Rs) X
X-X
^X = 3060
-10 0 +10
2(X-X) = 0
Symbolically, ^X-X) =0
get the anthmetic. mean the total of tt deZl-? ^^^^^ « calculated. To total ,s divided by the number of
^^^^"I-ted. This
b]
r Worker X ^ — A (d)
N=3 4 Id =30
- Steps :
assumed mean.
N = number of observations ^
We can further simphfy the short cut or assumed mean method. All deviations taken by assumed mean
are divided by common factor.
A 1010 . 0 0
B 1020 10 1
C 1030 20 2
N=3 ■ Id'=3
Steps
-Ld'
X=A+
= 1010 + J X 10
146
d' =
brcom'r flz"" ~ - -
Solution.
in a factory.
A B C D E F G H J T
"3^ J ^6^
ZX = 2400
X = ^ il^
N ~ \o ^ rupees.
St
1. 2.
3.
4.
147
Dlustration 2. Calculate the anthmetic mean of the marks given m illustration l by he short-cut method
(Assumed Mean Method).
Worker
ABCDEFGHI
N = 10
120 150 180 200 250 300 220 350 370 260
X-A
JdJ
-50 -20 0
Id. = +400
Steps :
3. Get the total of the deviations calculated from assumed mean (d).
X=A+
= 100 +
400
.t^'i
Worker
ABCDEFGHI
Marks X
X-A <<i) .
120 150 180 200 250 300 220 350 370 260
-50 -20 0
X-A
+15 +17 +6
Id' - 40
148
Steps :
X ^C
= 200
X 10
10
B. Discrete Series
Students .-Marks :
Solution.
A 50
B 100
50
150
E 100
F 50
150
H 100
50
J 100
tx
200
400 f^x.
/1+/2+/3+••■/«
149
Z/X ^f
900 10
or
S/X N
= 90 marks
where,
E/X = sum of the products of variables and their frequencies f- Frequency '
Direct method :
Illustration 5. Following tables gives the marks obtained by 100 students in a class.
Marks : 10
No. of Students : ^
Solution.
20 10
30 40
40 20
50 25
10 Xj 5A 50 f^X^
20 Xj 200 f^X^ ■
Steps :
ISO
X= or
3500 100
If
= 35
where,
^ = -^Ifx n '' •
' 100 J_
100 100
^ 100 x 35 = 3500
= -475 + 475 = 0
Aver Step
taken by calculate
-XI
Short-Cut Method (Assumed Mean Method) : We can use this ^method to calculate arithmetic mean in
order to simplify arithmetic calculations. The followmg formula is
used :
Ifd
X=A+
Here,
X - A, i.e., deviations of variables taken from assumed mean •Lfd = Sum of the product of frequencies
and their respective deviations Dlustration 6. Calculate the average marks of students given in
Illustration 5 by short cut method.
cies
25)]
Marks
10 20 30 40 50
5 10 40 20 25
N - 100
-100 -100 0
+200 +500
= 500
n of
:heir
Steps :
Ifd
X=A+
= 30 +
500 100
5)
Step Deviation Method : We can further simplify the short-cut method. All deviations taken by assumed
mean are divided by common factor. The following formula is used to calculate the arithmetic mean by
step deviation method.
152
til; V
Here,
X = A.m., C
Steps ;
50
= loo ^ 10 = 30 + 5 = 35
153
C. Continuous Series t
In continuous series, the method of calculations of arithmetic mean - ^e ^^ ^ the case of discrete series.
The only difference is that m continuous series mid-pmnts ot trcSsfLrvals are required to be obtained.
The following equation can be used
Mid-point = Here presents lower limit and presents upper limit, ..g., the
After obtaimng the mid-points, we can use all the ^^-e me^ods o^ca^^^^^^^^^^ of arithmetic mean in
the same way as we used m discrete series. These methods are.
(i) Direct method, («) Short-cut method, and {Hi) Step deviation method.
Direct Method .. .
Marks
0-4 4
4-8 8
8-12 2
12-16 1
X/^ = N = 15 •Lfm = 90
Steps :
S'ii
ll
154
l'!.- V.'
Symbolically,
/1+/2+/3+.../■„ X - -
■X=
J_
15 1
X [8 + 48 + 20 + 14]
x90=90
15 15
= '6 Marks.
1. Thyotal of frequencies multiplied by Arithmetic Mean is always equal to the sum ^of the product of
mid-points of various classes and their respLive frequtd"
. NX =-Lfm
15 X 6 = 90
•Efim-X) =0
155
N = 15 60
Steps :
1. Obtain mid-points.
IM
X=A+
=2+
60 15
=2+4=6
method.
Solution.
m-A
tS6
1- Obtain mid-points.
7: ui't^^'frrmr fr^-ncy.
-y 15
=2+—X4
15 ^
No. of Workers
10 20 30 15 5
80
lfm
If
11700
157
X=
or
lfm N
80
= 146.25
Illustration 12. Following information pertains to the daily income of 150 famdies. Calculate the
arithmetic mean.
Income (Ks) No. of families
85 140
95 115
105 95
115 70
125 60
135 40
145 25
Solution. First, get the class frequencies from given more than c inxuiative frequencies.
(f)
10 25 20 25 10 20 15 25
N = 150
Mid-points m-100
(m)
80 -20
90 -10
100 0
110 +10
120 +20
130 +30
140 +40
150 +50
m~lOO 10 id')
-2 -1 0 +1 +2 +3 +4 +5
-20 -25 0
Ifd! = +245
Applying formula.
X=A+
Id' N
= 100.1^x10
ll
t-
iJ'ii -
158 ^
computing arithmetic mean by the short-cut method and the step deviatiL methodTa
frequency distnbution m discrete and continuous series). The formX i^af le"
^f{d + 1) = -Lfd' + Zf
Equal values on both sides of the above formula is a proof of correct calculations We add one more
column to a table of calculations prepared in discrete and contiCus eri
Solution.
to a of tl
X f 10 ) d'
0-10 10-20 20-30 30-40 40-50 50-60 4 6 20 10 7 3 5 15 .25 35 45 55 -10 0 +10 +20 +30 +40
-1 0 +1 +2 +3 +4 -4 0 +20 +2C +21 +12 0 6 40 30 28 15
Arithmetic Mean, X = A + ^ x C
= 15 +
69 50
X 10
= 15 + 13.8 = 28.8 Hence mean marks are 28.8 or 29 approx. Applying Charher's test :
H hi 2.
of obi
H« X
The arithmetic mean has the following important mathematical properties : 1. The sum of the deviations
of the items from the arithmetic mean is always equal to zero. Mean is a point of balance and sum of
the positive deviations is equal ^o the sum bf the negative deviations.
Marks x~x
X
5 -10
10 -5
15 0
20 +5
25 + 10
LX = 75 nx-x) = o
-_
^~N5
= 15
Z(X - X) = Le., Ix = 0
Here, Ex or E{X - X ) = Total of the deviations from arithmetic mean. In case of discrete and continuous
series Zfx or I,f(X - X) = 0 2. We can calculate the combined arithmetic mean from the means and the
number of observations of two or more related groups. The combined mean formula is as under :
- _ N1X1 + N2X 2
A 1.2 - -77—~T7
NI+N2
Here,
Nj = Number of observations in
- _ N,XI+N2X2+N3X3
X
U,3 -
x,=
N, =
■c
160
Combined Mean
1.2.3..
= + N, X2 +N3X3+ ..Nj(„ ■
Ni+N2+N3+..N„
Solution.
B ■ 35 X2 40 N,
Here,
^ ^ (60x40)+ (40x35) 60 + 40 ~
38 the
3800 100
= 38 marks.
" ^ ^ marks.
A 40 Xi 60 Nj
B ? X2 ■ 40.N,
161
-N2X2
where.
V-
- N1 + N2
60 + 40
2400 + 40X2
40X2 = 1400
Illustration 16. The mean marks of 100 students of combined sections A and B are 38 marks. If the mean
marks of section A are 40 and that of section B are 35. Find out the number of students in sections A and
B. Solution.
100
A 40X1 ? N,
B 35X2 ■ ?N,
(Nix40) + (100-Ni)x35
=-Too
3800 = 40 Nj + 3500 - 35 N^
2. 40 Nj - 35 Nj = 3800 - 3500 -
5 Nj = 300
Nj = 60
Hence, the students in section A are 60 and in Section B are (100 - 60) = 40.
162
12
XX = 15
Set I ----
(X - 3)
(X)
-2 4
-1 1
0 0
+1 1
+2 4
= 10
12
Set II ~
X~2 (X-2)^
(x') (x'^J
-1 1
0 0
+1 1
+2 4
+3 9
= 15
H -- N5-
three values of the formula are known, the third can be calculated"
_ J^X
Tf , X = or IX = NX
10 30
20 30
30 30
40 30
50 30
163
150
= 30
NX = SX 5 X 30 = 150
150 = 150. ^
This property has great utihty in calculatton of wage bills, e.g., average wage Rs 120.
The relation NX = ZX can be easily used for correcting the value of mean, which is
m„s„ado„ 17. -n-e arithmetic mean of a series of 40 as Rs 265. Bnt while calculating ii an item Rs 115 was
misread as Rs ISO. Fmd the correct
arithmetic mean.
Solution.
Since,
_ EX X =
EX = NX Here, X = 265, N = 40
EX = 40 X 265 = 10600
Calculated EX, i.., 10600, is wrong as the us get correct EX by subtracting the incorrect item and adding
the correct item
Incorrect EX = 10600
10450
= Rs 264.12.
40
164
Solution. Since
Here,
N NX
-■"''e. of observations,
Calculated ZX i c Tnn values. Let us cor^t M u "" "" observations are Rs 5 less ,1, ^
Incorrect ^
es .
N " - — = Rs 105
1+2+3+4+S EX IS
Alisi
ll
requ ]
__^
IX = IS X = 3
X+2
25 5
X~2
-1 0 +1 +2 +3
Xx2
(X) Column = 2) = 1
^ = 6 Multiplied 2 = (3 X 2) = 6
51
2 4 6 8 10
30 6
We a
JVIaifc Studo
Stud^
165
Marks
No. of Students
58
5
Lower limit of the first class and upp. W o^^ j^^r^^L^niS^^ to be defined by marking an —and last
classes. Thus, first
Example
Less than 10 Less than 15 Less than 20 Less than 25 Less than 30
5-10 5
5-15 13
5-20 16
5-25 20
5-30 25
Smdents : 5 =8 -^
=4
25-30 (25-10) = 5
■tr
166
Example :
5-30 25
10-30 20
15-30 12
20-30 9
25-30 5
10-15 (20-12) = 8
15-20 (12-9) = 3
25-30
20-25 (9-5)
=4 5
Merits : Arithmetic Mean ,s the most popularly used because of the following merits ■ . 1. It IS simple to
understand and easy to calculate. "
2. It is based on all the observations of the series Therefore it i.' th representative measure.
^neretore, it is the most
3. Its values is always definite. It is rigidly defined and not affected by personal bias
4. The calculation of arithmetic mean does not require any specific algement of
167
„ ^ Rs on 000+ Rs .S.500 + Rs 4,500+ Rs 2,000 _ g oqO per The average salary will be--4
month. Average calculation is not - "presenmive^ I. is affected by an extreme value of Rs 20,000 paid to
the General Managet; ,
4 Arithmetic mean can be a value that doe. not extst m the senes at all, ..g., the average of 4, 8
and 9 is = 7, which is not an item of the series.
5 Arithmetic mean gives more impot«,nce to the bigger items and less importance to
1. Meaning
arithmetic mean gives equal importance tt/aif the ^^^^^ fact, thL are number
2. WeStled mean is used for comparison of the results of two or more un,verstt,es
or boards. ,.u^
IWX
Xw =
where.
Xw = Weighted Arithmetic Mean W= Weights X = The variables Steps. (/) Multiply weights by X and
obtain WX
Solution.
Workers
ZX=18
EX
X=
8+6+4
18
: ■: 3 .
= Rs 6 per hour.
terSn' - - — -- ana
•n
-—-—--
169
Xw
LW 150
Weighted Mean is Rs 6. . n
Thus, weighted arithmetic mean will be equal to the simple arithmetic mean, when all
Xw = X Rs 6 = Rs 6.
Suppose men, women and child workers are 10, 20 and 50 respectively then our
---------Vorkers
Man 8 10 80
Woman6 20 120
Child 4 50 200
- SWX 400 Xw =
= Rs 5
LW 80
Thus, the weighted arithmetic mean will be less than the simple arithmetic mean when items of small
vflues are given greater weights and items of big values are given less
weights. __
Xw < X
Rs 5 < Rs 6 .u
However, in the absence of given weights, assumed weights can be assigned to the
But, normally they are not equal. Suppose men, women and child workers are 50, 20 and 10
respectively, then our answer would be different.
Type
Man
Woman
Child
Workers W
50 20 10
SW = 80
WX
400 120 40
ZWX = 560
__ ZWX Xw =
560 80
=7
Weighted Mean is Rs 7.
^ Statistics for Economics-Xl
Thus the weighted arithmetic mean will be greater than the simple arithmetic mean
when items of small values are given less weights and items of big values are given mor^ weights.
Xw > X Rs 7 > Rs 6
niustration 20. Calculate Weighted Mean by weighting each price by the quantity consumed.
Solution.
Total 17.89
WX
iMM
IWX = 403.436
.Xw =
ZWX 403.436
= 22.55
lUustration 21. From the results of the two schools A and B given below, state which or them is better.'
Oass
IX
XI
XII
Total
School .A
Appeared
30 50 200 120
400
Passed
25 45 150 75
295
School B
Appeared
400
Passed
80 95 70 50
295
171
ntf Use Weighted Anthmetk Mean after obtaining homogeneous figures, converting into percentages.
School A
IX X XI XII 30 50 200 120 25 45 150 75 8.33 90 75 62.50 2499 4500 15000 7500
School B
IX X XI XII 100 120 100 80 80 95 70 50 80 79.2 70-62.5 8000 9504 7000 5000
_ SWX _ School B : Xw -
School B is better.
29504 400
= 73.76
Subject
Weight
32
Marks of A
63 65 58 70
Marks of B JAarks of C
60 64 56 80
65 70 63 52
Of Ihe candidate gening the highest marks .s to be awarded the scholarsWp, who should get it?
172
Solution.
W X, WX, WX,
Business Studies 1 70 70 80 80 52 52
EW 10
2:WX2 624
LW 10
EWX3 648
^t Vt ./v O^ I
Mathematical Properties of Arithmetic Mean Here, X-X=x Mathematically. (1) 2.(X - X) = 0 ljc = 0 If(X - X )-
0 Ifx = 0 , Properties of ■ Arithmetic Mean (3) E(X - X )Ms the least, i.e., Ix^ is minimum ZWX Weighted.
Mean Xw = ^^ - N1X1+N2X2 Similarly - N1X1+N2X2 + N3X3 N1 + N2 + N3 (4) NX = IX
ii-
173
Abbreviations
variable from
zx- Sum of all the items of the U= Sum of the deviations of X variabic i
E/X = Sum of the product of variable (X) X-variable from assumed mean
lfm = Sum of the products of mid-points Lfd = Sum of the product of frequencies
= Combined mean of two groups. (X- Xf = X', i.e., square of the deviations
second group.
EXERCISES
Questions :
2. What are the functions of an average? Discuss the characteristics of good average. Which of the
average possesses most of these characteristics?
3. What is meant by 'Central Tendency'? Discuss the essentials of a measure of central tendency.
174
6. Why xs arithmetic mean is the most commonly used measure of central tendency^
-- - ^ -sure
(a) Marks obtained by 10 students : 30, 62, 47, 25, 52, 39, 56, 66, 12, 24
(b) Income of 7 families (In Rs) : Also show = o 550, 490, 670, 890, 435, 590, 575
5 6 7 8 9
Frequency : 6 12 15 28 20 14
irmx ..^,. ^
65 15
10 5
[X= 7.06]
1 50 11
2 50 13
3 55 14
60 16
5 65 16
7 65 15
8 60 14
9 60 13
10 50 13
Values Frequency
60 54
77 115
Calculate arithmetic mean of the followmg data Profit (in Rs) : 0-10 10-20 20-30
No of shops : 12 18 2/
30-40 20
175
40-50 50-60
16
[ X = Rs 30.45]
population of U.K.
Age Group
0-5 5-10 10-15 15-20 20-25 25-30 30^0 40-50 50-60 60-65
(in lakhs)
214 258 222 157 145 161 267 184 120 100
18
19
20 18 16 14 27 25 19 17
8.
[Average Age India = 25.25 years and UK = 29.404 years] Calculate simple and weighted ar^hmetic
averages of the folbwing items :
68 1
124 9
85 46 128 14
101 31 143 2
102 1
146 4
108 11 151
110
153 5
112 23 172 2
113 17
Less than 10 Less than 20 Less than 30 Lesi than 40 Less than 50 -:- 5 15 55 75 100
_ A = OU iViaiivai
Also get 5:f(X-X) = 0
176
11.
12.
13.
14.
15.
are Rs 27^ and 225 respectively, fmd out the arithmetic mean of the salaries the employees of the
establishment as a whole.
of 200 students were 52.32. Fmd out the mean of marks obtained by both th g^ups of students taken
together. ^ ^^ 3,
The mean marks of 1 >0 students were found to be 40. Later on it was discovere
The mean weight ot 25 boys in group A of a class is 61 kg and the mean weiS
of .5 boys m group of the same class is 58 kg Find the mean weight of 60 b^!
Marks Beloiv : 10 20 30
No. of Students : 5 9 17
40
29
50 45
60 60
A 75 50
B 60 60
C 55 50
average ot 31 marks. What were the average marks of the other students.'
taken as 297 and 165 mstead of 197 and 185. Find the correct mean. '
19. Find the average wage of a worker from the following data •
: Above 300 310 320 330 340 350 360 3701 No. of ivorkers : 650 500 425 375 300 275_ 250 100
[X = Rs 339.23]j
177
-40 to -30 10
-30 to -20 28
-20 to -10 30
-10 to 0 42
0 to 10 65
10 to 20 180
20 to 30 10
[X = 4.29 °C)
21. A candidate obtains tbe followmg percentage of marks : Sanskrit Mathemat^ 84, Economics 56,
English 78, Politics 57, History 54, Geography 47. ^ is agreed to give double weights to marks m Enghsh,
Mathematics and Sanskrit. What is he weighted and simple arithmetic mean? = 68.8, X = 64.43 Marks]
22. Calculate weighted mean by weighting each price by the quantity consumed:
Food items
Flour
Ghee
Sugar
Potato
Oil
Quantity Consumed
500 kg 200 kg 30 kg 15 kg 40 kg
[Xw = Rs 6.35]
23. Comment on the performance of the students of three universities given below using weighted
mean :
[Weighted Mean
24. A distribution consists of three components with total frequencies of 200, 250 and 300 having means
of 25, 10 and 15 respectively Find out the mean o^ combined
: distribution. =
Chapter 9
(a) Median,
(c) Mode.
median
1. Definition
2. Calculation of Median
Definition
sxss
According to AX. Bowley, "If the number of the group are ranked m order according to the measurement
under consideration then the measurement of the number most nearly one half ts the median." ^
■ According to Secrist, "Median of a series is the value of the ttem actual or estimated
tvhen a sertes ts arranged in order of magnitude which divides the distribution into the tivo parts.
nL! O™'''''' "" heights of 7 students in a class.
; 'uf "sr
147
151
140
Anurag
Deven
149
Suresh
142
At
Mayoor
147
AtuI
144
144
Satish
145
145
Himankar
The first and most important rule for obtaining the median is that the data should be arranged in an
ascending (increasing) or descending (decreasing) order. This arrangement facditates locating the
central position so that the series may be divided into two parts one less than the central value and the
other more than the central value. '
142
144
145
147
149
151
mm
Deven
Mayoor
Satish
Himankar
AtuI
Suresti
Anurag
smdents : Anurag Suresh Ami Himankar Satish Mayoor Deven Height (cm) : 151 149 147 145 144 142
140
From this ordering also we observe that 145 cm or value of the 4th item is the median.
Calculation of median
Median is the central positional average of given data. That is, median has a position more or less at the
centre of the values and it divides the series roughly into equal parts.
180
Solution.
Anurag 151
Deven 140
Suresh 149
Mayoor142
Atul 147
Satish 144
Himanka-- 145
Deven 140
Mayoor142
Satish 144
Himankar 145
Atul 147
Suresh 149
Anurag 151
Steps :
1. The above data must be arranged either in ascending or descending order to get the value of median.
Arrange the data in ascending order.
\th
item
fN+^^
Me = Size of
= Size of
fN + V
th
item
Pos
But
fN
the h
7+1
item
'N+iK ■
nil 1
is when it is an odd number, the central item, i.e..
-XI the
But when the number of item in a series is even 2, 4, 6, 8, 10 etc, the central item, /.e.,
N+V
Arranging the data in ascending order including the height of Rajesh, we get
et
Deven 140
Mayoor142
Satish 144
Himankar 145
AtuI 147
Suresh 149
Anurag 151
Rajesh 152
Me = Size of
= Size of
fN + lX^ .
2,+
Item
Item
Medkn is estimated by finding the arithmetic mean of two middle values, i.e., adding the height of
Himankar and AtuI and dividing by two. ' &
Size of 4.5"^ item = item + item
145 + 147
292
1 17 7 41 13 11
2 32 8 32 14 15
3 35 9 11 15 35
4 33 10 18 16 23
5 15 11 20 17 38
6 21 12 22 18 12
JI
182
18 + V
th
= 9.5'^ item
The value of 9.5"' item = .Z^lue of the 9"* item + Value of the 10^'' item
= 11^.21.5.
Hence Median = 21.5 (b) Discrete Series
Solution.
10 1
20 8
30 16
40 26
50 20
60 16
70 7
80 4
Marks
10 20 30 40 50 60 70 80
No. of Students
1 8 16 26 20 16 7 4
N =99
1=1 10 = 2 26 = 2 52 = 2 72 = 2 88 = 2 95 = 2 99 = 2
16 16 16 16 16 16
26 26 26 26 26
20 20 20 20
16 16 16
up to (c) Cc
nil
the m(
7+4
183
Steps :
Me = Size of
fN + n
th
Item.
4. Median is located at the size of the items in whose cumulative frequency, the value
of
(N + U
th
item falls.
Median = Size of
= Size of
(N + l
th
(99 + ^^
Item
= 50th item
Illustration 5. Find out the value of median from the following data : Daily wages (in Rs) : 100 50 70 110
80 Number of Workers : 15 20 15 18 12
Solution.
50 20 20
70 15 35
80 . 12 47
100 15 62
110 18 80
fN + l
rso+i^i
or
th
up to 47 have a value of 80. Thus the median value would be Rs 80. (c) Continuous Series
Illustration 6. The size of land holdings of 380 families in a village is given below. Find the median size of
land holdings.
(in acres)
100-200 89
200-300 148
300-400 64
J'
184
IvV
Solution.
Size of Land Holdings (in acres) No. of families (f) Less than cumulative frequencies
0-100 40 40
100-200 89 129
300-400 64 341
400-500 • 39 380
Steps :
th
3. Locate the median group in cumulative frequency column where the size of
fN^''
4. Apply the following formula to calculate the median from located group :
—— c.f. Median = /j + - x i
c.f = Cumulative frequency of the class preceding the median class. f = Frequency of the median group.
Me = size of
= size of
380
item
--cf Me = /j + :
Xi
185
Me = 200 + 1^1^x100
= 200
= 200 +
148
61x100 148
241.216
148
•. Median size of land holding = 241.22 acres, (ie 50% of the families are having less than or equal to
241.22 acres of land holdingr^d 50% of famihes are having more than or equal to 241.22 acres of land
holdings.)
Number of Age
Persons(m years)
(f)
'7 35-40
13 30-35
15 25-30
20 20-25
Total
3U 33 28 14
160
«I
Note : If the given question is in deseending otdet of values then Wore giving the question, the dafa is
Required to arrange ■„ ascending order to calculate less than
cumulative frequencies. . . ,
Solution. This question has been solved below after arranging the series m ascending
order. "___
Age in years (Ascending order)
14
28, 33 30 20
15 13
Cumulative frequency-(c.f.)
186
In the above example median is the value of lies in 35^0 class interval.
N- , Me = + Xi
80-75
f^l th or ri6o> th
.1) I2; or
= 35
= 35
30
5x5 30
X5
Solution. If the data are given in the form of cumulative series they have to be converted into simple
series in order to find out the frequency of the median class which IS needed m calculation of median.
Once it is done that rest of the procedure is the same as in any other continuous series.
0-10 4 4
10-20 12 16
20-30 24 40
30^0 36 76
40-50 20 96
50-60 16 112
60-70 8 120
70-80 5 125
ha' on
Middle item is
ri25
xth
lics-XI
187
ich
^ -c.f. Me = /, + - X i
^^ 62.5-40 = 30 + —trr- X 10
= 30 +
36 22.5x10
36
More than 50 0
More than 40 40
More than 30 98
Solution. e>umulative frequency taoie is oi more man type, in !.u».u eases mc ucua have to be
converted into a simple continuous series and median is calculated of ascending order series. ,,
be lich me
10-20 42 42
20-30 25 67
30^0 58 125
40-50 40 165
ri65Y'
th
eI
N
Me = /j + - X i
^ 30 , iM^iZ >, 10
= 30 +
58 15.5x10
58
188
Statistics for Economics-XI Illustration 10. Compute median from the following data • MMues : 115 125
135 145 155 165 175 185 195 Frequency : 6 25 48 72 116 60 8 22 3
upper limit ot a class. The classes are thus 110-120 170 l^n ^ ^
Uass-intervals
110-120
120-130
130-140
140-150
150-160
160-170
170-180
180-190
190-200
Total
Frequency
6 25 48 72 116 60 38 22 3
390
(390^
th
Me = + - ^
116
= 150 +
44
X 10
116
Illustration 11. if the arithmetic mean of the data given is 28 Find rh. I ^ ■ • frequency, and (b) the
median of the series. ^ ^^^
Profit per
Retail shop j2 jg
30-40
27
40-50 17
50-60 6
Positional Average and Partition Values
189
Solution.
{a) Calculation of missing frequency. Let the missing frequency of group 30-40 he X.
Profit per Retail shop X Number of retail shops f Mid-point m ' —-----1 fm
0-10 12 5 60
10-20 18 15 270
20-30 27 25 675
30-40 X 35 35X
40-50 17 45 765
50-60 6 55 330
Ifm
X=
or
28 =
Ifm If " N
2100+ 35X 80 + X
140
X=
= 20
0-10 12 12
10-20 18 30
20-30 27 57
30-40 20 77
40-50 17 . 94
50-60 6 100
N= 100
100^
190
Me = /j + -ly- X
= 20 + —-X 10 = 20 +
27 " ■ 27
Illustration 12. In the frequency distribution of 100 famiUes given below, the number of families
corresponding to expenditure groups 20-40 and 60-80 are missing from the table. However, the median
is known to be 50. Find the missing frequencies. Expenditure : 0-20 20-40 40-60 60-80 80-100 No. of
families : 14 ? TI 15
Solution. Let the missing trequency of the group 20-40 be X and the missing frequency of 60-80 group be
Y.
or X + Y = 100 - 14 - 27 - 15 or X + Y = 44
0-20 14 14
20-40 X 14 + X
40-60 27 41 + X
60-80 Y 41 + X + Y
80-100 15 100
^lOOV*'
Middle item of the series is also interval 40-60. (Given median = 50)
Now,
Me = /. +
-c.f.
Xt
50 = 40 +
50-[14 + X] 17
X 20
50 - 40 =
27^20
X + Y = 44 or /■j + = 44
2. The sum of the deviations of the items about the median, ignoring ± signs, will be less than any
other point. For example :
X : 10 11 12
Deviations from
The sum of the deviations taken from median (12), less than the sum of the deviations taken from an\
13 1
14
= t>
-f — »J
ipomt (1®
Merits
2. It is well defined as an ideal average should be and it indicates the yalue of the middle item in
the distribution.
3. It can be determined graphically, mean cannot be graphically determined.
4. It is proper average for qualitative data where items are not converted or measured but are
scored.
6. In the case of open-end distribution it is specially useful since only the position is to be known. It
is useful in a distribution of unequal classes.
Demerits
192
6. Interpolation by a formula is required to calculate median in continuous series This reqmres the
assumption that all the frequencies of the class interval are uniformly spread which is not always true.
1. Definition
Definition
When we are required to divide a series into more than two parts, the dividing places are known as
partition values. Suppose, we have a piece of cloth 100 metres long an^d we have to cut it into 4 equal
pieces, we will have to cut it at three places. Quartiles are those values which divide the series into four
equal parts. For getting partition values the most important rule is that the values must be arranged m
ascending order only. In the case of finding out the median, we can arrange the data either m ascending
or in descending order but here there is no choice-only ascending order is possible for calculating
partition values (Quartiles).
For example, we have the following data of heights of 7 students in a class • Name of students : Anurag
Deven Suresh Mayoor Atul Satish Himankar Height (cm) : 151 140 149 142 147 144 145
Therefore, for getting correct results, the data must be arranged in ascending order in all the cases.
While an average is representative of whole series, quartiles are averages of parts of series For example,
the first quartile is the average of first half of the series and third quartile is the average of the second
half of the series.
Thus, quartiles are not averages like mean and median. They help us in understanding
how various "ems are spread around the median. Therefore, the special use of partition
values IS to study the dispersion of items in relation to the median, that is in understanding the
composition of a series.
193
Deven 140
Satish 144
AtuI 147
Anurag 151
AS we Know tne meuiaii is uic nci^iii. iwuim -------------
cm. Now, suppose we have to calculate quartiles. By definition quartiles will divide a series into four
equal parts and so number or quartiles will be three. They are known as lower quartile, middle quartile
and upper quartile. These are also called first, second and third quartiles.
The middle or second quartile (Q^) is the central positional value of the data, i.e., median. The first or
lower quartile (Qj) is the central positional value of the lower half, and third or upper quartile (Q3) is the
central position value of upper half of the data. In the above data, (Q, = 142, Q, = 145 and Q, = 149.
It must be remembered that Q, is always less than Q^ and Q3 (Q^ < Q^ and Q3) and median falls
between Qj and Q3.
Illustration 13. From the following information of wages of 30 workers in a factory calculate median,
lower and upper quartile.
1 330 16 240
2 320 17 330
3 550 18 420
4 470 19 380
5 210 20 450
6 500 21 260
7 270 22 330
8 120 23 440
9 680 24 480
10 490 25 520 .
11 400 26 300
12 170 27 580
13 440 28 370
14 480 29 380
15 620 30 350
194
Solution.
fN+iY^ rN+v""'
and
N+1
Nth
items
Median
Me = size of
N+l
th
Item
= 15.5* item
380 + 400
1 120 16 400
2 170 17 420
260 20 450
6 270 21 470
01 320 23 480
9 330 24 490
10 11 12 13 14 15 330 25 500
330 350 370 380 380 26 27 28 29 30 520 550 580 620 680
Uj
{b)l I
Calci
195
Qj = size of - size of
rN+n
th
item
(30 + 1
ah
item = 7.75th
= 300 + .75 (320 - 300) = 300 + 15 = 315 .-. Lower Quartile is Rs 315. Upper Quartile
Qj = size of
= size of
VN+r"'
item
V3O + 1Y''
item
Illustration 14. Following are the different sizes and number of shoes m a shoe shop.
Size of Shoes
4.5
5.5
6.5
7.5
8.5 9
9.5 10 10.5 11
No. of Shoes (f)
8 12 15 20 35 50 40 20 15 24 12 5 3
196
Solution.
Steps
4.5 4 4
5 8 12
5.5 12 24
6 15 39
6.5 20 59
7 35 94
7.5 50 144
8 40 184
8.5 20 204
9 15 219
9.5 24 243
10 12 255
10.5 5 260
11 3 263
dat:
fN + U
th
fN + V
th
and
Af + 1
th
Median
Me - size of = size of
fN + lY'' .
Item
r263+r
th
First Quartile
N + l^
th
263 + 1^*''
Qj = size of = size of
Medial Ap:
item item
197
Q = size of
Vn+T''*
Item
= size of
r 263 + 1^
th
item
= size of 198* item Third Quartile =8.5 size of shoes, niustration 15. Calculate Median, First Quartile and
Third Quartile from the following data:
Solution.
(in Rupees)
800 16
1000 24
1200 26
1400 30
1600 20
1800 5
Income ■ ^H
(in Rs) 1
800 16 16
1000 24 40
1200 26 66
1400 30 96
1600 20 116
1800 5 121
Median :
Me = size of = size of
fN + 1
Nth
item
ri2i+i^
th
198
Median = Rs 1200
First Quartile
Qj = size of
Item
= size of
121+n
Item
Q. = size of
= size of
(N+l^
th
Item
T21 + n
th
item
Thus,
Marks Students
Solution.
Me
30-35 14
35^0 16
40-45 18
45-50 23
50-55 18
55-60 8
60-65 3
Marks
30-35 35-40 40-45 45-50 50-55 55-60 60-65
14 16 18 23 18 8 3
c.f
14 30 48 71 89 97 100
199
Steps :
2. Median, first quartile and third quartile items are located by finding out
th n/»T\th
u.
(N^
v4.
, and
3. Locate the median group, first quartile and third quartile group by cumulative
Me = /j + - X i
Nr
— -C.f.
2.
th
N4
th
, and
fN'
4,.
th
Items
fN) 4]
-c.f.
XI
Median
N , Me = /j + - X i
th
100
50-48
Me = 45 + = 45 +
23
2x5 23
X5
= 45.43
200
First Quartile
14 j
—-c.f.
XI
= 35 + 1^= 38.43 16
(N^ th rioo^i
Item = ^
UJ I4J
= 75* item
-c.f.
f
X/
v4.
. ^ „ 75-71
= 50 + = 51.11
lo
niustration 17. Calculate the Median and Q^ using the following data :
Mid-points marks : 5 15 25
■ No. of
students : 3 10 17
35 7
45 6
55 4-
65 2
75 1
3 10 17
76
4 .2 1
Median
Applying formula, we get
Median = size of
th r5o^
U> item = I2 J
= 25* item
"l-c-f. Me = /j + ^^— X i
2S-1 3
Me = 20 + - X 10
= 20
17 12x10
17
= 27.05
v4y
Qj = size of
/XT
/ Nl
Qs = K +
v4.
-c.f.
X/
13 30 37 43
■47
49
50
202
. where, = 40,
03 = 40.^^,10
. 40 . = 40.83
Illustration 18. Calculate the Median and Quartiles for the following : Marks (below) : 10 20 30 40 50 60
70 80 No. of Students : 15 35 60 84 96 127 198 250
Solution. Before calculating Median and Quartiles, first we convert the given cumulative frequencies into
class frequencies :
[ W. of^tttdentfi
0-10 15 15
10-20 20 35
20-30 25 60
30-40 24 84
40-50 12 96
50^60 31 127
60-70 71 ■ 198
70-80 52 250
Total 250
( N^^
Median = size of
v2.
250 ,
Hence, median hes in class 50-60 Applying suitable formula to get median :
Me = /, + 2
Xt
where / = 50,
Me = 50
= 50 +
31 29x10
31
- X 10 = 59.35
•i
Tl
in a
rN
2)
Nth
item =
250^
Q^ = size of
E^cf.
= 62.5* item
Xt
62.5-60
= 30 + X 10
. 30 . ^ 31.04
24
fN
th
r250
4J
4.
= 187.5* item
Q3 = ^
XI
203
r"
nlustarion 19. The following series relates to the da,ly income of workers employed
in a firm. Compute
204
1. As the data are of inclusive class intervals, we are required to convert the classes into class
boundaries. ^.lasses
At this point a worker in the centre earning highest daily income of the lowest 50% of workers (;.e.,
Median value)
--------1_ .
H-4 At this point a worker is eaming minimum daily income of top 25% workers .4.- Area of top 25%
of workers
■ At this point a worker is earning maximum daily income of lowest 25% workers (i.e. lower quartile
value = O,)
205
5 10 15 20 10 5
15 30 50 60 65
Nth
th (65\
— item or -
Uv I2j
N
Me = /j +
-c.f.
Xt
= 24.5 + = 24.5 +
32.5-30 20 2.5x5
x5
20
= 24.5 + 0.625 = 25.125 .-. Highest data income of lowest 50% workers is Rs 25.13. (b) Computation of
minimum daily income earned by top 25% workers (Q^)
th
3x65
^ l\
Q, = Value of — item =
.T,
= 48.75* value
Qs = ^
-c.f.
Xt
= 24.5 +
20 18.75x5 20
= 24.5 + 4.687 = 29.187 Minimum daily income earned by top 25% workers is Rs 29.19.
It,f
Statistics for Economics-XI (c) Computation of maximum daily income earned by lowest 25% workers
(Qj)
Qi = Value of
UJ
Q. = ^
XI
= 19.5 + = 19.5 +
16.25-15
15 1.25x5 15
x5
= 19.5 + 0.416 = 19.916 Maximum daily income earned by lowest 25% workers is Rs 19.92.
Illustration 20. Determine median and quartiles graphically from the following data : Marks : 0-5 5-10
10-15 15-20 20-25 25-30 30-35 35^0 Students : 7 10 . 20 13 17 10 14 9
Solution.
Secc
Akrrifes . . f ■ Mjr/fes less than Less than cumulative Marks more than More than
cumulative
0-5 7 5 7 0 100
5-10 10 10 17 5 93
10-15 20 15 i7 10 83
15-20 13 20 50 15 63
20-25 17 25 67 20 50
25-30 10 30 77 25 33
30-35 14 35 91 30 23
35^0 9 40 100 35 9
N = 100
1. Calculate ascending cumulative frequencies (less than) and descending cumulative frequencies
(more than).
2. Draw two ogives—one by 'less than' and other by 'more than' methods.
4.
5.
207
Me = size of
Qj = size of
Q = size of
th
UJ
Item, i.e..
N4
th
Item, i.e..
100 4
3^100^
V^/
3. Locate 50, 25, 75 values on Y-axis and from them draw perpendiculars or cumulative frequency
curve (ogive).
4. From these points where they meet the ogive draw another perpendicular touching X-axis.
5. The points where perpendicular touches X-axis, Qj, Me and Q^ are located.
208
VehficaUon
Me = /j +
NU
-cf
Xt
50-37 ^
13x5
XI
25-17 = 10 + —-X 5
- 10
20 8x5
20
Qi = 12 Marks.
= 12
-cf
xt
ic 75-67 ^ 8x5
= 25.— .29
Q, = 29 Marks
Less than method' cumulative frequency curve is the reminder of the rule that at the
hrst step of calculation of quartiles, the data is arranged in ascending order. Howeven
median can be ocated on graph even by more than 'ogive' or calculated by arranging the data m
descending order. ^hb^
209
1. Definition
2. Determination of Mode
According to Coxton and Cowden, "the mode of distribution is the value at the point around which the
items tend to be most heavily concentrated. It may be regarded as the
The word mode comes from French la mode which means the fashion Mode in statistical language is
that value which occurs most often in a senes, that is value which is most typical. If garment
manufacturers say that short collars are now in fashion the statement implies that maximum number of
people now-a-days wear short collar shirts If we say the mode is size No. 7 shoe, it means in a given data
maximum number of people wear size No. 7. Thus, mode is that value of observations which occurs the
greatest number of times or with the greatest frequency.
For a better understanding of mode let us look at the following information about
Marks : 5 10 15 20 25 30 35 40 45 50 : 2 3 25 2 1 18 20 24 14 10
According to the explanation of mode given above, the modal marks will be 15 because maximum
number of students (25) have obtained 15 marks each. Although 15 have the highest frequency, a more
careful examination of the information shows that the highest concentration of the frequency is around
40 marks. That is, m the neighbourhood of 40 marks. There are more frequencies (18, 20, 14, 10) as
compared to the neighbourhood of 15 marks (2, 3, 2, 1). Thus 15 marks are not ^yp.c^/ of the series of
valLs. For the reasons given above, 40 marks is the mode and not 15. Therefore, to define accurately,
mode is that value of observations around which items are most densely
or heavily concentrated.
The mode is defined as the most frequently occurring value. If each observation occurs the same
number of times, then there is no mode in that distribution. If two or more observations occur the same
number of times (and more frequently than any other observation) then there is more than one mode
and the distribution is multi-modal, as against uni-modal, where there is one mode. If two values occur
most frequently then the series is bi-modal, in case of three values occurring most frequently then the
series is called tri-modal. The mode as a measure of central tendency has little sigmficance for a bi- or
"Mode is that value of the graded quantity at wh,ch the instances are most numerous. " -A.L. Bowley
"The value occurring most frequently in a senes (or group) of Hems and around which the other
item^ar^ distributed most densely."
210
2. Determination of Mode
(a) Series of Individual Observations and Discrete Series
" -- ^^st
Marks : 4 6 5 '
in
98
Solution.
10 4 7 6 5 Modal value
7 8 8 9 9 10.
(«) Discrete Series. Converting the above data into discrete series, we get
Mode = 7 Marks
(b) Discrete Series. In discrete series the mode can be located by two ways :
(i) By Inspection.
(ii) By Grouping.
(i) By Inspection. The mode can be determined just by inspection in discrete series, the size around
which the items are most heavily concentrated will be decided as mode. Illustration 22. Find out mode
from the following data :
125 3-
175 8
225 21
275 6
325 4
375 2
Solution. By inspection, we can determine that the modal wage is Rs 225 because this value occurred
the maximum number of times, i.e., 21 times.
{ii) By Grouping. In discrete and continuous series, if the items are concentrated at more than one value,
attempt is made to find out the item of concentration with the help of grouping method. In such
situations it is desirable to prepare a grouping table and an analysis table for ascertaining the modal
class.
In grouping method, values are first arranged in ascending order and the frequencies against each item
are properly written. A grouping table normally consists of six columns Frequencies are added in twos
and threes and total are written between the values. It necessary, they can be added in fours and fives
also.
Column 3. Leaving the first frequency, other frequencies are grouped in twos.
Column 5. Leaving the first frequency, other frequencies are grouped in threes.
Column 6. Leaving the first two frequencies, other frequencies are grouped in threes.
After observing maximum total in each of these cases, put a mark or circle on every total. An analysis
table is prepared after completing grouping table in order to find out the item which is repeated the
highest number of times. If the same procedure is adopted in continuous series, we shall be in a position
to determine the modal class.
We shall now see how mode is determined by grouping method in a discrete series.
212
Statistics for Economics-XI Illustration 23. Find out mode of a data given in Illustration 20 by grouping.
Grouping Table
125 3
11
175 8 32
225 21 29
■ 27 ii 35
275 6 31
325 4 10 12
375 2
Analy.sis Table
2 1 1
3 1 1 1
4 1 1 1
5 1 1 1
6 1 1 1
Total 1 3 6 3 1
^^ Smce the value 225 has come largest times, 6 times, hence the modal
visage IS
Frequency- : 3 8 10 12 16 14 10 8 17 5 4 i
lics-XI
213
Grouping Table
2 3
3 8
4 10
5 12
6 16
7 14
8 10
9 8
10 17
11 5
12 4
13 1
11
22
30
18
22
J:
18
28
21
24
25
_42
35
10
30
40
30
38
26
12
Total
to
■1
m
32 I
It
j The value of 6 has come the largest times (5), hence mode is 6.
12
531
13
214
Mo = I +
or
Mo = / +-
Xt
ifl~fo) + {fl~f2)
X /■
where, Mo = Mode
/j = lower hmit of modal class /", = frequency of the modal class /o = frequency of the class preceding
the modal class = frequency of the class succeeding the modal class i = class interval of the modal class
or
Mo = Mo = /j +
X/
A1+A2
"/"i-Zol + l/i-Zil
Xt
where. Mo = Mode
modal class and the frequency of the class before the modal class . precedmg class (ignoring signs)
'"
A, = (Read delta 2), .. \f _ f^l Jhe difference between the frequency of the
niustration 25. Fmd out the mode from the following frequency distribution • Central snes : 1 , 3 , ^ ^ ^ ^
^^
Frequency ; g ^
10
12 20
12
215
Solution. Since the central sizes are given, we must convert them into class intervals.
Grouping Table
Qass Imervai
0.5-1.5
1.5-2.5
2.5-.3.5
3.5^.5
4.5-5.5
5.5-6.5 6.5-7.5
7.5-8.5 8.5-9.5
9.5-10.5
(V
6
10
12
20
12 5
32
14
22
32
16
32
17
24
44
10
28
37
Analysis Table
42
20
Total 1 3 6 3 1
By Inspection Mode lies in the group 4.5-5.5. To determine the value of Mode, we should apply the
following formula.
fi-fo
Xt
where, /j = lower limit of the modal class (4.5) /j = frequency of the modal class (20)
216
fo = frequency of the class preceding the modal class (12) fi = frequency of the class succeeding the
modal class (12) i = class interval of modal grdkp (1)
20-12
Mo = 4.5 + = 4.5 +
= 4.5 +
2x20-12-12 8
X1
40-12-12 8
X1
16
Mode = 4.5 + 0.5 = 5. Illustration 26. Find the mode of the distribution from the following data :
Below 15 . ........
20 10
" 25 26
" 30 38
" 35 47
40 52
" 45 55
Solution. For calculation, mode of the given distribution first convert the given data into class intervals.
Grouping Table
10
7 -
23
16 - -
28
12 -
9 21
14 -
5 -
3 -
217
Analysis Table
Total 1 3 6 4 2 1
The mode lies in the class 20-25. Applying the formula, we get
fi-fo
where.
Mo = /, + ^XI
Mo = 20 +
2x16-7-12
X5
= 20 + ^ X 5
The formula to calculate the mode from the modal class discussed above, is apphcablt in a series where
there are equal class intervals. When the class intervals are not equal, before calculating the value of the
mode, we must take them equal and the given frequencies should be adjusted presuming that they are
equally distributed throughout the class.
Class Frequency
Class Frequency
4 8 10 14 16 20
24 14 16 11 10 6
218
Solution. The class intervals are not equal. They are made equal by combining two or more classes.
Grouping Table
Class Frequency
0-6 4+8 = 12 -]
36
6-12 10 + 14 = 24 - -1 72
60
• 12-18 16 + 20 . = 36 - - -
74 98
18-24 24 + 14 = 38 - _ -
75 Ill
24-30 16 + 11 + 10 = 37 - _ 81
43
30-36 =6 - -
Analysis Table
Total 1 3 6 3 1
The mode lies in the class 18-24 Applying the formula, we get
X/
where
Mo = 18 +
2x38-36-37
X6
219
3. Draw two lines diagonally inside the modal class rectangle to the upper corner of the adjacent
bar.
4. From the point of intersection of these lines, draw a perpendicular of X-axis which gives the
modal value.
Illustration 28. Determine the value of mode of the following distribution graphically and verify the
results.
Marks
.0-10 5
30 40 MARKS
Verification :
Mo = /, + ,
' Ifi-fo-fi
Mo = 20 +
14-12
2x14-12-10
X 10
= 20 + — X 10 = 20 + 3.33 6
220
touches the X-axis, gives the modal value. Mode cannot be determined graphicSSy if two
value and below the mean value are equal. This relationship does not exist in moderately
Negative
X < Me < Mo
Positive
X = Me = Mo
Mo < Me < X
of tie (^'^^^"caly distribution, if the distribution tails off towards higher value
greater concentration m lower values mean and median will be more than the Lde (X
and Me > Mo). In other words, mode is lowest, i.e., X > Me > Mo.
valufbf Ae^dl'Vyi"^^^^^ distribution, if the distribution tails off towards lower value of the data and has
greater concentration in higher values, (i.e., negatively skewed),
mean and n^edian are less, then mode (X and Me < Mo). In other words, mode is
Mo = 3 Med - 2X
In most of the cases if the distribution is moderately asymmetrical, the value of mode calculated from
mean and median would not differ significantly from the value calculated by other methods. Inhere may
be two values in a series which occur with equal frequency, this IS called b,-modal series. In case of bi-
modal distribution or mode is ill-defined, its value may be determined by the above formula which is
based upon the relationship of mean median and mode. If we know any of the two values out of the
three, we can calculate the third value from the above relationship.
Dlustration 29. {a) In an asymmetrical distribution mean is 58 and the median is 61 Calculate mode.
{b) If mode in a tolerably asymmetrical distribution is 12 and median is 16, what would be the most
probable mean?
Solution.
Mode = 67.
Mean = ^ 2
= 18
Mean =18.
musttation 30. The following table gives production yield in kg per hectare of wheat ot 150 farms m a
village. Calculate the mean, median and mode production yield.
Production (in kg) : 50-53 53-56 56-59 59-62 62-65 65-68 68-71 71-74 74-77
No. of farms : 3 8 14 30 36 28 16 10 5
222
Solution.
50-53 53-56 56-59 59-62 62-65 65-68 68-71 71-74 74-77 3 8 14 30 36 28 16 10 5 51.5 54.5 57.5 60.5 63.5
66.5 69.5 72.5 75.5 . -12 -9 -6 -3 0 +3 +6 +9 +12 -4 -3 -2 -1 0 +1 +2 +3 +4 -12 -24 -28 -30
0 +28 +32 +30 +20 3 11 25 55 91 119 135 145 150
N = 150 - i - lfd' = 16
Median :
' \7\th
Item
150
th
Item
= the size of
V2
T -C.f.
Me = /, + X
where.
N
75-55
Mode
= 62
36
■x3
= 62 + 1.666 = 63.67
Grouping table
Analysis Table
223
Rupees No . of receiver
50-53 3 -
53-56 8 11 25
56-59 14 - 22 . if -
59-62 30 44 *« 80
62-65 36 66 - -
65-68 28 64 94 i!0 1
68-71 16 44 54
71-74 10 26 - 31
74-77 5 - 15 -
Total 3 6 4 1
224
^Jy .nspection the o,ode hes i„ the group applying the Mowing formula, we
Mo = L + _fi~fo_
1 _/■_
Xi
Here,
36-30
Mo = 62 +
= 62 +
2x36-30-28
x3
14
= 62 + 1.285
x3
compared to mean LSian Inf^o f A?""' " most typical and cogent ues orr ^rm^^
2. Mode is not A i. ' ' of shoes etc. vduts L nottt™."'' " if the extreme
We find that as compared to mean and median, mode is less suitable. Mean is simple to calculate, its
value is definite, it can be given algebraic treatment and is not affected by fluctuations of samphng.
Median is even more simple to calculate and is almost as stable as mean, although it is influenced by
fluctuations and cannot be given algebraic treatment. Mode is the most popular item of a series and is
also easy to calculate and simple to understand. But it is not suitable for most elementary studies
because it is not based on all the observations of the series and is unrepresentative. Mode has its own
uses and advantages as we have seen, but as compared to mean and median, it is not so precise and
accurate.
OF FORMI
1. Median Me = Size of
fN+l^
V2
th
Item
fN + 1}
th
I4;
rN+n
Item
4J
th
Item
Me = Size of
th
Item
X, I N/2-c.f. Me = /j + --j.—— X I
Qj = Size of
l4j
Item
Q3 = Size of
fN
.4;
Item
Q3 = + f '
Mode :
Mos:
Xf
226
exercises
Questions :
4 De&rr"'?' of a dismbution.
7. Deftne mode. Explain how mode can be read on graph paper> • nr^a—ft -.an and
tendency
<b) Average inrelligence of srudents in a 'class, and (c) Average production per shift in a factory .
Problems :
145
257
130 260
200 300
210 345
198 360
234 390
159
160
178
Fmd out median of the following information : Marks : 10, 70, 50, 20, 95, 55,
42,
[Me = 210]
'e have the following frequency distribution of the size of 51 households. Calculate the arithmetic mean
and the median.
21
11
75
Total
51
1 2 3 4 5
2 4 10 8 15
5-- 10 - 15 20 25
2 •4 6 8 10
[X = 5, Me = 5]
6 20
7 12
25
9 30
out median, furst quartile and third quartile of the following series :
No. of Persons 2 3 6 15 10 .5 4 3 1
6. /The percentage of marks obtained by 68 students in an examination are given below - '
Compute the median.
_ ^ , , , [Me = 65.6]
7. Calculate the mean of the following distribution of daily wages of workers in a factory:
No. of Workers : 10 30 15 5 80
41so, calculate the median for the distribution of wages given above.
./ [X = 146.75, Me = 146.67]
8./The following table gives the marks obtained by 65 students in statistics in a certain examination.
Calculate the median.
60% 18
50% 40
40% 45
20% 63,)
10% 65
228
■ 26 8 ^ 2 50
1 ^^ 46 49 32 28 14
30-35 14
35-40 16.
40-45 18
1 3 8 10
3898
4 10 10 17
5 12 11 5
6 16 12 4
13.
7 14 13 1
[Mo = 6]
1 26
2 113
3 120
4 95
5 60
6 42
7 21
8 14
9 5
10 4
[Mo = 3]
229
^^ind out the Mode from any of the following two distributions :
f ■■ 6 10 16 14 10 5. 2
And
is/uk of electric lamps is given in the following table. Calculate the median and the mode.
Below 400 4
400-800 12
800-1200 40
1200-1600 41
1600-2000 27
2000-2400 13
2400-2800 9
Above 2800 4
[Mo
59 1
61 2
63 9
65 48
67 131
69 102
71 40
73 17
Total 350
50 \
Calculate theNMedian Marks. If^60% of students pass this examination, find out the I minimum
marl^btained by a pass tandidate. . [Me = 27.5, 25.5%]
46
40
20
10
230
32 20 43 11
61 31 47 .15
52 56 64 20 35 21 50
22 10 43 42
49 62
75 77
' 97 35 30 30 95
60 27 53 31 9
45 22 36 13 46
73 81 40 40 55
67 54 23
42 25 51
modal age.
21.
Ma No. of Students
Less than 10 Less than 20 Less than 30 Less than 40 Less than 50 Less than 60 Less than 70 Less than 80
Less than 90 5 15 98 242 367 405 425 438 439
20.
c1, LA = jy.j:). Me = SX 44
For the data given below find graphically the folWing • '
Mo = 36.22]
iVo. of workers : 5
No. of workers : 23
25-29 10 55-59 10
30-34
15 60-64 5
35-39
25 65-69 2
40-44 65
45-49 40
Draw a -less than' ogtve front the following data and hence find out the value of
Class
20-25
25-30
30-35
35-40
40-45
45-50
50-55
55-60
Frequency
6 9 13 23 19 15 9 6
231
22. The following table gives the distribution of the wages of 65 employees in a factory.
23. Draw the histogram and estimate the value of mode from the following data :
0-10 0
10-20 2
20-30 3
30-40 7
40-50 13
50-60 11
60-70 9
70-80 2
80-90 1
24. Represent the following data by means of a histogram and find out mode. Weekly wages : No. of
workers :
10-15
15-20 19
20-25 27
40-45
Chapter 10
measures of dispersion
Introduction
magnitude of the distriLion but ZrS tfir ^^put the general level of Measures of central tendency a^e
som^i^r^^^^^
r-^ -7 ^^ there may be great may be below poverty line. TTiere ifnTed to nt ^^^^of a majority of the
people
Definitions
According to D.C. Brooks and W.F.L. Dick. "Dispersion or spread is the degree of the scatter or variation
of variables about a central value."
According to Prof. L.R. Connor. "Dispersion is a measure of the extent to which the individual items
vary."
Now, look at the following data about salaries paid to employees of three different departments of an
organisation.
Dept. A
Deviation
t. B
mmmmi
Dept. C
Deviation
fWf
MS
00000
- 1000
5000
- 3000
Total : Mean X :
25000 Rs 5000
25000 Rs 5000
25000 Rs 5000
We find from the above table that the average salary paid to employees in each department is the same,
i.e., Rs 5000. In department 'A' the salary paid to each employee is the same, i.e., Rs 5000, hence mean
is fully representative of the values of the items in the series. In department 'B' though the mean is Rs
5000, but the constitution of series is quite different. In this case lowest value is Rs 4000 and the highest
value is Rs 6000 and the difference between the highest and the lowest value is Rs 2000, and the highest
deviations from the mean are -1000 and +1000. The mean in this case, does not adequately represent
the values of the items in the series of department 'B'. In department 'C though the mean is the same,
but there is wide gap between the values of items. The lowest value is Rs 2000 and the highest value is
Rs 10,000, which deviate from mean by -3000 and +5000 respectively. The difference between the
highest and the lowest value is Rs 8000. Not a single item in the series is represented by its mean.
From the above illustration we observe that some deviations are positive and some are negative.
Similarly, some deviations are large and others are small. Therefore, we are required to make an overall
summary of these differences (scatteredness) in all values about the central value. This summary is
called the measures of dispersion or measures of variation. It is clear that we must not only know the
composition of a series but also observe how the composition of a series differs from another. For such
a study we have, a statistical tool called measures of dispersion or measures of variation. ,
234
Before we go on to describe the specific methods of studying variabihty, we must clearly define the
objectives.
ia) To Test the Reliability of an Average : Measures of dispersion enable us to know whether an average
is really representative of the series. If the dispersion of variabdity m the values of various items in a
series is large the average may be unrepresentative of the series. If on the other the variability is small,
the average would be a representative value. This point has already been made clear in the above
dlustration, wherein different series of three departments the mean was a common value and the
variations differed.
(b) To Serve as Basis for Control of Variability : The study of variation is done also for the purpose of
analysing why large variations happen or occur and this may help to control the variation itself.
For example, in some major human health problems the blood pressure, the heart and pulse beat are
recorded and an attempt is made by the doctors to control these through provision of medicines.
Similarly, in industrial production to control the quality of the product and the causes of variations in
product are obtained by inspection and quality control programmes. In social sciences where we have to
study problems relating to inequality in income and wealth, measures of dispersion are of great help.
(c) To Make a Comparative Study of Two or More Series : Measures of variability are also useful in
comparing two or more series with regard to disparities or differences. A greater degree of dispersion or
variability would mean lack of uniformity or consistency or homogeneity of the data. While a low degree
of variability would indicate high uniformity or consistency or stability. Comparative studies of
varmbihty are very useful in many fields like profit of companies, share values, performance of
individuals and studies relating to demand, supply and prices, etc.
id) To Serve as a Basis for Further Statistical Analysis : Measure of variability which IS measure of second
order is very useful in the use of higher measures such as skewness, kurtosis correlation, regression etc.
Note: Characteristics of a representative average are explained on Page 139 and 140 ot this Book. Same
points are for characteristics of a good Measure of dispersion.
Range
from
Method-Curve
Measures of Dispersion
235
First two measures, viz.. Range and Quartile Deviations are from spread of values, termed as positional
measures. They are calculated from the values of the variable at a particular position of the distribution.
They are not based on deviations from any particular value. While the mean deviation and standard
deviation are from an average defined in terms of deviations from a central value. Lorenz curve is
graphic method of studying dispersion/variability.
(a) Range
(a) Range
1. Meaning
2. Calculation of Range
4. Uses of Range
1. Meaning
Range is the simplest measure of dispersion. Range is the difference betu/een the largest and the
smallest value in the distribution. It is determined by two extreme values of observations. In case of the
grouped frequency distribution range is defined as difference between the upper Hmit of the highest
class and the lower limit of the smallest class. In case of a frequency distribution, the frequencies of the
various classes are immaterial since range depends only on the two extreme observations. Range as
defined is an absolute measure of dispersion and expressed in the units of measurement of the given
data. Thus if we want to compare the variabihty of two or more distributions with the same units of
measurement, we may use absolute measure. Symbolically, range is located by the following formula :
Range = L - S
S = Smallest item
Relative Measure
To compare the variability of two or more distributions given in different units of measurement, we
cannot use absolute measure but we need a relative measure which is independent of the units of
measurement. This relative measure is called coefficient of Range. It is common practice to use
coefficient of range even for the comparison of variability of the distributions given in the same units of
measurement. It is obtained by applying the following forniula :
Coefficient of Range =
L-S L + S
236
= kzS
L + S _ 30-5 30 + 5 = 0.714
.11 ^35
= IlzI
L + S _ 60-0 ^ 60 + 0 = 1
No. of Persons : 10 15 17
ai
ci
Measures of Dispersion
237
Coefficient of range =
L-5 L + S
35.5-15.5
20
= 0.39
~ 35.5 + 15.5 51
Illustration 3. The following are the marks obtained by 50 students in Statistics. Calculate the range of
marks obtained by middle 50% of the students.
Less than 10 4
Less than 20 10
Less than 30 30
Less than 40 40
Less than 60 50
Np. of Students
0-10 4 A
10 - 20 6 10
20 - 30 20 30
30-40 10 40
40 - 50 7 47
50 - 60 3 50
and 37.5* student (i.e., 1- and 3«» Quartiles) Q, and Q,. Marks of 12.5* student hes m class 20 - 30 and
marks of 37.5* student in class 30 - 40.
U.S'^-c.f.
Xt
20
= 20 + ^ X 10 = 21.25 Marks
= 30 +
20
37.5'*' Smdent-c./". f
37.5-30
XI
10
X 10
= 30 + X 10 = 37.5 Marks
10
238
= 37.5 - 21.25 = 16.25 Marks Thus, range of marks obtained by middle 50% students is 16.25 marks
Solution. We arrange the data in ascending order 61, 64, 65, 66, 67, 67, 68, 68, 69, 70, 72 Range height =
L - S 72 - 61 = 11 inches.
When shortest man (61 inches is omitted) the range will be = L - S = 72 - 64 = 8 Change in the range = 11
-8=3
11
ban a very accurate picture of variability one may compute'rangl ' "
3. It is rigidly defined.
le es
:st he
ies )m
239
Measures of Dispersion
160 to 180 centimetres, if a dwarf (shortest) student whose height is 100 cemimetres is admitted in our
data, the range would shoot up from 20 to 80 centimetres. Thus, a single variation in the value of an
extreme item affects the value of the range. 3 It is influenced very much by fluctuations of sample.
Range is subject to P uctuations ■ of values from sample to sample. However in small samples, it is
uscxul in certain
circumstances.
5. It does not tell anything about distribution of items in the series relative to a
measure of central tendency. Thus, the range is very unsatisfactory measure of dispersion and should be
used with
Despite various limitations, the range is useful in the following areas: (a) Quality control : Range is used
to study the variation in the quality of the items produced of a manufacturing concern. Range has a
great significance in quality
control measures.
lb) Measure of fluctuations : It is a very useful measure to study fluctuation.: of series Variations in the
prices of share, other commodities arJ money rates pnd rate ot exchange can easily be studied with the
help of ran^e. " (c) Use in day-to-day life : Range is by far the most widely .'sed measure of variabihty in
our day-to-day life. For example, the answer to the problems hke daily sales in a departmental store',
'monthly wages of workers in a factory'^ or the expected return of fruits from an orchard', is usually
provided by the probable limits in the
form of range.
Id) Use in meteorological department: Range is also used in a very convenient measure by
meteorological department for weather forecast since the general public is interested to know the limits
within which the temperature is likely to vary on a particular day.
1. Meaning
diffLnce in the two values of quartiles is calculated, it would give us what is called the 'Interquartile
Range'. It is also a measure of dispersion. It is an advantage over range m as much as, it is not affected
by the values of the extreme items. In fact 50% of the values of a variable are between the quartile (i.e.,
Q, and Q,) and as such the interquartile range gives a fair measure of variability.
240
or
Q.D. =
Q3-Q1 2
Symbolically,
QazQi
Q3 + Q1 2
2. It IS rigidly defined.
3. It does not depend on all the values of the data v^rLblf ^he quartile deviation are the same as
those of the
fromi
Measures of Dispersion
SJiT-i-ti.
241
1 130 9 234
2 145 10 257
3 159 11 260
4 160 12 300
5 178 13 345
6 198 14 360
7 200 15 390
8 210
Steps
1. Arrange the data in ascending order to get the value of lower and upper quartiles.
fN + 1
th
item.
3. Apply the formulae to get interquartile range, quartile deviation and coefficient of quartile deviation.
Thus, we get
Qj = Size of = Rs 160
Q, = Size of
item = 4* item
ri5+n
th
Quartile Deviation =
Q3-Q1
Q.D. = Hzl^ = Rs 70
Coefficient of Q.D. =
Q3-Q1 Q3 + Q1
= 0.304
242
= 15 Marks
...(1)
QizQL Qs+Qt
= 0.6.
30
Q3+Q1
= 0.6
...(2)
= Q3 - Q = — 0.6
80
Q3 = Y = Marks
Qs-Q, = 30 40 - = 30
Tu tt = 40 - 30 = 10 Marks
Heights
'I '' 63 64 65 66
^ 6 15 10 5 4 3 1
Coe
(in inches)
No. of Persons :
Solution.
58 2
ai
(c) G HI
coeffii Ai Nc
Sol
I'l
(n+IY"
and — .
V^y V^y
4. Values are located at the size of item in whose cumulative frequency the value of item falls.
Qj = size of
fN+l^
Item =
(49 + 1^
th
Item
Q, = size of
= 63 inches
Q.D. =
Q3-Q1 _ 63-61
243
Height No. of
58 2 2
59 3 5
60 6 11
61 15 26
62 10 36
63 5 41
64 4 45
65 3 48
66 1 49
Q3+Q1 63 + 61 124
Thus,
and
Illustration 8. Calculate range and quartile deviation and compare them. Also calculate coefficient of
quartile deviation of the following data.
Age (years) : 20-30 30-40 40-50 50-60 60-70 70-80 80-90 No. of members : 3 61 132 154 140 51
3
Solution. Range
(NY' 3/».T\th
V^y
.4
item
in continuous series.
3. Locate, the first quartile and third quartile group m cumulative frequency column where
UJ
UJ
item falls.
- + -J- X I
+ J--X /
= Size of
= size of
U
(544^
item
'i + J- X /
where.
n - An 136-64
20-30 3 3
30-40 61 64
70-80 51 541
80-90 3 544
KI
245
Qj = Size of
= Size of
th
Vn
.4;
3x544
Item
where.
h = 60,
-c.f.
.4.
XI
xlO
= 60 +
140 58x10
140
= 64.14 years
^ 64.14-45.45 2
Q3 + Q1
246
-100% Persons
O3 = Rs 45,000
27,000
63:000 = 0-428
3. It ,s also useful where extreme values are likely to affeet the results
Demerits :
247
Measures of Dispersion
The range, the interquartile range and the quartile deviation suffer from common defect. They are
calculated by only two values of a series-wither extreme values m case
of range or the two values of the quartiles as in case of quartile deviation. This method of studying
dispersion by location of limits is also called the 'Method of Limits .
It is, therefore always better to have such a measure of dispersion which is based on all the observations
of a series and is calculated in relation to a central value. Range and Quartile deviations are not
calculated in relation to any average. If the variations ot items are calculated from an average, such
measure of dispersion throws light on the formation of the series and the scatteredness of items around
a central value. This method ot calculating dispersion is called the 'method of averaging deviations'.
Let .us examine from the following illustration about the salaries paid to employees of a departmental
store :
= Rs 5,400
We observe that the salary of A (Rs 10,000) is more from arithmetic mean (Rs 5,400) and the salary of B
(Rs 2,000) is quite less than the arithmetic mean. In gerieral, some deviations are positive and some are
negative. Similarly, some are large and some are
small.
If we consider an average of these deviations calculated from arithmetic mean, we can get an idea of a
measure of dispersion. As we know the sum of the deviations calculated Lm arithmetic mean is always
zero. Here, positive deviations and negative deviations cancel out each other. Therefore, adding these
deviations directly does not help us. Alternatively, we may consider either the 'absolute deviations' or
'squanng deviations . Thus, the measures of dispersion in terms of deviations from central value
(average)
are as under :
Where absolute deviations are obtained from average (ignoring plus and minus signs).
248
249
ElDl = (Read sigma D modulus), sum of the deviations taken from mean or median ignoring ± signs N or
M = Number of observations f = frequency X = Mean Me = Median Relative Measure of Mean DeviaHon
Coefficient of M.D. = ^
M.D.
XorMe
BtasMrio. 10. Calculate mean devation and its coefficient from median and mean fro^TL following yeld
of rice per acre for 10 districts of a state as under:
Districts
22
29
3 12
23
5 18
15
7 12
34
9 18
10 12
12 12 12 15 18 18 22 23 29 34
N = 10
666
300
5 11 16
LIDI = 57
22 2.5
29 9.5
12 7.5
23 3.5
18 1.5
15 4.5
12 7.5
34 14.5
18 1.5
12 7.5
EX = 195 EIDI =
250
fN + lY'
Me = size of
Item.
3. Take deviations of item from median Jgxonng ± smgs and denote the column as
EIDI
M.D. =
M.D. =
Si"""'' »'
Coefficient of M.D. = ^
XT Mean
Now, we get
Mean =
Median
fN + V th item
4J
rio+1^ th
<2
= Size of
Absolute Measure :
JV ~ 10 - 'on®
M.D. 57
N 195
Median ~ =
~ 10
N=—
~ 10 = 6 tons
= m.D.
Mean
~ 19.5 = 0.307
251
Measures of Dispersion
Note It is better to calculate M.D. from median than that from mean because the sum of the deviations
taken from median ignoring ± signs is less than sum of deviations taken
from mean.
Illustration 11. the yield of wheat per acre for 10 districts of a state is as under:
District : 1 2 3 4 5 6 7 8 9 10
Calculate : .
(i) Range and coefficient of range. («) Quartile Deviation and its coefficient. {Hi) Mean Deviation about
Mean and coefficient. (iv) Meati Deviation about Median and coefficient.
Solution. In order to calculate the quartile and median we arrange the yield of wheat in the ascending
order of magnitude.
Range = L - S
L-S
Coefficient Range =
25-9_16^0.47
25 + 9 34
Qj = size of
N+l
item = size of
rio+1
Mh
item
9 10 10 12
15
16 18 19 21 25
= Value of 2nd item + | (values of 3rd item - value of 2nd item) = 10 + 0.75 (10 - 10) = 10 + 0 = 10 tons.
Qj = size of
rN+n
th
item = size of
srio+i"!
th
item
252
_ 19.5-10
- =4.75 tons
of Mean deviation
Absolute Measure :
Arithmetic Mean, =
N ~ IF tons
M.D. =
= 4.2
Me = Size of
= Size of
10^ y''
item
item
Relative Measure :
Coeff. of M.D. =
M.D.=
Relative Measure :
M.D. 4.3
Coeff. of M.D. =
Median = 0.277
15.5
Measures of Dispersion
, 253.
(from median)
and " -
^^ ^^ ~g of absolute
M.D. = 5.7 tons Coeff. of M.D. = 0.316 M.D. = 4.3 rons Coeff. of M.D. = 0.277
has lesser variation is more rehable Therefore the vSh 7" ^^^ ^^^P ^hich
the yield of rice. ifteretore, the yield ot wheat is more reliable than
Also"tS:^^^^ lismbution.
254
Solution.
012
89
10 11 12
15
16 21 10 16
84212202
IDt fm
15 2 30
31 1 16
52 0 0
62 1 10
78 2 32
86 3 24
90 4 16
92 5 10
93 6 6
95 7 14
97 8 16
97 9 0
99 10 20
Z/'IDI - 194
Total
N = 99
Steps :
th
item.
3. Value is located at the size of the item in whose cumulative frequency the value of « Item falls.
5. ^ke deviations of items from median ignoring ± signs and denote the column as
7. After getting the total of f\D\ column apply the following formula :
I.f\D\
N
M.D. =
Median
Me = size of
= Size of
(N + lf
2J
item
Median = 2 Accidents
255
M.D. =
Z/IDI 194
M.D.
Coefficient of M.D. =
99
1.96
Median
= 1.96
Illustration 13. Calculate Mean Deviation from mean and its coefficient of the following data :
Marks : 0-10 10-20 20-30 30-40 40-50
No. of Students : 5 8 15 16 6
Solution.
10-20 8 15 -1 -8 12 96
20-30 15 25 0 0 2• 30
Steps :.
2. Take the deviations of mid-points from mean ignoring ± sings and denote them by ID!.
4. After getting the total of f\D\ column apply the following formula :
If\D\
M.D. =
Zfd'
X = A + ^x C N
256
^ ^ "" 50 = 25 + 2 = 27 Marks
M.D. =
If\D\ N
50
Here,
M.D.
>27
= 0.349
0-10 5
10-20 10
20-30 20
30-40 5
40-50 10
Marks :
= 50 - 0 = 50 Marks
Relative Measure :
Coefficient of Range =
L-5 L + S
50-0 50 + 0
=1
Qj = size of = size of
V4y (50
item
th
257
Q,-I,*
N■4
-cf
■XI
Here,
= 10 + ^^xio
Q = size of
N 14
Item
= size of
'50
.4,
th
Here, 1, = 30,
Vn)
v4y
Q, = 30+37.5-35^^^
= 30 +
2.5x10 10
= 32.5 Marks
Absolute Measure :
32.5-17.5
= 7.5 Marks
Relative Measure :
Q3+Q1
258
0-10 5 5 5 20 100
10-20 10 15 15 10 100
20-30 20 35 25 0 0
30-40 5 40 35 10 50
40-50 10 50 45 20 200
N^ 50 Zf IDI = 450
Steps :
2. Take the deviations of mid-points from median ignoring ± signs and denote IDI.
4. After obtaining the total of f\D\ column, apply the following formula :
n\D\
M.D. =
l2j 12J
item
Here,
Me = + ^ .
20 +
20
10x10 20
= 25 Marks.
Measures of Dispersion
259
M,D. =
If\D\ N
50
= 9 Marks
Illustration 15. Calculate the mean deviation from mean for the following marks obtained by 10
students.
Student : 3 4 21
2. Take deviations of mid-points from mean ignoring ± signs and denote them by IDI.
4. After getting the total of /IDI column apply the following formula:
S^IDl N
M.D. =
X=A+
where, A = 5, = 2 and N = 10 2
=5+
10
= 5.2
260
Mean Deviation:
Here, where.
M.D. =
If\D\ N
X/IDI = 14.8, N = 10
14.8
= =1.48 Marks
Alternatively: (Short-cut Method) apply the follo„i°; ZnZ, ' ob«,ni„g Xf W,,
where.
Now, we get
M.D. = l/l^^llP^z^KI/B-^ N
^f\d\=14,A = 5,N=10 •
class (the class m which mean hes), i.e., 4 + 3=7 If A = Sum of all class frequencies after the mean' class, 2
+ 1=3
MD =
10
10
10
Note : Take care that assumed mean is close to the true mean.
(B) SI 1. 2.^
3.<
261
Measures of Dispersion
3 Based on all items. It is based on all the items of the series, hence it is affected by ■ every value of the
distribution. Thus mean deviation is a better measure of dispersion than range and quartile deviation. 4.
Less affected by extreme values. It is not affected very much by value ot extreme
items.
5 Absolute measure. The averaging of absolute deviations for an average takes out the
irregularities in the distribution and thus mean deviation provides an accurate and true measure of
dispersion.
6 Calculated value. Mean deviation is not based on limits like range and quartile deviation. It is a
calculated value based on the deviations about an average. It provides a better measure for comparison
about the formation of different
distributions.
Demerits : uui
1 Ignoring the signs. The strongest objection against mean direction is that while
■ calculating its value we take the absolute value of the deviations about an averap and ignore the
± signs of the deviation. The step of ignormg the signs of the deviation is mathematically unsound and
illogical. Therefore this method is non-algebraic, for this reason it is not in further statistical calculations.
2 Not well defined. Mean deviation is not a well-defined measure since it is calculated
■ from different averages (mean, median and mode). Mean deviation calculated from various
averages will not be the same.
3 Harder calculations. Mean deviation involves harder calculation than the range ' and quartile
deviation. Its calculation by an arbitrary origin makes the calculation
tedious. 11■•u
open-end classes.
Uses. Despite so many demerits, mean deviation is not a totally useless measure. In spite of its
mathematical drawbacks, it has found favour with economists and business statisticians because of its
simplicity, accuracy and also on account of the fact that standard deviation gives greater importance to
deviations of extreme values. For fj^^^sting ^u^s cycles, this measure has been found useful than
others. It is also good for small sample studies where elaborate statistical analysis is not required.
1. Meaning
262 ■ '
-1. Meaning
It is^e mnf^' Standard deviation was introduced by Karl Pearsons in the year 1893 It i^he most
commonly used measure of dispersion. It satisfies most of the properties iLd down for an ideal measure
of dispersion. properties laid
calculation signs are ignored and absolute deviations are taken. This drawback is
removed m die calculation of standard deviation. One of the easiest ways of dXg a way
Standard deviation is also known as root mean square deviation because it is the square root of the
means of squared deviations from le arithmetic La„
and the deviations of various items from the arithmetic average are square! ^e ql^d
deviations are totalled and the sum is divided by the number of iteL tL sqnaTrrot Symbolically,
where
<y
X-X = *
(«) Actual mean method (b) Direct method (c) Assumed mean method
Illustration 16. Calculate Standard Deviation of the following data • 25 50, 45, 30, 70, 42, 36, 48, 34, 60
2. Obtain deviations of the values from the mean, i.e., calculate (X - X). Denote these deviations by
X.
4. Divide Ix^ by number of observations and find out the square root.
263
Here,
Now we get.
Here,
= (X- X)
X=
i:X 440
= 44
N 10 Ix^ = 1710, N = 10
jrm
= iir ^ \ 10
Values X X-X *
25 -19 361
50 +6 36
45 +1 1
30 -14 196
70 -26 676
42 -2 4
36 -8 64
48 +4 16
34 -10 100
60 +16 256
i:X=r440 1x^4=1710
= jl7i = 13.076
Illustration 17- Calculate the standard deviation of data given in Illustration 16 by direct method.
Solution.
Steps
= i-w ■i
{Xf
.N.
Vaiues
25 625
50 2500
45 2025
30 900
70 4900
42 1764
36 1296
. 48 2304
34 1156
60 3600
264
Now we get,
X=
EX 440
= 44
N 10
a=
'21070
-(44)2
Solution.
1. Calculate the deviations of the observations from an assumed mean (X - A). Denote these
deviations by d and make the total of deviations.
2. Square the deviations and denote the total LiP
a=
d = X-A
Here,
When the mean is in fraction, this method is used to simphfy the calculations.
25 -20 400
50 +5 25
45 0 0
30 -15 225
42 -3 9
36 -9 81
48 +3 9
34 -11 121
60 +15 225
N = 10 Id Id'
= -10 = 1720
a=
/N
1720, N = 10, ZJ = - 10
a=
1720 f-lO^
10
10
= VI72-(-1)2 = Vm = 13.076
Illustration 19. From the following information, find standard deviation of x and y variables :
Ix = 235, Ey = 250
N = 10
Solution.
::VX
ox = y N v m) ay = = \ N InJ
:4 5 6 7 8 9 10
Frequency : 6 12 15 28 20 14 5
Solution.
266
(x'ltrr„
a=
N IfX
Here,
X=
X=
100
= 7.06
c=
Here,
= 1.541
Solution.
(c) ass
X f X-2
2. Obtain the sum after mukiplying f and X (frequency and size), i.e., Z/X.
267
a=
Z/X^
-(X)'
\
Ifx^ fz/x^ N J
m 7.06
N 100
Now we get,
a=
IfX"
-(Xf =
5221
100
-(7.06)2
Illustration 22. Calculate the standard deviation of the data given in Illustration 20 by
4 6 -3 -18 54
5 12 -2 -24 48
6 15 -1 -15 15
7 28 0 0 0
8 20 +1 +20 20
9 14 +2 +28 56
10 5 +3 +15 45
268
Steps :
r . Take the dev,ario„s of s,ze from an assnmed mean and denote these delations 2. Multiply these
deviations by the respective frequencies and calculate the total Zfd
where.
Now, we get
d= {X- A)
W fXfdf
Here, _
Ifd N
0=
238
100
= A/2.38-(0.0^ = V2.3764
jlOO
= a/^38-0.0036 a = 1.541
and — « --
140 1
145 4
150 15
155 30
160 36
165 24
170
175 2
Values
140
Frequency
1 4 15 30 36 24
N = 120
X ~ iss d
d'
-3 -2 -1 0 +1 +2 +3 +4
fd'
-3 -8 -15 0
fd
9 16 15 0 36 96 72
__32
= 276~
269
1. Take the deviations of values from an assumed mean and denote these deviations
by (d).
2. Divide these deviations by common factor and obtain step deviations, i.e., d'.
3. Multiply step deviations by the respective frequencies and calculate the total Zfd'.
4. Calculate the squares of the step deviations (J"), multiply these squared deviations by respective
frequencies (in other words fd' x d' = fd'-) and obtain the total Zfd'-.
where,
Here,
a=
d' =
VN IN
xC
N
X = 155 +
Standard Deviation
120
X 5 = 155 + 0.75 X 5
a=
Xfd'^ f-Lfd'
Here,
0=
(27
il20'
f 90 ^ 120
X5
= V2.3-(0.75)2 X 5 = /2.3-0.5625 x 5
For calculating standard deviation in continuous series any of the following methods may be applied :
VN
where, x = {X - X)
24. Find .he Mean and S«ndard deviation from .he following dis.rihu.ion :
No. of Students , 4 V
« 2 1
Solution.
Marks X
11
N= IS
Midpoints (m)
1 6 10 14
fm
8 48 20 14
Ifm = 90
(m ~ X)
-A 0 +4 +8
16 0 16 64
fx^
64 0 32 64
Ifx'- = 160
Steps :
a=
Mean
Here,
271
a=
If-'
Here,
Zfx^ = 160, N = 15
0 = J^ = M666 = 3.265
Note. This method is rarely applied in practice becausc in case the actual mean is in fraction, the
calculations becomes complicated and take lot of time.
0-4 4 1 8 4 16
4-8 8 6 48 36 288
Steps :
0=
Ifm^ (Ifm
-{Xf
2-^2
9o^ 15
= ^ yfiOM ^ 3.265
29 31 47 51 70
45-50 40-45 35-40 30-35 25-30 22 29 31 47 51 70 52.5 47.5 42.5 37.5 32.5 27.5 +10 +5 0 -5 -10
-15 220 145 0 -235 -510 -1050 2200 725 0 1175 5100 15750
2. Multiply these devattons by the respective frequencies and calculate the total, m \
a=
where,
d= (X-A)
.N
273
a=
l-Lfd^ f-Lfd^
. N VN )
a=
24950 250
r-1430
= ^99.8-(-5.72)2
250
Illustration 27. Find out the Standard Deviation of the frequency distribution given in Illustration 26 by
step deviation method.
Solution.
40-45 31 42.5 0 0 0 0
Steps
1. Take the deviations of mid-points from an assumed mean and denote these deviations by d.
2. Divide these deviations by common factor and obtain step deviations, i.e., d'.
3. Multiply step deviations by the respective frequencies and calculate the total Jlfd'.
4. Calculate the squares of the step deviations {d'^Y, multiply these squared deviations by
respective frequencies (in other words fd' x d' = fd'-) and obtain the total I^fd'-.
274
where, and
a=
d' =
XC
IC
C = Common factor
Now
we get.
a=
Ifd
r2
Zfd'
{N
\2
XC
998 250
286^ I 250
5 = 73.992-(-1.144)2 X 5
.-. Standard Deviation = 8.19 years. Illustration 28. Find the Standard Deviation of the height of 100
students.
Less than 62.5 Less than 65.5 Less than 68.5 Less than 71.5 Less than 74.5
5 23 65 92 100
Height (in inches) X Frequency Midpoints im) ni-67 3 (d-) fd' fd-"
Measures of Dispersion
275
a=
<
Ifd'^ flfd'^
XC
o=
97
100
100,
X 3 = VO.97-0.0225 x 3
Standard Deviation. = 2.92 inches. Illustration 29. Calculate Mean Standard Deviation and mean
deviation about mean
Marks Students
More than 20 50
More than 40 47
More than 80 41
40-80 47 - 41 = 6 60 -3 -18 54
80-100 41 - 21 = 20 90 0 0 0
100-120 21 - 9 = 12 110 +2 24 48
Mean :
Applying formula, we get
X-A.^.C
where.
276
- 24
Standard Deviation
a=
where.
IN
XC
a=
SO
v50y
xlO
Marks X
20-40 40-80 80-100 100-120 120-140
Frequency (ft
3 6 20 12 9
Mid-points m
N = 50
Mean Deviation
30 60 90 110 130
M.D. =
where.
\A r, 998.4
M-D- = = 19.968
/"IDI
2/IDI = 998.4
3. <
mea
(b
Let us try the same question by assumed mean method (Assumed Mean = 90)
t- - . Marks ■ ■ ■■ f m m - 90 /lai
h^ \d\
20-A0 3 30 60 180
40-80 6 60 30 180
80-100 20 90 0 0
\d\ = 960
M.D. =
I/IJI+(X-A)(I/B-IM) N
960 + (94.8-90)(29-21) 50
50 998.4
50
50
= 19.968
Various measures are calculated from standard deviation. Some of the important measures are as under
:
Symbolically,
Coefficients of S.D. = ^
X
(b) Coefficient of Variation : This relative measurement is developed by Karl Pearson and is most
popularly used to measure relative variation of two or more than two series. It shows the relationship
between the standard deviation and the arithmetic mean expressed in terms of percentage. This
measure is used to compare uniformity, consistency and variability in two different series. The series
having greater coefficient of variation, it is called to be less uniform, less homogeneous, less
278
consistent or less stable (in other words, it has higher degree of variability). In the same way, the series
having lesser coefficient of variation, it is said to be more uniform, more homogeneous, more consistent
or more stable, (in other words, it has less degree of variability).
Symbolically,
C.V. = ^ X 100
(c) Variance : Variance is the square of standard deviation. Standard deviation and variance are
measures of variability and they are closely related. The only difference between the two measurements
is that the variance is the average squared deviation from mean and standard deviation is the square
root of variance. Symbolically,
Calculation of Variance
Variance (a^) =
^ nx-xf
Here, x=X-X
In Frequency Distribution :
Variance (a^) =
flfd'^ 2'
N [NJ
Xa
Here,
d' = , and
C = Common factor
Individual Observations
Illustration 30. A batsman is to be selected for a cricket team. The choice is between X and Y on the basis
of their five previous scores which are : ■X : 25 85 40 80 120
y■ 50 70 65 45 80
(/) a higher run scorer (ii) a more reliable batsman in the team.
Batsman X
Arithmetic Mean
JX
X=
EX = 350, N = 5 350
Standard Deviation
a=
N
a=
5750
= V1150
= 33.91 Runs
Coefficient of S.D. = ^
Coeff. of S.D. =
70 = 0.484
Variance
E(X-X)2 Ijc^
Batsman Y
Y=
EY = 310, N = 5 310
Y=
= 62
a=
N
oy
= 12.88 Runs
Coefficient of S.D. = ^
279
X * Y
85 +15 225 70 +8 64
40 -30 900 65 +3 9
Coeff. of S.D. =
62 = 0.207
_ E(Y-Y)2 ^^ ^ NN
E (Y- Yf = ly = 830
280
Here, X - X = x
5750
=1150 Runs
Coefficient of Variation
„ - ax
c.v.^ =• Y 100
o = 33.91 and X = 70
= 48.44%
Here, y - y ^ ^
, 830 ay- = —-
= 166 Runs
C.V.^ = ^ X 100
= 20.77%
1 '""IT of
Solution.
Size
6
7
89
10
Frequency
7 22 60 85 32
Size X
89
10
of Variation
Frequency
7 22 60 85 32
~NT217~
X~7 d
-3 -1 -1 0 +1 +2 +3
fd
-9 -14 -22 0
+85 +64 +24
m = 128
Cot
estir
fd'
17 28 22 0 85 128 71
a an
, ,, Ifd^ ri/iif
362 YmY
Coefficient of Variation
C.V. = I X 100
^ + 0.59 = 7.59
281
217
Here,
C.V = I X 100
itmuous series
niustration 32. To check the quality of two bulbs and their life in burmng hours was
Brand ABrands
0-50 15 2
50-100 20 8
100-150 18 60
150-200 25 25
200-250 22 5
(i) Which brand gives higher life? (it) Which brand is more dependable?
282
Solution.
nomics-Xl
Brand A
MLjlfd"
Xc
Here,Z/^- = 193,Z/-^-19,N=100andC=50
CT =
/m 1100
'19_ 100
X 50
Co^ffidem ofJS^ation~(BrLd BJ
a=
l^jEfd'^
Xc
a=
100
.100,
x50
Measures of Dispersion
283
= Vl.93-(0.19)2
Vl.93-0.0361x50
Arithmetic Mean
X=A+ XC
19
= Vo.61-(0.23)2
- V0.61-0.0529x50
>/0.5571x50
X = 125 +
X 50
Arithmetic Mean
Ifd'
X=A+ XC
23
100
= 125 + (0.19) X 50 = 125 + 9.5 = 134.5 hrs. Applying the formula, now we get
C.V. = ^ X 100
.A.
134.5
X = 125 +
X 50
100
= 125 + (0.23) X 50 = 125 + 11.5 = 136.5 hrs. Applying the formula, now we get
where.
C.V. = ^ X 100
.A.
136.5
X 100 = 27.34%.
(/■) Since the average life of bulbs of brand B (136.5 hrs) is greater than that of brand
A (134.5 hrs), therefore the bulbs of brand B give a higher life. (ii) Since C.V. of bulbs of brand B (27.34%)
is less than that of brand A (51.15%),
therefore the bulbs of B are more dependable. Illustration 33. The number of employees, wages per
employee and the variance of wages per employee for two factories are given below :
No. of Employees
Average wage per employee per day (Rs) Variance of wages per employee per day (Rs)
(a) In which factory is there greater variation in the distribution of wages per employee?
(b) Suppose in factory B, the wages of an employee are wrongly noted as Rs 120 instead of Rs 100.
What would be the corrected variance for factory B?
Factory A Factory B
50 100
120 85
9 16
284
Solution.
Factory B
C.V. = J X 100
Here X = 8S and a = ^
— zx
For Factory B .-
^ = 100 and X= 85
100 x 85 = 8500
It IS not correct ZX
Corrected X = ^ = ^ ^
Variance = a^
a^ =
Here,
100
- (xp
ZX^
16 =
100
(85)2
- (Corrected X)^
719700 ~ ~Tdr - <84.8)2
bet
Measures of Dispersion
285
Illustration 34. The sum of 10 values is 100 and the sum of their squares is 1090. Find the coefficient of
variation.
y\.
ZX
^=N
10 X = 100
100 10
X=
NX = IX
= 10
Apply the following formula to obtain standard deviation (a) by direct method.
(X)2
02 =
1090
- (X)^
- (10)2
10
= 109 - 100 = 9
= V9 =3
Therefore,
C.V. = — X 100
= - X 100
= 30
Illustration 35. The means and standard deviations of two brands of bulbs are given below:
Brand I Brand II
Calculate a measure of relative dispersion for two brands and interpret the result.
286
Brand I
C.V. = ^ X 100
100 800
X 100 = 12.5%
Given
X = 770 a = 60
60 770
X 100 = 7.79 %
- 11 5
=3
Here, x-X- X, i.e., deviations taken from Mean, w/ , " ^ ~ ^^ deviations taken from any value
Measures of Dispersion
287
less than the sum of the squares of deviations calculated from any other value, which is used to
calculate standard deviation.
Symbolically,
But the
sum of deviations calculated from Median (ignoring ± signs) is always less than the sum of deviations
calculated from mean (ignoring ± signs), which is used to calculate mean deviation.
Symbolically,
(b) Standard Deviation and Normal Curve : In a normal or symmetrical distribution apart from mean,
median and mode are identical, a large proportion of distributions are concentrated around mean.
Following are a relationship {i.e., range of spread of items) can be determined on the basis of mean and
standard deviation.
Mean ± 1 a
Size : 1 2 3 4 5 6 7 8 9 10
Frequency : 8 12 10 28 16 12 10 2 02
288
Solution.
Here,
X=5+
Standard Deviation
100
= 4.32
CT =
0=
/424 100
-68>
—00 I____
liwj = V4.24-(0.68)2
Calculation (a)
There is no
- y/4.24-0A624 = VIT^ = 1.943 of percentage of cases :
X ± 2a = 4.32 ± 2 x 1.94 = 4.32 ± 3.88 = 8.20 and 0 44 between 1 and 8 are (100 - 2) 98 out of 100, i.e.,
98%
= 4.32 ± 5.82 = -1.5 and 10.14 negative value. All the cases lie between 0 to 10, /.e., 100%.
(c) Combined Standard Deviation : Just as combined arithmetic mean can be calculated, if means and
number of items in different groups are given, similarly combined standard deviation can be calculated,
if standard deviation means and number of items in different groups are given. Combined standard
deviation is obtained as follows : {a) Two related groups :
=^ N1+N2
Here,
'1.2,3
N1+N2+N3
The above formula can be extended to calculate the combined standard deviation of even more groups.
Illustration 37. In sample A, N = 150, X = 120 and S.D. = 20; in sample B, N = 75, X = 126 and S.D. = 22.
Calculate Combined Mean and Combined Standard Deviation. Solution. Combined Mean :
V N1X1+N2X2
- N1 + N2
150 + 75
225
225
= 122
. N1+N2
-i
290
j60odoT363ddT6^^ * 150 + 75
[981^ --
Distributions
A 20
^ 120 C 60
5.D.
8 20 12
200 - ^ = 48
20 + 120 + 60
^1.2,3 =
Here,
and
I nY+N^J^
' 20 + 120^^60
* 200 -
[65120 ,_
~ = ^^25.6 = 18.04
291
measures of Dispersion
\d) Change of origin and change of scale : Any constant added or ^"^ttacted (change of origin), then
standard deviation of original data and of changed data after addition or Isuhtraction will not change but
the mean of new data will change."
Any constant multiplied or divided (change of scale), then mean, standard deviation
and,variance will change of the new changed data.
Illustration 39. Average daily wage of 50 workers of a factory was Rs 200 with standard deviation of Rs
40. Each worker is given a rise of Rs 20. (,) What is the new average daily wage and standard deviation?
(ii) Have the wages become more or less uniform?
(Hi) If each worker is given a hike of 10% in wages, how are the Mean and Standard Deviation values
affected? Solution. We are given
N = 50, X = 200, a = 40
Old Series
Since,
X=
IX
NX =IX
50 X 200 = 10,000
Mean
X=
N 10,000
50
= Rs 200
Standard Deviation
a=
■-(X)2
X X^ = (200)^ X 50 workers
= 40,000 X 50 = 2,00,0000
a=
'20,00,000
50
= ^40,000-40,000
= Ji
(o') = 1. _
New Series
Rise of Rs 20 to each worker to get new series 20 X 50 workers = Rs 1,000 New XX = 10,000 + 1,000 = Rs
11,000
New Mean EX
X=
N 11,000
= Rs 220
a=
48,400 X 50 = 24,20,000
a=
^4,20,000
50
-(220)2
«y') = 1._
292
Old Scries
Coefficient of Variation
C.V. = ^xlOO
- 200 = 0.5%
X 100
New Seri^
Coefficient of Variation
C.V. = |xlOO
220 = 0.45%
X 100
orvaSl^^ro^i-ZCtrtr^ - -- coe^cienr
Mean affected: Old + Increase in wages ■•• Rs 200 + Rs 20 = Rs 220 New Mean = Rs 220
Standard deviation is the most satisfactory and widely used measure of dispersion ■ause of the
following merits :
■ Merits
1. Based on every item. Unlike the range and location based measures of dispersion, the standard
deviation makes use of all the observations in the set of series. That is, it includes every item of the
distribution.
2. Correct mathematical process. The standard deviation is the easiest measure of dispersion to
handle algebraically and it is the resuk of correct mathematical process. The deviations are calculated
from arithmetic mean which is an ideal average. The deviations are squared, so that automatically
become positive. Being used on correct mathematical process, it is amenable to further statistical
analysis.
4. Sampling fluctuations. Standard deviation is less affected by the fluctuations of sampling than
most other measures of dispersion.
Demerits
1. Complex in calculation. Standard deviation is not easy to calculate, nor it is easily understood. In
many cases it is more cumbersome in its calculation than either quartile deviation or mean deviation.
2. More weights to extreme items. It gives more weight to extreme items and less weight to those
which are near to the mean, because the squares of the deviations which are big in size, would be
proportionately greater than the squares of the deviations which are comparatively small. Thus,
deviation 2 and 8 are ratio of 1 : 4 but their square, i.e., 4 and 64 would be in the ratio of 1 : 16.
Howevei; since standard deviation gives greater weight to extreme items, it does not find much favour
with economists and businessmen who are more interested in the results of the modal class.
Uses
Despite the drawback the standard deviation is the best measure of dispersion and )uld be used
whenever possible. It is widely used in statistics because it possesses most die characteristics of an ideal
measure of dispersion, k is a significant measure for aking comparison between variability of two sets of
observations to test the significance f various statistical measures of random samples, correlation and
regression analysis etc. ' may regard standard deviation as the best and the most powerful measure of
lion.
294
Types of Measures of
Dispersion/Variation
Measure iatiori
Range
re of Variation/ nation
St r —■ --
or the coefficient of the absoluteltaC If d ' "" »» P^tcentZ
and coefficient of standard deviation "oo of mean deviation on standard deviation is ca.ed
Thus, C.V. - ®
= f XIOO
measured m the same of var abihty of two or more series wher "uiLts 5 ' " " « - obtained as
percenta;«
Measures of Dispersion
295
Lorenz Curve
The graphic method of studying dispersion is known as the Lorenz Curve Method. It is named after Dr
Max. O. Lorenz who used it for the first time to measure the distribution of weakh and income. Now k is
also used for the study of the distribution of profits, wages, turnover etc. In this method of values the
frequencies are cumulated and their percentage are calculated. These values are plotted on the graph
and curve that is obtained IS called the Lorenz Curve. The greatest defect of this curve is that it does not
give a quantitative measure of dispersion. Let us look at the following illustration. Illustration 40. Draw a
Lorenz Curve from the following data :
Group A Group B
20 10 16
40 20 14
60 40 10
100 50 6
180 80 4
Solution.
Income in
(Rs)
lative Income
wm
20 40 60 100 180
Cumu-laui/e-
Par^Mt
_____
5 15 30 55 100
10 20 40 50 80
Steps
1. The size of items (or if classes are given, then mid-points) are made cumulative. Considering last
cumulative total as equal to 100 difference cumulative total are converted into percentages.
2. In the same way frequencies are made cumulative. Considering the last cumulative frequency
item as equal to 100, all the different cumulative frequencies are converted into percentages.
296
*!■
r ■ !.
Percentage of Income
lorenz curve
Curve
Illustration 41. fa ,he fo|Wi„, ab,,™^ " . and . according ro the allt tX" t'^ » -
Sii inequa
nil
After-m
No. of I
6 6 0.6 6 6 6 2 2 1
25 31 3.1 11 17 17 38 40 20
60 . 91 9.1 13 30 30 52 92 46
Since curve B is farthest from the line of equal distribution, it represents greater inequality in area B as
compared to area A.
Illustration 42. 9400 Indian households are classified according to their after-tax income as follows :
After-tax Income
Cunmlatii'c Income r> f Cumulative Percentage of Income No. of houses holds Cumulative
Number of households Cumulative % of house- . holds
Below 1000 Below 5000 Below 10000 Below 20000 Below 40000 IT 12.5 25.0 50.0 100.0 1348
4210 1892 1460 490 1 1348 5558 7450 8910 9400 14.34 59.13 79.25 94.74 100.00
of measures of oispersion
;------
299
Measures of Dispersion
comparative study of these measures. It would help us in the selection of an appropriate measure of
dispersion which depends on-(^) nature of data, (b) the purpose, and [c) object of an investigation.
1. Definite value point of view : All the four-methods of dispersion are rigidly defined
3 Based on every item point of view : Range and quartile deviation are not based on
" all the items of series while mean deviation and standard deviation makes use of
all the items. They are based on every item of the distribution. Range is highly affected by the extreme
item.
4 Interpretation and application point of view : All the four measures of dispersion are easy to
interpret. Range and quartile deviation are useful for general study of variability. Range is useful for
quality control, weather forecasting, etc. Quartile deviation is useful when influence of extreme items is
minimised as in the study ot social problems. Mean deviation is used by economists and busmess
statisticians It is useful in forecasting business cycles and small sample studies. Standard deviation
possesses most of the good characteristics of a measure of dispersion Therefore, in sampling and other
areas of statistical analysis, it is the most favoured and indispensable measure.
5 Algebraic treatment point of view : Standard deviation is the best measure of dispersion
because of correct mathematical processes as compared to range, quartile deviation and mean
deviation. It is widely used in statistics, i.e., in making comparison between variability of two or more
sets of data, in testing the significance of random samples, in correlation and regression analysis, etc.
Thus, standard deviation satisfies the most essentials of a goqd measure of dispersion.
(vi) It should be usable for statistical calculations for further or higher order analysis.
300
Range
Absolute Afeasure
Range = L - S
Q^rtile Deviation
Relative Measures
Coefficient of range
L-S
L+S
Absolute Measure
Q.D. =
Relative Measures
_ QlzQl Qs+Qi
M.D. =
-- N
- ^Ji^
zfjm
M.D. = lll^hilzAmB-^ N
a=
Direct Method
IN
a=
.N
Measures of Dispersion
Coeff. M.D. =
M.D.
Mean or Median
M.D.
XorMe
C.V = - X 100
301
a=
d' =
fX~A
{N
XC
Vc
C = Common factor
"CS
Dirt
a=
IfX N
2-
-(Xf =
N
'N[N
a=
'N
lfd I N
\2
a-
IN[NJ
XC
a'
^ J^]
\ i\ J
Xe
1V
t ■■
r ■■
302
Combined Standard Deviation (a) Two related groups :
Here,.
Here, and
d^ = ^3 -X^ 2,3
EXERCISES
Questions :
{b) Name the most commonly used measure of relative dispersion. Give formula for calculating it.
4. Why should we measure dispersion.? Do the range and quartile deviation measure dispersion
about same value.?
7. Some measures of dispersion depend upon the spread of values whereas some calculate the
variation of values from central value.? Do you agree?
8. Define the first and third quartiles. Explain how the quartiles are used to calculate dispersion
values.?
qZt'd^LLT"^"""
10. 'Coefficient of variation is a relative measure of dispersion'. Explain
17.
lo.
19.
20. 21.
303
Measures of Dispersion
H2. In what way is standard deviation a better measure of dispersion than mean deviation?
14. Why is standard deviation considered to be the most popular measure cf dispersion.
Explain. ... . ,
15. What IS coefficient of variation? What purpose does it serve? Also distinguish
16. Define coefficient of variation? In what situation would you prefer this as a measure
of dispersion.
"The standard deviation of heights measured m inches will be larger than the standard deviation of
heights measured in feet for the same group of individuals. Comment on the validity of the above
statement. Otherwise give appropriate explanation of the statement given above.
Problems : Range
1. The daily wages of ten workers are given below. Find out range and its coefficient.
2. Following are the marks obtained by students m Sec. A and Sec. B. Compare the range of marks
of students in two sections.
Marks (Section A) : 20 25 28 45 15 30
Marks (Section B) : 45 52 36 42 28 25
304
4.
[(a) Range = 50, Coeff. of Range L 0 42
No. of Workers :
50 2
70 8
80 12
90
100 4
120 3
130 8
150 6
5.
If o .
Xf
10 4
15 12
6.
Find the range and coefficient of range of the following • Age m years : cm . &•
Frequency :
20 30 40 50
7 3 5 2
5-10 10
10-15 15
7.
15-20 20-25
Frequency :
1-5 2
6-10 8
11-15 15
8.
:e
Below 162 Below 163 Below 164 Below 165 Below 166 Below 167 Below 168 Below 169 Below 170
No. of persons
1 8 19 32 45 58 85 93 100
03]
'f
■ ^M
Quartile lc\ialinn
Months : 1 2 3 4 5 6 7 8 9 19 11 12
^ Income (Rs) : 239 250 251 251 257 258 260 261 262 262 273 275
[Q.D. = Rs 55, Coeff. of Q.D. = 0.213] 10. Find the Quartile Deviation and its Coefficient from the
following data relating to the daily wages of seven workers : . Daily Wages (in Rs) : 50 90 70
40 80 65 60
[Q.D. = Rs 15, Coefficient of Q.D. = 0.23] ^ Find out Quartile Deviation and Coefficient of Quartile
Deviation of the following items :
12. Find out Quartile Deviation, Interquartile Range and Coefficient of Quartile Deviation of the
following series :
13. Find out Coefficient of Quartile Deviation from the following data :
X : 10 15 20 25 30 35 40 45
f: 6 17 29 38 25 14 9 1
14. Calculate Quartile and Coefficient of Quartile Deviation of the following data : Marks : 5-9 10-14
15-19 20-24 25-29 30-34 35-39 Students : 1 3 8 5 4 2 2
15. Calculate lower and upper Quartiles, Quartile Deviation and Coefficient of Quartile Deviation of
the following series :
Frequency : 5 8 12 15 6 2
16. Calculate the Semi-interquartile Range and its Coefficient of the following data : Marks : 0-10
10-20 20-30 30-40 40-50 50-60 60-70 No. of Students : 48 11 15 12 6 3
'i
306
68
M.D. = 12.77] 50 50
18. Calculate the InterquartUe Range for the data given below •
Zency : T T T ^^ ^
6 5 4
[I.Q.R. = 12.9]
Mcaii Deviation
P''-^ ms) : 25 28 32 32 36 48 44 45
^^ 52 49 45 72 57 47
= ^ 4 10 9 15 12 7 9 7
n ■ 12 18 24 30 36 42
Frequency : 4 7 9 18 15 jo 5
No. of plants . -y . ^ , , ^ ^ ^ 7 8 9 10
• 2 ^ 7 11 18 24 12 8 6 4 3
Size of Item : 4 ^ „
Frequency =2 4 I ^^ ^^ 16
_ 3 2 14
s- ^^Mrr r ? r
[M.D. from X = 28.56, Coeffidem of M D = 0 228 M.D. from Me = 28, Coefficient of M.D. = 0.233]
[M.D. = 0.915]
28. Calculate Mean and Mean Deviation and coefficient of M.D. for the following distribution :
Workers : 20 40 30 10
mlrTstVtrthday : 17-19 20-25 26-35 36-40 41-50 51-55 56-60 61-70 Number :9 16 12 26 14 12
^_^
30. Find Mean Deviation from median of the marks secured by 100 students in a class-test as given
below:
[M.D. = 2.26]
31. Using Mean Deviation from median of the income group of 5 and 7 members given below,
compare which of the group has more variability?
Group A : Group B :
4000 3000
4200 4400 4600 4800 4000 4200 4400 4600 4800 5800
[Group A : M.D. = 1240, Coeff M.D. = 0.054 Group B : M.D. = 571.4, Coeff. M.D. = 0.13 Group B has
greater variation]
Standard Deviation
Week
54
2 62
63
65
5 68
71
78 9 10
33. Calculate Standard Deviation of the following two series. Which series has more variability:
A : 58 59 60 65 66 B : 56 87 89 46 93
52 75 31 46 48 65 44 54 78 68 [A: X = 56, S.D. = 11.7, C.V. = 20.89% B: X = 68, S.D. = 17.1, C.V = 25.14%
Series B has more variability]
308
WaSiSM
wem mtM
35.
36.
(«■) the arithmetic mean («) the standard deviation («/') the mean deviation Also calculate 0) Z(X - 55)^
(«) 2 IX - Median! (c) Examine, if
(«)Z|X-X|>z IX-Median!
No. of families
78
37.
9 10 11 12 77 41 20 8 6 1
25
4 16
5 21
6 18
7 13
8 10
94
38.
10
[X = 5.5; a^ = 3.99]
25-30 16
30-35 8
39.
WKMM wm&m
Ml
35^0 3
"^^^-o Vanance ..
Frequency 2 , ^^-25
■ 7 13 21
r T
4:
43
44.
Measures of Dispersion
309
40. The following are the scores made by two batsmen A and B in a series of innings:
[A : X = 50, S.D. = 41.83, C.V. = 83.66% ■ B : X = 33, S.D. = 23.37, C.V. = 70.82%]
41. The index number of prices of cotton and coal shares in 1998 were as under:
Month : Jan. Feb. Mar. Apr. May June July Aug. Sept. Oct. Nov. Dec. Index Number of Prices :
Cotton : 188 178 173 164 172 183 184 185 211 217 232 240 Coal : 131 130 130 129 129 129 127 127 130
137 140 142 Which of these two shares do you consider more variable in prices :
Coal : X = 131.75, S.D. = 4.815, C.V. = 3.65% Cotton shares are more variable in prices]
42. Calculate the arithmetic mean and standard deviation and variancefrom the following
distribution :
Frequency : 2 5 7 13 21 16 8 3
43. Calculate arithmetic mean and standard deviation and variance from the foSbwing series:
No. of Students : 7 11 22 0 15 5
[X = 51.67, S.D. = 15.13, a^ = 228.92] ,44. The following tables gives the age distribution of students in a
school in 2001 and 2002. Calculate Coefficient of Variation for both the groups.
Age :. 17- 18- 19- 20- 21- 22- 23- 24- 25-
2001 :1 3 8 12 14 14 5 3 2
2002 :6 22 34 40 32 20 16 9 3
You are given the following data about height of boys and girls : •
Boys Girls
72 , 38
Number
■tl
68 9
61 4
310
46.
10-12
^ 16 13 7 s 4
of Wh,chide, has
No of Students : 5 20 30 43 60 56 37 16 .
I ^^^ --Its : ^
orand X o c
» 15 12 IS
Brand Y 6 in
20 32 30
40-45 13 12
45-50 9 0
22 40_ 32 18 10
50. The following table gives the distribution of wages m the two branches of a falry^
:: — 300-350
gr Find the mean and standard deviation for the two branches for the wages separately. F [a) Which
branch pays higher average wages?
(b) Which branch has greater variability in wages in relation to the average wages?
(c) What is the average monthly wage for the factory as a whole?
id) What is the variance of wages of all the workers in the two branches—A and B taken together?
[Branch A : Mean = Rs 225, S.D. = Rs 66.20, C.V. = 29.42% Branch B : Mean = Rs 230, S.D. = Rs 62.15, C.V.
= 27.02% {a) Branch B pays higher average monthly wages.
X : 60.5 70.5 80.5 90.5 100.5 110.5 120.5 130.5 140.5 f ■ 3 21 -78 . 182 305 209 81 21 5
52. Goals scored by two teams A and B in football matches were as follows : No. of Goals in a
match : 0 1 2 3 4 No. of matches : A : 17 9 8 5 4
B : 17 9 6 5 3
[Coeff. of variation Team A = 123.6% Coeff. of Variation Team B = 109.0% Thus, Team B is more
consistent]
53. Find mean and the standard deviation of the following two groups taken together:
54.
iW
Medical Exami?ier iiilS^^ MiMm Examined IS^^WMlilli ' y Mean : ^ Weight Standard
Deviation
312
No. of Men
AB
{in Rs)
50 100 120
c^ A fCombined : X, , , - 7:5 „
Class A
Class B ]' 20 40 50 80
^ - li It ll - « « .5 85 „ ' ^ - - « 40
61.
xi/St------y^PC-
58.
59.
60.
Chapter 11
MEASURES OF CORRELATION
T. 2.
3.
4.
5.
6.
Introduction
Correlation and Causation Kinds of Correlation Degree of Correlation Methods of Studying Correlation
Scatter Diagram
Karl Pearson's Coefficient of Correlation (£) Spearman's Rank Correlation List of Formulae
iduction
In the previous chapters we have discussed measures of central tendency (Mean, Median and Mode),
partitional values (Quartiles) and measures of dispersion (Range,' Quartde Deviation, Mean Deviation,
Standard Deviation and Lorenz Curve. These are all relating to the description and analysis of single
variable only This type of statistical analysis is called 'univariate analysis'. Now, we will deal with
problems involving association in two variables. We find that in social as well as natural sciences, where
more than one inter-dependent variables are involved, change in one variable brings change in others.
For instance, in Biology we know that weight of a person increases with height in Geometry we know
the circumference of a circle depends on the radius, in Economics prices vary with supply, cost of
industrial production varies with the cost of raw materials-agricultural production depends on the
rainfall etc. The relationship between variables is measured by correlation analysis. Thus, 'the term
correlation (or covariation) indicates the . relationship between two such variables in which change in
the values of one variable, the values of the other variable also change.' This statistical analysis of such
data is called bivariate analysis
Other Definitions
According to Croxton and Cowden, "When the relationship is of d quantitative nature, the appropriate
statistical tool for developing and measuring the relationship and expressing it in a brief formula is
known correlation."
According to L.R. Connor, "If two or more quantities vary in sympathy so that movements in one tend to
be accompanied by corresponding movements in other(s) then they are said to be correlated."
314
H.
1. Cause and effect : There is a cause and effect relationship between two variables shon w,ves and
many .h„„ starred husbands may havt K' ^ r;
be correlation between price and demand so that in general whenever there is an increase in price the
demand falls, and vice-versa. But this does not mean that whenever there is a rise in price the demand
must fall. It is possible that with the rise in price the demand may also go up. This is on account of the
fact that in economic and social sciences various factors affect the data simultaneously and it is difficult
almost impossible to study the effects of these factors separately. Thus, correlation measures co-
variation, not causation. It measures the direction and intensity of relationship among variables.
s^ds of correlation
When both the variables change in one direction, that is when both increase or decrease the
relationship between the two variables is called positive or direct. But when the change is in opposite
directions that is one is increasing and the other is decreasing, the correlation is negative or inverse. For
determining the direction of change average values are taken. For example :
X Y X Y X Y X Y
60 255 10 90 50 60 10 250
We find that in I (a) the values of X series are increasing so also of the Y series. In I (b) values of X and Y
are decreasing. Thus, they are both instance and positive correlation. On the other hand, in II {a) the
values of X are increasing and the values of Y are decreasing, similarly in II (b) the values of X are
decreasing and the values of Y are increasing. Thus, hey are both examples of negative correlation.
316
--easeinheatlllT^^^nofr,.
Increase in the number tfTe ^ "": " 3- Sale of woollen garments ant ly ir ""
of^rrotstr -ear
bear a constant rari^ ('""'''near), the amount of chale 3. Staple, Multiple and Partial Correlation
Measures of Correlation
317
of correlation
The relationship between two values can be determined by the quantitative value of coefficient of
correlation which is obtained by calculations.
Perfect Correlation : Perfect correlation is that where changes in two related variables are exactly
proportional. If equal proportional changes are in the same direction, there is perfect positive
correlation betWeen the two values described as +1; and if equal proportional changes are in the
reverse direction, there is perfect negative correlation, described as - 1. For example, the circumference
of a circle increases in the equal proportionate ratio with the increase in the equal proportionate ratio in
the length of its diameter; the amount of electricity bill increase in a perfectly definite ratio with an
increase in the number of unit consumed, the volume of a gas varies inversely with the pressure at
constant temperature etc.
DEGREE OF CORRELATION
ive
Zero Correlation : The value of the coefficient of correlation may be zero. It means that there is zero
correlation. It does not mean the absence of any type of relation between the two variables. Two valued
are uncorrelated. However; other type of relation may be there. There is no linear relationship between
them.
Limited Degree of Correlation : In social science, the variables may be correlated, but an increase in one
variable need not always be accompanied by a corresponding or equal increase (or decrease) in the
other variable. Correlation is said to be limited positive when there are unequal changes in the two
variables in the same direction; and correlation is limited negative when there are unequal changes in
the reverse direction. The limited degree of combination can be high (between ± .75 to 1); moderate (± .
25 to .75) or low
i j I-i
318
Dejp-ee of Correlation
Perfect Correlation
Possibly no Correlation
Positive
+1
+ -9 or more from + .75 to + .9 from + .6 to +.75 from + .3 ro +.6 Less than +.3 0
Negative
-1
- .9 or more from - .75 to - .9 from - .6 to -.75 from - .3 to -.6 Less than -.3 0
IhhoD;
SW Coefficient of Correlation
scatter diagram
measuring X-variable on bokolTL^^apb paper. The chart is prepared by pomt for each pair of
observation oTx and y JZTi u'''' ^^ P^^^ -
plotted m the shape of points. The cluster of ooin^ U ^^^^^ ^ata are
the scatter diagram. When the plottedTo „te^ "" P^P^^ called
we know that there is some correlation Seen tl ^^^d-upward or downward-the correlat^n is positive,
when it ir^ov^nw^d ^ - "Pward
r=+l
Measures of Correlation
319
I w
V! i
r= 0
No Con-elation (9)
Fig. 1
Figure [a), (b) and (c) show an upward trend—they show positive correlation. Figure (d), (e) and show a
downward trend—they show negative correlat!on. Howe\ er, there are differences among (a), (b) and
(c) and similar differences among {d), (e) and (/).
We find from the plottings on the scatter diagrams that there is a certain similarity among (a) and {d), (b)
and (e) and (c) and (/). In (a) and (d) the plotted points are almost in a straight lines—this indicates
perfect correlation. In [b) and (e) the plotted points are not in a straight line but if we draw a straight
line in the middle of their points (regression line) we will find, the points are near about the line. This
kind of scatter diagram shows high degree correlation. In (c) and (f) if we draw a similar line (regression
line), we will find that the plotted points are very much scattered around the line—^not as near as in the
case of {b) and (e). This kind of scattered diagram shows low degree correlation. Finally, diagram {g)
shows such a vast scatter of points that it is impossible to see any trend— this shows no correlation or
zero correlation.
Illustration 1. From the following pairs of value of variables X and Y draw a scatter diagram and interpret
the result.
8 9 10 11 12 13 14 15 54 48 42 36 30 24 18 12
5
72
6 66
7 60
X : 4 Y : 78 Solution.
We note that X = 4 and Y = 78 as given first X and Y values. We may plot this as point (X, Y) on graph
paper, where X = 4 and Y = 78. We measure 4 on X-axis and 78
lik
,1.
scatter diagram
64 56
48 40 32 24 16 8 0
320
coordinates of
measure 5 along the x'axis and 72 alongT axis and so on for all d,e given X and y Xl
from the above scatter diagram we can decide Aat the variables X an" Y "e corre ated. The points take
the shape of li^e
Rate of Change
It is slope of the straight line rwhirh depends on an angle that the str^lghT Hnt
rate of change
— i• — — --
• ----
-4-. -U —1-
02468
10 12 14 16 IS
an<
change
I'fl::
in
a (i
Measures of Correlation
321
showing no change
non-linear relationship
Fig. 3
We know when the plotted points show some upward trend, the correlation is positive and when there
is downward trend, the correlation is negative.
(/•) If the straight line makes an angle of 45° with the X-axis, the change is exactly in the same
proportion as the change in the value of X [Fig. {a) and (b)].
Hi) If the angle that the straight line makes with the X-axis is greater than 45° the
change in the value Y is more than proportionate to the change in the value of X [Fig. (c) and (d)].
(iii) If the angle that the straight line makes with X-axis is less than 45", the change in value Y is less than
proportionate to the change in the value X [Fig. (e) and (/)]
(w) If there is no angle and it is a straight line parallel to X-axis, it shows that value Y does not change at
all [Fig.
(v) Linear correlation exists when the ratio of change between two variables is uniform
(curvilinear) the amount of change in one variable does not bear a constant ratio
to the amount of change in the other variable. Such relationship will form a curve on graph [Fig. (h)].
322
Ir
»f
5. fa case of linear relationship between x lni y lT T ^ donate change in th^ .a,„e t Jcha^^r,: t'^nTT" "
X fgr^BS variable
r=
Ixy
Here,
Nxaxxay
''"\N
Measures of Correlation
N = Number of pairs of observations r = Coefficient of correlation The above formula can be rewritten as
under :
Txy
323
r=
N.ax.ay
The above formula is based on the study of covariance between two series. The covariance between
two series is written as follows :
N N
r=
Exy 1 1
—-y. — X — N
Zxy,
ox oy 1
^y X 1 )c 1 = ^y
r=
Ixy
yjlx^xZy^
or
I(X-X).(Y-Y)
^Jiix-xfylm-yf
Applying the Karl Pearson's formula Coefficient of Correlation is calculated by following methods :
Illustration 2. Calculate Product moment of correlation from the following data and interpret the result.
Marks in Mathematics : 15 18 21 24 27 30 36 39 42 48
Marks in Statistics : 25 25 27 27 31 33 35 41 41 45
Solution. Karl Person's coefficient of correlation is also called Product Moment of Correlation.
324
II
n^>-
i*'
15 18 21 24 27 30 36 39 42 48
ZX = 300
225 25
144 25
81 27
36 27
9 31
0 33
36 35
81 41
144 41
324 45
= 1080 2Y = 330
Steps :
1. Calculate arithmetic means of X and Y series 7. Apply the following formula :
r=
Ixy
Here, ^ = (X - X) and y = (Y _ y)
300
X=
Now we get.
Y=
r=
N ZX
10
330 N
Ixy
= 30
= 33
Measures of Correlation
325
r=
= 0.98
Illustration 3. Calculate Karl Pearson's coefficient of correlation between birth rate and death rate from
the following data :
1941 26 20
1951 32 22
1961 . 33 24
1971 35 27
1981 30 24
Solution.
EX
X=
180
30 ; Y =
Ixy
EY _ m N " 6
= 22
r=
Here,
r=
= 0.920
X ■y xy
24 15 -6 36 -7 \ 49 42
26 20 -4 16 -2 4. 8
32 22 +2 4 b 0 0
33 24 +3 9 +2 .4 6
35 27 +5 25 +5 '25 25
30 24 0 0 +2 4 0
Pi'
t-i \vi
I;-,
326
X series 15 25 136
Y series 15 18 138
c ----------138
Summation of product of deviations of X v o r means = 122. "cviauons ot A and Y series from their
respective
Solution. Regarding deviations of the values in X anH v t . means, we are given the following Morr^^^
^ ""
Applying formula
Now, we get
r=
r=
Ixy
122
122
122
= 0.891
>^36x138 V18768 136.996 Hence, there is high degree positive correlation between X and Y
Solution. Given N = 50, cx = 4.5, ay = 3.5 and Zxy = 420 Applying formula.
r=
Ixy
Now, we get
N X ax X ay
r=
420
420
= 0.533
Measures of Correlation
oy = Vl^ = 4.05
327
r=
N.ax.ay
Exy 1 1
—-X — x — N ax ay
r = 12.3 X —X ^
3.71 4.05 = 12.3 X 0.27 X 0.25 = 0.83 Hence, there is high degree of positive correlation between X and Y.
Illustration 7. Find the standard deviation of X series if coefficient of correlation between two series X
and Y is = 0.28 and their covariance is 7.6 and variance of Y series is 81.90.
Covariance of X and Y =
Lxy
= 7.6
ay = V81.90 = 9.05
Applying formula.
Now, we get
r=
lay
N.ax.ay
2jcy 1 1 x — x —
0.28 = 7.6 X
x-
N ax ay 1
or
ax 9.05
ax =
2.534
= 2.99 approx 3
Illustration 8. Calculate the number of items for which r = + 0.8, Ixy = 200, standard deviation of Y = 5;
and Ix^ = 100, where x and y denotes deviation of items from actual mean.
r=
Ixy
yjlx^xly^
or 0.8 =
200
Now, we get
^O.sr = or 0.64 =
100 xZy
yJlOOxZy^
i
^Iv
i>t
Ii ;;
328
V 2 40000 64 =625
Now, = or 5 = 1625
VN VN
= 25
J:XY-N
r=
in,
ll In
N.
In j
nJ
• EXY
= -p- Kf
/ix--
* NVN
nixy~zxxy
where.
r=
X=
lix' N {Xfx
ZX N ,Y= ZY N
~(Y)
v\2
r=
X.
1XY-N.{X).{Y}
X ^V X—
r=
Illustration 9. The data of price and quantity purchased relating to a commodity for 5 months are given
below. Calculate the product moment correlation (Karl Pearson's coefficient of correlation) between
price and quantity and comment on its sign and magnitude.
Months Price (in Rs) Quantity (in kg)
1 10 5
2 10 6
3 11
12 3
• 10 10 11 12 12
2:x = 55
6432
ZY= 20
ZX' = 609
25 36 16 9 4
lY' = 90
Steps :
2. Square the values of X series and obtain the total, i.e., EX^
3. Square the values of Y series and obtain the total, i.e., EY^
4. Multiply X and Y values and find out the total, i.e., ZXY.
5 12 2
xy
50 60 44 36 24
EXY = 214
r=
IXY-N.X.Y
X=
EX 55
EY 20
N 5 ~ ^^ ' ^ - N 5
Here, ZXY = 214, EX^ = 609, ZY^ = 90, X = 11, Y = 4 and N = 5 Now, we get
214-5x11x4
=4
r=
Ii ■ [i 1
330
214-220 V609-605x>/90-80 -6 -6
-6
= -0.949
Hence, there is high degree of negative correlatiori between price and quantity purchased relating to a
commodity of 5 months.
In other words, purchase (demand) decreased due to increase in the price of commodity.
Illustration 10. Draw a scatter diagram and calculate Karl Pearson's coefficient of correlation between X
and Y. Interprete the result and comment on their relationship. X : 1 3 4 5■7
8
: 2 6 8 10 . 14 16
Solution.
scatter diagram
-Q
lo
v ■SA
in
1V -Q-
(V) U —A-
-At
u )■1 J. t ( 17 ! <
fx,
Fig. 4
From the above scatter uiagram we can decide that variables X and Y are correlated. The points take the
shape of line, and it goes up from left bottom to right top then there IS perfect positive correlation
between X and Y.
fon
331
X y X' XY'
1 2 1 4 2
3 6 9 36 18
4 8 16 64 32
5 10 25 100 50
7 14 49 196 98
8 16 64 256 128
r=
N N
Here, IXY = 328, 2X = 28, lY = 56, IX' = 164, IT- = 656 and N = 6
r=
(28)^ r
rr-f
1164 56-
{56r
328-261.33
66.67 66.67
= +1
5.774x11.547 66.67 r = +1
There is perfect positive correlation by scatter diagram and even by Karl Peaison^ formula, resulting to r
= +1.
We observe from the illustration the changes in tw^o values X and Y are exactiv in equal proportion. Y
values are exactly double than the corresponding values of X movuig in same direction (upward). In such
situation, correlation results to perfect positiJe correlation. If equal proportional changes are in the
reverse direction, there is perfect negative correlation (r = -1).
f/ '
fiPl
IN
332
-fflcient of c„„e,ado„
on their relationship.
X Y : Solution.
comment
-3 9
-2 4
-1 1
24
39
-3 -2 -1 1 2 3
IX = 0
941149
xy = 28
941149
= 28
81 16 1 1 16 81
ly^ = 196
XY
-27 -8 -1 1 8 27
Zxy= 0
r=
IXY-
■2 (ZYf
fzy^-
N
=0
yf^x^/65.334 = 0 5.291x8.083"
rhey
When actual mean is not a whole number; but a fraction or the series is large, the calculation by actual
mean method and direct method will involve a lot of calculations and time. To avoid such tedious
calculations, we can use the assumed meat, method. Correlation coefficient can be obtained by the
following formula.
Idxdy-
r=
Ux.Zdy N_
W-
(Zdyf
Illustration 12. Calculate Karl Pearson's coefficient of correlation of the following data of height of
fathers in inches (X) and their sons (Y). Interpret the result.
65
66 57
67
68
69
70 72
-3 -2 -11 -1 0 +1 +2 +4
Zdx = -10
9 4 121 1 0 1 4 16
Z^^ = 156
67 56 65
68 72 72 69 71
+2 -9 0 +3 +7 +7 +4 +6
ldy = 20
4 81 0 9 49 49 16 36
Zdy" = 244
-6 +18 0 -3 0 +7 +8 +24
Idxdy = 48
Steps :
1. Calculate the deviations of X series from an assumed mean (68) and denote them
by dx and find out the total, i.e., ILdx. 1. Calculate the deviations of Y series from an assumed mean (65)
and denote them by dy and find out the total, i.e., Uy.
3. Square the deviations of X series and obtain the total, i.e., Zdx^.
4. Square the deviations of Y series and obtain the total, i.e., I^y^.
5. Multiply t/x and f/y and find out the total,/.e;, Ikixiiy.
334
Here, Zdxdy = 48, ZJx = -10, I^y = 20, N = 8, W = 156, ZJ^a ^ ,44 Now, we get r = -j
8
48 + 25 >/i56-12.5XV244-50
73 73
73
11.97x13.92 ~ U^ ""
We can simplify the above calculations by using log tables : Taking Logarithms
73
Hence,
= 1.8633 - 1 [4.4441]
= 1.8633 - 2.2220 = -0.3587 = -0.3587 (+1) - -1 + (1 - 0.3587) = Antilog T.6413 r = 0.4378 = 0.438
u Stq
,• 335
Measures of Correlation
I unaffected by the change of origin and change of scale of X and Y. After changing these deviations, we
apply the same formula of assumed mean method.
Illustration 13. The data on price and supply relating to a commodity for 7 months are given below :
^ ^
".s) : 40 lo^ .o .o
Supply (in kg) : 400 200 500 1000 400 1100 1200
Calculate product moment of correlation between price and quantity and comment on its sign and
magnitude. Solution.
60 0 0 0 500 -200 -2 4 o;
Steps :
1. Calculate the deviations of X series from an assumed mean and divide them by common factor.
Denote them by dx and find out the total, i.e., Idx.
2. Calculate the deviations" of Y series from an assumed mean and divide them by common factor.
Denote them by dy and find out the total, i.e., Uy.
3. Square the step deviations of X series and obtain the total, i.e., Idx\
4. Square the step deviations of Y series and obtain the total, i.e., Zd-f.
(Zdx).CZdy)
N_
Zdx.dy-
r=
[Zdxf N
Zdy^-
(Zdy)^ N
336
r=
40-1^
'7
^_ - —
41
y/Bxy/96-857
41
41
= + 0.787
_ 5.29x9.84 52.05
valu" of ~ ff ^^ ^^^
SolL^W^rmultSTLes^lOa^r."^^^ ^^^^^
X:1 ■
Applying formula
337
"Ldxdy —
r=
(ldx)(Uy) N
Now, we get
+46-
r=
(OHO) .7
f^xf^
+ 46
sBmm
LIBRARY
= 0.997
V28x47
If the original values of X and Y were used the result would still be the same and r would be +0.997.
2. Normality : The correlated variables are affected by a large number of independent causes,
which form a normal distribution. Variables like indices of price and supply, ages of husbands and wives,
heights of fathers and sons, price and demand are affected by such forces the normal distribution is
formed.
3. Causal relationship : Correlation is only meaningful, if there is a cause and effect relationship
between the force, affecting the distribution of items in two series. It-is meaningless, if there is no such
relationship. There is no relationship between rice and wheat, because the factors that affect these
variables are not common. Similarly, the weight of an individual during the last ten years may show an
upward trend and his income during this period may also show similar tendency but there cannot be any
correlation between the two series because the forces affecting the two series are entirely unconnected
with each other. The calculated coefficient of correlation of such series is usually termed as ''non-sense
or spurious^ correlation.
4. Proper grouping : It will be a better correlation analysis if there is an equal number of pairs.
5. Error of measurement: If the error of measurement is reduced to the minimum the coefficient
of correlation is more reliable.
Ji ■
"i
338
are muIdpUed or divided by so™ ^rant ^ f'"" * ^-ri' consr^r . subtracted or added frorivl®'f^^'/v""®" ""
'^at a " and 14). of X and Y series. (See Illustration
and need not necessarily be indepe^detit Uncor^L ^ """elated variables and y Stnyly implies the absence
of Itoear rZI^ \ ™™bles X
^Rww^ ____________
under consideration
Charles Edward spearman, a British psychologist developed a formula in 1904 which consists in
obtaining the correlation coefficient between ranks of N individuals in the two attributes under study
called coefficient of correlation, by rank differences. It is the Product Moment Correlation between the
ranks.
This method is applicable only to individual observations rather than frequency distribution. The result
we get from this method is only approximate one, because under ranking method original values are not
taken into account.
After assigning ranks to the various items, the differences of corresponding rank vaiues are calculated
and following formula is used :
rfe = 1 -
N^-N
where,
rk = Coefficient of rank correlation ZD' = the total of squares of the differences of corresponding ranks N
= the number of pairs of observations Like Karl Pearsons, the value or rk lies between +1 and -1. If rk =
+1, then there is complete agreement in the order of ranks and the direction of the rank is also the
same. When rk = -1, then there is complete disagreement in order of ranks and they are in opposite
direction. Let us examine by following example :
23
23
000
000
niy = 0
rk = \-
=1-
ZD^ N^-N
6x0
3^-3
=1-0=1
1 •2
321
-2 0 2
04
=8
rk = l-
=1-
6ZD^ N^-N
6x8
3^-3
= 1 - 2 = -1
When Ranks are given, (fc) When Ranks are not given, (c) When Ranks are equal or repeated.
Ir i
(If:
340
A B CDE
1 2 345
2 3 164
F65
G // I / K
7 8 9 10 11
8 7 10 11 9
^^^ff^^ffident of Correlatic
H
I
N= 11
12
89
10 11
2 -1
3 -1
1 +2
6 -2
4 +1
5 +1
8 -1
7 +1
10 -1
11 -1
9 +2
Steps :
11441111114
ZD'= 20.
rk=l-
Now we get.
N^-N
n'-n
341
rk=l-
6x20 11^-11
=1-
120 1320
= 1 - 0.091 = 0.909
Hence, there is high degree positive correlation, i.e., two judges are agreeing to the degree of 0.909. It
indicates that judges have fairly strong likes and dislikes so far as ranking of the babies are concerned.
Illustration 16. From the following marks obtained by 10 students in Statistics and Economics, calculate
Spearman's coefficient of rank correlation.
36 4 50 -5 -1 ' 1
56 8 35 2 +6 36
20 2 70 8 -6 36
65 10 25 1 +9 81
42 5 58 6 -1 1
33 3 75 1? -6 36
44 6 60 7 -1 1
53 7 45 4 \ +3 9
15 1 89 10 -9 81
60 9 38 3 +6 36
N= 10 = 318
Steps
1. Assigns ranks to given data. Ranks can be given by allotting the biggest item the first rank, the
next to its second rank and so on or smallest item the first, next to its second rank and so. on. Any one
of the above method of ranking must be followed in case of both the variables.
2. Find the difference of two ranks (i.e., R^ - R^) and denote these differences by D.
342
rk = 1
Here,
rk = 1 -
^x318
=1-
1908 990
th uu = 1 -1.-927 .-0.927.
2^x^226-(55). (55)
iiOx
2260-3025
343
Measures of Correlation
Blustration 17. Ten entries are submitted for a competition. Tbree judges study eacb
Ranks given by :
Calculate the appropriate rank correlations to help you to answer the following questions :
Entry No. : 1 2 3 4 5 6 7 8 9 10
Judge A : 9 3 7 5 1 6 2 4 10 8
Judge B : 9 1 10 4 3 8 5 2 7 6
Judge C : 6 3 8 7 2 4 1 5 9 10
- ■A •a
1 9 9 6
2 3 1 3
3 7 10 8
4 5 4 7
5 1 3 2
6 6. 8 4
7 2 5 1
8 4. 2 5
9 10 7 9
10 8 6 10
N = 10
- IjF^
0 0 +3 9 +3 9
+2 4 0 0 -2 4
-3 9 -1 1 +2 4
+1 1 -2 4 ,-3 9
-2 4 -1 \-1 +1 1
-2 4 +2- .•4 +4 16
-3 9 +1 1 +4 16
+2 4 -1 " 1. -3 9
+3 9 +1 1 ' -2 4
+2 4 -2 '4 -4 16
rk = l-
N-' - N
6x48
10" -10
= +0.71
rk (between Judges A and C)
rk = I -
6x26
10-^10 = +0.8425
= I _ ^ = 1 _ 0.29 290
= 1-^=1- 0.1575
990
344
rk = l-
6x88 10^-10
(*) ^mce is n^nimun. of the pair of judges ^ and C. therefore, they disagree the
in and" ^^rr/t^S-"^nts
rk=l-
N^-N
0.5 = 1 -
2£>2 ^ 0-5x990 6
= 82.5
Corrected rk = I -
=1-
(10)^-10 735
990
rk = + 0.2576
13
13
24
15
20
19
345
X Y
40 24 7 10 -3.0 9.00
9 6 1 2.5 - 1.5 2.25
65 20 10 9 +1.0 1.00
25 9 5 4 +1.0 1.00
57 19 9 8 +1.0 i.oo"
N = 10 ZD^ = 39.5
Steps :
1. Assign the ranks to given data. When two or more items are of equal value, they are assigned
average ranks. For example, in X series value 16 repeated twice and
they are each ranked ^^ =3.5 and in Y series value 13 are given the rank ^^ = 5.5 and so on.
2. Obtaining ID^ apply the formula. When equal ranks are assigned to same of the
entries and adjustment is made in the formula of rank correlation, i.e., adding —
{m^ -m) to the value of SD^ Here, m represents for number of times whose ranks are repeated. In case,
there are more than one such group of values with common rank 1/12 (m^ - m), is added as many times
the number of such groups. The adjusted formula is as under :
..3
rk = 1-
N^-N
Now we get, rk
ta :
=1-
=1-
6(39.5 + 1.5)
990
=1-
10^-0 6x41.0
990
246
= 1 - 0.246 = 0.754
346
OF FORMULAlt
r = ^y _ Zxy J_ J N.ax.ay N ^ ax ^
ay
NN
Ixy
xZ/
ZXY-N
r=
In
N.
'N
In
IN
.N
_N
N1XY-EX.IY
zxy-n.x.y
I------------^^ ± —1\.
Measures of Correlation
r=
347
N
Explanation of Symbols
r = Karl Pearson's Coefficient of Correlation. X = (X - X), deviations taken from actual mean of X series. y =
(Y - Y), deviations taken from actual mean of Y series. ax = Standard deviation of X series. ay = Standard
deviation of Y series. ZX = Sum of the values of X series ZY = Sum of the values of Y series ZX^ = Sum of
square of the values of X series ZY^ = Sum of square of the values of Y series ZXY = Multiplying X and Y
values and obtaining the total N = No. of pairs of observations
dxdy = Multiplying the deviations taken from assumed mean of X series with the deviations taken from
assumed mean of Y series.
rk = \-
=1-
N^-N
N^-N
348
exercises
4.
5.
6. 7.
10.
11
12.
13.
14.
Questions ; ~ "
ic) Simple, partial and multiple correlation Give three examples of perfect correlation. '
8.
9.
26
35
57
68
8 12
9 11
349
Me^swres of Correlation
15 What are the advantages of Spearman's rank correlation over Karl Pearson's correlation coefficient?
Explain the method of calculatmg Spearman s rank
correlation coefficient.
Problems :
1. Give the following pairs of value of variables of capital em^ployed and^profit^. Capital employed (in
crores of Rs) (X) Profit (in lacs of Rs) (Y)
ib) Do you think that there is any correlation between profit and capital employed.
Is it positive or negative? Is it high or low? (c) By graphic inspection, draw an estimating hne.
2 Plot the following data as a scatter diagram and comment over the result : ' X ■ 11 10 15 13 10
16 13 8 17 14
y : 6 7 9 9 7 . 11 9 6 12 11
3 Following are the heights and weights of 10 students in a class. Draw a scatter ■ diagram and
indicate whether the correlation is positive or negative.
Weight (m kg) : 65 54 55 61 60 54 50 63 65 50
4. Construct the scatter diagram of the data given below and interpret it
{expuri/.
Draw a scatter diagram for the data given below and interpret it
X: 10 20 30 40
y. 32 20 " 24 36
50 40
60 28
70 48
80 44
XY
15
18
10
30
17
27 16
25 12
23 13
30 9
f!
350 •
Y Series
■ -------J__
15 25
18 25
39 41
24 27 30 36 27 31 33 35
- ^^ - S 2 2 - -
42 41
48 45
[r = 0.98]
91 95 49 40
;- « .^o 550
16 17 20 19 19 20 25 27
Firms 12 3
■™---JJ
Expenses . n 13 14 ..
(/« Rs '000) ^ 15 14 13 j3
12. Ten students got the following percentage of b ■ c • f'" = Ser,al No. ; 1 ^ 'J' ^^ Statistics and
Mathematics -
Statistics : gO 60 51 76 58 J J ' ^ 10
Mathematics 45 yj ^^ 62 64 72 56 58
13. Calculate correlation coeffident between X the nnn,K . • ^^ = " 3ri Y, the number of rain coats
solrl in
^ =Vl4 8 18 10 22 9 3 ,
^ 11 20 12 15 73'' ^^
^ 3 4 7 10 11 29
[r = - 0.67]
Measures of Correlation
351
^ f The deviations from their means of two senes (X and Y) are given below . Ky . 4 -3 -2 -1
0+1+2+3+4
: j -3 -4 0+4+1 .2 -2 -1
< Calculate Karl Pearson's coefficient of correlation and interpret the resu.t. ^^ ^ ^^
Y ; 9 8 10 12 11 13 14
8 16
16.
9 15
[r = +0.931
Calculate the correlation coefficient of the marks obtained by 12 students m Mathematics and Statistics
and interpret it.
Students
A 50 22
B 54 25
C 56 34
D 59 28
£ 60 26
F 62 30
G 61 32
H 65 30
7 7 K ^ 67 71 71 74 28 34 36 40
[r = + 0.783]
67
68
68 72
69 VO
71 73 69 70 [r = 0.47]
17. The height of fathers and sons are given below : Height of fathers (in inches) : 65 66 67 Height of
sons (in inches) : 67 68 64
18 Find Karl Pearson's coefficient of correlation from the following index numbers and
Costofltvtng : 98 99 [r = +0.85]
19. Find the product moment correlation between sales, and expenses of the following 10 firms.
Firms Sales Expenses
1 50 11
2. 50 13
55 14
60 16
65 16
65 15
7 65 15
8 60 14
9 60 13
20.
10 50 13
[r = +0.797]
Calculate the coefficient of correlation for the following ages of husbands and wives in years at the time
of their marriage.
33 35 36 29 28 29 [r = +0.82]
21 Find suitable coefficient of correlation for the following data : .^iFertUizers used ^ tons) : 15 18
20 24 30^ 35^
23.
Annual maintenance ■. I6OO 1 Ton lonn ^ ^ 10 12 Co^MmRs) 1800 1900 1700 2100 2000
C 400 14
A 200 10
B 500 16
700 20
E 600 17
24.
F 300 13
[r = +0.988]
Total of the deviation of X = -170 Total of the deviation of Y = -20 Total of squares of deviation of X =
2264 lotal of the squares of deviation of Y - 8288
^ 25 16 12 8
27. The following are the marks obtained fout nf mm u t'" = +0-143] emp oyment interview
held by two Ldependem - ^ '
Candidates ■■ A n n r^
'' - " n
Judge X Judge Y
20 22
20 15
14
10
11
12
13 9 [r = 0.721]
353
Measures of Correlation
X: 1 2 3 4 5 6 7 8 9 10
y. 12 96 10 354782
Y : 12 13 14 14 14 16 15 ^^ _ ^^^^
30. Twelve entries were submitted in a flower show competition. They were ranked by
11 12 11 1 [r = -0.454]
59 17
12 5 8
[r = +0.86]
57 19
[r = +0.73]
32. Calculate rank coefficient of correlation between years of service and efficiency rating.
Persons
A 24 66
B 30 51
C 12 84
25 66
E 29 45
F 19 81
G 16 72
H 10
97
IJ
11 7 92 70
[r = -0.78]
33. From the following data calculate coefficient of correlation by the method of rank
95 70 60 80 81 150 115 110 140 142
XY
75 120
68 134
50 100
[r = +0.93]
1. 2.
3.
4.
5.
6.
7.
8. 9.
10. 11.
Chapter 12
Introduction Definition
Types of Index Numbers Problems in Construction of Index Numbere Methods of Constructing Index
Numbers Consumer Price Index (CPI) Index of industrial Production (IIP) General Uses of Index Numbere
Inflation and Index Numbers Limitations of Index Numbers List of Formulae
li'ftSWf
INTRODUCTICm
■ have r'r' we
Vegetable oil (per litre) Rs Tea (per kg in Rs) 40 100 ^yjyjj 80 150
ti c
We can measure the change in the prices of vegetable oil and tea in two ways :
(a} Actual difference. The actual difference in price is the difference between the current year price and
the base year price.
We find that the rate of vegetable oil is increased by Rs 40 and of tea by Rs 50 from the year 2000 to
2005. From this, it appears that the increase in price of tea is more than the increase in price of
vegetable oil. (b) Relative change (price relative). The relative change in prices is the actual difference in
prices relative to the original price. From the above example :
80 - 40 = Rs 40 150 - 100 = Rs 50
Relative change =
Actual difference
80-40
or 1-
For vegitable
For Tea
40
150-100 100
=1
or
1-
= 0.5 or 1 -
40 150 100
=1
= 0.5
This change can also be expressed in percentage : For vegetable oil : 1 x 100 = 100% And for tea : 0.5 x
100 = 50%
The ratio of prices in two years is called price relative which is a pure number and this price relative for a
single commodity even may be called an index number of that commodity.
However, if we calculate the rise in percentage taking 2000 as the base year, we. find that the rise is
100% of vegetable oil and 50% in case of tea.
Symbolically,
iL Po
100
100
356
P^ - price of the current year (2005) Pa = price of the base year (2000)
Tea
40
m 100
= 150
Thus, change in pncrrrll ^"-at of tea. actual difference in prices. important than just the
As measurement of veeerahip nil .v. iv " i of measurement, their absZe duffel l ""
Rs 80
Rs150
I.e.,
80 150
40 100
2 + 1.5
= 1.75
".dex goes up. Similarly, we come acroTs rdex ^b '"T"''-production, sales, export, prices, wages «c Thev
ar7 f , "Sncultural and industrial economy. Index numbers are the bafo^rlX^^VrrnX:'™''''''"-
definition
According to Spigel, "An index number is a statistical measure designed to show changes in variables or
a group of related variables with respect to time, geographic location of other characteristics."
According to Croxton and Cowden, "Index numbers are devices for measuring differences in the
magnitude of a group of related variables".
1. Index numbers are expressed in terms of percentages so as to show the extent of relative
change. However, percentage sign (%) is never used.
2. Index numbers are relative or comparative measurement of group of items. They compare
changes taking place over time or between places or like categories-schools, persons, hospitals etc.
3. Index number are called SpeciaUsed type of averages in the senses that they help us in
comparing change in series which are in different units. Averages like mean, median and mode can be
used to compare only those series which are expressed in the same unit.
4. The technique of index numbers is utilised in measuring changes in magnitude which are not
capable of direct measurement due to composite and complex character of the phenomenon. Examples
of such phenomena or magnitudes are price level, 'cost of living', prices of specified list of commodities,
volume of production in different sectors of an industry, production of various agricultural crops,
'business or economic activity' etc. Changes in business activity in a country are not capable of direct
measurement but it is possible to study relative changes in business activity by studying the variation in
the values of same such factors which affect business activity and which are capable of direct
measurement.
There are various kinds of index numbers. In economics and business, they can broadly classified as
under :
5. Sensex
1. Wholesale price index (WPI) is used to measure the general price level where we are required to
obtain the wholesale prices of industrial, agricultural and other products from wholesale market. It does
not include the items pertaining to services like repairing charges, barber charges etc. WPI is used to
eliminate the effect of
358
4. number of Agriculn.ral Production (lAP) is used to study the rise and fall of the yteld of pnncpal
crops from one period to other period
5. Sensex is a useful guide for the investors in the stock marter If rh. •■
appropnate time for mvestment. The rise in sensex at the highest level reflects the base''??
S'valut'oft°hT'°™ Bombay Stock Exchange Sensitive todex with 1978-79 as
iiuinoer will replace wholesale price index. Producers Price Index /PPT^
359
Following are the important problems which must be well defined for the construction of index numbers
:
1. Purpose : Every index number has its own particular uses and hmitations. The first and foremost
problem in the construction of index numbers is in regard to the objective or the purpose for which they
are required. It is important to know what is to be measured and how these measures are used. If the
purpose is to measure the general price level, then wholesale price index number is used. If the purpose
is to measure cost of living of middle class families, working class (labour) or agricultural workers, in a
particular region or city, then consumer price mdex number is used. If the object is to measure relative
change in industrial production, then index number of industrial production is to be used.
2. Selection of base period : When comparison is to be made between different time periods or
different places, some point of reference is to be decided. This is called base. In the above illustration
about prices of vegetable oil and tea, we have taken year 2000 as the base year and 2005 as current
year for our calculations' of index numbers. The base is assigned the value of 100%.
(a) The base period should not be either too short or too long : It should be neither less than a
month nor more than a year from calculations' point of view.
(b) The base period should not be too near or too far : This is because people usually prefer to
compare present conditions with conditions in base or reference period that is not too far back time. If
the base period is too far the comparison becomes meaningless. Due to introduction of new
commodities, change in habits, taste, fashion, in economy many commodities may go out of use. In such
situation it becomes necessary to shift the base period.
(c) The base period should be normal and representative period : Base period should be free from
all sorts of abnormalities and random or irregular fluctuations like earthquakes, wars, floods, famines,
labour strikes, lockouts, economic boom and depression.
Fixed base and chain base : If the period of comparison is kept fixed for all current years, it is called fixed
base period. However sometimes chain base method is used, in which the changes in the prices for any
given year are compared with prices in the preceding year and not with the fixed year. Naturally, the
chain base method gives a better picture than what is obtained by fixed base method. However, much
would depend upon the purpose of constructing the index.
3. Selection of items : Collection of data is a special problem in constructing index numbers, since there
is a large variety of goods and prices. Care also must be taken that data from unrelated commodities or
periods are not grouped together for the calculation of price index. If the number of the commodities is
too large, a choice
360
of some representative items has to be made. On the other hand, inclusion of too few Items would
make the index number unrepresentative of he ™
mcrdfal, '' ^^^^ calculations, it is nof^"s bk «
be considered :
(«) Commodities selected should be relevant and representative of the group according to the purpose
of mdex number. For example, in const uction of wholesale price mdex number to know the general
price level, we sZw nclude wholesale prices of some major mdustrial and agricultural ^olmoS and Other
goods and services. In the same way for coLtruction ofZsle
we must collect the prices relating to production of various goods of factories For
361
6. Choice of an average : For constructing an index number any average such as mean, median,
mode, geometric mean and harmonic mean can be used Frorn the practical point of view median and
mode are unsuitable because of their being Latic. The geometric mean and harmonic mean are difficuh
to calculate hence; arithmetic mean is used. Though with the development of the use of electronic
computers, the use of geometric mean is also becoming popular.
7. System of weighting : In order to allow each commodity to have reasonable influence on the
index it is advisable to use a suitable weighting system. Unweighted index numbers are those where all
commodities are given equal importance. But in most cases different commodities are given different
degrees of importance, therefore, weights are assigned to the various items.
The method of weighting used would depend on the purpose of index. Weighting may be done
according to : (a) Value or quantity produced, (b) Va ue of quantity consumed, and (c) Value or quantity
sold. When the quantity is the basis of weight it is called quantity weighting and when the value is the
basis, it is called value weighting. Weight may ,be either implicit (arbitrary) or explicit (actual).
8. Choice of method : There are various methods of calculating index numbers such as the
aggregative method or the price relative method. Various methods have been proposed for calculation
of weighted index number such as Laspeyre's method, Paasche's method, Dorbish and Bowley's method,
Marshall Edgeworth's method, Kelley's method and Fisher's method. Fisher's method is considered as
ideal for constructing index numbers. No single formula can be said to be appropriate for all types of
index numbers and as such the choice of a formula wi 1 have to be made taking into account the object
of index numbei; the data available and the resources at the disposal of the person or organisation
constructing the mdex
number. »
constructing indei
Methods
Price
362
This is the simplest method of calculating index numbers. In this method, total of the current year prices
for the various commodities is divided by the total of base year prices and the quotient is multiplied by
100.
Po^ =
Quantity Index
^01 =
^Po
X 100
100
where.
Poi = Current year price index number ^01 = Current year quantity index number Zp^ = Total of current
year prices for various commodities ^Po = Total of base year prices for various commodities
= Total of current year quantities for various commodities = Total of base year quantities for various
commodities
Illustration 1. Calculate price index number for 2005 taking 1995 as the base year trom the following
data by simple aggregative method.
Commodities
Solution.
A B C D E
A 100 140
B 80 120
C 160 180
D 220 240
E 40 40
It:
exi
363
Steps
Po, =
= ^ X 100
Zpo
Here,
p = Price index number of the current year (2005) Zp = Total of current year prices for various
commodities Zp - Total of base year prices for various commodities
Now, we get
Commodities ■■ ^ ^^^ jj 5 2
80 60 20 10 6
Commodities
ABCDE
Total
(qp)
___ _
40 10
52
■Quantity in 2004 (in kg)
Iq^}
Uo = 107
80 60 20 10 6
24, = 176
^^ - 100
^01 E^o
Here,
101
^01 107
364
Illustration 3. Compute index numbers for the vears 1996 innn ( u r ., data (Base Year 1995). ^ 1996
to 2000 from the following
1996 14
1997 16
1998 20
1999 22
2000 24
Limitations
X 100. Index number by this method is the arithmetie mean or median or geometric'
365
Illustration 4. Construct pure index number for 2005 taking 2000 as the base year from the following
data by simple average of price relative method.
ABCD L
" Solution. 'A price relative is the price of the current period expressed as a percentage of the price
at the base period'.
C 160 180
Steps :
|LxlOO
>1
Po^ =
Po
xlOO
366
^xlOO {Po
611.6
= 122.32
N~s
m the prices of commodities m the year 2005 to the extent of 22.32% as compared to
Merits
1. Index number is not influenced by extreme items. Equal importance is given to all
LfxC? ItvlliS*
Limitations
1. The relatives calculation are assumed to have equal importance. This assumption may not be
always correct. ^
weights are assigned to various commodities to reflect their relative importance in the
Weights are assigned to the various items. There are various methods of assigning
method, Marshall Edgeworth's method, Kelley's method and Fisher's method Fisher's
constructing index numbers. According to syllabus of Class XI, we are discussing here Laspeyre's and
Paasche's method of constructing index
Price Index fo: = X 100 Zpolo Quantity Index = X 100 ZqoPo Price Index = v^^ X 100 Quantity Index =
X 100 ZqoPi
!. This assumption
denoted as ,, and' the'^na^; „ ^t™ ylJlt^^^.T'''' ''' Laspeyre-s metltod is very widely used" Ti, ha The
or
XlOO
where
XVo
Poi = Price index number ^01 = Quantity index number Vgj = Value index number Pi = Current year price
pQ = Base year price
= Current year quantity = Base year quantity V, = Current year value (Zp^q^) Vp = Base year value {Lp q )
Commodities
ABCD
Price
10 8 6 4
Quantity
30 15 20 10
Price
12 10
66
Quantity
50 25 30 20
368
Solution.
Year (ZOOS) 1
50 25 30 20 300 120 120 40 500 200 180 80 360 150 120 60 600 250 180 120
where
X 100
690
= J^ X 100 = 118.96
Thus the price index number of 2005 is 1 IS v pnces of commodities m the year 2005 to the ex,^^ urease
m the Laspeyre's Quantity Index : 18-96% as compared to 1996.
= |»xlOO ^oPo
960
^01 =
580
XlOO = 165.52
JOU
(B) Paasche's Method : In this method current year quantities are. taken as weights. Steps. (Price Index
Number)
1. Mukiply current year prices of various commodities with current year weights and obtain Spj^j.
2. Multiply the base year prices of various commodities with the current year weights and obtain
Paasche's Method :
p 3,^x100
1150 960
X 100 = 119.79
Thus, the price index number of 2005 is 119.79. In other words, there is net increase in prices of
commodities in the year 2005 to the extent of 19.79% as compared to 1996.
1150 690
xlOO = 166.67
Thus, the quantity index number of 2005 is 166.67. In other words, there is net increase in quantity of
commodities in the year 2005 to the extent of 66.67% as compared to 1996.
Zpo^l
1150 580
X 100 = 198.28
Thus, the value index number of 2005 is 198.28. In other words, there is net increase in value of
commodities in the year 2005 to the extent of 98.28% as compared to 1996.
Illustration 6. Calculate weighted average of price relative index number of prices for 2005 on the basis
of 2004 from the following data :
370
Commodities
AB
CDE
Solution.
Weights
20 12 8 4 6
Price 2004
20 15 10 5 4
Price 2005
Commodities
ABCDE
Weights
(w) %
20 12 8 4 6
Steps :
35 18 11 5 5
Price 2004
20 15 10 5 4
Price 2005
35 18 11 5 5
Value weights
(PolJ [V]
400 180 80 20 24
Pi
^xlOO ypo
fp]
IV = 704
[PV]
IPV = 105400
nr fPl/l
or [PV]
Po, =
— XlOO \Po
or
^^ _ 105400
= 149.71
tr in
CO
im
371
The wholesale price index numbers measure the changes in the general level of prices and they fail to
reflect the effect of the increase or decrease of prices on the cost of living of different classes or group
of people in a society. Consumer price index numbers are also called (/) Cost of living index number, or
(«) Retail price index number, or (ttt) Price of living index numbers. Consumer price index numbers are
designed to measure the average change over time in the price paid by ultimate consumer for a
specified quantity of goods and services. They measure the change in the cost of living of a particular
section of society due to change in the retail price. A change in the price level affects the cost of living of
different classes of people differently. The general index number fails to reveal this. So there is the need
to construct consumer price index. People consume different types of commodities. People's
consumption habit is also different from person to person, place to place and class to class, i.e., richer
class, middle class and poor class.
(2) Conducting family budged enquiry : Family budget enquiry is held with a view to find out how much
an average family of this group spends on different items of consumption. The quantity of the
commodities consumed, as also prices at which they are purchased are noted down. The enquiry is done
on a random sample basis. Some famihes are selected from the total number by lottery method, and
their family budgets are
372
(v) Miscellaneous
IWR
IW
Here,
Po.
W = Weights
Zpo^o
This is based upon Laspeyre's method. According to this method, the various items are given weights on
the basis of quantity consumed in the base year.
If the calculated cost of living index number is more than 100, it means a higher cost of living,
necessitating an upward adjustment in the wages and salaries of employees The rise of wages or salaries
is equal to the amount of percentage it exceeds 100. If the calculated index number is less than 100, it
means the cost of hving has decline by the balancing percentage between 100 and calculated index
number.
Illustration 7. An enquiry into the budgets of the middle class families in a certain city gave following
information. What is the cost of living index of 2004 as compared with 1995. Calculate by :
Expenses on items
Price (in Rs) 2004
Solution.
Expenses on item Weights (%) (W) Price (in Rs) 1995 (Po) Price (in Rs) 2004 (P.) Price
Relative - A (R) Weighted Relatives (WR)
CPI =
LWR 13449.9
= 134.499
ZW 100
374
Expenses on items
Food Fuel
Clothing
Rent
Misc.
Weights
35 lb 20 15 20
1500 • 250 750 300 400 49000 2000 10000 3000 5000 52500 2500 15000 4500 8000
100
hving for 2005 with 1980 as JS hSe ' """ber for eost of
Food Clothing Fuel and lighting House rent Misc. 100 20 15 30 35 200 25
20 40 65
Solution.
Items Weights W Price 1980 (Pol Price 200S (PJ Price: Relative R 100 Weighted
Relatives (WR)
Food Clothing Fuel and lighting House rent Misc. 75 10 5 6 4 100 20 15 30 35 200 25
20 40 65 200 125 133.33 133.33 185.71 15000 1250 666.65 799.98 742.84
CPI =
375
IWR IW
18459.47 100
= 184.594
Thus, there is increase of 84.6% in prices of 2005 with that of 1980„ Illustration 9. The consumer price
index for June 2005 was 125. The food index was 120 and that of other items 135. What is the
percentage of the total weight given to food? Solution.
(I) :m
Let the total weight = 100, W^ = Food and W^ = other items Hence, 100 = Wj +
IWR
.(1)
CPI =
125 =
IW
or
100 = 12500 =
We get 13500 =
12500 =
... X (135)
Wj + W^ 120 Wj + 135 W^
W, = 100 - 66.67 = 33.33 Hence, percentage of total weighs given to food = 66.67% and for other items =
33.33%
M.
376
Verification
Items
Index
(I)
120 135
Weights (W)
66.67 33.33
IW = 100
W.
8000 4500
IW, = 12500
CPI =
= 125
Consumer Price Index No. is 125 as given in question. Uses of Consumer Price Index
Suppose, the consumer price index was 400 in 7(\[\a n^ u 100 m 2000-01. Then a rupee in 2004-^5 wol
be ^^uTto ' " ""
100 400
= 0.25
100 and for 2004.0J was AmwTlnA A Consumer pnce mdex for 2000-01 was by rise of h,s
wages^T^elrl^Telltagt""" """"""
377
However the monthly money wage was raised from Rs 3250 to 5000 in 2004-05. The worker has not
gained. In fact his real wage has gone down. The real wage of the worker is Rs 1250 in 2004-05 as
compared Rs 3250 in 2000-01.
Example 12. If the salary of a person in the base year is Rs 4000 per annum and the current year salary is
Rs 6000, by how much should his salary rise to maintain the same standard of living if the CPI is 400.
When 100 is the CPI of Base year, bis salary is Rs 4000 400 CPI of current year, his salary should be
400x4000
100
= Rs 16,000
Hence, his salary rise should be of Rs 10,000 (16,000 - 6000 = Rs 10,000) in current year to maintain the
same standard of living.
2. The government (central or state) and many big industrial and business units use consumer
price index numbers to regulate the Dearness Allowance (D.A.) or grant of bonus to employees. This
compensates them for increased cost of living due to price rise. They are used by the government for
the formulation of price policy, wage policy and general economic policies.
3. If the prices of some important essential commodities (like wheat, rice, sugar, cloth, etc.)
increase, due to shortages, the government may decide to provide them through fair price shops or
rationing.
4. Costs of living index numbers are used for deflating value series in national accounts.
5. Consumer price index numbers are used widely in wage negotiations and wage contracts. They
are used for automatic adjustment (increase) of wages corresponding to a unit increase in the consumer
price index.
Index numbers of industrial production are fairly common these days. They tell about the relative
increase or decrease in the level of industrial production in a country in relation to the level of
production in the base year. They are the best measures of economic progress in any country. These
indices can be constructed by studying variations in the level of industrial output. As such the first step
in the construction of such index numbers is to find the level of output of various industries of the
country. It should be remembered that these index numbers throw light on changes in the quantum of
production, not in
A number of index numbers of industrial production are compiled in India by official and non-official
agencies. The general index of industrial production is the most popular among these. In India, Index of
Industrial Production is published by Central Statistical Organisation (CSO), Industrial Statistics Wing. The
old series of Index of Industrial
.'aatlT''. . ' but now new series is pubHshed with the base year
Usually important data about production are collected under following major heads:
I. Mining Industries : Coal (inc. lignite). Petroleum, crude (off-shore and on-shore) Iron ore.
II. MetaUurgical Industries : Hot metal (inc. pig iron), crude steel, semi-finished steel, steel castings,
aluminum, bister copper.
III. Mechanical Engineering Industries : Machine tools, cotton textile machinery, cement machinery,
railway wagons, automobiles, (commercial vehicles, cars, jeeps, land rovers), power driven pumps,
diesel engines, earth moving equipment, bicycles sewing machines, agricultural tractors. '
IV. Electrical Engineering Industries : Power transformers, electric motors, electric fans, electrical
lamps, radio receivers, aluminum conductors.
V. Chemical and Allied Industries : Nitrogenous fertilizer (N), phosphatic fertihzer (P^Oj), soda ash,
caustic soda, paper and paper bond, automobile tyres, bicycle tyres, cement, petroleum refinery
products, penicillin, streptomycin, chloramphenical powdei; vitamin A.
VI. TextUe Industries : Jute textiles, cloth (cotton cloth), mixed/blended cloth, spuny and filament yarn,
staple fibre etc.
379
The data relating to the production of the above mentioned industries are cq^lected either monthly,
quarterly or yearly. The production of the base year is taken as 1^0 and the current year's production is
expressed as a percentage of the base year's production. These percentages are multiplied by the
relative weights assigned to various industries. Weights are usually assigned on the basis of the relative
importance of different industries. The relative importance of industries is usually decided on the basis
of capital invested, the gross value of productions, turnover, net output etc. Many other criteria of
relative importance can also be laid down. Usually weights in an index number of industrial production
are based on the values of net output of different industries. The weighted arithmetic average or
geometric mean of the relatives give the index number of industrial production. Such index numbers can
be constructed both for gross output as well as net output.
The following table shows broad industrial grouping and their weights.
Electricity 10.17
From the above table, we find that the growth performances of broad Industrial categories differ.
iL Uo
IW
W = Relative importance of different outputs. Illustration 13. Construct Index of Industrial Production for
2004 from the following information.
Industry Output (Units)
2. Textile 80 110 25
3. Mechanical Engineering 70 90 15
4. Chemical 80 70 25
5. Electrical 90 120 15
380
In.Iusln-
I ■ Milling
2. Textile
3. Mechanical Engineering
4. Chemical
5. Electrical
X iL W
ZW
12220.20
100
= 122.20
measure relative temporal or crol^eS^n^^.r''™ ""'"bers is to compared with same base figure
Inde^^^^^^^^ ' "^"^^le or a set of variables time to time, among differentXe^and T " comparison of
changes from
-- --red ,„
381
3. They help in framing suitable policies : Index numbers are indispensable tools for the
management of any government organisation or an individual business concern for efficient planning
and formulation of business policies. For example, relative wholesale and retail price index numbers are
the output (volume of trade, industrial and agrxultural production etc.) help in economic and business
policy making.
It is not in the field of business and economics that index numbers are used as a basis for policy frame
but even in disciplines like Sociology and Psychology their utility is immense. For example, sociologists
may speak of population indices, psychologists measure intelligence quotients which are essential index
numbers comparing a person's intelligence score with that of an average for his or her age. Health
authorities prepare indices to display changes in the adequacy of hospital facilities and educational
research organisations have devised formulae to measure changes in effectiveness of school systems.
4. lo measure the purchasing power of money : Index numbers are helpful in finding out the
intrinsic worth of money as contrasted with its nominal worth. Very often statements are made that
purchasing power of the Indian rupee in 2000 is only 20 paise as compared to its purchasing power in
1990. It means that a person who was having an income of Rs 1000 per month in 1990 should have an
income of Rs 5000 to maintain the same standard which he was maintaining in 1990. This helps in
determining the wage policy of a country.
5. To help in study of trend : Index numbers are very useful in the trend or tendency of a
series over a period of time. It is easy to find out the trend of exports, imports, balance of payments,
industrial production, prices, national income and variety of other phenomena. It is also useful in
forecasting future trends. With the help of index numbers of prices, demand, wages, income etc., a
business executive is in a better position to take decisions about whether a new product should be
launched or whether there is scope for exploring new markets or whether the existing pricing and
production policies need a change.
6. For adjusting National Income : Index number are vfery helpful in deflating (adjusting) national
income on the basis of constant prices to enable us to find out whether there is any change in the real
income of the people. They are used to adjust the original data for price changes, or to adjust wages for
cost of living changes and thus be transformed into real income and nominal sales into real sales
through appropriate index numbers.
According to Samuelson— By inflation we mean a time of generally rising prices for goods and factors of
production—rising prices of bread, cakes, haircuts, rising wages, rents, etc.
According to Ackley "Inflation can be defined as a persistent and appreciable rise in general level of
average prices".
382
pnce levef rLg ove ' t me Jnfl^^ " f -- -PP^X that keeps the
(WPI). „ ,s aL.n L T
383
Category
Weifihts %
22.0 14.2 63.8
■No. of items
98 19 318
Source : Economi^ Survey ^05-06 Inflation and Wholesale Price Index Number
WPI IS the only price index m ndt wS ^^^ ^^^de and transactions,
lag of two weeks I^is due to rhe-^ ^^ '' ^^ """ ' ''''''
Table 1
Year
Primary Articles
Manufactured Goods
AU , Cunnnodirics
Last week of
1989-90
1990-91
1991-92
1992-93
1993-94
1994-95
1995-96
1996-97
1997-98
1998-99
1999-00
2000-01 2001-02
2002-03
2003-04
175 171.1
190 191.8
214 217.8
231 233.1
254 258.3
121 125 136 142 153 159 162 168 178 181 183
198
199
109
115
130
148
153
193
223
231
256
263
290
313 312
117 ■ 116.9
123 122.2
126 128.8
129 134.6
135 141.7
139 150.9
144 159.2
144 161.8
152 172.3
162 180.3
169 189.5
172 197.8
173 198.3
From the above Table 1 of Wholesale Price Index Number obtained from Economic Survey, 2005-2006,
let us understand the uses of WPI under the following heads:
6. Uses in planning
1. Price trends in India : Ever since independence the price trends in India have varied
between sharp to moderate increases. With the exception of some years of the First
Five-Year Plan, viz., 1952-53 and 1954-55 when prices showed a moderate decline, almost
the entire period of over five decades since 1950-51 has shown persistent rise in prices.
molesale Prices. The rising trend in wholesale prices, as shown by the Wholesale
, bas continued ever since 1960-61, but it assumed alarming dimensions since
1972-73 after the first oil shock of 1973 when" OPEC nations affected a manifold rise
in oil prices. OPEC again increased petroleum prices in 1978 that adversely affected our
1970-71 as base 100) increased to 175 in 1974-75 and further to 256 in 1980-81 thus
showing two and a half fold .increase in price in just one decade. The base year for WPI
was changed to 1981-82 = 100 under the new index which rose to 258.3 in 1993-94
showing another two and a half fold rise in price in a little over one decade The base
year was again changed to 1993-94 = 100 under the current series of Wholesale Price
1993-94 and 2004-05. The Wholesale Price Index stood at 189.5 in 2004-05. Table 1
shows the movement of wholesale prices of various commodity groups since 1986-87
2 Measuring rate of inflation : WPI is used to measure the rate of inflation. The rate of inflation is useful
to know the real value of income, savings and wealth etc. Using WPI of 2003-04 and 2004-05 for all the
commodities from the table given above, the rate of inflation can be calculated as under :
Rate of inflation =
-100
= 5.1%
385
Thus, the annual inflation rate during 2004-05 was 5.1% in case of all commodities. One can also
calculate inflation rates for different commodities or commodity groups as required for policy purposes.
the Wholesale Price Index (WPI) increased from 4.6 per cent at end March 2004 to 5.1 per cent at end
March 2005. The year 2005-06 started with an inflation rate of 5.7 per cent on April 2, 2005, which was
followed by a softening trend until August 27, 2005 when it reached a trough of 3.3 per cent. While the
rate rose steadily thereafter, it remained below 5 per cent. At 4.5 per cent on January 21, 2006 it was
significantly lower than 5.4 per cent recorded a year ago. Average WPI inflation decelerated from 10.6
per cent in the first half of 1990s to 4.7 per cent during 2001-02 to 2004-05.
3. Forecasting future prices : From the above time series data of WPI understand that the
wholesale price level has increased in 2004-05 for primary articles by 83%, for fuel power, light and
lubricants 191%, for manufactured products 69% and for all commodities by 89.5%. Thus, WPI can be
used to forecast the increase in future prices.
4. Estimation of demand and supply : One can use an appropriate model to estimate the future
demand and supply as the prices affect both the demand and supply. WPI therefore is useful for
analysing and forecasting trade situations by interpreting the present trend in supply and demand
conditions.
5. Determining real changes in aggregatives : WPI are useful to determine the real changes in
aggregates like, national income, national expenditure, capital formation etc. National income is defined
as the value of goods and services produced in a certain year. National income at current prices can be
obtained after calculating the value of goods and services according to prices prevailing in the same
year.
For example, suppose the national income of the country in 2001 on the basis of current year prices
amounts to Rs 700 crore which is increased to Rs 780 crore in 2002. Suppose the WPI increased to 150 in
the year 2002 as compared to 2001 WPI as 140. The real change in national income can be calculated
as :
= 11^x780
150
= Rs 728 crore
Here, the real increase in national income of Rs 28 crore (728 - 700). while actual monetary increase is
Rs 80 crore (780 - 700).
386
amotJ^rerpeSra^d'tr^^^^^^^^^ lajches number of projects wh.ch require huge for rhese profecrs in .rs
I'a, bVd.e " " " " P"™'™
r- ^J - as services
labourers or non-n^JTurlTZXye^
Workers (CPI-IW Base 1982 . 100r£s cha^.e I"dex for Industrial
(CPI-AL Base 1986-87 f 100) while CW for TIrt m "" ^S"cultural labourers
measure of price rise of inflation and is used for de^erm nL ^ '' considered a good governmem
employees as well as otLHlr T ^ allowance (DA) of
price increase will not be inflationary Government gives the statement as oil
Table 2
Major Group
1. Food
4. Housing
6. Misc. group
Grand Total
100.00
100.00
387
commodities and regarded as an index of changes in cost of hving of industrial workers This index shows
that there has been a five-fold rise in retail prices and cost of living of industrial workers since 1982. This
index has gone up to 548 in Oct. 2006 as against 100 in base year 1982. General Index for Urban Non-
Manual Employees showed over fourfold rise between 1982 and Oct. 2006. Thus, the impact of rising
prices on urban non-manual employees was a little less than that on tbe industrial workers. Prices in
rural areas also showed a rising trend but the extent of rise in prices this case was lower than that in
urban areas as is indicated by the the general index for agricultural labourers. Table 3 shows the
movement of prices in India as shown by the All India Consumer Price Index.
Table 3
High inflation hurts the poor with their incomes not indexed to prices. It also puts pressure on interest
rates, and adversely affects both savings and investment. Because of its implications for the poor and its
possible destabilizing effects on macro economic stability, containment of inflation is high on the
Government agenda.
388
4. Deficit Financing
^Po
y V.
evq
ZPoqo
Poi =
389
Quantity Index
. ^ ^ X 100
ZqoPo
, 100
^oPi
CPI = ^ X 100
CPI =
ZPV Method
:ive
_ XW .XW.
iL ZW
W
390
exercises
Questions :
1. Distinguish between actual difference and relative difference in prices. 2- Define index numbers. Why
do we need an index number.?
11. Distinguish between Laspeyre's method and Paasche's method of constructing index number.
Define Consumer Price Index number. Explain the uses of consumer price index numbers.
16. Distinguish between 'Wholesale Price Index' and 'Consumer Price Index'.
17. Why is it essential to have different CPI for different categories of consumers?
19. Can CPI number for urban non-manual employees represent the changes in cost of hvmg of
President of India?
20. What do you mean by inflation? How the wholesale price index numbers are useful for
measuring the rate of inflation?
Problems :
1. Construct the Index Number for 2002 with 2001 as base from the following prices of commodities by
simple (Unweighted) aggregative method. Commodities : A B CD E
12.
13.
14.
50 80
40 60
10 . 5 2
20 10 6
391
2. Using the following data and 2002 as the base period, compute simple aggregative price indices for
the two fuels.
Coal (Rs) 5 3 4
5.
4, Calculate Quantity Index Numbers from the following data by simple aggregative method taking
quantity of 1998 as base.
(Quantity Index No. = 117.1, 125.1, 127.3, 130.5) Calculate index number for 2002 on the base prices for
1991 from the following by average of price relative method.
6.
16 21 6 3 14
Construct the index number for 2000 taking 1990 as base by price relative method using arithmetic
mean.
Commodities -.A B C D
Price (1990) : 10 20 30 40
Price (2000) : 13 17 60 70
392
1993
1994
1995
1996
1997
(in Rs)
75 50 65 60 72
1998
1999
7« 69 75 84 80
wholesale prices in India for second week of Sept. 2002 and L th'e wLr'"^'' ^^^ ""-ber of wholesale
prLes
Weights Index
A B C D 4 3 2 5 20 15 25 10 6 5 3 4 10 23 15 40
aZtitv in? ^"^u price index and quantity index numbers wn-li onoi__j -
2001
Cummodtty
ABC
Pnce
Quantity
2002
Price
252
624
Quantity
31
6
[Laspeyre's : Price Index = 76.92, Quantity Index = 143 18-Paasche's : Price Index = 69.84, Quantity Index
= 130]
11. Calculate weighted aggregative of actual price index number and quantity index number from the
following data using (/) Laspeyre's Method, and {ii) Paasche's Method. Also calculate value index
number and interpret them.
Quantity lbs. Price per lb. Quantity lbs. ' Price per lb.
Bread Meat Tea6 4 0.5 40 paise 45 paise 90 paise 7 5 1.5 30 paise 50 paise 40 paise
Commodity Price Base Year (in Rs) Price Current Year Quantity Base Year (in kg)
A 6.0 8.0 40
B 3.0 3.2 80
C 2.0 3.0 20
13. Prepare consumer price index numbers from the following data for 2000 and 1999 taking 1998 as
base.
[Index numbers, 1999 = 127.25, 2000 = 107.43] From the data given below construct the consumer price
index number
Food 250 45
Rent 150 15
Clothing 320 20
Miscellaneous 300 15
UNIT 4
DEVELOPING PROJECTS IN EC
Chapter 13
Introduction
3. Consumer Awareness
In the previous Unit 1, we have studied the Meaning of Economics; Scope and Importance of Statistics in
Economics; in Unit 2, Collection and Organisation of Data; and in Unit 3, About the Various Statistical
Tools. These tools are Very important in our daily life to analyse different economic activities such as
consumption, production, distribution, transport in land and foreign trade and different business
activities. In this chapter we will learn the method of developing a project report which will help us in
understanding the application of statistical tools to analyse the various types of business activities.
Reports are prepared to give information about the development of institution, business, product,
government activities etc. For example,
1. Consumer may be interested in knowing the quality, price and uses of product in changing
environment and technology, e.g., preference for landline phone or mobile phone,^ detergent powder
or detergent cake, fully automatic or semi-automatic washing machine, etc. Such surveys are conducted
by manufacturing organisations.
2. Shareholders may be interested to know about the earning of organisation and possibility of
getting dividend while holding the shares of the company. Such surveys are conducted by non-
government organisations, societies, etc.
3. Central/state governments prepare reports for future development in priority areas such as
road, power, teleconununication, education, health, etc. For example, for this purpose the government
conducts surveys to know about likely requirement of primary health centres and schools for basic
education. Similarly government decides the requirement of power (Mega Watts), roads to construct in
the light of changing population of a respective area.
4. Reserve Bank of India plans the opening of new branches of commercial banks, cooperative
banks or agricultural banks in the light of increasing credit requirement of population on the basis of
survey reports. Chamber of Commerce, namely.
Federation of Indian Chamber of Commerce and Industry (FICCI), Confederation of-Indian Industry (CII)
conduct surveys of abroad to know the business opportunities arising out of economic development of
respective nation.
5. In the international context, United Nations Organisation (UNO) plans humanitarian help (food, hfe
saving drugs, etc.) in war, drought, earthquakes and such other natural calamities based on survey
reports.
In the light of above examples it is very clear that project reports help in understanding the
requirements of shareholders, consumers. Central and State Governments, Reserve Bank of India and
financial institutions and national and international bodies to plan their activities for future operations.
Those organisations who ignore the changing requirement of the consumers or population may fail in
achieving their goals and objectives.
uroject
2. To help in the pohcy formation about the economic and social development of the country.
3. To direct the efforts of organisation in given objectives based on opportunities provided in the
changing environment.
5. To pay competitive prices for irequired goods by the consumer to take the real value of the price
paid to sellers.
8. To provide food, medical help to badly affected areas due to any natural calamities by national,
international, social and non-government organisations.
9. It helps in conducting research on various issues such as political, social, economical,
technological aspects of national and international significance.
^^imers iuvareni
Consumers may be exploited by manufacturers, government agencies, board of directors and national
and international agencies, e.g., manufacturers charge higher price, provide poor quality, lesser weight,
defective product, etc., to the consumers. The Indian Consumer Protection Act, 1986 has provided
various rights to the consumers, such as right to basic needs, safety, choice, information, education,
redressal, representation and healthy environment. Any consumer is exploited on this ground can
approach to the appropriate authorities to seek compensation or replacement of goods. For this
consumers may be made aware about their rights and informed about proper agencies, which they can
approach for grievances.
There are five steps in preparing a project report for consumer awareness :
1. Identification of Problem
2. Preparation of Questionnaire
3. Collection of Data
5. Conclusion
Identification of Problem
We want to know about consumers'/dealers' knowledge about the product of a company manufacturing
namely, colour TV., air-conditioner, washing machine, refrigerator, car, scooter, computer etc. Let us
take the example of air-conditioner where we are interested to know from dealers about the
performance of air-conditioner with respect to price, cooling technology, quality, availability, warranty,
after sales service etc. keeping in view other air-conditioners' manufacturers product available in the
market in competition.
Preparation of Questionnaire
To know more about various aspects of air-conditioner in a more systematic manner, we must design a
questionnaire covering all the aspects discussed above.
Name:.....
Address :. Phone No.
Q 3. Which brand AC would you recommend to the customer? (rank them) (1-Best, 8-Worst)
(/) Videocon
397
(vi) Authentic
Name .............................................................................................................
(vi) Authentic
(i) Price (iv) Availability (vii) Technology Q. 7. Does the brand name influence the customer?
Name : ............................................................................................................—
(iv) National
(viii) Others
Q. 11. Which AC company do you feel is the most aggressive in giving discounts and scheme (please
specify)?
Q. 12. Do you agree that huge advertisement campaigns are the most responsible factors for the
changing market scenario and increasing demand?
(i) Agree very strongly (Hi) Agree (v) Disagree (vii) Don't know
Q. 13. Generally what short of problem do you face while doing a sale?
Specify : .......................................................................................................
398
(i) Videocon
(ii) Carrier
(iii) Amtrex
(iv) National
(v) LG
(vi) Samsung
(vii) Voltas
(viii) Others
Warranty
6 month-1 yr.
1-2 yrs.
2-3 yrs.
Q. 16. What is the market size of the area you are dealing in'
Q. 17. Average No. of units sold per month from your counter.
(viii) Others
(/■) Videocon
(ii) Carrier
(iii) Amtrex
(iv) National
(v) LG
(vi) Samsung
(vii) Voltas
(viii) Others —-
Q. 19. Please give the sales break-up for the month of April, May and June in last three years :
(i) Videocon
(ii) Carrier
(Hi) Amtrex
(iv) National
(v) LG
(vi) Samsung
(vii) Voltas
(viii) Others
Q. 20. To get a substantial growth in your present sale which of the following would you prefer?
Collection of Data
The above questionnaire with the help of investigators using sampling method will be filled in by the
dealers. The number and geographical areas depend upon our requirement, where we want to position
our product, namely, Delhi, Kolkata, Chennai and other capital cities of states.
We can also collect the information from government and industrial publications to know about the
growth of air-conditioner industry and future government policy in this respect.
Data collected through questionnaire will be classified and presented in the form of tables, graphs and
diagrams, viz., bar diagrams, pie-diagram etc. For rigorous analysis
400
Blustration: Table and dtagram (based on hypothetiea. data, are g.ven beiow ,
Table 1
Atvareness
Conclusion
(U
O<
ai o cr
UJ CL
Brand
Present Availability
Price
Technology
m LG
^ Others
Observation
3. Price : LG or Videocon
5. Technology : Videocon or LG
Thus, the Air-conditioner company will come to know about the brand, present availability, price, after
sales service, technology etc. Through this observation the company will be in a position to decide
regularity of supply the number of units to be produced and to improve after sales service as per
requirements of future consumers.
Analysis
Let us analyse the given data by applying different statistical tools (Mean, Standard deviation and
Coefficient of Variation) using the following formulae :
1. Mean :
2. Standard Deviation :
a=
nx-xf
100
=J"
Table 2
(Figures in percentages)
Brand 24 3 8 10 20 18 8 9
Present Availability 12 9 7 11 14 13 17 17
Price 25 10 6 5 12 28 9 5
Technology 22 10 7 8 12 17 14 10
IX 101 44 34 40 75 97 60 49
Observation
Considering brand, present availability, price, after sales service and technology, average percentage of
customers of Videocon air-conditioner is the highest as 20.2% and hence will prefer to buy Videocon air-
conditioner.
402
Videocm Carrier Af
X Si/^ R
Z(X -
XV 112.8 46.8
a 10.62 6.84
Thus, we get
+2 +3 -3 -2 0
9940
26 5.099 63.74
+5 -1 -3 +2 -3
- - —-- LG Vbit» t Othm 1
48 125.22 54 78.8
Name of Companies
Videocon
Carrier
Amtrex
National
Samsung
LG
Voltas
others
Mean
20.2
Standard devtatton
8.8 6.8 8 15 19.4 12 9.8
Coefftcient of Variation
52.57
77.72
24.56
63.74
46.2
57.68
61.25
90.61
Observations
ruJr!'^ of variation is the highest for other brands as 90.61%, hence the
Requirement
reoZd Ar Potential dealers/customers, they may be asked to prepare required tables, graphs and
diagrams etc.
4. Further, students should be asked to analyse and interpret the data collected by
5. They may also suggest the future course of action for the company.
403
ivity
Productivity is the ratio between input and output of an organisation. Productivity varies from company
to company. For example, X company manufactures a colour T.V. for Rs 10,000, while Y company
manufactures the similar T.V. for Rs 11,000. In this case X company is more productive than Y company
because X company's manufacturing cost of colour T.V. is less by Rs 1,000. Therefore, we say X company
is more productive than Y company. In addition to this productivity, we can also be able to calculate
productivity of different factors of production such as labour, capital etc. For example, the cost of labour
of X company to manufacture colour T.V. is Rs 3,000 and that of Y company is Rs 2,500. In this case the
labour productivity of Y company is better, although overall productivity of X company is better as
compared to company Y.
Productivity is determined with internal and external factors. Internal factors are technology,
organisation structure, managerial ability, ability of the firm to substitute different inputs, etc. External
factors include growth of agriculture and industrial production, price, growth of bank deposits and
credit, composition and growth of GDP, structure of foreign trade, savings and capital formation etc.
-4
Identification of Problem
We want to know productivity awareness amongst the enterprises of the following economic problems.
We can identify the problems like :
Collection of Data
Different ministries and departments of Central and State Governments publish regularly current
information alongwith statistical data on the number of subjects. This information is quite reliable for
related studies. We can collect data about identified problems from Newspapers/Economic Surveys/RBI
Bulletin/Government Budget of the State or the Nation/Census Reports/NSS Reports/Annual Survey of
Industries/ Labour Gazettes/Agriculture Statistics of India/Indian Trade Journals etc.
404
Illustration 1.
P, Statistics for Economics-XI
Table 1
Period
Weights
1995-96
1996-97
1997-98
1998-99
1999-00
2000-01 2001-02
2002-03
2003-04
2004-05
2004-05 (April-Dec.)
2005-06 (April-Dec.)
10.47
9.7 -1.9
0.4
economic survey : 2005-2006 (p 132) loofotconr' ^-'^^very that commenced from the second
quarter of
mwsmmsm
^ ... ----------iiivcstmen
H) normal business and investment cycles, (//■) lack of domestic and external demand'
Illustration 2.
Table 2
Notes: 1. The ratios to GDP for 2005-06 (BE) are based on CSO's Advance Estimates GDP at current
market prices prior to 1999-2000 based on 1993-94 series and from 1999-2000 based on new 1999-2000
series. 2. The fiscal deficit excludes the transfer of States' share in the small savings collections.
Anlaysis : The Fiscal Responsibility and Budget Management Act (FRBMA), 2003 continued to provide a
strong institutional mechanism for making sustained progress at
406
demand on «t^tt^lr^^^^^ a proportion of GDP, declined from 6.6 per cen, t itsTsi o " cSrS^^^^^
Budeet for 7005 n^ u.^ ' 7 ,^^ P^'" cent, respectively. The
..dj, 5S.S
Requirement
1t