Foreword iii

Chapter 1 : Introduction 1

Chapter 2 : Collection of Data 9

Chapter 3 : Organisation of Data 22

Chapter 4 : Presentation of Data 40

Chapter 5 : Measures of Central Tendency 58

Chapter 6 : Measures of Dispersion 74

Chapter 7 : Correlation 91

Chapter 8 : Index Numbers 107

Chapter 9 : Use of Statistical Tools 121





told this subject is mainly around

Studying this chapter should
enable you to: what Alfred Marshall (one of the
know what the subject of founders of modern economics) called
economics is about; the study of man in the ordinary
understand how economics is business of life. Let us understand
linked with the study of economic
activities in consumption, what that means.
production and distribution; When you buy goods (you may
understand why knowledge of want to satisfy your own personal
statistics can help in describing
needs or those of your family or those
consumption, production and
distribution; of any other person to whom you want
learn about some uses of to make a gift) you are called
statistics in the understanding of a consumer.
economic activities.
When you sell goods to make
a profit for yourself (you may be
1. W H Y ECONOMICS? a shopkeeper), you are called a seller.
You have, perhaps, already had When you produce goods (you may
Economics as a subject for your earlier be a farmer or a manufacturer), you
classes at school. You might have been are called a producer.

When you are in a job, working for In real life we cannot be as lucky
some other person, and you get paid as Aladdin. Though, like him we have
for it (you may be employed by unlimited wants, we do not have a
somebody who pays you wages or a magic lamp. Take, for example, the
salary), you are called a service- pocket money that you get to spend.
holder. If you had more of it then you could
When you provide some kind of have purchased almost all the things
service to others for a payment (you you wanted. But since your pocket
may be a lawyer or a doctor or a money is limited, you have to choose
banker or a taxi driver or a transporter only those things that you want the
of goods), you are called a service- most. This is a basic teaching of
provider. Economics.
In all these cases you will be called
gainfully employed in an economic Activities
activity. Economic activities are ones Can you think for yourself of
that are undertaken for a monetary some other examples where a
gain. This is what economists mean person with a given income has
by ordinary business of life. to choose which things and in
what quantities he or she can
Activities buy at the prices that are being
List different activities of the charged (called the current
members of your family. Would prices)?
you call them economic What will happen if the current
activities? Give reasons. prices go up?
Do you consider yourself a Scarcity is the root of all economic
consumer? Why? problems. Had there been no scarcity,
there would have been no economic
We cannot get something for problem. And you would not have
nothing studied Economics either. In our daily
If you ever heard the story of Aladdin life, we face various forms of scarcity.
and his Magic Lamp, you would agree The long queues at railway booking
that Aladdin was a lucky guy. counters, crowded buses and trains,
Whenever and whatever he wanted, he shortage of essential commodities, the
just had to rub his magic lamp on rush to get a ticket to watch a new
when a genie appeared to fulfill his film, etc., are all manifestations of
wish. When he wanted a palace to live scarcity. We face scarcity because the
in, the genie instantly made one for things that satisfy our wants are
him. When he wanted expensive gifts limited in availability. Can you think
to bring to the king when asking for of some more instances of scarcity?
his daughters hand, he got them at The resources which the producers
the bat of an eyelid. have are limited and also have

alternative uses. Take the case of food activities of various kinds. For this,
that you eat every day. It satisfies your you need to know reliable facts about
want of nourishment. Farmers all the diverse economic activities like
employed in agriculture raise crops production, consumption and
that produce your food. At any point distribution. Economics is often
of time, the resources in agriculture discussed in three parts: consum-
like land, labour, water, fertiliser, etc., ption, production and distribution.
are given. All these resources have We want to know how the
alternative uses. The same resources consumer decides, given his income
can be used in the production of non- and many alternative goods to choose
food crops such as rubber, cotton, jute from, what to buy when he knows the
etc. Thus alternative uses of resources prices. This is the study of Consum-
give rise to the problem of choice ption.
between different commodities that
We also want to know how the
can be produced by those resources.
producer, similarly, chooses what to
produce for the market when he
knows the costs and prices. This is the
Identify your wants. How many study of Production.
of them can you fulfill? How
many of them are unfulfilled?
Finally, we want to know how the
Why you are unable to fulfill national income or the total income
them? arising from what has been produced
What are the different kinds of in the country (called the Gross
scarcity that you face in your Domestic Product or GDP) is
daily life? Identify their causes. distributed through wages (and
salaries), profits and interest (We will
Consumption, Production and leave aside here income from
Distribution international trade and investment).
If you thought about it, you might This is study of Distribution.
have realised that Economics involves Besides these three conventional
the study of man engaged in economic divisions of the study of Economics
about which we want to know all the
facts, modern economics has to
include some of the basic problems
facing the country for special studies.
For example, you might want to
know why or to what extent some
households in our society have the
capacity to earn much more than
others. You may want to know how
many people in the country are really

poor, how many are middle-class, how of numbers relating to selected facts
many are relatively rich and so on. You in a systematic form) to be added to
may want to know how many are all modern courses of modern
illiterate, who will not get jobs, economics.
requiring education, how many are Would you now agree with the
highly educated and will have the best following definition of economics that
job opportunities and so on. In other many economists use?
words, you may want to know more Economics is the study of how
facts in terms of numbers that would people and society choose to
answer questions about poverty and employ scarce resources that could
disparity in society. If you do not like have alternative uses in order to
the continuance of poverty and gross produce various commodities that
disparity and want to do something satisfy their wants and to
about the ills of society you will need distribute them for consumption
to know the facts about all these among various persons and groups
things before you can ask for in society.
appropriate actions by the
government. If you know the facts it Activity
may also be possible to plan your own Would you say, in the light of the
life better. Similarly, you hear of discussion above, that this
some of you may even have definition used to be given seems
experienced disasters like Tsunami, a little inadequate now? What
earthquakes, the bird flu dangers does it miss out?
threatening our country and so on
that affect mans ordinary business 2. STATISTICS IN ECONOMICS
of life enormously. Economists can
look at these things provided they In the previous section you were told
know how to collect and put together about certain special studies that
the facts about what these disasters concern the basic problems facing a
cost systematically and correctly. You country. These studies required that
may perhaps think about it and ask we know more about economic facts
yourselves whether it is right that in terms of numbers. Such economic
modern economics now includes facts are also known as data.
learning the basic skills involved in The purpose of collecting data
making useful studies for measuring about these economic problems is to
poverty, how incomes are distributed, understand and explain these
how earning opportunities are related problems in terms of the various
to your education, how environmental causes behind them. In other words,
disasters affect our lives and so on? we try to analyse them. For example,
Obviously, if you think along these when we analyse the hardships of
lines, you will also appreciate why we poverty, we try to explain it in terms
needed Statistics (which is the study of the various factors such as

unemployment, low productivity of By data or statistics, we mean both

people, backward technology, etc. quantitative and qualitative facts that
But, what purpose does the are used in Economics. For example,
analysis of poverty serve unless we are a statement in Economics like the
able to find ways to mitigate it. We production of rice in India has
may, therefore, also try to find those increased from 39.58 million tonnes
measures that help solve an economic in 197475 to 58.64 million tonnes in
problem. In Economics, such 198485, is a quantitative fact. The
measures are known as policies. numerical figures such as 39.58
So, do you realise, then, that no million tonnes and 58.64 million
analysis of a problem would be tonnes are statistics of the
possible without the availability of production of rice in India for
data on various factors underlying an 197475 and 198485 respectively.
economic problem? And, that, in such In addition to the quantitative
a situation, no policies can be data, Economics also uses qualitative
formulated to solve it. If yes, then you data. The chief characteristic of such
have, to a large extent, understood the
information is that they describe
basic relationship between Economics
attributes of a single person or a group
and Statistics.
of persons that is important to record
3. WHAT IS STATISTICS? as accurately as possible even though
they cannot be measured in
At this stage you are probably ready quantitative terms. Take, for example,
to know more about Statistics. You gender that distinguishes a person
might very well want to know what the as man/woman or boy/girl. It is often
subject Statistics is all about. What
possible (and useful) to state the
are its specific uses in Economics?
information about an attribute of a
Does it have any other meaning? Let
person in terms of degrees (like better/
us see how we can answer these
questions to get closer to the subject. worse; sick/ healthy/ more healthy;
In our daily language the word unskilled/ skilled/ highly skilled etc.).
Statistics is used in two distinct Such qualitative information or
senses: singular and plural. In the statistics is often used in Economics
plural sense, statistics means and other social sciences and
numerical facts systematically collected and stored systematically
collected as described by Oxford like quantitative information (on
Dictionary. Thus, the simple meaning prices, incomes, taxes paid etc.),
of statistics in plural sense is data. whether for a single person or a group
Do you know that the term statistics
of persons.
in singular means the science of
You will study in the subsequent
collecting, classifying and using chapters that statistics involves
statistics or a statistical fact. collection and organisation of data. The
next step is to present the data in

tabular, diagrammatic and graphic a statistical data. Whereas, saying

forms. The data, then, is summarised hundreds of people died, is not.
by calculating various numerical Statistics also helps in condensing
indices such as mean, variance, the mass of data into a few numerical
standard deviation etc. that represent measures (such as mean, variance
the broad characteristics of the etc., about which you will learn later).
collected set of information. These numerical measures help
summarise data. For example, it
Activities would be impossible for you to
Think of two examples of remember the incomes of all the
qualitative and quantitative data. people in a data if the number of
Which of the following would give people is very large. Yet, one can
you qualitative data; beauty, remember easily a summary figure like
intelligence, income earned, the average income that is obtained
marks in a subject, ability to statistically. In this way, Statistics
sing, learning skills? summarises and presents a
meaningful overall information about
4. WHAT STATISTICS DOES? a mass of data.
Quite often, Statistics is used in
By now, you know that Statistics is
an indispensable tool for an economist finding relationships between different
that helps him to understand an economic factors. An economist may
economic problem. Using its various be interested in finding out what
methods, effort is made to find the happens to the demand for a
causes behind it with the help of the commodity when its price increases
qualitative and the quantitative facts or decreases? Or, would the supply of
of the economic problem. Once the a commodity be affected by the
causes of the problem are identified, changes in its own price? Or, would
it is easier to formulate certain policies the consumption expenditure increase
to tackle it. when the average income increases?
But there is more to Statistics. It Or, what happens to the general price
enables an economist to present level when the government
economic facts in a precise and expenditure increases? Such ques-
definite form that helps in proper tions can only be answered if any
comprehension of what is stated. relationship exists between the
When economic facts are expressed in various economic factors that have
statistical terms, they become exact. been stated above. Whether such
Exact facts are more convincing than relationships exist or not can be easily
vague statements. For instance, verified by applying statistical
saying that with precise figures, 310 methods to their data. In some cases
people died in the recent earthquake the economist might assume certain
in Kashmir, is more factual and, thus, relationships between them and like

to test whether the assumption she/ consumption of past years or of recent

he made about the relationship is valid years obtained by surveys. Thus,
or not. The economist can do this only statistical methods help formulate
by using statistical techniques. appropriate economic policies that
In another instance, the economist solve economic problems.
might be interested in predicting the
changes in one economic factor due 5. CONCLUSION
to the changes in another factor. For
example, she/he might be interested Today, we increasingly use Statistics
in knowing the impact of todays to analyse serious economic problems
investment on the national income in such as rising prices, growing
future. Such an exercise cannot be population, unemployment, poverty
undertaken without the knowledge of etc., to find measures that can solve
Statistics. such problems. Further it also helps
Sometimes, formulation of plans evaluate the impact of such policies
and policies requires the knowledge in solving the economic problems. For
of future trends. For example, an example, it can be ascertained easily
Statistical methods are no substitute for common sense!
There is an interesting story which is told to make fun of statistics. It is said
that a family of four persons (husband, wife and two children) once set out
to cross a river. The father knew the average depth of the river. So he
calculated the average height of his family members. Since the average height
of his family members was greater than the average depth of the river, he
thought they could cross safely. Consequently some members of the family
(children) drowned while crossing the river.
Does the fault lie with the statistical method of calculating averages or
with the misuse of the averages?

economic planner has to decide in using statistical techniques whether

2005 how much the economy should the policy of family planning is
produce in 2010. In other words, one effective in checking the problem of
must know what could be the ever-growing population.
expected level of consumption in 2010 In economic policies, Statistics
in order to decide the production plan plays a vital role in decision making.
of the economy for 2010. In this For example, in the present time of
situation, one might make subjective rising global oil prices, it might be
judgement based on the guess about necessary to decide how much oil
consumption in 2010. Alternatively, India should import in 2010. The
one might use statistical tools to decision to import would depend on
predict consumption in 2010. That the expected domestic production of
could be based on the data of oil and the likely demand for oil in

2010. Without the use of Statistics, it cannot be made unless we know the
cannot be determined what the actual requirement of oil. This vital
expected domestic production of oil information that help make the
and the likely demand for oil would decision to import oil can only be
be. Thus, the decision to import oil obtained statistically.

Our wants are unlimited but the resources used in the production
of goods that satisfy our wants are limited and scarce. Scarcity is
the root of all economic problems.
Resources have alternative uses.
Purchase of goods by consumers to satisfy their various needs is
Manufacture of goods by producers for the market is Production.
Division of the national income into wages, profits, rents and interests
is Distribution.
Statistics finds economic relationships using data and verifies them.
Statistical tools are used in prediction of future trends.
Statistical methods help analyse economic problems and
formulate policies to solve them.


1. Mark the following statements as true or false.

(i) Statistics can only deal with quantitative data.
(ii) Statistics solves economic problems.
(iii) Statistics is of no use to Economics without data.
2. Make a list of activities that constitute the ordinary business of life. Are
these economic activities?
3. The Government and policy makers use statistical data to formulate
suitable policies of economic development. Illustrate with two examples.
4. You have unlimited wants and limited resources to satisfy them. Explain
by giving two examples.
5. How will you choose the wants to be satisfied?
6. What are your reasons for studying Economics?
7. Statistical methods are no substitute for common sense. Comment.

Collection of Data

chapter, you will study the sources of

Studying this chapter should enable
data and the mode of data collection.
you to:
understand the meaning and The purpose of collection of data is to
purpose of data collection; collect evidence for reaching a sound
distinguish between primary and and clear solution to a problem.
secondary sources; In economics, you often come
know the mode of collection of data; across a statement like,
distinguish between Census and After many fluctuations the output
Sample Surveys;
of food grains rose to 176 million tonnes
be familiar with the techniques of
sampling; in 199091 and 199 million tonnes in
know about some important 199697, but fell to 194 million tonnes
sources of secondary data. in 199798. Production of food grains
then rose continuously and touched
212 million tonnes in 200102.
1. I N T R O D U C T I O N
In this statement, you can observe
In the previous chapter, you have read that the food grains production in
about what is economics. You also different years does not remain the
studied about the role and importance same. It varies from year to year and
of statistics in economics. In this from crop to crop. As these values

vary, they are called variable. The 2. WHAT ARE THE SOURCES OF DATA?
variables are generally represented by
Statistical data can be obtained from
the letters X, Y or Z. The values of
two sources. The enumerator (person
these variables are the observation.
who collects the data) may collect the
For example, suppose the food grain
data by conducting an enquiry or an
production in India varies between
investigation. Such data are called
100 million tonnes in 197071 to 220
Primary Data, as they are based on
million tonnes in 200102 as shown
first hand information. Suppose, you
in the following table. The years are
want to know about the popularity of
represented by variable X and the
a film star among school students. For
production of food grain in India (in
this, you will have to enquire from a
million tonnes) is represented by
large number of school students, by
variable Y:
asking questions from them to collect
TABLE 2.1 the desired information. The data you
Production of Food Grain in India get, is an example of primary data.
(Million Tonnes) If the data have been collected and
X Y processed (scrutinised and tabulated)
197071 108 by some other agency, they are called
197879 132 Secondary Data. Generally, the
197980 108 published data are secondary data.
199091 176 They can be obtained either from
199697 199 published sources or from any other
199798 194 source, for example, a web site. Thus,
200102 212 the data are primary to the source that
collects and processes them for the
Here, these values of the variables first time and secondary for all sources
X and Y are the data, from which we that later use such data. Use of
can obtain information about the secondary data saves time and cost.
trend of the production of food grains
For example, after collecting the data
in India. To know the fluctuations in
on the popularity of the film star
the output of food grains, we need the
among students, you publish a report.
data on the production of food grains
If somebody uses the data collected
in India. Data is a tool, which helps
by you for a similar study, it becomes
in understanding problems by
secondary data.
providing information.
You must be wondering where do
data come from and how do we collect
these? In the following sections we will Do you know how a manufacturer
discuss the types of data, method and decides about a product or how a
instruments of data collection and political party decides about a
sources of obtaining data. candidate? They conduct a survey by

asking questions about a particular Good Q

product or candidate from a large (i) Is the electricity supply in your
group of people. The purpose of locality regular?
surveys is to describe some (ii) Is increase in electricity charges
characteristics like price, quality, justified?
usefulness (in case of the product) and The questions should be precise
popularity, honesty, loyalty (in case and clear. For example,
of the candidate). The purpose of the Poor Q
survey is to collect data. Survey is a What percentage of your income do
method of gathering information from you spend on clothing in order to look
individuals. presentable?
Preparation of Instrument Good Q
What percentage of your income do
The most common type of instrument you spend on clothing?
used in surveys is questionnaire/
interview schedule. The questionnaire The questions should not be
is either self administered by the ambiguous, to enable the respon-
respondent or administered by the dents to answer quickly, correctly
researcher (enumerator) or trained and clearly. For example:
investigator. While preparing the Poor Q
questionnaire/interview schedule, you Do you spend a lot of money on books
should keep in mind the following in a month?
points; Good Q
How much do you spend on books in
The questionnaire should not be too a month?
long. The number of questions (i) Less than Rs 200
should be as minimum as possible. (ii) Between Rs 200300
Long questionnaires discourage (iii) Between Rs 300400
people from completing them. (iv) More than Rs 400
The series of questions should move The question should not use double
from general to specific. The negatives. The questions starting
questionnaire should start from with Wouldnt you or Dont you
general questions and proceed to should be avoided, as they may
more specific ones. This helps the lead to biased responses. For
respondents feel comfortable. For example:
example: Poor Q
Poor Q Dont you think smoking should be
(i) Is increase in electricity charges prohibited?
justified? Good Q
(ii) Is the electricity supply in your Do you think smoking should be
locality regular? prohibited?

The question should not be a because all the respondents respond

leading question, which gives a clue from the given options. But they are
about how the respondent should difficult to write as the alternatives
answer. For example: should be clearly written to represent
Poor Q both sides of the issue. There is also
How do you like the flavour of this a possibility that the individuals true
high-quality tea? response is not present among the
Good Q options given. For this, the choice of
How do you like the flavour of this tea? Any Other is provided, where the
respondent can write a response,
The question should not indicate
which was not anticipated by the
alternatives to the answer. For
researcher. Moreover, another
Poor Q limitation of multiple-choice questions
Would you like to do a job after college is that they tend to restrict the
or be a housewife? answers by providing alternatives,
Good Q without which the respondents may
Would you like to do a job, if possible? have answered differently.
The questionnaire may consist of Open-ended questions allow for
closed ended (or structured) questions more individualised responses, but
or open ended (or unstructured) they are difficult to interpret and hard
questions. to score, since there are a lot of
Closed ended or structured variations in the responses. Example,
questions can either be a two-way Q. What is your view about
question or a multiple choice question. globalisation?
When there are only two possible
answers, yes or no, it is called a two- Mode of Data Collection
way question.
Have you ever come across a television
When there is a possibility of more
than two options of answers, multiple show in which reporters ask questions
choice questions are more appropriate. from children, housewives or general
Example, public regarding their examination
Q. Why did you sell your land? performance or a brand of soap or a
(i) To pay off the debts. political party? The purpose of asking
(ii) To finance childrens educa- questions is to do a survey for
tion. collection of data. There are three
(iii) To invest in another property. basic ways of collecting data: (i)
(iv) Any other (please specify). Personal Interviews, (ii) Mailing
Closed -ended questions are easy (questionnaire) Surveys, and (iii)
to use, score and code for analysis, Telephone Interviews.

Personal Interviews less expensive. It allows the researcher

to have access to people in remote
This method is used
areas too, who might be difficult to
when the researcher
reach in person or by telephone. It
has access to all the does not allow influencing of the
members. The resea- respondents by the interviewer. It also
rcher (or investigator) permits the respondents to take
conducts face to face interviews with sufficient time to give thoughtful
the respondents. answers to the questions. These days
Personal interviews are preferred online surveys or surveys through
due to various reasons. Personal short messaging service i.e. SMS have
contact is made between the become popular. Do you know how an
respondent and the interviewer. The online survey is conducted?
interviewer has the opportunity of The disadvantages of mail survey
explaining the study and answering are that, there is less opportunity to
any query of the respondents. The provide assistance in clarifying
interviewer can request the respon- instructions, so there is a possibility
dent to expand on answers that are of misinterpretation of questions.
particularly important. Misinterpre- Mailing is also likely to produce low
response rates due to certain factors
tation and misunderstanding can be
such as returning the questionnaire
avoided. Watching the reactions of the
without completing it, not returning
respondents can provide supplemen-
the questionnaire at all, loss of
tary information. questionnaire in the mail itself, etc.
Personal interview has some
demerits too. It is expensive, as it Telephone Interviews
requires trained interviewers. It takes
In a telephone interview, the
longer time to complete the survey.
investigator asks questions over the
Presence of the researcher may inhibit
telephone. The advan-
respondents from saying what they
tages of telephone
really think.
interviews are that they
are cheaper than
Mailing Questionnaire
personal interviews and
When the data in a survey are can be conducted in a shorter time.
collected by mail, the questionnaire is They allow the researcher to assist the
sent to each individual respondent by clarifying the
by mail with a request questions. Telephone interview is
to complete and return better in the cases where the
it by a given date. The respondents are reluctant to answer
advantages of this certain questions in personal
method are that, it is interviews.

Activities small group which is known as Pilot

Survey or Pre-Testing of the
You have to collect information questionnaire. The pilot survey helps
from a person, who lives in a
in providing a preliminary idea about
remote village of India. Which
the survey. It helps in pre-testing of
mode of data collection will be
the most appropriate for the questionnaire, so as to know the
collecting information from him? shortcomings and drawbacks of the
You have to interview the parents questions. Pilot survey also helps in
about the quality of teaching in assessing the suitability of questions,
a school. If the principal of the clarity of instructions, performance of
school is present there, what enumerators and the cost and time
types of problems can arise? involved in the actual survey.
The disadvantage of this method
is access to people, as many people 4. CENSUS AND SAMPLE SURVEYS
may not own telephones. Telephone Census or Complete Enumeration
Interviews also obstruct visual
A survey, which includes every
reactions of the respondents, which
element of the population, is known
becomes helpful in obtaining
as Census or the Method of Complete
information on sensitive issues.
Enumeration. If certain agencies are
interested in studying the total
Pilot Survey
population in India, they have to
Once the questionnaire is ready, it is obtain information from all the
advisable to conduct a try-out with a households in rural and urban India.

Advantages Disadvantages
Highest Response Rate Most expensive
Allows use of all types of questions Possibility of influencing
Better for using open-ended respondents
questions More time taking.
Allows clarification of ambiguous

Least expensive Cannot be used by illiterates

Only method to reach remote areas Long response time
No influence on respondents Does not allow explanation of
Maintainsanonymity of respondents unambiguous questions
Best for sensitive questions. Reactions cannot be watched.

Relatively low cost Limited use

Relatively less influence on Reactions cannot be watched
respondents Possibility of influencing respon-
Relatively high response rate. dents.

The essential feature of this method

is that this covers every individual unit
in the entire population. You cannot
select some and leave out others. You
may be familiar with the Census of
India, which is carried out every ten
years. A house-to-house enquiry is
carried out, covering all households
in India. Demographic data on birth
and death rates, literacy, workforce,
life expectancy, size and composition
of population, etc. are collected and
Source: Census of India, 2001.
published by the Registrar General of
India. The last Census of India was 1981 indicated that the rate of
held in February 2001. population growth during 1960s and
1970s remained almost same. 1991
Census indicated that the annual
growth rate of population during
1980s was 2.14 per cent, which came
down to 1.93 per cent during 1990s
according to Census 2001.
At 00.00 hours of first March,
2001 the population of India stood
at 1027,015,247 comprising of
531,277,078 males and
495,738,169 females. Thus, India
becomes the second country in the
world after China to cross the one
billion mark.

Source: Census of India, 2001.

Sample Survey
Population or the Universe in statistics
means totality of the items under
According to the Census 2001, study. Thus, the Population or the
population of India is 102.70 crore. It Universe is a group to which the
was 23.83 crore according to Census results of the study are intended to
1901. In a period of hundred years, apply. A population is always all the
the population of our country individuals/items who possess certain
increased by 78.87 crore. Census characteristics (or a set of characteris-

tics), according to the purpose of the Sample: Ten per cent of the
survey. The first task in selecting a agricultural labourers in Chura-
sample is to identify the population. chandpur district.
Once the population is identified, the Most of the surveys are sample
researcher selects a Representative surveys. These are preferred in
Sample, as it is difficult to study the statistics because of a number of
entire population. A sample refers to reasons. A sample can provide
a group or section of the population reasonably reliable and accurate
from which information is to be information at a lower cost and
obtained. A good sample (represen- shorter time. As samples are smaller
tative sample) is generally smaller than than population, more detailed
the population and is capable of information can be collected by
providing reasonably accurate conducting intensive enquiries. As we
information about the population at need a smaller team of enumerators,
a much lower cost and shorter time. it is easier to train them and supervise
Suppose you want to study the their work more effectively.
average income of people in a certain Now the question is how do you
region. According to the Census do the sampling? There are two main
method, you would be required to find types of sampling, random and non-
out the income of every individual in random. The following description will
the region, add them up and divide make their distinction clear.
by number of individuals to get the
average income of people in the region. Activities
This method would require huge In which years will the next
expenditure, as a large number of Census be held in India and
enumerators have to be employed. China?
Alternatively, you select a represent- If you have to study the opinion
ative sample, of a few individuals, from of students about the new
the region and find out their income. economics textbook of class XI,
what will be your population and
The average income of the selected
group of individuals is used as an
If a researcher wants to estimate
estimate of average income of the the average yield of wheat in
individuals of the entire region. Punjab, what will be her/his
population and sample?
Research problem: To study the Random Sampling
economic condition of agricultural As the name suggests, random
labourers in Churachandpur district sampling is one where the individual
of Manipur. units from the population (samples)
Population: All agricultural are selected at random. The
labourers in Churachandpur district. government wants to determine the

tables have been generated to

guarantee equal probability of
selection of every individual unit (by
their listed serial number in the
sampling frame) in the population.
They are available either in a
A Population of 20
published form or can be generated
Kuchha and 20 by using appropriate software
Pucca Houses
packages (See Appendix B).You can
start using the table from anywhere,
i.e., from any page, column, row or
A Representative A non Representative point. In the above example, you need
Sample Sample
to select a sample of 30 households
impact of the rise in petrol price on
out of 300 total households. Here, the
the household budget of a particular
largest serial number is 300, a three
locality. For this, a representative
digit number and therefore we consult
(random) sample of 30 households has
three digit random numbers in
to be taken and studied. The names
sequence. We will skip the random
of all the 300 households of that area
numbers greater than 300 since there
are written on pieces of paper and
is no household number greater than
mixed well, then 30 names to be
300. Thus, the 30 selected households
interviewed are selected one by one.
are with serial numbers: 149, 219,
In the random sampling, every
111, 165, 230, 007, 089, 212, 051,
individual has an equal chance of being
244, 300, 051, 244, 155, 300, 051,
selected and the individuals who are
152, 156, 205, 070, 015, 157, 040,
selected are just like the ones who are
243, 479, 116, 122, 081, 160, 162.
not selected. In the above example, all
the 300 sampling units (also called
sampling frame) of the population got
Exit Polls
an equal chance of being included in
the sample of 30 units and hence the You must have seen that when an
sample, such drawn, is a random election takes place, the television
sample. This is also called lottery networks provide election coverage.
method. The same could be done using They also try to predict the results.
a Random Number Table also. This is done through exit polls,
wherein a random sample of voters
How to use the Random Number who exit the polling booths are asked
Tables? whom they voted for. From the data
of the sample of voters, the
Do you know what are the Random
prediction is made.
Number Tables? Random number

Activity characteristic of the population (that

You have to analyse the trend of may be the average income, etc.). It is
foodgrains production in India the error that occurs when you make
for the last fifty years. As it is an observation from the sample taken
difficult to include all the years, from the population. Thus, the
you have to select a sample of difference between the actual value of
production of ten years. Using a parameter of the population (which
the Random Number Tables, is not known) and its estimate (from
how will you select your sample?
the sample) is the sampling error. It is
possible to reduce the magnitude of
Non-Random Sampling
sampling error by taking a larger
There may be a situation that you sample.
have to select 10 out of 100
households in a locality. You have to
decide which household to select and Consider a case of incomes of 5
which to reject. You may select the farmers of Manipur. The variable x
households conveniently situated or (income of farmers) has measure-
the households known to you or your ments 500, 550, 600, 650, 700. We
friend. In this case, you are using your note that the population average of
judgement (bias) in selecting 10 (500+550+600+650+700)
households. This way of selecting 10 5 = 3000 5 = 600.
out of 100 households is not a random Now, suppose we select a sample
selection. In a non-random sampling of two individuals where x has
method all the units of the population measurements of 500 and 600. The
do not have an equal chance of being sample average is (500 + 600) 2
selected and convenience or judgement = 1100 2 = 550.
of the investigator plays an important Here, the sampling error of the
role in selection of the sample. They are estimate = 600 (true value) 550
mainly selected on the basis of (estimate) = 50.
judgment, purpose, convenience or
quota and are non-random samples. Non-Sampling Errors
Non-sampling errors are more serious
5. SAMPLING AND NON-S AMPLING than sampling errors because a
ERRORS sampling error can be minimised by
Sampling Errors taking a larger sample. It is difficult
The purpose of the sample is to take to minimise non-sampling error, even
an estimate of the population. by taking a large sample. Even a
Sampling error refers to the Census can contain non-sampling
differences between the sample errors. Some of the non-sampling
estimate and the actual value of a errors are:

Errors in Data Acquisition process and tabulate the statistical

This type of error arises from recording data. Some of the major agencies at
of incorrect responses. Suppose, the the national level are Census of India,
teacher asks the students to measure National Sample Survey Organisation
the length of the teachers table in the (NSSO), Central Statistical Organisa-
classroom. The measurement by the tion (CSO), Registrar General of India
students may differ. The differences (RGI), Directorate General of
may occur due to differences in Commercial Intelligence and Statistics
measuring tape, carelessness of the (DGCIS), Labour Bureau etc.
students etc. Similarly, suppose we The Census of India provides the
want to collect data on prices of most complete and continuous
oranges. We know that prices vary demographic record of population. The
from shop to shop and from market Census is being regularly conducted
to market. Prices also vary according every ten years since 1881. The first
to the quality. Therefore, we can only Census after Independence was held
consider the average prices. Recording in 1951. The Census collects
mistakes can also take place as the information on various aspects of
enumerators or the respondents may population such as the size, density,
commit errors in recording or trans- sex ratio, literacy, migration, rural-
scripting the data, for example, he/ urban distribution etc. Census in
she may record 13 instead of 31. India is not merely a statistical
operation, the data is interpreted and
Non-Response Errors analysed in an interesting manner.
The NSSO was established by the
Non-response occurs if an interviewer government of India to conduct
is unable to contact a person listed in nation-wide surveys on socio-
the sample or a person from the economic issues. The NSSO does
sample refuses to respond. In this continuous surveys in successive
case, the sample observation may not rounds. The data collected by NSSO
be representative. surveys, on different socio economic
subjects, are released through reports
Sampling Bias
and its quarterly journal
Sampling bias occurs when the Sarvekshana. NSSO provides periodic
sampling plan is such that some estimates of literacy, school
members of the target population enrolment, utilisation of educational
could not possibly be included in the services, employment, unemployment,
sample. manufacturing and service sector
enterprises, morbidity, maternity,
6. CENSUS OF INDIA AND NSSO child care, utilisation of the public
There are some agencies both at the distribution system etc. The NSS 59th
national and state level, which collect, round survey (JanuaryDecember

2003) was on land and livestock of data collection is to understand,

holdings, debt and investment. The explain and analyse a problem and
NSS 60th round survey (January causes behind it. Primary data is
June 2004) was on morbidity and obtained by conducting a survey.
health care. The NSSO also
Survey includes various steps, which
undertakes the fieldwork of Annual
need to be planned carefully. There are
survey of industries, conducts crop
estimation surveys, collects rural and various agencies which collect,
urban retail prices for compilation of process, tabulate and publish
consumer price index numbers. statistical data. These can be used as
secondary data. However, the choice
7. CONCLUSION of source of data and mode of data
Economic facts, expressed in terms of collection depends on the objective of
numbers, are called data. The purpose the study.

Data is a tool which helps in reaching a sound conclusion on any
problem by providing information.
Primary data is based on first hand information.
Survey can be done by personal interviews, mailing questionnaires
and telephone interviews.
Census covers every individual/unit belonging to the population.
Sample is a smaller group selected from the population from which
the relevant information would be sought.
In a random sampling, every individual is given an equal chance of
being selected for providing information.
Sampling error arises due to the difference between the actual
population and the estimate.
Non-sampling errors can arise in data acquisition, by non-response
or by bias in selection.
Census of India and National Sample Survey Organisation
are two important agencies at the national level, which collect,
process and tabulate data.


1. Frame at least four appropriate multiple-choice options for following

(i) Which of the following is the most important when you buy a new

(ii) How often do you use computers?

(iii) Which of the newspapers do you read regularly?
(iv) Rise in the price of petrol is justified.
(v) What is the monthly income of your family?
2. Frame five two-way questions (with Yes or No).
3. (i) There are many sources of data (true/false).
(ii) Telephone survey is the most suitable method of collecting data, when
the population is literate and spread over a large area (true/false).
(iii) Data collected by investigator is called the secondary data (true/false).
(iv) There is a certain bias involved in the non-random selection of samples
(v) Non-sampling errors can be minimised by taking large samples (true/
4. What do you think about the following questions. Do you find any problem
with these questions? If yes, how?
(i) How far do you live from the closest market?
(ii) If plastic bags are only 5 percent of our garbage, should it be banned?
(iii) Wouldnt you be opposed to increase in price of petrol?
(iv) (a) Do you agree with the use of chemical fertilizers?
(b) Do you use fertilizers in your fields?
(c) What is the yield per hectare in your field?
5. You want to research on the popularity of Vegetable Atta Noodles among
children. Design a suitable questionnaire for collecting this information.
6. In a village of 200 farms, a study was conducted to find the cropping
pattern. Out of the 50 farms surveyed, 50% grew only wheat. Identify the
population and the sample here.
7. Give two examples each of sample, population and variable.
8. Which of the following methods give better results and why?
(a) Census (b) Sample
9. Which of the following errors is more serious and why?
(a) Sampling error (b) Non-Sampling error
10. Suppose there are 10 students in your class. You want to select three out
of them. How many samples are possible?
11. Discuss how you would use the lottery method to select 3 students out of
10 in your class?
12. Does the lottery method always give you a random sample? Explain.
13. Explain the procedure of selecting a random sample of 3 students out of
10 in your class, by using random number tables.
14. Do samples provide better results than surveys? Give reasons for your

Organisation of Data

between census and sampling. In this

Studying this chapter should enable chapter, you will know how the data,
you to: that you collected, are to be classified.
classify the data for further
The purpose of classifying raw data is
statistical analysis;
distinguish between quantitative to bring order in them so that they
and qualitative classification; can be subjected to further statistical
prepare a frequency distribution analysis easily.
table; Have you ever observed your local
know the technique of forming junk dealer or kabadiwallah to whom
classes; you sell old newspapers, broken
be familiar with the method of tally household items, empty glass bottles,
marking; plastics etc. He purchases these
differentiate between univariate
things from you and sells them to
and bivariate frequency distribu-
those who recycle them. But with so
much junk in his shop it would be very
difficult for him to manage his trade,
1. I N T R O D U C T I O N if he had not organised them properly.
In the previous chapter you have To ease his situation he suitably
learnt about how data is collected. You groups or classifies various junk.
also came to know the difference He puts old newspapers together and

ties them with a rope. Then collects manner. The kabadiwallah groups his
all empty glass bottles in a sack. He junk in such a way that each group
heaps the articles of metals in one consists of similar items. For example,
corner of his shop and sorts them into under the group Glass he would put
groups like iron, copper, empty bottles, broken mirrors and
aluminium, brass etc., and so on. windowpanes etc. Similarly when you
In this way he groups his junk into classify your history books under the
different classes newspapers, group History you would not put a
plastics, glass, metals etc. and book of a different subject in that
brings order in them. Once his junk group. Otherwise the entire purpose
is arranged and classified, it becomes of grouping would be lost.
easier for him to find a particular item Classification, therefore, is arranging
that a buyer may demand. or organising similar things into groups
Likewise when you arrange your or classes.
schoolbooks in a certain order, it
becomes easier for you to handle Activity
them. You may classify them Visit your local post-office to find
out how letters are sorted. Do
you know what the pin-code in a
letter indicates? Ask your

Like the kabadiwallahs junk, the
unclassified data or raw data are
highly disorganised. They are often
very large and cumbersome to handle.
To draw meaningful conclusions from
them is a tedious task because they
according to subjects where each do not yield to statistical methods
subject becomes a group or a class. easily. Therefore proper organisation
So, when you need a particular book and presentation of such data is
on history, for instance, all you need needed before any systematic
to do is to search that book in the statistical analysis is undertaken.
group History. Otherwise, you Hence after collecting data the next
would have to search through your step is to organise and present them
entire collection to find the particular in a classified form.
book you are looking for. Suppose you want to know the
While classification of objects or performance of students in
things saves our valuable time and mathematics and you have collected
effort, it is not done in an arbitrary data on marks in mathematics of 100

students of your school. If you present TABLE 3.2

them as a table, they may appear Monthly Household Expenditure (in
Rupees) on Food of 50 Households
something like Table 3.1.
1904 1559 3473 1735 2760
TABLE 3.1 2041 1612 1753 1855 4439
Marks in Mathematics Obtained by 100 5090 1085 1823 2346 1523
Students in an Examination 1211 1360 1110 2152 1183
1218 1315 1105 2628 2712
47 45 10 60 51 56 66 100 49 40 4248 1812 1264 1183 1171
60 59 56 55 62 48 59 55 51 41 1007 1180 1953 1137 2048
42 69 64 66 50 59 57 65 62 50 2025 1583 1324 2621 3676
64 30 37 75 17 56 20 14 55 90 1397 1832 1962 2177 2575
62 51 55 14 25 34 90 49 56 54 1293 1365 1146 3222 1396
70 47 49 82 40 82 60 85 65 66
49 44 64 69 70 48 12 28 55 65 from Table 3.1 then you have to first
49 40 25 41 71 80 0 56 14 22 arrange the marks of 100 students
66 53 46 70 43 61 59 12 30 35
45 44 57 76 82 39 32 14 90 25
either in ascending or in descending
order. That is a tedious task. It
Or you could have collected data becomes more tedious, if instead of
on the monthly expenditure on food 100 you have the marks of a 1,000
of 50 households in your students to handle. Similarly in Table
neighbourhood to know their average 3.2, you would note that it is difficult
expenditure on food. The data for you to ascertain the average
collected, in that case, had you monthly expenditure of 50
households. And this difficulty will go
up manifold if the number was larger
say, 5,000 households. Like our
kabadiwallah, who would be
distressed to find a particular item
when his junk becomes large and
disarranged, you would face a similar
situation when you try to get any
information from raw data that are
large. In one word, therefore, it is a
tedious task to pull information from
large unclassified data.
The raw data are summarised, and
presented as a table, would have
resembled Table 3.2. Both Tables 3.1 made comprehensible by classifi-
and 3.2 are raw or unclassified data. cation. When facts of similar
In both the tables you find that characteristics are placed in the same
numbers are not arranged in any class, it enables one to locate them
order. Now if you are asked what are easily, make comparison, and draw
the highest marks in mathematics inferences without any difficulty. You

have studied in Chapter 2 that the ways. Instead of classifying your books
Government of India conducts Census according to subjects History,
of population every ten years. The raw Geography, Mathematics, Science
data of census are so large and etc. you could have classified them
fragmented that it appears an almost author-wise in an alphabetical order.
impossible task to draw any Or, you could have also classified them
meaningful conclusion from them. according to the year of publication.
But when the data of Census are The way you want to classify them
classified according to gender, would depend on your requirement.
education, marital status, occupation, Likewise the raw data could be
etc., the structure and nature of classified in various ways depending
population of India is, then, easily on the purpose in hand. They can be
understood. grouped according to time. Such a
The raw data consist of classification is known as a
observations on variables. Each unit Chronological Classification. In
of raw data is an observation. In Table such a classification, data are
3.1 an observation shows a particular classified either in ascending or in
descending order with reference to
value of the variable marks of a
time such as years, quarters, months,
student in mathematics. The raw
weeks, etc. The following example
data contain 100 observations on
shows the population of India
marks of a student since there are classified in terms of years. The
100 students. In Table 3.2 it shows a variable population is a Time Series
particular value of the variable as it depicts a series of values for
monthly expenditure of a household different years.
on food. The raw data in it contain
50 observations on monthly Example 1
expenditure on food of a household
Population of India (in crores)
because there are 50 households.
Year Population (Crores)
Activity 1951 35.7
Collect data of total weekly 1961 43.8
expenditure of your family for a 1971 54.6
year and arrange it in a table. 1981 68.4
See how many observations you 1991 81.8
have. Arrange the data monthly 2001 102.7
and find the number of
observations. In Spatial Classification the data
are classified with reference to
3. CLASSIFICATION OF DATA geographical locations such as
countries, states, cities, districts, etc.
The groups or classes of a Example 2 shows the yield of wheat in
classification can be done in various different countries.

on the basis of either the presence or

the absence of a qualitative
characteristic. Such a classification of
data on attributes is called a
Qualitative Classification. In the
following example, we find population
of a country is grouped on the basis
of the qualitative variable gender. An
observation could either be a male or
Example 2
a female. These two characteristics
Yield of Wheat for Different Countries could be further classified on the basis
Country Yield of wheat (kg/acre) of marital status (a qualitative
America 1925 variable) as given below:
Brazil 127
China 893 Example 3
Denmark 225
France 439 Population
India 862

Male Female
In the time-series of Example 1,
in which year do you find the Married Unmarried Married Unmarried
population of India to be the
minimum. Find the year when it The classification at the first stage
is the maximum. is based on the presence and absence
In Example 2, find the country of an attribute i.e. male or not male
whose yield of wheat is slightly (female). At the second stage, each
more than that of Indias. How class male and female, is further sub
much would that be in terms of divided on the basis of the presence or
percentage? absence of another attribute i.e.
Arrange the countries of whether married or unmarried. On the
Example 2 in the ascending
order of yield. Do the same
exercise for the descending order
of yield. The objects around can be
grouped as either living or non-
Sometimes you come across living. Is it a quantitative
characteristics that cannot be classification?
expressed quantitatively. Such
characteristics are called Qualities or other hand, characteristics like height,
Attributes. For example, nationality, weight, age, income, marks of
literacy, religion, gender, marital students, etc. are quantitative in
status, etc. They cannot be measured. nature. When the collected data of
Yet these attributes can be classified such characteristics are grouped into

classes, the classification is a chapter, does not tell you how it varies.
Quantitative Classification. Different variables vary differently and
depending on the way they vary, they
Example 4 are broadly classified into two types:
Frequency Distribution of Marks in (i) Continuous and
Mathematics of 100 Students
(ii) Discrete.
Marks Frequency
A continuous variable can take any
010 1
numerical value. It may take integral
1020 8
2030 6 values (1, 2, 3, 4, ...), fractional values
3040 7 (1/2, 2/3, 3/4, ...), and values that
4050 21 are not exact fractions ( 2 =1.414,
5060 23
6070 19
3 =1.732, , 7 =2.645). For
7080 6 example, the height of a student, as
8090 5 he/she grows say from 90 cm to 150
90100 4 cm, would take all the values in
Total 100 between them. It can take values that
are whole numbers like 90cm, 100cm,
Example 4 shows quantitative 108cm, 150cm. It can also take
classification of the data of marks in fractional values like 90.85 cm, 102.34
mathematics of 100 students given in cm, 149.99cm etc. that are not whole
Table 3.1 as a Frequency Distribution. numbers. Thus the variable height
is capable of
Activity manifesting in
Express the values of frequency every conceivable
of Example 4 as proportion or value and its
percentage of total frequency. values can also
Note that frequency expressed in be broken down into infinite
this way is known as relative gradations. Other examples of a
frequency. continuous variable are weight, time,
In Example 4, which class has distance, etc.
the maximum concentration of Unlike a continuous variable, a
data? Express it as percentage discrete variable can take only certain
of total observations. Which class values. Its value changes only by finite
has the minimum concentration
jumps. It jumps from one value to
of data?
another but does not take any
intermediate value between them. For
example, a variable like the number
of students in a class, for different
A simple definition of variable, classes, would assume values that are
which you have read in the last only whole numbers. It cannot take

any fractional value like before we address this question, you

0.5 because half of a must know what a frequency
student is absurd. distribution is.
Therefore it cannot take a
value like 25.5 between 25 5. WHAT IS A FREQUENCY DISTRIBUTION?
and 26. Instead its value
A frequency distribution is a
could have been either 25
comprehensive way to classify raw
or 26. What we observe is
data of a quantitative variable. It
that as its value changes
shows how the different values of a
from 25 to 26, the values
variable (here, the marks in
in between them the fractions are
mathematics scored by a student) are
not taken by it. But do not have the
distributed in different classes along
impression that a discrete variable
with their corresponding class
cannot take any fractional value.
frequencies. In this case we have ten
Suppose X is a variable that takes
classes of marks: 010, 1020, , 90
values like 1/8, 1/16, 1/32, 1/64, ...
100. The term Class Frequency means
Is it a discrete variable? Yes, because
the number of values in a particular
though X takes fractional values it
class. For example, in the class 30
cannot take any value between two
40 we find 7 values of marks from raw
adjacent fractional values. It changes
data in Table 3.1. They are 30, 37, 34,
or jumps from 1/8 to 1/16 and from
30, 35, 39, 32. The frequency of the
1/16 to 1/32. But cannot take a value
class: 3040 is thus 7. But you might
in between 1/8 and 1/16 or between
be wondering why 40which is
1/16 and 1/32
occurring twice in the raw data is
not included in the class 3040. Had
it been included the class frequency
Distinguish the following of 3040 would have been 9 instead
variables as continuous and of 7. The puzzle would be clear to you
if you are patient enough to read this
Area, volume, temperature,
number appearing on a dice,
chapter carefully. So carry on. You will
crop yield, population, rainfall, find the answer yourself.
number of cars on road, age. Each class in a frequency
distribution table is bounded by Class
Earlier we have mentioned that
Limits. Class limits are the two ends
example 4 is the frequency of a class. The lowest value is called
distribution of marks in mathematics the Lower Class Limit and the highest
of 100 students as shown in Table 3.1. value the Upper Class Limit. For
It shows how the marks of 100 example, the class limits for the class:
students are grouped into classes. You 6070 are 60 and 70. Its lower class
will be wondering as to how we got it limit is 60 and its upper class limit is
from the raw data of Table 3.1. But, 70. Class Interval or Class Width is

the difference between the upper class frequency distribution of the data in
limit and the lower class limit. For the our example above. To obtain the
class 6070, the class interval is 10 frequency curve we plot the class
(upper class limit minus lower class marks on the X-axis and frequency on
limit). the Y-axis.
The Class Mid-Point or Class Mark
is the middle value of a class. It lies
halfway between the lower class limit
and the upper class limit of a class
and can be ascertained in the
following manner:

Class Mid-Point or Class Mark =

(Upper Class Limit + Lower Class
Fig.3.1: Diagrammatic Presentation of
The class mark or mid-value of Frequency Distribution of Data.
each class is used to represent the
How to prepare a Frequency
class. Once raw data are grouped into
classes, individual observations are
not used in further calculations. While preparing a frequency
Instead, the class mark is used. distribution from the raw data of Table
3.1, the following four questions need
TABLE 3.3 to be addressed:
The Lower Class Limits, the Upper Class 1. How many classes should we
Limits and the Class Mark
Class Frequency Lower Upper Class 2. What should be the size of each
Class Class Marks
Limit Limit
3. How should we determine the class
010 1 0 10 5
1020 8 10 20 15
2030 6 20 30 25 4. How should we get the frequency
3040 7 30 40 35 for each class?
4050 21 40 50 45
5060 23 50 60 55 How many classes should we have?
6070 19 60 70 65
7080 6 70 80 75 Before we determine the number
8090 5 80 90 85 of classes, we first find out as to what
90100 4 90 100 95
extent the variable in hand changes
Frequency Curve is a graphic in value. Such variations in variables
representation of a frequency value are captured by its range. The
distribution. Fig. 3.1 shows the Range is the difference between the
diagrammatic presentation of the largest and the smallest values of the

variable. A large range indicates that example, suppose the range is 100
the values of the variable are widely and the class interval is 50. Then the
spread. On the other hand, a small number of classes would be just 2
range indicates that the values of the (i.e.100/50 = 2). Though there is no
variable are spread narrowly. In our hard-and-fast rule to determine the
example the range of the variable number of classes, the rule of thumb
marks of a student are 100 because often used is that the number of
the minimum marks are 0 and the classes should be between 5 and 15.
maximum marks 100. It indicates that In our example we have chosen to
the variable has a large variation. have 10 classes. Since the range is 100
After obtaining the value of range,
and the class interval is 10, the
it becomes easier to determine the
number of classes is 100/10 =10.
number of classes once we decide the
class interval. Note that range is the
What should be the size of each
sum of all class intervals. If the class
intervals are equal then range is the
product of the number of classes and The answer to this question depends
class interval of a single class. on the answer to the previous
question. The equality (2) shows that
Range = Number of Classes Class
given the range of the variable, we can
determine the number of classes once
we decide the class interval. Similarly,
Activities we can determine the class interval
Find the range of the following: once we decide the number of classes.
population of India in Example 1, Thus we find that these two decisions
yield of wheat in Example 2. are inter-linked with one another. We
Given the value of range, the cannot decide on one without deciding
number of classes would be large if on the other.
we choose small class intervals. A
In Example 4, we have the number
frequency distribution with too many
of classes as 10. Given the value of
classes would look too large. Such a
range as 100, the class intervals are
distribution is not easy to handle. So
we want to have a reasonably compact automatically 10 by the equality (2).
set of data. On the other hand, given Note that in the present context we
the value of range if we choose a class have chosen class intervals that are
interval that is too large then the equal in magnitude. However we could
number of classes becomes too small. have chosen class intervals that are
The data set then may be too compact not of equal magnitude. In that case,
and we may not like the loss of the classes would have been of
information about its diversity. For unequal width.

How should we determine the class the lower class limit of that class. Had
limits? we done that we would have excluded
When we classify raw data of a the observation 0. The upper class
continuous variable as a frequency limit of the first class: 010 is then
distribution, we in effect, group the obtained by adding class interval with
individual observations into classes. lower class limit of the class. Thus the
The value of the upper class limit of a upper class limit of the first class
class is obtained by adding the class becomes 0 + 10 = 10. And this proce-
interval with the value of the lower dure is followed for the other classes
class limit of that class. For example, as well.
the upper class limit of the class 20 Have you noticed that the upper
30 is 20 + 10 = 30 where 20 is the class limit of the first class is equal to
lower class limit and 10 is the class the lower class limit of the second
interval. This method is repeated for class? And both are equal to 10. This
other classes as well. is observed for other classes as well.
But how do we decide the lower Why? The reason is that we have used
class limit of the first class? That is to the Exclusive Method of classification
say, why 0 is the lower class limit of of raw data. Under the method we
the first class: 010? It is because we form classes in such a way that the
chose the minimum value of the lower limit of a class coincides with
variable as the lower limit of the first the upper class limit of the previous
class. In fact, we could have chosen a class.
value less than the minimum value of The problem, we would face next,
the variable as the lower limit of the is how do we classify an observation
first class. Similarly, for the upper that is not only equal to the upper
class limit for the last class we could class limit of a particular class but is
have chosen a value greater than the
also equal to the lower class limit of
maximum value of the variable. It is
the next class. For example, we find
important to note that, when a
observation 30 to be equal to the
frequency distribution is being
upper class limit of the class 2030
constructed, the class limits should
and it is equal to the lower class limit
be so chosen that the mid-point or
class mark of each class coincide, as of class 3040. Then, in which of the
far as possible, with any value around two classes: 2030 or 3040 should
which the data tend to be we put the observation 30? We can put
concentrated. it either in class 2030 or in class 30
In our example on marks of 100 40. It is a dilemma that one commonly
students, we chose 0 as the lower limit faces while classifying data in
of the first class: 010 because the overlapping classes. This problem is
minimum marks were 0. And that is solved by the rule of classification in
why, we could not have chosen 1 as the Exclusive Method.

Exclusive Method TABLE 3.4

Frequency Distribution of Incomes of 550
The classes, by this method, are Employees of a Company
formed in such a way that the upper Income (Rs) Number of Employees
class limit of one class equals the
800899 50
lower class limit of the next class. In 900999 100
this way the continuity of the data is 10001099 200
maintained. That is why this method 11001199 150
12001299 40
of classification is most suitable in
13001399 10
case of data of a continuous variable.
Total 550
Under the method, the upper class limit
is excluded but the lower class limit of
in the class: 800899 those employees
a class is included in the interval. Thus
whose income is either Rs 800, or
an observation that is exactly equal
between Rs 800 and Rs 899, or Rs
to the upper class limit, according to
899. If the income of an employee is
the method, would not be included in
exactly Rs 900 then he is put in the
that class but would be included in
next class: 900999.
the next class. On the other hand, if
it were equal to the lower class limit Adjustment in Class Interval
then it would be included in that class.
In our example on marks of students, A close observation of the Inclusive
the observation 40, that occurs twice, Method in Table 3.4 would show that
in the raw data of Table 3.1 is not though the variable income is a
included in the class: 3040. It is continuous variable, no such
included in the next class: 4050. That continuity is maintained when the
is why we find the frequency corres- classes are made. We find gap or
ponding to the class 3040 to be 7 discontinuity between the upper limit
instead of 9. of a class and the lower limit of the
next class. For example, between the
There is another method of forming
upper limit of the first class: 899 and
classes and it is known as the
the lower limit of the second class:
Inclusive Method of classification.
900, we find a gap of 1. Then how
do we ensure the continuity of the
Inclusive Method
variable while classifying data? This
In comparison to the exclusive method, is achieved by making an adjustment
the Inclusive Method does not exclude in the class interval. The adjustment
the upper class limit in a class is done in the following way:
interval. It includes the upper class 1. Find the difference between the
in a class. Thus both class limits are lower limit of the second class and
parts of the class interval. the upper limit of the first class.
For example, in the frequency For example, in Table 3.4 the lower
distribution of Table 3.4 we include limit of the second class is 900 and

the upper limit of the first class is TABLE 3.5

899. The difference between them Frequency Distribution of Incomes of 550
Employees of a Company
is 1, i.e. (900 899 = 1)
2. Divide the difference obtained in Income (Rs) Number of Employees
(1) by two i.e. (1/2 = 0.5) 799.5899.5 50
3. Subtract the value obtained in (2) 899.5999.5 100
999.51099.5 200
from lower limits of all classes 1099.51199.5 150
(lower class limit 0.5) 1199.51299.5 40
4. Add the value obtained in (2) to 1299.51399.5 10
upper limits of all classes (upper Total 550
class limit + 0.5).
After the adjustment that restores
continuity of data in the frequency How should we get the frequency
distribution, the Table 3.4 is modified for each class?
into Table 3.5 In simple terms, frequency of an
After the adjustments in class observation means how many times
limits, the equality (1) that determines that observation occurs in the raw
the value of class-mark would be data. In our Table 3.1, we observe that
modified as the following: the value 40 occurs thrice; 0 and 10
Adjusted Class Mark = (Adjusted occur only once; 49 occurs five times
Upper Class Limit + Adjusted Lower and so on. Thus the frequency of 40
Class Limit)/2. is 3, 0 is 1, 10 is 1, 49 is 5 and so on.
But when the data are grouped into
Tally Marking of Marks of 100 Students in Mathematics
Class Observations Tally Frequency Class
Mark Mark
010 0 / 1 5
1020 10, 14, 17, 12, 14, 12, 14, 14 //// /// 8 15
2030 25, 25, 20, 22, 25, 28 //// / 6 25
3040 30, 37, 34, 39, 32, 30, 35, //// // 7 35
4050 47, 42, 49, 49, 45, 45, 47, 44, 40, 44, //// //// ////
49, 46, 41, 40, 43, 48, 48, 49, 49, 40, //// /
41 21 45
5060 59, 51, 53, 56, 55, 57, 55, 51, 50, 56, //// //// ////
59, 56, 59, 57, 59, 55, 56, 51, 55, 56, //// ///
55, 50, 54 23 55
6070 60, 64, 62, 66, 69, 64, 64, 60, 66, 69, //// //// ////
62, 61, 66, 60, 65, 62, 65, 66, 65 //// 19 65
7080 70, 75, 70, 76, 70, 71 ///// 6 75
8090 82, 82, 82, 80, 85 //// 5 85
90100 90, 100, 90, 90 //// 4 95
Total 100

classes as in example 3, the Class in classifying raw data though much

Frequency refers to the number of is gained by summarising it as a
values in a particular class. The classified data. Once the data are
counting of class frequency is done by grouped into classes, an individual
tally marks against the particular observation has no significance in
class. further statistical calculations. In
Example 4, the class 2030 contains
Finding class frequency by tally 6 observations: 25, 25, 20, 22, 25 and
marking 28. So when these data are grouped
A tally (/) is put against a class for as a class 2030 in the frequency
each student whose marks are distribution, the latter provides only
the number of records in that class
included in that class. For example, if
(i.e. frequency = 6) but not their actual
the marks obtained by a student are
values. All values in this class are
57, we put a tally (/) against class 50
assumed to be equal to the middle
60. If the marks are 71, a tally is put
value of the class interval or class
against the class 7080. If someone
mark (i.e. 25). Further statistical
obtains 40 marks, a tally is put
calculations are based only on the
against the class 4050. Table 3.6
values of class mark and not on the
shows the tally marking of marks of
values of the observations in that
100 students in mathematics from
class. This is true for other classes as
Table 3.1.
well. Thus the use of class mark
The counting of tally is made easier
instead of the actual values of the
when four of them are put as //// observations in statistical methods
and the fifth tally is placed across involves considerable loss of
them as . Tallies are then counted information.
as groups of five. So if there are 16
tallies in a class, we put them as Frequency distribution with
/ for the sake of unequal classes
convenience. Thus frequency in a
class is equal to the number of tallies By now you are familiar with
against that class. frequency distributions of equal class
intervals. You know how they are
Loss of Information constructed out of raw data. But in
some cases frequency distributions
The classification of data as a with unequal class intervals are more
frequency distribution has an appropriate. If you observe the
inherent shortcoming. While it frequency distribution of Example 4,
summarises the raw data making it as in Table 3.6, you will notice that
concise and comprehensible, it does most of the observations are
not show the details that are found in concentrated in classes 4050, 5060
raw data. There is a loss of information and 6070. Their respective frequen-

cies are 21, 23 and 19. It means that terms of unequal classes. Each of the
out of 100 observations, 63 classes 4050, 5060 and 6070 are
(21+23+19) observations are split into two classes. The class 40
concentrated in these classes. These 50 is divided into 4045 and 4550.
classes are densely populated with The class 5060 is divided into 50 55
observations. Thus, 63 percent of data and 5560. And class 6070 is divided
lie between 40 and 70. The remaining into 6065 and 6570. The new
37 percent of data are in classes classes 4045, 4550, 5055, 5560,
010, 1020, 2030, 3040, 7080, 6065 and 6570 have class interval
8090 and 90100. These classes are of 5. The other classes: 010, 1020,
sparsely populated with observations.
2030, 3040, 7080, 8090 and 90
Further you will also notice that
100 retain their old class interval of
observations in these classes deviate
10. The last column of this table shows
more from their respective class marks
the new values of class marks for
than in comparison to those in other
these classes. Compare them with the
classes. But if classes are to be formed
in such a way that class marks old values of class marks in Table 3.6.
coincide, as far as possible, to a value Notice that the observations in these
around which the observations in a classes deviated more from their old
class tend to concentrate, then in that class mark values than their new class
case unequal class interval is more mark values. Thus the new class mark
appropriate. values are more representative of the
Table 3.7 shows the same data in these classes than the old
frequency distribution of Table 3.6 in values.
Frequency Distribution of Unequal Classes
Class Observations Frequency Class
010 0 1 5
1020 10, 14, 17, 12, 14, 12, 14, 14 8 15
2030 25, 25, 20, 22, 25, 28 6 25
3040 30, 37, 34, 39, 32, 30, 35, 7 35
4045 42, 44, 40, 44, 41, 40, 43, 40, 41 9 42.5
4550 47, 49, 49, 45, 45, 47, 49, 46, 48, 48, 49, 49 12 47.5
5055 51, 53, 51, 50, 51, 50, 54 7 52.5
5560 59, 56, 55, 57, 55, 56, 59, 56, 59, 57, 59, 55,
56, 55, 56, 55 16 57.5
6065 60, 64, 62, 64, 64, 60, 62, 61, 60, 62, 10 62.5
6570 66, 69, 66, 69, 66, 65, 65, 66, 65 9 67.5
7080 70, 75, 70, 76, 70, 71 6 75
8090 82, 82, 82, 80, 85 5 85
90100 90, 100, 90, 90 4 95
Total 100

Figure 3.2 shows the frequency TABLE 3.8

curve of the distribution in Table 3.7. Frequency Array of the Size of Households
The class marks of the table are Size of the Number of
plotted on X-axis and the frequencies Household Households
are plotted on Y-axis. 1 5
2 15
3 25
4 35
5 10
6 5
7 3
8 2
Total 100

The variable size of the

household is a discrete variable that
Fig. 3.2: Frequency Curve only takes integral values as shown
in the table. Since it does not take any
fractional value between two adjacent
integral values, there are no classes
If you compare Figure 3.2 with in this frequency array. Since there
Figure 3.1, what do you observe? are no classes in a frequency array
Do you find any difference
between them? Can you explain
there would be no class intervals. As
the difference? the classes are absent in a discrete
frequency distribution, there is no
class mark as well.
Frequency array
So far we have discussed the
classification of data for a continuous The frequency distribution of a single
variable using the example of variable is called a Univariate
percentage marks of 100 students in Distribution. The example 3.3 shows
mathematics. For a discrete variable, the univariate distribution of the
the classification of its data is known single variable marks of a student.
as a Frequency Array. Since a discrete A Bivariate Frequency Distribution is
variable takes values and not the frequency distribution of two
intermediate fractional values variables.
between two integral values, we have Table 3.9 shows the frequency
frequencies that correspond to each distribution of two variable sales and
of its integral values. advertisement expenditure (in Rs.
The example in Table 3.8 lakhs) of 20 companies. The values of
illustrates a Frequency Array. sales are classed in different columns

Bivariate Frequency Distribution of Sales (in Lakh Rs) and Advertisement Expenditure
(in Thousand Rs) of 20 Firms
115125 125135 135145 145155 155165 165175 Total
6264 2 1 3
6466 1 3 4
6668 1 1 2 1 5
6870 2 2 4
7072 1 1 1 1 4
Total 4 5 6 3 1 1 20

and the values of advertisement unclassified. Once the data is

expenditure are classed in different collected, the next step is to classify
rows. Each cell shows the frequency them for further statistical analysis.
of the corresponding row and column
Classification brings order in the
values. For example, there are 3 firms
whose sales are between Rs 135145 data.
lakhs and their advertisement The chapter enables you to know how
expenditures are between Rs 6466 data can be classified through a
thousands. The use of a bivariate frequency distribution in a
distribution would be taken up in comprehensive manner. Once you
Chapter 8 on correlation. know the techniques of classification,
7. CONCLUSION it will be easy for you to construct a
The data collected from primary and frequency distribution, both for
secondary sources are raw or continuous and discrete variables.

Classification brings order to raw data.
A Frequency Distribution shows how the different values of a variable
are distributed in different classes along with their corresponding
class frequencies.
The upper class limit isexcluded but lower class limit is included in
the Exclusive Method.
Both the upper and the lower class limits areincluded in the Inclusive
In a Frequency Distribution, further statistical calculations are based
only on the class mark values, instead of values of the observations.
The classes should be formed in such a way that the class mark
of each class comes as close as possible, to a value around
which the observations in a class tend to concentrate.


1. Which of the following alternatives is true?

(i) The class midpoint is equal to:
(a) The average of the upper class limit and the lower class limit.
(b) The product of upper class limit and the lower class limit.
(c) The ratio of the upper class limit and the lower class limit.
(d) None of the above.
(ii) The frequency distribution of two variables is known as
(a) Univariate Distribution
(b) Bivariate Distribution
(c) Multivariate Distribution
(d) None of the above
(iii) Statistical calculations in classified data are based on
(a) the actual values of observations
(b) the upper class limits
(c) the lower class limits
(d) the class midpoints
(iv) Under Exclusive method,
(a) the upper class limit of a class is excluded in the class interval
(b) the upper class limit of a class is included in the class interval
(c) the lower class limit of a class is excluded in the class interval
(d) the lower class limit of a class is included in the class interval
(v) Range is the
(a) difference between the largest and the smallest observations
(b) difference between the smallest and the largest observations
(c) average of the largest and the smallest observations
(d) ratio of the largest to the smallest observation
2. Can there be any advantage in classifying things? Explain with an example
from your daily life.
3. What is a variable? Distinguish between a discrete and a continuous
4. Explain the exclusive and inclusive methods used in classification of
5. Use the data in Table 3.2 that relate to monthly household expenditure
(in Rs) on food of 50 households and
(i) Obtain the range of monthly household expenditure on food.
(ii) Divide the range into appropriate number of class intervals and obtain
the frequency distribution of expenditure.
(iii) Find the number of households whose monthly expenditure on food is
(a) less than Rs 2000
(b) more than Rs 3000

(c) between Rs 1500 and Rs 2500

6. In a city 45 families were surveyed for the number of domestic appliances
they used. Prepare a frequency array based on their replies as recorded

1 3 2 2 2 2 1 2 1 2 2 3 3 3 3
3 3 2 3 2 2 6 1 6 2 1 5 1 5 3
2 4 2 7 4 2 4 3 4 2 0 3 1 4 3
7. What is loss of information in classified data?
8. Do you agree that classified data is better than raw data?
9. Distinguish between univariate and bivariate frequency distribution.
10. Prepare a frequency distribution by inclusive method taking class interval
of 7 from the following data:

28 17 15 22 29 21 23 27 18 12 7 2 9 4 6
1 8 3 10 5 20 16 12 8 4 33 27 21 15 9
3 36 27 18 9 2 4 6 32 31 29 18 14 13
15 11 9 7 1 5 37 32 28 26 24 20 19 25
19 20

From your old mark-sheets find the marks that you obtained in
mathematics in the previous classes. Arrange them year-wise. Check
whether the marks you have secured in the subject is a variable or
not. Also see, if over the years, you have improved in mathematics.

Presentation of Data

Textual or Descriptive presentation

Studying this chapter should Tabular presentation
enable you to: Diagrammatic presentation.
present data using tables;
represent data using appropriate
In textual presentation, data are
1. I N T R O D U C T I O N described within the text. When the
quantity of data is not too large this form
You have already learnt in previous of presentation is more suitable. Look
chapters how data are collected and at the following cases:
organised. As data are generally
voluminous, they need to be put in a Case 1
compact and presentable form. This In a bandh call given on 08 September
chapter deals with presentation of data 2005 protesting the hike in prices of
precisely so that the voluminous data petrol and diesel, 5 petrol pumps were
collected could be made usable readily found open and 17 were closed whereas
and are easily comprehended. There are 2 schools were closed and remaining 9
generally three forms of presentation of schools were found open in a town of
data: Bihar.

Case 2 3 rows (for male, female and total) and

Census of India 2001 reported that 3 columns (for urban, rural and total).
Indian population had risen to 102 It is called a 3 3 Table giving 9 items
crore of which only 49 crore were of information in 9 boxes called the
females against 53 crore males. 74 crore "cells" of the Table. Each cell gives
people resided in rural India and only information that relates an attribute of
28 crore lived in towns or cities. While gender ("male", "female" or total) with a
there were 62 crore non-worker number (literacy percentages of rural
population against 40 crore workers in people, urban people and total). The
the entire country, urban population most important advantage of tabulation
had an even higher share of non- is that it organises data for further
workers (19 crores) against the workers statistical treatment and decision-
(9 crores) as compared to the rural making. Classification used in
population where there were 31 crore tabulation is of four kinds:
workers out of a 74 crore population.... Qualitative
In both the cases data have been Quantitative
presented only in the text. A serious Temporal and
drawback of this method of presentation Spatial
is that one has to go through the
complete text of presentation for Qualitative classification
comprehension but at the same time, it When classification is done according
enables one to emphasise certain points to qualitative characteristics like social
of the presentation. status, physical status, nationality, etc.,
it is called qualitative classification. For
example, in Table 4.1 the characteris-
tics for classification are sex and
location which are qualitative in nature.
Literacy in Bihar by sex and location (per cent)
Location Total
Sex Rural Urban
Male 57.70 80.80 60.32
Female 30.03 63.30 33.57
Total 44.42 72.71 47.53
Source: Census of India 2001, Provisional
In a tabular presentation, data are Population Totals.
presented in rows (read horizontally)
and columns (read vertically). For Quantitative classification
example see Table 4.1 below tabulating In quantitative classification, the data
information about literacy rates. It has are classified on the basis of

characteristics which are quantitative Temporal classification

in nature. In other words these In this classification time becomes the
characteristics can be measured classifying variable and data are
quantitatively. For example, age, height, categorised according to time. Time
production, income, etc are quantitative may be in hours, days, weeks, months,
characteristics. Classes are formed by years, etc. For example, see Table 4.3.
assigning limits called class limits for TABLE 4.3
the values of the characteristic under Yearly sales of a tea shop
consideration. An example of from 1995 to 2000

quantitative classification is Table 4.2. Years Sale (Rs in lakhs)

1995 79.2
TABLE 4.2 1996 81.3
Distribution of 542 respondents by 1997 82.4
their age in an election study in Bihar 1998 80.5
1999 100.2
Age group No. of
2000 91.2
(yrs) respondents Per cent
2030 3 0.55 Data Source: Unpublished data.
3040 61 11.25
4050 132 24.35 In this table the classifying
5060 153 28.24 characteristic is year and takes values
6070 140 25.83 in the scale of time.
7080 51 9.41
8090 2 0.37
All 542 100.00 Activity
Go to your library and collect
Source: Assembly election Patna central
data on the number of books in
constituency 2005, A.N. Sinha Institute of Social
Studies, Patna.
economics, the library had at
the end of the year for the last
Here classifying characteristic is age ten years and present the data
in years and is quantifiable. in a table.

Activities Spatial classification

Construct a table presenting When classification is done in such a
data on preferential liking of the way that place becomes the classifying
students of your class for Star variable, it is called spatial
News, Zee News, BBC World, classification. The place may be a
CNN, Aaj Tak and DD News. village/town, block, district, state,
Prepare a table of country, etc.
(i) heights (in cm) and
Here the classifying characteristic is
(ii) weights (in kg) of students
country of the world. Table 4.4 is an
of your class.
example of spatial classification.

TABLE 4.4 (i) Table Number

Export from India to rest of the world in
one year as share of total export (per cent) Table number is assigned to a table for
Destination Export share identification purpose. If more than one
table is presented, it is the table
USA 21.8
Germany 5.6
number that distinguishes one table
Other EU 14.7 from another. It is given at the top or
UK 5.7 at the beginning of the title of the table.
Japan 4.9 Generally, table numbers are whole
Russia 2.1
Other East Europe 0.6
numbers in ascending order if there are
OPEC 10.5 many tables in a book. Subscripted
Asia 19.0 numbers like 1.2, 3.1, etc. are also in
Other LDCs 5.6 use for identifying the table according
Others 9.5
to its location. For example, Table
All 100.0 number 4.5 may read as fifth table
(Total Exports: US $ 33658.5 million)
of the fourth chapter and so on.
(See Table 4.5)

Activity (ii) Title

Construct a table presenting The title of a table narrates about the
data collected from students of
contents of the table. It has to be very
your class according to their
clear, brief and carefully worded so that
native states/residential
locality. the interpretations made from the table
are clear and free from any ambiguity.
4. TABULATION OF DATA AND PARTS OF It finds place at the head of the table
A TABLE succeeding the table number or just
below it. (See Table 4.5).
To construct a table it is important to
learn first what are the parts of a good (iii) Captions or Column Headings
statistical table. When put together in
At the top of each column in a table a
a systematically ordered manner these
column designation is given to explain
parts form a table. The most simple way
figures of the column. This is
of conceptualising a table may be data called caption or column heading.
presented in rows and columns (See Table 4.5)
alongwith some explanatory notes.
Tabulation can be done using one- (iv) Stubs or Row Headings
way, two-way or three-way Like a caption or column heading each
classification depending upon the row of the table has to be given a
number of characteristics involved. A heading. The designations of the rows
good table should essentially have the are also called stubs or stub items, and
following: the complete left column is known as

stub column. A brief description of the were non-workers in 2001. (See Table
row headings may also be given at the 4.5).
left hand top in the table. (See Table
4.5). (vi) Unit of Measurement
The unit of measurement of the figures
(v) Body of the Table in the table (actual data) should always
Body of a table is the main part and it be stated alongwith the title if the unit
contains the actual data. Location of does not change throughout the table.
any one figure/data in the table is fixed If different units are there for rows or
and determined by the row and column columns of the table, these units must
of the table. For example, data in the be stated alongwith stubs or
second row and fourth column indicate captions. If figures are large, they
that 25 crore females in rural India should be rounded up and the method

Table Number Title

Table 4.5 Population of India according to workers and non-workers by gender and location


Column Headings/Captions
Location Gender Workers Non-worker Total
Main Marginal Total
Male 17 3 20 18 38
Row Headings/stubs


Body of the table

Female 6 5 11 25 36


Female 1 0 1 12 13
Total 8 1 9 19 28
Male 24 4 28 25 53

Female 7 5 12 37 49
Total 31 9 40 62 102

Source : Census of India 2001

Foot note : Figures are rounded to nearest crore
Source note


(Note : Table 4.5 presents the same data in tabular form already presented through case 2 in
textual presentation of data)

of rounding should be indicated (See Diagrams may be less accurate but

Table 4.5). are much more effective than tables in
presenting the data.
(vii) Source Note There are various kinds of diagrams
It is a brief statement or phrase in common use. Amongst them the
indicating the source of data presented important ones are the following:
in the table. If more than one source is (i) Geometric diagram
there, all the sources are to be written (ii) Frequency diagram
in the source note. Source note is (iii) Arithmetic line graph
generally written at the bottom of the
table. (See Table 4.5). Geometric Diagram
Bar diagram and pie diagram come in
(viii) Footnote the category of geometric diagram for
Footnote is the last part of the table. presentation of data. The bar diagrams
Footnote explains the specific feature are of three types simple, multiple and
of the data content of the table which is component bar diagrams.
not self explanatory and has not been
explained earlier. Bar Diagram
Simple Bar Diagram
Bar diagram comprises a group of
equispaced and equiwidth rectangular
How many rows and columns bars for each class or category of data.
are essentially required to form
Height or length of the bar reads the
a table?
magnitude of data. The lower end of the
Can the column/row headings
of a table be quantitative? bar touches the base line such that the
height of a bar starts from the zero unit.
Bars of a bar diagram can be visually
5. D I A G R A M M A T I C PRESENTATION OF compared by their relative height and
DATA accordingly data are comprehended
quickly. Data for this can be of
This is the third method of presenting frequency or non-frequency type. In
data. This method provides the non-frequency type data a particular
quickest understanding of the actual characteristic, say production, yield,
situation to be explained by data in population, etc. at various points of
comparison to tabular or textual time or of different states are noted and
presentations. Diagrammatic presenta- corresponding bars are made of the
tion of data translates quite effectively respective heights according to the
the highly abstract ideas contained in values of the characteristic to construct
numbers into more concrete and easily the diagram. The values of the
comprehensible form. characteristics (measured or counted)

retain the identity of each value. Figure expenditure profile, export/imports

4.1 is an example of a bar diagram. over the years, etc.

You had constructed a table
presenting the data about the
students of your class. Draw a
bar diagram for the same table.
Different types of data may require
different modes of diagrammatical
representation. Bar diagrams are
suitable both for frequency type and A category that has a longer bar
non-frequency type variables and (literacy of Kerala) than another
attributes. Discrete variables like family category (literacy of West Bengal), has
size, spots on a dice, grades in an more of the measured (or enumerated)
examination, etc. and attributes such characteristics than the other. Bars
as gender, religion, caste, country, etc. (also called columns) are usually used
can be represented by bar diagrams. in time series data (food grain
Bar diagrams are more convenient for produced between 19802000,
non-frequency data such as income- decadal variation in work participation
Literacy Rates of Major States of India
2001 1991
Major Indian States Person Male Female Person Male Female
Andhra Pradesh (AP) 60.5 70.3 50.4 44.1 55.1 32.7
Assam (AS) 63.3 71.3 54.6 52.9 61.9 43.0
Bihar (BR) 47.0 59.7 33.1 37.5 51.4 22.0
Jharkhand (JH) 53.6 67.3 38.9 41.4 55.8 31.0
Gujarat (GJ) 69.1 79.7 57.8 61.3 73.1 48.6
Haryana (HR) 67.9 78.5 55.7 55.8 69.1 40.4
Karnataka (KA) 66.6 76.1 56.9 56.0 67.3 44.3
Kerala (KE) 90.9 94.2 87.7 89.8 93.6 86.2
Madhya Pradesh (MP) 63.7 76.1 50.3 44.7 58.5 29.4
Chhattisgarh (CH) 64.7 77.4 51.9 42.9 58.1 27.5
Maharashtra (MR) 76.9 86.0 67.0 64.9 76.6 52.3
Orissa (OR) 63.1 75.3 50.5 49.1 63.1 34.7
Punjab (PB) 69.7 75.2 63.4 58.5 65.7 50.4
Rajasthan (RJ) 60.4 75.7 43.9 38.6 55.0 20.4
Tamil Nadu (TN) 73.5 82.4 64.4 62.7 73.7 51.3
Uttar Pradesh (UP) 56.3 68.8 42.2 40.7 54.8 24.4
Uttaranchal (UT) 71.6 83.3 59.6 57.8 72.9 41.7
West Bengal (WB) 68.6 77.0 59.6 57.7 67.8 46.6
India 64.8 75.3 53.7 52.2 64.1 39.3

Fig. 4.1: Bar diagram showing literacy rates (person) of major states of India, 2001.

rate, registered unemployed over the different years, marks obtained in

years, literacy rates, etc.) (Fig 4.2). different subjects in different classes,
Bar diagrams can have different etc.
forms such as multiple bar diagram
and component bar diagram. Component Bar Diagram

Activities Component bar diagrams or charts

(Fig.4.3), also called sub-diagrams, are
How many states (among the very useful in comparing the sizes of
major states of India) had
different component parts (the elements
higher female literacy rate than
or parts which a thing is made up of)
the national average in 2001?
and also for throwing light on the
Has the gap between maximum
and minimum female literacy relationship among these integral parts.
rates over the states in two For example, sales proceeds from
consecutive census years 2001 different products, expenditure pattern
and 1991 declined? in a typical Indian family (components
being food, rent, medicine, education,
Multiple Bar Diagram power, etc.), budget outlay for receipts
Multiple bar diagrams (Fig.4.2) are and expenditures, components of
used for comparing two or more sets of labour force, population etc.
data, for example income and Component bar diagrams are usually
expenditure or import and export for shaded or coloured suitably.

Fig. 4.2: Multiple bar (column) diagram showing female literacy rates over two census years 1991
and 2001 by major states of India.
Interpretation: It can be very easily derived from Figure 4.2 that female literacy rate over the years
was on increase throughout the country. Similar other interpretations can be made from the figure
like the state of Rajasthan experienced the sharpest rise in female literacy, etc.

TABLE 4.7 its height equivalent to the total value

Enrolment by gender at schools (per cent) of the bar [for per cent data the bar
of children aged 614 years in a district of
height is of 100 units (Figure 4.3)].
Otherwise the height is equated to total
Enrolled Out of school
Gender (per cent) (per cent)
value of the bar and proportional
heights of the components are worked
Boy 91.5 8.5
out using unitary method. Smaller
Girl 58.6 41.4
All 78.0 22.0 components are given priority in
parting the bar.
Data Source: Unpublished data
Pie Diagram
A component bar diagram shows
the bar and its sub-divisions into two A pie diagram is also a component
or more components. For example, the
bar might show the total population of
children in the age-group of 614 years.
The components show the proportion
of those who are enrolled and those
who are not. A component bar diagram
might also contain different component
bars for boys, girls and the total of
children in the given age group range,
as shown in Figure 4.3. To construct a
component bar diagram, first of all, a Fig. 4.3: Enrolment at primary level in a district
bar is constructed on the x-axis with of Bihar (Component Bar Diagram)

diagram, but unlike a component bar of the components have to be converted

diagram, a circle whose area is into percentages before they can be
proportionally divided among the used for a pie diagram.
components (Fig.4.4) it represents. It
Distribution of Indian population by their
working status (crore)
Status Population Per cent Angular
Marginal Worker 9 8.8 32
Main Worker 31 30.4 109
Non-Worker 62 60.8 219
All 102 100.0 360

is also called a pie chart. The circle is

divided into as many parts as there are
components by drawing straight lines
from the centre to the circumference.
Pie charts usually are not drawn
with absolute values of a category. The
values of each category are first Fig. 4.4: Pie diagram for different categories of
Indian population according to working status
expressed as percentage of the total
value of all the categories. A circle in a
pie chart, irrespective of its value of
radius, is thought of having 100 equal
parts of 3.6 (360/100) each. To find Represent data presented
out the angle, the component shall through Figure 4.4 by a
component bar diagram.
subtend at the centre of the circle, each Does the area of a pie have any
percentage figure of every component bearing on total value of the
is multiplied by 3.6. An example of this data to be represented by the
conversion of percentages of pie diagram?
components into angular components
of the circle is shown in Table 4.8. Frequency Diagram
It may be interesting to note that Data in the form of grouped frequency
data represented by a component bar distributions are generally represented
diagram can also be represented by frequency diagrams like histogram,
equally well by a pie chart, the only frequency polygon, frequency curve
requirement being that absolute values and ogive.

Histogram TABLE 4.9

Distribution of daily wage earners in a
A histogram is a two dimensional locality of a town
diagram. It is a set of rectangles with Daily No. Cumulative Frequencey
bases as the intervals between class earning of wage 'Less than' 'More than'
(Rs) earners (f)
boundaries (along X-axis) and with
4549 2 2 85
areas proportional to the class
5054 3 5 83
frequency (Fig.4.5). If the class intervals 5559 5 10 80
are of equal width, which they generally 6064 3 13 75
6569 6 19 72
are, the area of the rectangles are 7074 7 26 66
proportional to their respective 7579 12 38 59
frequencies. However, in some type of 8084 13 51 47
8589 9 60 34
data, it is convenient, at times 9094 7 67 25
necessary, to use varying width of class 9599 6 73 18
intervals. For example, when tabulating 100104 4 77 12
105109 2 79 8
deaths by age at death, it would be very 110114 3 82 6
meaningful as well as useful too to have 115119 3 85 3
very short age intervals (0, 1, 2, ..., yrs/
Source: Unpublished data
0, 7, 28, ..., days) at the beginning
when death rates are very high Since histograms are rectangles, a line
parallel to the base line and of the same
compared to deaths at most other
magnitude is to be drawn at a vertical
higher age segments of the population. distance equal to frequency (or
For graphical representation of such frequency density) of the class interval.
data, height for area of a rectangle is A histogram is never drawn for a
the quotient of height (here frequency) discrete variable/data. Since in an
and base (here width of the class interval or ratio scale the lower class
interval). When intervals are equal, that boundary of a class interval fuses with
the upper class boundary of the
is, when all rectangles have the same
previous interval, equal or unequal, the
base, area can conveniently be rectangles are all adjacent and there is
represented by the frequency of any no open space between two consecutive
interval for purposes of comparison. rectangles. If the classes are not
When bases vary in their width, the continuous they are first converted into
heights of rectangles are to be adjusted continuous classes as discussed in
to yield comparable measurements. Chapter 3. Sometimes the common
portion between two adjacent
The answer in such a situation is
rectangles (Fig.4.6) is omitted giving a
frequency density (class frequency better impression of continuity. The
divided by width of the class interval) resulting figure gives the impression of
instead of absolute frequency. a double staircase.

A histogram looks similar to a bar continuous variables, but histogram is

diagram. But there are more differences drawn only for a continuous variable.
than similarities between the two than Histogram also gives value of mode of
it may appear at the first impression. the frequency distribution graphically
The spacing and the width or the area as shown in Figure 4.5 and the x-
of bars are all arbitrary. It is the height coordinate of the dotted vertical line
and not the width or the area of the bar gives the mode.
that really matters. A single vertical line
could have served the same purpose Frequency Polygon
as a bar of same width. Moreover, in A frequency polygon is a plane
histogram no space is left in between bounded by straight lines, usually four
two rectangles, but in a bar diagram or more lines. Frequency polygon is an
some space must be left between alternative to histogram and is also
consecutive bars (except in multiple derived from histogram itself. A
bar or component bar diagram). frequency polygon can be fitted to a
Although the bars have the same histogram for studying the shape of the
width, the width of a bar is unimportant curve. The simplest method of drawing
for the purpose of comparison. The a frequency polygon is to join the
width in a histogram is as important midpoints of the topside of the
as its height. We can have a bar consecutive rectangles of the
diagram both for discrete and histogram. It leaves us with the two

Fig. 4.5: Histogram for the distribution of 85 daily wage earners in a locality of a town.

ends away from the base line, denying No matter whether class boundaries or
the calculation of the area under the midpoints are used in the X-axis,
curve. The solution is to join the two frequencies (as ordinates) are always
end-points thus obtained to the base plotted against the mid-point of class
line at the mid-values of the two classes intervals. When all the points have been
with zero frequency immediately at plotted in the graph, they are carefully
each end of the distribution. Broken joined by a series of short straight lines.
lines or dots may join the two ends with Broken lines join midpoints of two
the base line. Now the total area under intervals, one in the beginning and the
the curve, like the area in the other at the end, with the two ends of
histogram, represents the total the plotted curve (Fig.4.6). When
frequency or sample size. comparing two or more distributions
Frequency polygon is the most plotted on the same axes, frequency
common method of presenting grouped polygon is likely to be more useful since
frequency distribution. Both class the vertical and horizontal lines of two
boundaries and class-marks can be or more distributions may coincide in
used along the X-axis, the distances a histogram.
between two consecutive class marks
Frequency Curve
being proportional/equal to the width
of the class intervals. Plotting of data The frequency curve is obtained by
becomes easier if the class-marks fall drawing a smooth freehand curve
on the heavy lines of the graph paper. passing through the points of the

Fig. 4.6: Frequency polygon drawn for the data given in Table 4.9

Fig. 4.7: Frequency curve for Table 4.9

frequency polygon as closely as frequencies are plotted against the

possible. It may not necessarily pass respective lower limits of the class
through all the points of the frequency interval. An interesting feature of the
polygon but it passes through them as two ogives together is that their
closely as possible (Fig. 4.7). intersection point gives the median
Fig. 4.8 (b) of the frequency distribu-
Ogive tion. As the shapes of the two ogives
Ogive is also called cumulative suggest, less than ogive is never
frequency curve. As there are two types decreasing and more than ogive is
of cumulative frequencies, for example never increasing.
less than type and more than type,
TABLE 4.10
accordingly there are two ogives for any Frequency distribution of marks
grouped frequency distribution data. obtained in mathematics
Here in place of simple frequencies as Marks Number of Less than More than
in the case of frequency polygon, students cumulative cumulative
cumulative frequencies are plotted x f frequency frequency
along y-axis against class limits of the 020 6 6 64
frequency distribution. For less than 2040 5 11 58
4060 33 44 53
ogive the cumulative frequencies are
6080 14 58 20
plotted against the respective upper 80100 6 64 6
limits of the class intervals whereas for
Total 64
more than ogives the cumulative

Fig. 4.8(a): 'Less than' and 'More than' ogive for data given in Table 4.10
Arithmetic Line Graph
An arithmetic line graph is also called
time series graph and is a method of
diagrammatic presentation of data. In
it, time (hour, day/date, week, month,
year, etc.) is plotted along x-axis and
the value of the variable (time series
data) along y-axis. A line graph by
joining these plotted points, thus,
obtained is called arithmetic line graph
(time series graph). It helps in
understanding the trend, periodicity,
etc. in a long term time series data.

Can the ogive be helpful in
locating the partition values of
Fig. 4.8(b): Less than and More than ogive the distribution it represents?
for data given in Table 4.10

TABLE 4.11 Here you can see from Fig. 4.9 that
Value of Exports and Imports of India
for the period 1978 to 1999, although
(Rs in 100 crores)
the imports were more than the exports
Year Exports Imports
all through, the rate of acceleration
197778 54 60 went on increasing after 198889 and
197879 57 68
197980 64 91
the gap between the two (imports and
198081 67 125 exports) was widened after 1995.
198283 88 143
198384 98 158 6. C O N C L U S I O N
198485 117 171
198586 109 197 By now you must have been able to
198687 125 201 learn how collected data could be
198788 157 222
198889 202 282
presented using various forms of
198990 277 353 presentation textual, tabular and
199091 326 432 diagrammatic. You are now also able
199192 440 479 to make an appropriate choice of the
199293 532 634
199394 698 731 form of data presentation as well as the
199495 827 900 type of diagram to be used for a given
199596 1064 1227 set of data. Thus you can make
199697 1186 1369
199798 1301 1542
presentation of data meaningful,
199899 1416 1761 comprehensive and purposeful.

Fig. 4.9: Arithmetic line graph for time series data given in Table 4.11

Data (even voluminous data) speak meaningfully through
For small data (quantity) textual presentation serves the purpose
For large quantity of data tabular presentation helps in
accommodating any volume of data for one or more variables.
Tabulated data can be presented through diagrams which enable
quicker comprehension of the facts presented otherwise.


Measures of Central Tendency

Studying this chapter should of the data. In this chapter, you will
enable you to: study the measures of central
understand the need for tendency which is a numerical method
summarising a set of data by one to explain the data in brief. You can
single number; see examples of summarising a large
recognise and distinguish set of data in day to day life like
between the different types of average marks obtained by students
of a class in a test, average rainfall in
learn to compute different types
of averages; an area, average production in a
draw meaningful conclusions factory, average income of persons
from a set of data; living in a locality or working in a firm
develop an understanding of etc.
which type of average would be Baiju is a farmer. He grows food
most useful in a particular grains in his land in a village called
situation. Balapur in Buxar district of Bihar. The
village consists of 50 small farmers.
Baiju has 1 acre of land. You are
1. I N T R O D U C T I O N
interested in knowing the economic
In the previous chapter, you have read condition of small farmers of Balapur.
the tabular and graphic representation You want to compare the economic

condition of Baiju in Balapur village. 2. ARITHMETIC MEAN

For this, you may have to evaluate the
Suppose the monthly income (in Rs)
size of his land holding, by comparing
of six families is given as:
with the size of land holdings of other
1600, 1500, 1400, 1525, 1625, 1630.
farmers of Balapur. You may like to
The mean family income is
see if the land owned by Baiju is
1. above average in ordinary sense obtained by adding up the incomes
(see the Arithmetic Mean below) and dividing by the number of
2. above the size of what half the families.
farmers own (see the Median 1600 + 1500 + 1400 + 1525 + 1625 + 1630
below) 6
3. above what most of the farmers = Rs 1,547
own (see the Mode below) It implies that on an average, a
In order to evaluate Baijus relative family earns Rs 1,547.
economic condition, you will have to Arithmetic mean is the most
summarise the whole set of data of commonly used measure of central
land holdings of the farmers of tendency. It is defined as the sum of
Balapur. This can be done by use of the values of all observations divided
central tendency, which summarises by the number of observations and is
the data in a single value in such a usually denoted by x . In general, if
way that this single value can there are N observations as X1, X2, X3,
represent the entire data. The ..., XN, then the Arithmetic Mean is
measuring of central tendency is a given by
way of summarising the data in the
form of a typical or representative X 1 + X 2 + X 3 + ... + X N
value. N
There are several statistical SX
measures of central tendency or N
averages. The three most commonly
Where, S X = sum of all observa-
used averages are:
tions and N = total number of obser-
Arithmetic Mean
How Arithmetic Mean is Calculated
You should note that there are two
more types of averages i.e. Geometric The calculation of arithmetic mean
Mean and Harmonic Mean, which are can be studied under two broad
suitable in certain situations. categories:
However, the present discussion will 1. Arithmetic Mean for Ungrouped
be limited to the three types of Data.
averages mentioned above. 2. Arithmetic Mean for Grouped Data.

Arithmetic Mean for Series of mean by direct method. The

Ungrouped Data computation can be made easier by
using assumed mean method.
Direct Method In order to save time of calculation
of mean from a data set containing a
Arithmetic mean by direct method is large number of observations as well
the sum of all observations in a series as large numerical figures, you can
divided by the total number of use assumed mean method. Here you
observations. assume a particular figure in the data
as the arithmetic mean on the basis
Example 1 of logic/experience. Then you may
Calculate Arithmetic Mean from the take deviations of the said assumed
data showing marks of students in a mean from each of the observation.
class in an economics test: 40, 50, 55, You can, then, take the summation of
78, 58. these deviations and divide it by the
number of observations in the data.
X= The actual arithmetic mean is
N estimated by taking the sum of the
40 + 50 + 55 + 78 + 58 assumed mean and the ratio of sum
= = 56.2 of deviations to number of observa-
tions. Symbolically,
The average marks of students in Let, A = assumed mean
the economics test are 56.2. X = individual observations
N = total numbers of observa-
Assumed Mean Method tions
If the number of observations in the d = deviation of assumed mean
data is more and/or figures are large, from individual observation,
it is difficult to compute arithmetic i.e. d = X A


Then sum of all deviations is taken Arithmetic Mean using assumed mean
as Sd = S( X - A ) method

Sd Sd
X =A + = 850 + (2, 660)/10
Then find N
Sd = Rs1,116.
Then add A and to get X
N Thus, the average weekly income
Sd of a family by both methods is
Therefore, X = A + Rs 1,116. You can check this by using
You should remember that any the direct method.
value, whether existing in the data or
not, can be taken as assumed mean. Step Deviation Method
However, in order to simplify the The calculations can be further
calculation, centrally located value in simplified by dividing all the deviations
the data can be selected as assumed taken from assumed mean by the
mean. common factor c. The objective is to
Example 2 avoid large numerical figures, i.e., if
d = X A is very large, then find d'.
The following data shows the weekly
This can be done as follows:
income of 10 families.
Family d X-A
A B C D E F G H = .
c C
Weekly Income (in Rs) The formula is given below:
850 700 100 750 5000 80 420 2500
S d
400 360 X =A + c
Compute mean family income. N
Where d' = (X A)/c, c = common
Computation of Arithmetic Mean by factor, N = number of observations,
Assumed Mean Method A= Assumed mean.
Families Income d = X 850 d'
Thus, you can calculate the
(X) = (X 850)/10 arithmetic mean in the example 2, by
A 850 0 0
the step deviation method,
B 700 150 15 X = 850 + (266)/10 10 = Rs 1,116.
C 100 750 75
D 750 100 10 Calculation of arithmetic mean for
E 5000 +4150 +415 Grouped data
F 80 770 77
G 420 430 43 Discrete Series
H 2500 +1650 +165
I 400 450 45 Direct Method
J 360 490 49
In case of discrete series, frequency
11160 +2660 +266 against each of the observations is

multiplied by the value of the Assumed Mean Method

observation. The values, so obtained, As in case of individual series the
are summed up and divided by the calculations can be simplified by using
total number of frequencies. assumed mean method, as described
Symbolically, earlier, with a simple modification.
Since frequency (f) of each item is
S fX given here, we multiply each deviation
X =
Sf (d) by the frequency to get fd. Then we
Where, S fX = sum of product of get S fd. The next step is to get the
variables and frequencies. total of all frequencies i.e. S f. Then
S f = sum of frequencies.
find out S fd/ S f. Finally the
arithmetic mean is calculated by
Example 3
S fd
Calculate mean farm size of X =A + using assumed mean
cultivating households in a village for Sf
the following data. method.
Farm Size (in acres):
64 63 62 61 60 59
Step Deviation Method
No. of Cultivating Households: In this case the deviations are divided
8 18 12 9 7 6 by the common factor c which
simplifies the calculation. Here we
d X-A
Computation of Arithmetic Mean by estimate d' = = in order to
Direct Method c C
Farm Size No. of X d fd reduce the size of numerical figures
(X) cultivating (1 2) (X - 62) (2 4) for easier calculation. Then get fd' and
in acres households(f)
(1) (2) (3) (4) (5) S fd'. Finally the formula for step
64 8 512 +2 +16 deviation method is given as,
63 18 1134 +1 +18 S fd
62 12 744 0 0 X =A + c
61 9 549 1 9 Sf
60 7 420 2 14
59 6 354 3 18 Activity
60 3713 3 7 Find the mean farm size for the
data given in example 3, by using
Arithmetic mean using direct method, step deviation and assumed
mean methods.
S fX 3717
X = = = 61.88 acres
Sf 60 Continuous Series
Therefore, the mean farm size in a Here, class intervals are given. The
village is 61.88 acres. process of calculating arithmetic mean

in case of continuous series is same Steps:

as that of a discrete series. The only 1. Obtain mid values for each class
difference is that the mid-points of denoted by m.
various class intervals are taken. You
2. Obtain S fm and apply the direct
should note that class intervals may
method formula:
be exclusive or inclusive or of unequal
size. Example of exclusive class S fm 2110
X= = = 30.14 marks
interval is, say, 010, 1020 and so Sf 70
on. Example of inclusive class interval
is, say, 09, 1019 and so on. Example Step deviation method
of unequal class interval is, say,
020, 2050 and so on. In all these m A
1. Obtain d' =
cases, calculation of arithmetic mean c
is done in a similar way. 2. Take A = 35, (any arbitrary figure),
c = common factor.
Example 4
fd ( 34)
Calculate average marks of the X = A+ c = 35 + 10
f 70
following students using (a) Direct = 30.14 marks
method (b) Step deviation method.
An interesting property of A.M.
Direct Method
Marks It is interesting to know and
010 1020 2030 3040 4050 useful for checking your calculation
5060 6070 that the sum of deviations of items
No. of Students about arithmetic mean is always equal
5 12 15 25 8
3 2 to zero. Symbolically, S ( X X ) = 0.
However, arithmetic mean is
TABLE 5.3 affected by extreme values. Any large
Computation of Average Marks for
Exclusive Class Interval by Direct Method
value, on either end, can push it up
or down.
Mark No. of mid fm d'=(m-35) fd'
(x) students value (2)(3) 10
f) (m) Weighted Arithmetic Mean
(1) (2) (3) (4) (5) (6) Sometimes it is important to assign
010 5 5 25 3 15
1020 12 15 180 2 24 weights to various items according to
2030 15 25 375 1 15 their importance, when you calculate
3040 25 35 875 0 0 the arithmetic mean. For example,
4050 8 45 360 1 8 there are two commodities, mangoes
5060 3 55 165 2 6
6070 2 65 130 3 6
and potatoes. You are interested in
finding the average price of mangoes
70 2110 34
(p1) and potatoes (p2). The arithmetic

p1 + p2 3. MEDIAN
mean will be . However, you
2 The arithmetic mean is affected by the
might want to give more importance presence of extreme values in the data.
to the rise in price of potatoes (p2). To If you take a measure of central
do this, you may use as weights the tendency which is based on middle
quantity of mangoes (q1) and the position of the data, it is not affected
quantity of potatoes (q2). Now the by extreme items. Median is that
arithmetic mean weighted by the positional value of the variable which
divides the distribution into two equal
q1p1 + q 2 p 2
quantities would be . parts, one part comprises all values
q1 + q 2
greater than or equal to the median
In general the weighted arithmetic value and the other comprises all
mean is given by, values less than or equal to it. The
w1 x1 + w 2 x 2 +...+ w n x n wx Median is the middle element when
= the data set is arranged in order of the
w1 + w 2 +...+ w n w
When the prices rise, you may be
interested in the rise in the price of Computation of median
the commodities that are more The median can be easily computed
important to you. You will read more by sorting the data from smallest to
about it in the discussion of Index largest and counting the middle value.
Numbers in Chapter 8.
Example 5
Activities Suppose we have the following
Check this property of the observation in a data set: 5, 7, 6, 1, 8,
arithmetic mean for the following 10, 12, 4, and 3.
example: Arranging the data, in ascending order
X: 4 6 8 10 12 you have:
In the above example if mean is 1, 3, 4, 5, 6, 7, 8, 10, 12.
increased by 2, then what
happens to the individual
observations, if all are equally
The middle score is 6, so the
If first three items increase by median is 6. Half of the scores are
2, then what should be the larger than 6 and half of the scores
values of the last two items, so are smaller.
that mean remains the same. If there are even numbers in the
Replace the value 12 by 96. What data, there will be two observations
happens to the arithmetic mean. which fall in the middle. The median
in this case is computed as the

arithmetic mean of the two middle th

values. Median = size of item
Example 6 Discrete Series
The following data provides marks of In case of discrete series the position
20 students. You are required to of median i.e. (N+1)/2th item can be
calculate the median marks. located through cumulative freque-
25, 72, 28, 65, 29, 60, 30, 54, 32, 53, ncy. The corresponding value at this
33, 52, 35, 51, 42, 48, 45, 47, 46, 33. position is the value of median.

Arranging the data in an ascending Example 7

order, you get
The frequency distribution of the
25, 28, 29, 30, 32, 33, 33, 35, 42, number of persons and their
45, 46, 47, 48, 51, 52, 53, 54, 60, respective incomes (in Rs) are given
below. Calculate the median income.
Income (in Rs): 10 20 30 40
65, 72. Number of persons: 2 4 10 4
You can see that there are two In order to calculate the median
observations in the middle, namely 45 income, you may prepare the
and 46. The median can be obtained frequency distribution as given below.
by taking the mean of the two
observations: TABLE 5.4
Computation of Median for Discrete Series
45 + 46
Median = = 45.5 marks Income No of Cumulative
2 (in Rs) persons(f) frequency(cf)
In order to calculate median it is 10 2 2
important to know the position of the 20 4 6
median i.e. item/items at which the 30 10 16
40 4 20
median lies. The position of the
median can be calculated by the The median is located in the (N+1)/
following formula: 2 = (20+1)/2 = 10.5th observation.
This can be easily located through
(N+1) cumulative frequency. The 10.5th
Position of median = item
2 observation lies in the c.f. of 16. The
Where N = number of items. income corresponding to this is Rs 30,
You may note that the above so the median income is Rs 30.
formula gives you the position of the
median in an ordered array, not the Continuous Series
median itself. Median is computed by In case of continuous series you have
the formula: to locate the median class where

N/2th item [not (N+1)/2th item] lies. In the above illustration median
The median can then be obtained as class is the value of (N/2)th item
follows: (i.e.160/2) = 80th item of the series,
(N/2 c.f.) which lies in 3540 class interval.
Median = L + h
Applying the formula of the median
Where, L = lower limit of the median as:
c.f. = cumulative frequency of the class Computation of Median for Continuous
preceding the median class, Series
f = frequency of the median class,
Daily wages No. of Cumulative
h = magnitude of the median class (in Rs) Workers (f) Frequency
2025 14 14
No adjustment is required if 2530 28 42
frequency is of unequal size or 3035 33 75
magnitude. 3540 30 105
4045 20 125
Example 8 4550 15 140
5055 13 153
Following data relates to daily wages 5560 7 160
of persons working in a factory.
Compute the median daily wage. (N/2 c.f.)
Median = L + h
Daily wages (in Rs): f
5560 5055 4550 4045 3540 3035 35 +(80 75)
2530 2025 = (40 35)
Number of workers: 30
7 13 15 20 30 33 = Rs 35.83
28 14
Thus, the median daily wage is
The data is arranged in ascending
order here. Rs 35.83. This means that 50% of the

workers are getting less than or equal The third Quartile (denoted by Q3) or
to Rs 35.83 and 50% of the workers upper Quartile has 75% of the items
are getting more than or equal to this of the distribution below it and 25%
wage. of the items above it. Thus, Q1 and Q3
You should remember that denote the two limits within which
median, as a measure of central central 50% of the data lies.
tendency, is not sensitive to all the
values in the series. It concentrates
on the values of the central items of
the data.

Find mean and median for all
four values of the series. What
do you observe?

Mean and Median of different series Percentiles divide the distribution into
Series X (Variable Mean Median hundred equal parts, so you can get
Values) 99 dividing positions denoted by P1,
A 1, 2, 3 ? ? P2, P3, ..., P99. P50 is the median value.
B 1, 2, 30 ? ?
C 1, 2, 300 ? ?
If you have secured 82 percentile in a
D 1, 2, 3000 ? ? management entrance examination, it
means that your position is below 18
Is median affected by extreme
values? What are outliers? percent of total candidates appeared
Is median a better method than in the examination. If a total of one
mean? lakh students appeared, where do you
Calculation of Quartiles
Quartiles are the measures which
divide the data into four equal parts, The method for locating the Quartile
each portion contains equal number is same as that of the median in case
of observations. Thus, there are three of individual and discrete series. The
quartiles. The first Quartile (denoted value of Q1 and Q3 of an ordered series
by Q1) or lower quartile has 25% of can be obtained by the following
the items of the distribution below it
formula where N is the number of
and 75% of the items are greater than
it. The second Quartile (denoted by Q2)
or median has 50% of items below it (N + 1)th
and 50% of the observations above it. Q1= size of item

3(N +1)th Mode is the most frequently observed

Q3 = size of item. data value. It is denoted by Mo.
Computation of Mode
Example 9
Discrete Series
Calculate the value of lower quartile
from the data of the marks obtained Consider the data set 1, 2, 3, 4, 4, 5.
by ten students in an examination. The mode for this data is 4 because 4
22, 26, 14, 30, 18, 11, 35, 41, 12, 32. occurs most frequently (twice) in the
Arranging the data in an ascending data.
11, 12, 14, 18, 22, 26, 30, 32, 35, 41. Example 10
(N +1)th Look at the following discrete series:
Q1 = size of item = size of
4 Variable 10 20 30 40 50
Frequency 2 8 20 10 5
(10 +1)th
item = size of 2.75th item Here, as you can see the maximum
4 frequency is 20, the value of mode is
= 2nd item + .75 (3rd item 2nd item) 30. In this case, as there is a unique
= 12 + .75(14 12) = 13.5 marks. value of mode, the data is unimodal.
But, the mode is not necessarily
Activity unique, unlike arithmetic mean and
Find out 3Qyourself. median. You can have data with two
modes (bi-modal) or more than two
5. MODE modes (multi-modal). It may be
possible that there may be no mode if
Sometimes, you may be interested in
no value appears more frequent than
knowing the most typical value of a any other value in the distribution. For
series or the value around which example, in a series 1, 1, 2, 2, 3, 3, 4,
maximum concentration of items 4, there is no mode.
occurs. For example, a manufacturer
would like to know the size of shoes
that has maximum demand or style
of the shirt that is more frequently
demanded. Here, Mode is the most
Unimodal Data Bimodal Data
appropriate measure. The word mode
has been derived from the French Continuous Series
word la Mode which signifies the In case of continuous frequency
most fashionable values of a distribution, modal class is the class
distribution, because it is repeated the with largest frequency. Mode can be
highest number of times in the series. calculated by using the formula:

exclusive to calculate the mode. If mid

MO = L + h points are given, class intervals are
D1 + D2 to be obtained.
Where L = lower limit of the modal
class Example 11
D 1 = difference between the frequency Calculate the value of modal worker
of the modal class and the frequency familys monthly income from the
of the class preceding the modal class following data:
(ignoring signs). Income per month (in 000 Rs)
D2 = difference between the frequency Below 50 Below 45 Below 40 Below 35
of the modal class and the frequency Below 30 Below 25 Below 20 Below 15
Number of families
of the class succeeding the modal 97 95 90 80
class (ignoring signs). 60 30 12 4
h = class interval of the distribution. As you can see this is a case of
You may note that in case of cumulative frequency distribution. In
continuous series, class intervals order to calculate mode, you will have
should be equal and series should be to covert it into an exclusive series. In
Grouping Table
Income (in
000 Rs) Group Frequency
4550 97 95 = 2
4045 95 90 = 5 7 17
3540 90 80 = 10 15
3035 80 60 = 20 30 35
2530 60 30 = 30 50 60
2025 30 12 = 18 48 68
1520 12 4 = 8 26 56
1015 4 12 30

Analysis Table
Columns Class Intervals
4550 4045 3540 3035 2530 2025 1520 1015
Total 1 3 6 3 1

this example, the series is in the Take a small survey in your class
descending order. Grouping and to know the students preference
Analysis table would be made to for Chinese food using
determine the modal class. appropriate measure of central
The value of the mode lies in
Can mode be located
2530 class interval. By inspection graphically?
also, it can be seen that this is a modal
Now L = 25, D1 = (30 18) = 12, D2
= (30 20) = 10, h = 5
Using the formula, you can obtain Suppose we express,
the value of the mode as: Arithmetic Mean = Me
MO (in 000 Rs) Median = Mi
Mode = Mo
M= h so that e, i and o are the suffixes.
D1 + D2 The relative magnitude of the three are
12 M e>M i>M o or M e<M i<M o (suffixes
= 25 + 5 = Rs 27,273 occurring in alphabetical order). The
10+12 median is always between the
Thus the modal worker familys arithmetic mean and the mode.
monthly income is Rs 27,273.
Measures of central tendency or
A shoe company, making shoes averages are used to summarise the
for adults only, wants to know data. It specifies a single most
the most popular size of shoes.
representative value to describe the
Which average will be most
appropriate for it? data set. Arithmetic mean is the most
commonly used average. It is simple

to calculate and is based on all the graphically. In case of open-ended

observations. But it is unduly affected distribution they can also be easily
by the presence of extreme items. computed. Thus, it is important to
Median is a better summary for such select an appropriate average
data. Mode is generally used to depending upon the purpose of
describe the qualitative data. Median analysis and the nature of the
and mode can be easily computed distribution.

The measure of central tendency summarises the data with a single
value, which can represent the entire data.
Arithmetic mean is defined as the sum of the values of all observations
divided by the number of observations.
The sum of deviations of items from the arithmetic mean is always
equal to zero.
Sometimes, it is important to assign weights to various items
according to their importance.
Median is the central value of the distribution in the sense that the
number of values less than the median is equal to the number greater
than the median.
Quartiles divide the total set of values into four equal parts.
Mode is the value which occurs most frequently.


