Download as ppt, pdf, or txt
Download as ppt, pdf, or txt
You are on page 1of 84


Dr. Rehan Ahmad Khan



Every day a lot of information we get from radio, television, newspapers and
magazines contains facts and figures usually called “Statistics”. For
• Children who brush their teeth with brand X tooth paste have 40% fewer
• The Bureau of Census projects the population of Pakistan to be 250 million
in the year 2020.
• According to the Department of Health the life expectancy of a new born
male was 45 years in 1950; today it is 57 years.
• Eight out of ten Pakistanis do not have wills.
• The prevalence of diabetes is nearly 3 times as high in over weight people
as it is in non over weight people.
• More than 10 insurance companies pay more than Rs.50 million in claims
every year.
• Sixty percent study only to pass tests, not to become. Fifty percent admit
that they cheat.
The above examples show that statistical information is and can be used for a
variety of reasons. For example we may use them to:

• Inform the general public

• Explain the things that have happen
• Influence decisions that will take place
• Justify a claim
• Provide general comparisons
• Predict future outcome
• Estimate unknown quantities
• Establish a relationship or association between two factors.

Hence Statistics is more than just members – it is what is done to or with



In the modern world of computers and information technology, the

importance of statistics is very well recogonised by all the
disciplines. Statistics has originated as a science of statehood
and found applications slowly and steadily in Agriculture,
Computer Science, Economics, Commerce, Biology, Chemistry,
Medicine, Industry, planning, education and so on. As on date
there is no other human walk of life, where statistics cannot be

Origin and Growth of Statistics

The word‘ Statistics’ and ‘ Statistical’ are all derived from the
Latin word Status, means a political state. The theory of
statistics as a distinct branch of scientific method is of
comparatively recent growth. Research particularly into the
mathematical theory of statistics is rapidly proceeding and fresh
discoveries are being made all over the world.

Meaning of Statistics

Statistics is concerned with scientific methods for collecting,

organising, summarising, presenting and analysing data as well
as deriving valid conclusions and making reasonable decisions
on the basis of this analysis. Statistics is concerned with the
systematic collection of numerical data and its interpretation.

The word ‘ statistic’ is used to refer to

1. Numerical facts, such as the number of people living in particular


2. The study of ways of collecting, analysing and interpreting the


Statistics is defined differently by different authors over a period of

time. In the olden days statistics was confined to only state
affairs but in modern days it embraces almost every sphere of
human activity. Therefore a number of old definitions, which was
confined to narrow field of enquiry were replaced by more
definitions, which are much more comprehensive and

Secondly, statistics has been defined in two different ways –

Statistical data and statistical methods.

The following are some of the definitions of statistics as numerical

1. Statistics are the classified facts representing the conditions of

people in a state. In particular they are the facts, which can be
stated in numbers or in tables of numbers or in any tabular or
classified arrangement.

2. Statistics are measurements, enumerations or estimates of

natural phenomenon usually systematically arranged, analysed
and presented as to exhibit important interrelationships among

Definition by Croxton and Cowden

Statistics may be defined as the science of collection, presentation

analysis and interpretation of numerical data from the logical
analysis. It is clear that the definition of statistics by Croxton and
Cowden is the most scientific and realistic one.

According to this definition there are four stages:

1. Collection of Data

2. Presentation of data

3. Analysis of data

4. Interpretation of data

Collection of Data: It is the first step and this is the foundation
upon which the entire data set. Careful planning is essential
before collecting the data. There are different methods of
collection of data such as census, sampling, primary,
secondary, etc., and the investigator should make use of correct

Presentation of Data: The mass data collected should be

presented in a suitable, concise form for further analysis. The
collected data may be presented in the form of tabular or
diagrammatic or graphic form.

Analysis of Data: The data presented should be carefully
analysed for making inference from the presented data such as
measures of central tendencies, dispersion, correlation,
regression etc.,

Interpretation of Data: The final step is drawing conclusion from

the data collected. A valid conclusion must be drawn on the
basis of analysis. A high degree of skill and experience is
necessary for the interpretation.

Definition by Horace Secrist

Statistics may be defined as the aggregate of facts affected to a

marked extent by multiplicity of causes, numerically expressed,
enumerated or estimated according to a reasonable standard of
accuracy, collected in a systematic manner, for a predetermined
purpose and placed in relation to each other.

The above definition seems to be the most comprehensive

and exhaustive.

Functions of Statistics

There are many functions of statistics. Let us consider the following

five important functions.


Generally speaking by the word ‘ to condense’ , we mean to reduce

or to lessen. Condensation is mainly applied at embracing the
understanding of a huge mass of data by providing only few
observations. If in a particular class in Lahore School, only marks
in an examination are given, no purpose will be served. Instead if
we are given the average mark in that particular examination,
definitely it serves the better purpose. Similarly the range of marks
is also another measure of the data. Thus, Statistical measures
help to reduce the complexity of the data and consequently to
understand any huge mass of data. 14

Classification and tabulation are the two methods that are used to
condense the data. They help us to compare data collected from
different sources. Grand totals, measures of central tendency
measures of dispersion, graphs and diagrams, coefficient of
correlation etc provide ample scope for comparison.

If we have one group of data, we can compare within itself. If the rice
production (in Tonnes) in Sheikhupura district is known, then we
can compare one region with another region within the district. Or if
the rice production (in Tonnes) of two different districts within
Hafizabad is known, then also a comparative study can be made.
As statistics is an aggregate of facts and figures, comparison is
always possible and in fact comparison helps us to understand the
data in a better way. 15

By the word forecasting, we mean to predict or to estimate before

hand. Given the data of the last ten years connected to rainfall
of a particular district in Punjab, it is possible to predict or
forecast the rainfall for the near future. In business also
forecasting plays a dominant role in connection with production,
sales, profits etc. The analysis of time series and regression
analysis plays an important role in forecasting.


One of the main objectives of statistics is drawn inference about a

population from the analysis for the sample drawn from that
population. The four major branches of statistical inference are

1. Estimation theory

2. Tests of Hypothesis

3. Non Parametric tests

4. Sequential analysis

In estimation theory, we estimate the unknown value of the population

parameter based on the sample observations. Suppose we are
given a sample of heights of hundred students in a school, based
upon the heights of these 100 students, it is possible to estimate
the average height of all students in that school.
Tests of Hypothesis:

A statistical hypothesis is some statement about the probability

distribution, characterising a population on the basis of the
information available from the sample observations. In the
formulation and testing of hypothesis, statistical methods are
extremely useful. Whether crop yield has increased because of
the use of new fertilizer or whether the new medicine is effective
in eliminating a particular disease are some examples of
statements of hypothesis and these are tested by proper
statistical tools.

Scope of Statistics

Statistics is not a mere device for collecting numerical data, but as

a means of developing sound techniques for their handling,
analysing and drawing valid inferences from them. Statistics is
applied in every sphere of human activity –social as well as
physical –like Biology, Commerce, Education, Planning,
Business Management, Information Technology, etc. It is almost
impossible to find a single department of human activity where
statistics cannot be applied. We now discuss briefly the
applications of statistics in other disciplines.

Limitations of statistics:

Statistics with all its wide application in every sphere of human

activity has its own limitations. Some of them are given below:

 Statistics is not suitable to the study of qualitative


 Statistics does not study individuals

 Statistical laws are not exact

 Statistics table may be misused

 Statistics is only, one of the methods of studying a problem

Some More Definitions

Descriptive & Inferential Statistics

Descriptive Statistics is what most people think when they hear the word
statistics. It consists of those methods which are used for collection,
presentation and description of data. These methods are used to analyze and
to display the information in graphical form for meaningful interpretation e.g.
Average yield of wheat per area of a particular agricultural land, the number
of people in various income categories, the average runs scored by a
particular cricket player during a season. Percentage of registered voters
favoring a particular candidate, percentage of students passed an
examination etc.

Inferential Statistics refers to the techniques of interpreting the values

resulting from the descriptive techniques and using them to make decisions. It
involves the theory of probability and consists of those methods for making
generalization, predictions or estimates about the population based on limited
information e.g. we might predict the wheat yield for the coming year on the
basis of growing trends over previous years.

Descriptive Inferential
I) A Tennis player wants to find his I) A Tennis player wants to estimate his
average score for the past 20 chance of winning an upcoming
games. tournament based on his current season
average and the average of the competing
Tennis players.
II)A politician wants to know the exact II)Based on an opinion poll, a politician would
percentage of votes cast for him in like to estimate his chance for re-election in
the last general election. the upcoming election.
III) Aamir wants to describe the III) Based on the first four test scores, Aamir
variation in his four test scores in would like to predict the variation in his final
statistics. statistics test scores.
IV) Mrs. Rashid wants to determine the IV) Based on last year’s grocery bills, Mrs.
average weekly amount she spent Rashid would like to predict the average
on groceries in the past 3 months amount she will spend on groceries for the
upcoming year.
V) We would like to describe of the V) Based on the average income for last 5
changes in our average income over years, we would like to predict the variation
the last 5 years. in the average income next year.


In a statistical enquiry, all the items, which fall within the purview of
enquiry, are known as Population or Universe. In other words, the
population is a complete set of all possible observations of the type
which is to be investigated. Total number of students studying in a
school or college, total number of books in a library, total number of
houses in a village or town are some examples of population.

Finite population and infinite population:

A population is said to be finite if it consists of finite number of units.

Number of workers in a factory, production of articles in a particular
day for a company are examples of finite population. The total number
of units in a population is called population size.

A population is said to be infinite if it has infinite number of
units. For example the number of stars in the sky, the
number of people seeing the Television programmes etc.,


Statisticians use the word sample to describe a portion

chosen from the population. A finite subset of statistical
individuals defined in a population is called a sample. The
number of units in a sample is called the sample size.

Parameters and statistics:

We can describe samples and populations by using measures such as

the mean, median, mode and standard deviation. When these terms
describe the characteristics of a population, they are called parameters.
When they describe the characteristics of a sample, they are called
statistics. A parameter is a characteristic of a population and a statistic
is a characteristic of a sample. Since samples are subsets of population
statistics provide estimates of the parameters. That is, when the
parameters are unknown, they are estimated from the values of the

In general, we use Greek or capital letters for population parameters

and lower case Roman letters to denote sample statistics. [N, µ, δ, are
the standard symbols for the size, mean, S.D, of population. n , x , s, are
the standard symbol for the size, mean, s.d of sample respectively]. 26
 Variables and Constants

 Quantitative Variable

 Qualitative Variable

 Continuous Variable

 Discrete Variable

Scales of Measurement

1. Nominal Scale

2. Ordinal Scale

3. Interval Scale

4. Ratio Scale

Nature of data

It may be noted that different types of data can be collected for different
purposes. The data can be collected in connection with time or
geographical location or in connection with time and location. The
following are the three types of data:

1. Time series data.

2. Spatial data

3. Spacio-temporal data.

Categories of data

Any statistical data can be classified under two categories

depending upon the sources utilized.

These categories are,

1. Primary data 2. Secondary data 29

Primary Data

Primary data is the one, which is collected by the

investigator himself for the purpose of a specific inquiry or study.

Such data is original in character and is generated by survey

conducted by individuals or research institution or any


The primary data can be collected by the following five methods.

1. Direct personal interviews.

2. Indirect Oral interviews.

3. Information from correspondents.

4. Mailed questionnaire method.

5. Schedules sent through enumerators

Merits and Demerits of Primary Data
1.The collection of data by the method of personal survey is possible only if
the area covered by the investigator is small. Collection of data by sending the
enumerator is bound to be expensive. Care should be taken twice that the
enumerator record correct information provided by the informants.

2.Collection of primary data by framing a schedules or distributing and

collecting questionnaires by post is less expensive and can be completed in
shorter time.

3.Suppose the questions are embarrassing or of complicated nature or the

questions probe into personnel affairs of individuals, then the schedules may
not be filled with accurate and correct information and hence this method is

4.The information collected for primary data is mere reliable than those
collected from the secondary data.
Secondary Data

Secondary data are those data which have been already collected and
analysed by some earlier agency for its own use; and later the same
data are used by a different agency. According to W.A. Neiswanger, ‘ A
primary source is a publication in which the data are published by the
same authority which gathered and analysed them. A secondary source
is a publication, reporting the data which have been gathered by other
authorities and for which others are responsible’ .

Sources of Secondary Data

1. Published sources

2. Unpublished sources.

Published Sources

 Reports and official publications

 Semi-official publication of various local bodies such as Municipal

Corporations and District Boards

 Private publications

Unpublished Sources

All statistical material is not always published. There are various

sources of unpublished data such as records maintained by various
Government and private offices, studies made by research institutions,
scholars, etc. Such sources can also be used where necessary

Precautions in the use of Secondary Data

The following are some of the points that are to be considered in the use
of secondary data:

1. How the data has been collected and processed

2. The accuracy of the data

3. How far the data has been summarized

4. How comparable the data is with other tabulations

5. How to interpret the data, especially when figures collected for one
purpose is used for another.

Generally speaking, with secondary data, people have to compromise

between what they want and what they are able to find.

Merits and Demerits of Secondary Data
1.Secondary data is cheap to obtain. Many government publications are
relatively cheap and libraries stock quantities of secondary data produced by the
government, by companies and other organisations.

2.Large quantities of secondary data can be got through internet.

3.Much of the secondary data available has been collected for many years and
therefore it can be used to plot trends.

Secondary data is of value to:

- The government –help in making decisions and planning future policy.

- Business and industry –in areas such as marketing, and sales in order to
appreciate the general economic and social conditions and to provide
information on competitors.

- Research organisations –by providing social, economical and industrial


The collected data, also known as raw data or ungrouped data are
always in an un organised form and need to be organised and
presented in meaningful and readily comprehensible form in order to
facilitate further statistical analysis. It is, therefore, essential for an
investigator to condense a mass of data into more and more
comprehensible and assimilable form. The process of grouping into
different classes or sub classes according to some characteristics is
known as classification, tabulation is concerned with the systematic
arrangement and presentation of classified data. Thus classification is
the first step in tabulation.

Types of classification

Statistical data are classified in respect of their characteristics. Broadly

there are four basic types of classification namely:

a) Chronological classification

b) Geographical classification

c) Qualitative classification

d) Quantitative classification


Tabulation is the process of summarizing classified or grouped data in

the form of a table so that it is easily understood and an investigator is
quickly able to locate the desired information. A table is a systematic
arrangement of classified data in columns and rows.

Statistical Data
Data is the plural of datum – a piece of information. The value of the
response variable associated with one element of a population or sample is
known as datum (or data in a singular sense), for example, Asif enrolled in
college at the age of 18, his hair is black, he is 5 feet 7 inch tall, and he
weights 140 pounds. And the set of values collected for the response
variable from each of the elements belonging to the sample is called data (or
data in a plural sense), for example, the set of 25 weights collected from the
25 students.

Frequency Distribution
A frequency distribution is a method of classifying data into classes or
intervals in such a way that the number of each class can be determined.
The number in a class is called the class frequency and is denoted by ‘f’.
This method provide a way of reviewing a set of numbers without actually
have to consider the individual numbers and it can be very usefully when
dealing with large amounts of data.
The procedure of constructing a frequency distribution for a given set of data
depends on the type o data involved i.e. continuous, discrete or qualitative.38
Construction of a Frequency Distribution
There are no hard and fast rules to construct a frequency distribution; however
some basic guidelines must be observed.
i) Appropriate number of classes in a frequency distribution
The number of classes denoted by C, depends on the situation and the
amount of data. There is no hard and fast rules regarding the number of
classes to use and the choice is arbitrary. It is generally accepted that the
number of classes should be between 5 and 20, depending on the amount of
data. A useful suggestion regarding the number of classes is given by
Sturge’s rule. The rule is:
C = 3.3 log (n) + 1
where, C denotes the number of classes and n is the number of observations.
For example, if there are 25 observations in a data set, then
C = 3.3 log (25) + 1 = 3.3 (1.3979) + 1 = 6
ii) Find the lowest value and the highest value in the data.
iii) Find the range: Range is obtained by subtracting the lowest value
from the highest value. R= XL - XS
iv) Divide the range by the number of classes to find the class width or
class interval h. In case of fractional results, the next higher whole
number if usually taken as the class interval. 39
Construction of a Frequency Distribution
v) Determine the value at which the lowest interval should begin. It
should be ordinarily be a multiple of the class interval.

vi) Determine the remaining class-limits and class boundaries by adding

the class interval repeatedly. The lowest class should be placed at the
top and the rest should follow according to size. Sometimes, the
highest class is placed at the top.

vii) Using the tally system, enter the raw data in the appropriate class
intervals. It is customary for convenience in counting to place the first
four bars or strokes vertically and fifth one diagonally so as to have a
set of five. Sometimes for a smaller data set, the actual values can be
written against each class instead of tally bars.

viii) Convert each tally to a frequency (f).

ix) Finally, total the frequency column to see that all the data have been
accounted for. 40
The following data give the index numbers of 100 commodities in a certain
year. Make a frequency distribution.

91 120 138 96 99 113 97 94 119 111

118 83 91 86 71 119 123 87 151 117
87 116 134 90 61 141 104 115 125 79
119 124 112 145 96 114 114 106 113 89
110 111 75 106 153 63 107 96 100 96
81 101 104 108 147 133 100 109 104 110
143 77 109 138 113 86 121 86 136 117
99 95 90 100 104 79 68 88 116 101
144 127 101 128 102 105 106 122 76 78
73 147 127 129 140 120 129 77 108 109

Step 1: We first find the range R. As the Maximum value is 153 and the
Minimum value is 61, the range is
R = XL – XS = 153 – 61 = 92
Step 2: We next decide the number of classes. Suppose we decide to take
C=10 classes. Then the class interval is

R 92
h   9.2  10
C 10
Typically, the value of R/C is rounded up to the next value determined
by the precision of measurement to produce a convenient value.
Step 3: Next we decide to locate the lower limit class at 60. With this choice,
the class limits will be 60-69, 70-79, 80-89, ….
Step 4: To determine the frequency of each class we use either a entry table
(for small data set) or a tally column. If a piece of data falls in a class,
we record a tally mark (l) in the tally column corresponding to that
The frequency distribution is then constructed as follows:
class Mid-
(Index Tally frequency
Boundaries point
60-69 III 3 59.5-69.5 64.5
70-79 IIII 9 69.5-79.5 74.5
80-89 IIII 9 79.5-89.5 84.5
90-99 III 13 89.5-99.5 94.5
100-109 I 21 99.5-109.5 104.5
110-119 IIII 19 109.5-119.5 114.5
120-129 II 12 119.5-129.5 124.5
130-139 5 129.5-139.5 134.5
140-149 II 7 139.5-149.5 144.5
150-159 II 2 149.5-159.5 154.5

The following data set represents the amounts of cash (in rupees) spent in a
particular day by 25 FAST students. Construct a grouped frequency table.
39.78 28.30 28.31 17.95 44.47
46l.65 31.47 33.45 29.17 48.39
82.71 43.63 41.17 47.32 52.16
25.94 50.32 35.25 35.70 17.89
60.20 48.14 22.78 38.22 23.25
Class Tally f class boundaries X
17.85-30.84 III 8 17.845 – 30.845 24.345
30.85-43.84 III 8 30.845-43.845 37.345
43.85-56.84 II 7 43.845-56.845 50.345
56.85-69.84 I 1 56.845-69.845 63.345
69.85-82.84 I 1 69.845-82.845 76.345
Relative Frequency: It is sometimes useful to express each value or class in
a frequency table as a fraction or a percentage of the total number of
measurements. The relative frequency for a measurement or class is found by
dividing the frequency, f, of the measurement by the total number of
measurements, n.

Cumulative Frequency: A cumulative frequency is the sum of the

frequencies for several consecutive classes of a frequency distribution.
Class Relative Cumulative
Interval Frequency Midpoint Frequency Frequency
6.30–under 6.50 1 6.40 .025 1
6.50–under 6.70 2 6.60 .050 3
6.70–under 6.90 7 6.80 .175 10
6.90–under 7.10 10 7.00 .250 20
7.10–under 7.30 13 7.20 .325 33
7.30–under 7.50 6 7.40 .150 39
7.50–under 7.70 1 7.60 .025 40
Total 40 1.00 45
The following data represents the IQ-Score of 60 students. Make a frequency
distribution of IQ scores.

145 139 126 122 125 130 96 110

118 118 101 142 134 124 112 109
134 113 81 113 123 94 100 136
109 131 117 110 127 124 106 124
115 133 116 102 127 117 109 137
117 90 103 114 139 101 122 105
97 89 102 108 110 128 114 112
114 102 82 101 46

Diagram or Graph:
• A diagram or graph is a pictorial means for portraying and summarizing
data. No doubt tabulation is a good method of condensing and
summarizing data but many people has no taste for numbers. They may
prefer a way of representation where figures could be avoided. More over a
pictorial presentation of the data often makes certain features of the data
more apparent them a tabular presentation.

• In the media it is common to represent the data graphically and with the
use of computer graphics it is now further enhanced.

• Diagram refers to various types of devices such as bars, circles, pictorials

etc. Diagrammatic representation is suited to spatial series. The following
are the advantages and limitations of diagrammatic presentations.
Advantages Limitations
1. Beautifully and neatly constructed 1. Diagrams show only approximate
diagrams are more attractive, values.
impressive and appealing than simple
2. Diagrams are understood by almost 2. It is difficult to read multi dimensional
everybody (even layman). diagrams.
3. Diagrams have long lasting impression 3. The construction of diagram is difficult
on the mind of reader. compared to drawing table.
4. Comparison is made easier with 4. It is not possible to analyse the
diagrams. diagram further.
5. Diagram presentations are universally 5. Diagrams can supplement the tabular
used in all the fields to represent the presentation but not an alternative to it.
statistical data.
6. One can draw meaningful inferences 6. It is not possible to have minute
from the diagrams in a short period of readings from the diagram.
time and with little labour.
7. Diagram presentations provide more 7. A wide gap between two figures are
information than data in a table. difficult to put on the diagram

Rules for Constructing Diagrams:
• A proper scale must be chosen for the diagram. It must suit the space

• Every diagram must have a suitable heading showing the main facts of the
diagram. Diagram title should be self-illustrated.

• Diagram should be drawn neatly and accurately with the help of drawing

• Appropriate diagram according to the demand of the data may be drawn.

The inappropriate diagram may distort the facts and may be misleading.

• When more than one item is drawn in a diagram, an index key must be
given for identifying and understanding the diagram.
• The source of the data presented should be individual at the bottom of the
• Never try to over crowed the diagram. Too much information presented in a
diagram may be confusing.
Types of Diagrams:

Different types of diagrams generally used for representing statistical data are.

• One Dimensional Diagrams – Simple Bar Charts, Multiple Bar Charts, Bar
Charts and Percentage Component

• Two Dimensional Diagrams – Rectangles Squares

• Pie Diagrams – Circles and Sectors

One Dimentional Diagrams

In one dimentional diagrams the quantities are represented only

by one dimension i.e. by the length of the bars and the width of
the bars is not taken into consideration.

Simple Bar Chart

This is one of the simplest form of presentation of data. It can be drawn either
horizontally or vertically. It is used to represent those data where each item
consists of single component and variations among the items is small. One
bar is drawn for each item. Generally the vertical scale represented the
frequencies / quantities in each category. The length (or the height) of each
bar indicates the frequency / size of the item / category it represents. The
width of the bar is not important however it must be the same for each item /
category. The gap between the bars should be equal-spaced. The bars can be
shaded or coloured if desired.

A sample of 50 college students was taken who were planning to go to Punjab
University. Each of the students was asked which of the following masters
program be or she intended to choose: Statistics, Economics, Business,
Information Technology (IT), Arts and other. The responses of these students
are presented in table below. Construct a simple bar chart for this data.

Masters Program f 20

Statistics 6

Economics 10 10

Business 12 5

IT 15 0
Stat Eco Business IT Arts Others
Arts 3
Masters Programs
Other 4

Draw a simple bar diagram to represent the Sales of a Company for 5 years.

Year 1997 1998 1999 2000 2001

Sales (Rupees) 75000 80000 90000 92000 95000

Sales (000 Rupees)

1997 1998 1999 2000 2001
Multiple Bar Chart
When two or more sets of data with common characteristic are to be
represented in the same diagram, Multiple bar Diagram is drawn.
The following frequency table gives the sales of paper (1000 tons) in
Lahore for the last three years. Draw a multiple bar diagram to
represent the data.
Categories 2000 2001 2002
Newspaper 50 75 100
Books Printing 60 65 75
Wrapping 20 15 25
Special Variations 10 18 15
Others 40 45 40

80 Series1
60 Series2
40 Series3
Newspaper Books Wrapping Special Others
Printing Variations
Component or Sectional Bar Chart
In component bar chart, a bar is drawn to represent the total frequency and
then divide the bar into components or sections whose lengths are
proportional to the frequencies of the categories they represent. This diagram
can also be drawn in the percentage form where one bar represent 100%,
then it is known as percentage component bar chart.
Example: Draw the component bar chart for the following data.
Cities Total Males Females
Lahore 7 3.7 3.3
Karachi 10 5.5 4.5
Rawalpindi 4 2.2 1.8
Peshawar 4.5 2.5 2.0
Quetta 2.0 1.1 0.9

Lahore Karachi Rawalpindi Peshawar Quetta 56
Pie Chart or Circular Diagrams
Pie chart is generally used for categorical or nominal data. Pie chart are used
to display parts of a total. The pie or a circle is divided into sectors or pieces,
whose area are proportional to the frequencies of the categories they
represent. The sectors are shaded or coloured differently to show the
relationship of pats to be whole. To construct the Pie chart we must make the
angles of sector proportional to the frequencies. As a circle consists of 360o.
The proportion that each category have is computed by the formula:

Component Part
Angle   360

Then the circle is divided into different sectors by constructing angles at the
center by measure of a protractor.

The following table represent the recipients of chartable growing. Draw a pie
chart to portray the results. 57
The following table represent the recipients of chartable growing. Draw a pie
chart to portray the results.
(in million of rupees)
Religion 31.0
Arts and Humanities 4.1
Social Services 6.9
Education 9.0
Health 9.2
Other 4.7

Table below gives the calculations necessary for constructing a Pie chart.

Recipient Amount Percentage Degrees

Religion 31.0 47.8 172.1 (47.8% of 36.o)
Health 9.2 14.2 51.1
Education 9.0  9 .0  50.0
13.9  100 
 64.9 
Social Services 6.9 10.6 38.2
Arts and Humanities 4.1 6.3 22.7
Other 4.7 7.2 25.9
Total 64.9 100.0 360
Stem and Leaf Plot
A stem and leaf plot is a method used to organize statistical data. The
greatest common place value of the data is used to form the stem. The next
greatest common place value is used to form the leaves.

EXAMPLE: Make a stem and leaf plot of the algebra test scores given below.

EXAMPLE: Make a stem and leaf plot of the entry test scores given below.

• What was the lowest score on the history test? 65

• What was the highest score on the history test? 95
• In which interval did most students score? 80 to 89

A histogram is a bar chart or graph showing the frequency of occurrence of
each value of the variable being analysed. In histogram, data are plotted as a
series of rectangles. Class boundaries are shown on the ‘ X-axis’ and the
frequencies on the ‘Y-axis’. The height of each rectangle represents the
frequency of the class interval. Each rectangle is formed with the other so as
to give a continuous picture.

Example: Draw a histogram for the following data.

Example: Draw a histogram for the following data.

Frequency Polygon
If we mark the midpoints of the top horizontal sides of the rectangles in a
histogram and join them by a straight line, the figure so formed is called a
Frequency Polygon. This is done under the assumption that the frequencies
in a class interval are evenly distributed throughout the class. The area of the
polygon is equal to the area of the histogram, because the area left outside is
just equal to the area included in it.
Example: Draw a frequency polygon for the following data.


Measures of Central Tendency
The first characteristics of a set of data we want to measure is the center or
central tendency. It tends to locate in some sense the middle of a set of data.
Its purpose is summarize the data set to obtain a general overview, which will
serve as a representative of the data. The term average is generally
associated with the measures of central tendency. Since this central value is
useful in locating a frequency distribution, the measures of central tendency
are also known as the Measures of Location.

Desirable Qualities of a Good Average

• An average should be properly defined.
• An average should be based on all the data values.
• An average should be easy to understand.
• An average should be easy to calculate and interpret.
• An average should be least affected by fluctuations of sampling.
• An average should not unduly influenced by extreme values.
• An average should be capable of further mathematical operations. 67
Types of Averages
The most common types of averages are:
• The Arithmetic Mean or Simply Mean
• The Geometric Mean
• The Harmonic Mean
• The Median
• The Mode
• Trimmed Mean

The Arithmetic Mean

The arithmetic mean is what most people think of when the word average is
used. The arithmetic mean or simply mean is defined as “the value which is
obtained by dividing the sum of all the values in a set by their number”. Mean
can be calculated for both samples and populations. They are computed in the
same way, but denoted differently. The mean of a sample is represented by
the symbol X (read “X bar”), and in symbols can be expressed as

X 1  X 2  X 3  ...  X n  Xi X
X  i 1

n n n
• The annual amounts (in billions of rupees) of Pakistan industrial exports
from 1985 to 1994 are 19.9, 21.9, 25.0, 23.6, 30.4, 33.7, 41.2, 45.3, 38.1,
and 39.3. Determine the mean X for this sample.
The sum of the ten measurements is
X= 19.9 + 21.9 + … + 39.3 = 318.4
The sample mean is
X  31.84 billions
Thus, the average amount of industrial exports over the 10 year period is
31.84 billion rupees.
Example: The reaction times of an individual to certain stimuli were measured
by a psychologist to be 0.53, 0.46, 0.50, 0.49, 0.52, 0.53, 0.44 and 0.55
seconds respectively. Determine the mean reaction time of the individual to
the stimuli.
The sum of the eight measurements is : x=0.53+0. 46+ … + 0.55 = 4.02
the sample mean is X   0.5025seconds
• Hence, the mean reaction time of the individual to the stimuli is 0.5025
seconds. 69
Properties of Arithmetic Mean:
• The arithmetic mean can always be computed for a data set.
• A set of data has only one arithmetic mean. Therefore it is known as
unique value.
• The sum of deviations of the numbers in a set from the mean is zero


1. It is rigidly defined.
2. It is easy to understand and easy to calculate.
3. If the number of items is sufficiently large, it is more accurate and more
4. It is a calculated value and is not based on its position in the series.
5. It is possible to calculate even if some of the details of the data are lacking.
6. Of all averages, it is affected least by fluctuations of sampling.
7. It provides a good basis for comparison.

1. It cannot be obtained by inspection nor located through a frequency graph.

2. It cannot be in the study of qualitative phenomena not capable of numerical

measurement i.e. Intelligence, beauty, honesty etc.,

3. It can ignore any single item only at the risk of losing its accuracy.

4. It is affected very much by extreme values.

5. It cannot be calculated for open-end classes.

6. It may lead to fallacious conclusions, if the details of the data from which it
is computed are not given.

The median is that value of the variate which divides the group into two equal
parts, one part comprising all values greater, and the other, all values less
than median.
• First arranged data into ascending or descending order, then compute
median by using the following formulas:
 n 1 

Median  X    observation 
 when n is odd
 2 
1  n  
th th
 n 
Median  X    observation    1 observation  
 when n is even
2  2  2  

Example: A sample of 9 students was given a statistics test. Find the median
for these test scores:
95 86 78 90 62 73 89 76 69
Solution: We must first arrange the scores in ascending order:
Array: 62 69 73 76 78 86 89 90 95
Here n = 9 (Odd) so,
 9 1 

Median  X     5th
observation = 78
 2  72
Example: The following IQ scores were observed for a sample of 10 school
children: 112, 109, 102, 93, 89, 111, 105, 95, 104, and 103. Compute the

Solution: We first arrange the scores in ascending order:

Array: 89 93 95 102 103 104 105 109 111 112

Here n = 10 (even), so

1  n  
th th
n 
Median  X    observation    1 observation 
2  2  2  
1  10  
th th
 10 
   observation    1 observation 
2  2   2  
1  th
  5  observation   6  observation 

2 
 103  104 103.5
Merits of Median
1. Median is not influenced by extreme values because it is a positional

2. Median can be calculated in case of distribution with open-end intervals.

3. Median can be located even if the data are incomplete.

4. Median can be located even for qualitative factors such as ability, honesty etc.

Demerits of Median
1. A slight change in the series may bring drastic change in median value.

2. In case of even number of items or continuous series, median is an

estimated value other than any value in the series.

3. It is not suitable for further mathematical treatment except its use in mean

4. It is not taken into account all the observations.


Measures of Dispersion
The measures of central tendencies i.e. mean, median, mode etc. condense
the series or frequency distribution into a single figure which is used to
describe a distribution. It is quite possible that several series may have the
same average, but their individual observations may highly differ from the
average. In such cases an average may not be the most typical or
representation. We therefore, require some more information regarding the
spread of the data about the average. This is done by measuring the
spread/dispersion and a quantity that measures this characteristic is called a
measure of dispersion, spread or variability.


The degree of scatterness or variation of the numerical data about center (i.e.
mean or median) is known as dispersion. Hence the dispersion measures to
which the individual values vary from the particular average value.
Types of Dispersion
There are two types of dispersions viz: Absolute dispersion and relative

1. Absolute Dispersion
If we measure the dispersion and express it in terms of the original data, it is
known as absolute dispersion, e.g. the average salaries of 6 individuals is
Rs.7000 and average dispersion of salaries from average is Rs.1290. Then
Rs.1290 is called absolute dispersion.

2. Relative Dispersion
If the objective to compare the dispersion for more than one series for the
comparison purposes, then absolute dispersion cannot be used. For example,
if we have heights and weights of students in centimeters and kilograms
respectively. Then we cannot compare the variability of these two series
through absolute measure of dispersion as they are in two different units.
Hence, two series has to express as a ratio or percentage of the average and
such a measure is known as relative measure of dispersion. In short relative
measures of dispersion are used for comparing more than one series whereas
absolute dispersion are calculated for simple series. 77
Measures of Absolute Dispersion
Following are the various measures of absolute dispersion:
• Range
• Semi-Inter quartile range or Quartile Deviation
• Mean Deviation
• Variance/Standard deviation
The range R is defined as the difference between the two extreme values, i.e.
the largest and the smallest of the distribution. In symbols:
Range = XL – XS
where XL stands for the largest value, and
XS stands for the smallest value.
Example: The following marks were obtained by 15 students. Find the range.
51, 90, 40, 25, 7, 14, 28, 72, 44, 23, 65, 85, 3, 59, 67
Here XL (the largest value) = 90, and
XS (the smallest value) = 3
Range = XL – XS = 90 – 3 = 87 marks

Advantages of Range

• It is a relatively simple to determine the range, even for a large set of


• It is easy to understand.

• It gives a rough and quick picture of variability of data.

Limitation of the Range

• It takes into consideration only the two extreme values in a set and does
not tell us anything at all about the other values in the set.

• It is highly unstable measure because it is based on only two extreme


• It cannot be computed in case of open and classes.

• In spite of these limitations it is widely used in daily temperatures, Quality

Control Charts, Share Market, Precious metal markets etc.
The Variance and the Standard Deviation
Variance is defined as: the mean of the squares of the deviation taken
from the arithmetic mean. When it is computed from the population,
the variance is denoted by , while computed from the sample denoted
by S2. In symbols

 2

(X i   )2
(for Population Data)

S 2

 (X i  X )2
(for Sample Data)
 x x 2 2

S 
  
n  n 

S2 
 x 2


The Variance and the Standard Deviation
Standard Deviation
• It is the most important measure of dispersion and is widely used. Standard
deviation is defined as “the positively square-root of the arithmetic mean of
the squares of the deviations of the observations from their mean” or
simply as the positively square –root of variance. When it is computed from
the population, the standard deviation is denoted by , while from the
sample is denoted by S. In symbols

 (X i   )2
(for Population Data)

(X i  X )2
(for Sample Data)

 x2 x

S    S x
n  n  n
Advantages and disadvantages of Standard Deviation
• It is simple to understand and easy to calculate
• It is rigorously defined and always give a definite value
• It is based on all the observations
• It is possible for further algebraic treatment
• It is less affected by the fluctuations of sampling and is a stable measure
• It is the basis for measuring the completion coefficient sampling and
statistical inferences.
• It is used to compare the variability of two or more distributions

• It gives more weight to extreme values, because the values are squared
• It is affected by the change in item in the series as is based on all the
• It is not popular measure with the economists where the most of the data is
positively skewed
Example: A random sample of 10 automobile parts companies gave the
following information about profit (in thousands of rupees):
24 15 9 7 11
19 20 5 29 15
Xi  X (X i  X )2 Direct Method
24 8.6 73.96 Step 1. Find the mean of the series ( X )
15 -0.4 0.16 Step 2. Calculate the deviations of each value
from the mean i.e. X i  X
9 -6.4 40.96
7 -8.4 70.56 Step 3. Square the deviations compute in Step
11 -4.4 19.36 (2) and add them to get 
(X i  X ) 2
Step 4. Divide this sum i.e.  (X i  X ) 2 by the
number of observations to obtain
5 -10.4 108.16 variance
29 13.6 184.96 Step 5. Take the positive square root of
15 -0.4 0.16 variance to obtain Standard Deviation
0 532.4

X i

 15.4 S 
2  i
( X  X ) 2

 53.24
n 10
n 10

 i
( X  X ) 2

 53.24  Rs.7.30
n 10 83
Example: The following table gives the weights of 9 students in a statistics
class. Calculate the Variance and Standard Deviation:
Weights (kg): 45, 52, 56, 67, 59, 70, 40, 58, 67.

 X
Xi X 2
X 2

  
45 2025 S
52 2704
n  n 
56 3136 2
67 4489 30188  514 
  
59 3481 9  9 
70 4900
40 1600  3354.2222  (57.11) 2
58 3364
67 4489  3354.2222  3261.5521
 X  514   30188
X 2
 92.6701

S  S  92.6701  9.63 kg.



You might also like