Introduction To Statistics

Introduction To statistics

In the modern world of computers and information technology, the importance of statistics is
very well recognized by all the disciplines. Statistics has originated as a science of statehood and
found applications slowly and steadily in Agriculture, Economics, Commerce, Biology,
Medicine, Industry, planning, education and so on.
As on date there is no any other human walk of life, where statistics cannot be applied
Origin and Growth of Statistics:
The word ‘Statistics’ and ‘Statistical’ are all derived from the Latin word Status, means a
political state. The theory of statistics as a distinct branch of scientific method is of
comparatively recent growth. Research particularly into the mathematical theory of statistics is
rapidly proceeding and fresh discoveries are being made all over the world.
Definitions of Statistics:
Statistics is defined differently by different authors over a period of time.
 Definitions by A.L. Bowley:
 Statistics may be called the science of counting in one of the departments due to Bowley,
obviously this is an incomplete definition as it takes into account only the aspect of
collection and ignores other aspects such as analysis, presentation and interpretation.
 Bowley gives another definition for statistics, which states ‘statistics may be rightly
called the scheme of averages’. This definition is also incomplete, as averages play an
important role in understanding and comparing data and statistics provide more measures.

 Definition by Croxton and Cowden:

 Statistics may be defined as the science of collection, presentation, analysis and
interpretation of numerical data from the logical analysis. It is clear that the
definition of statistics by Croxton and Cowden is the most scientific and realistic

Scope of Statistics:
 Statistics is applied in every sphere of human activity social as well as physical
like Biology, Commerce, Education, Planning, Business Management,
Information Technology, etc. It is almost impossible to find a single department
of human activity where statistics cannot be applied.
 Statistics and Industry:
In industries, control charts are widely used to maintain a certain quality level. In
production engineering, to find whether the product is conforming to specifications or
not, statistical tools, namely inspection plans, control charts, etc., are of extreme
importance. In inspection plans we have to resort to some kind of sampling a very
important aspect of Statistics.
 Statistics and Commerce:
Any businessman cannot afford to either by under stocking or having overstock of his goods.
In the beginning he estimates the demand for his goods and then takes steps to adjust with his
output or purchases. Thus statistics is indispensable in business and commerce.
 Statistics and Agriculture:
Analysis of variance (ANOVA) is one of the statistical tools developed by Professor R.A.
Fisher, plays a prominent role in agriculture experiments. In tests of significance based
on small samples, it can be shown that statistics is adequate to test the significant
difference between two sample means.
 Statistics and Economics:
Statistical data and technique of Statistical analysis have' proved immensely useful in solving a variety of
economic problems, such as wages, prices, analysis of time series and demand analysis. It has also
facilitated the development of economic theory. Wide applications of mathematics statistics in the study of
economics have led to the development of new disciplines called Economic Statistics and Econometrics.
 Statistics and Education:
Statistics is widely used in education. Research has become a common feature in all branches of
activities. Statistics is necessary for the formulation of policies to start new course, consideration
of facilities available for new courses etc.

 Statistics and Planning:

Statistics is indispensable in planning. In the modern world, which can be termed as the
“world of planning”, almost all the organizations in the government are seeking the help
of planning for efficient working, for the formulation of policy decisions and execution of
the same.
 Statistics and Medicine:
In Medical sciences, statistical tools are widely used. In order to test the efficiency of a
new drug or medicine, t - test is used or to compare the efficiency of two drugs or two
medicines, t -test for the two samples is used. More and more applications of statistics are
at present used in clinical investigation.
Definition and classification of statistics
 Statistics is a collection of numerical facts and data.
 Statistics is a mathematical science dealing with the methods of collection, organizing,
presentation, analysis and interpretation of the data.
 Statistics is a subject that deals with numbers and figures describing certain situations. It
primarily deals with numerical data taken by surveys and summarizes these data in such a
way that this summary gives a good indication about the nature of the data.
Statistics, in its singular sense is a subject area or field of study. It is defined as science, which
deals with the collection, processing, analysis, interpretation and presentation of numerical facts.
The subjects of statistics, as it seems, is not a new discipline but it is as old as the human society
itself. The sphere of its utility, however, was very much restricted.
Classification: Statistics is broadly divided into two categories based on how the collected data are used.
1. Descriptive Statistics
 Descriptive statistics is concerned with the collection and describing important features
of data.
 It consists of collection, organization, summarization and presentation of data.

2. Inferential Statistics
 deals with making inferences and/or conclusions about a population based on data obtained from a
limited sample of observations,
 Inferential statistics is used to make predictions or comparisons about larger group (a
population) using information gathered about a small part of that population.
 There are two main methods used in inferential statistics: estimation and hypothesis testing.
1.1 Definition of some basic terms
Population: A population is the set of all objects we wish to study.
A population may be finite or infinite.
If a population of values consists of a fixed number of these values, the population is said to be
finite, otherwise, it is infinite.
Sample: A sample is part of the population we study to learn about the population.
A sample should be a representative of the population.
Example1: In a certain study, 900 men were selected from Oromia Region. It was found that 25
are smokers.
(a) What is the population in this study?
(b) What is the sample size?
(a) The population is men from Oromia.
(b) The sample size is 900.
Example 2
A finite population includes the following:
(a) Students studying Business Administration at the Methodist University.
(b) All football clubs in the first and second divisions in Ghana.
(c) All households in Ethiopia.
Example 3
An infinite population includes the following:
(a) The set of real numbers between two integers.
(b) All fishes in River Volta.
(c) All palm trees in West Africa.
Sample survey: The technique of collecting information from a portion of the population.
Census survey: A survey that includes every member of the population.
Parameter: is a descriptive measure of a population, or summary value calculated from a
population. Examples: Average, Range, variance value of the population.
Statistic: is a descriptive measure of a sample, or summary value calculated from a sample.
Example: Average, Range, variance value of the sample.
1.2 Stages in Statistical Investigation
We have defined statistics, in singular sense, as a science that deals with collection, organization
(classification), presentation, analysis, and interpretation of numerical facts. So we consider the
following stages of statistical investigation:
Data Collection: This is a stage where we gather information for our purpose.
Data Organization: It is a stage where we edit our data. A large mass of figures that are
collected from surveys frequently need organization. The collected data involve irrelevant
figures, incorrect facts, omission and mistakes.
Data Presentation: The organized data can now be presented in the form of tables, charts,
diagrams and graphs. At this stage, large data are presented in a very summarized and condensed
Data Analysis: This is the stage where we critically study the data. The purpose of data analysis
is to dig out information useful for decision making.
Data Interpretation: This is the stage where draw valid conclusions from the results obtained
through data analysis. If the data that have been analyzed are not properly interpreted, the whole
purpose of the investigation may be defected and misleading conclusion may be drawn.
1.3 Application and limitation of statistics
Uses of statistics
The science of statistics is very essential for research and decision making processes in all
aspects of human life. The following are some of the areas for which statistical analysis is
 To represent the facts in the form of numerical data.
 To summarize a mass of data into a few presentable understandable and precise figures.
 To Predict or forecast future trend.
 To help select a course of action among a number of alternatives.
 To help in formulating policies.
Limitations of Statistics
a) It does not study qualitative characteristics directly Examples: Beauty, honesty, poverty,
and standard of living.
b) It does not study a single individual but deals with aggregate of facts. Example: The
population size of a country for some given year does not help us for comparative
c) Statistical laws are not exact: It is well known that mathematical and physical sciences
are exact. But statistical laws are not exact and statistical laws are only approximations.
Statistical conclusions are not universally true. They are true only on an average.
d) It is sensitive for misuse: Statistics must be used only by experts; otherwise, statistical
methods are the most dangerous tools on the hands of the inexpert. The use of statistical
tools by the inexperienced and untraced persons might lead to wrong conclusions.
Statistics can be easily misused by quoting wrong figures of data. As King says aptly
‘statistics are like clay of which one can make a God or Devil as one pleases’.
1.4 Types of variables and measurement scales
A variable is a characteristic of an object that can have different possible values.
There are two types of variables.
a) Quantitative variables: are variables that can be quantified or can have numerical
values. Examples: height, area, income, temperature e t c.
b) Qualitative variables: are variables that cannot be quantified directly. Examples: beauty,
Gender, location. qualitative variables are also called categorical variables. And hence we
have two types of data; quantitative & qualitative data.
 Quantitative variables can be further classified as
 Discrete variables, and
 Continuous variables
a) Discrete variables are variables whose values are counts.
Examples: number of students, number of households (family size), Number of pages of
a book.
b) Continuous variables are variables that can have any value within an interval.
Examples: weight, Length, Volume, e t c.
1.5 Measurement scales
There are four types of measurement scales for variables
1. Nominal scale: - “Nominal “is a Latin word for “name”
 This is a scale for grouping individuals into different categories.
 This scale of measure applies to qualitative variables only.
 On the nominal scale, no order is required.
 We cannot do arithmetic operations on data measured on the
nominal scale.
Example: Colour, Gender, Short, Tall, Pass, Fail etc.
2. Ordinal scale: - “ ordinal” is a Latin word, meaning “order”
 This scale also applies to qualitative data.
 It is a scale for grouping and ordering of individuals in to different categories.
 In this case one category is lower than the next one or vice versa.
 Ordinal scales data contain and convey more information than the nominal scale data
 We cannot do arithmetic operations on data measured on the nominal scale.
Examples: military ranks, ranks in race, ranks of collage academic staff, e t c.
3. Interval scale:
 This scale of measurement applies to quantitative data only.
 There is no true zero point (arbitrary zero paint)
 There is no physical significance to the zero point.
 In this scale, the zero point does not indicate a total absence of the quantity
being measured.
Example: c, oF (Measuring units of temperature)
 Possible to add and subtract but multiplication and division are not possible
37Oc – 35oc = 2oc
45oc – 43oc= 2oc
40 c = 2(20 c) But this does not imply that an object which is 40 oc is twice as hot as an
o o

object which is 20 oc.

 Interval scale data convey better information than nominal and ordinal scale data.
4. Ratio scale: is a measurement scale in which
 There is a constant interval size between any adjacent units on the measurement scale.
 There exists a zero point on the measurement scale and that there is a physical
significance to this zero point.
 This scale of measurement also applies to quantitative data only
Examples: height, weight, volume, etc
 One is different, larger /taller/ better/ less by a certain amount of difference and so much
times than the other.
 We can do arithmetic operations on data measured on the Ratio scale.
This measurement scale provides better information than interval scale of measurement

Quantitative Qualitative
Variables Variables

Meaured using
Discreate Continuous Nominal and
Variables Variables Oridinal scale of
1.6 Sources of data and methods of data collection
Any aggregate of numbers cannot be called statistical data. We say an aggregate of numbers is
statistical data when they are
 Comparable
 Meaningful and
 Collected for a well defined objective
Nature of Data:
It may be noted that different types of data can be collected for different purposes. The data can
be collected in connection with time or geographical location or in connection with time and
location. The following are the three types of data:
1. Time series data: The data that collected over period of time
- This type of data might have been collected either at regular intervals of time
or irregular intervals of time
2. Spatial data: If the data collected is connected with that of a place, then it is termed as spatial
3. Spacio-temporal data: If the data collected is connected to the time as well as place then it is
known as spacio temporal data.
Raw data: are collected data, which have not been organized numerically.
Examples: 25, 10, 32, 18, 6, 93, 4.
An array: is an arrangement of raw numerical data in ascending or descending order of
 It enables us to know the range of the data set easy and it also gives us some idea
about the general characteristics of the distribution.
Categories of data based on its sources
Primary source:
Is a source of data that supplies first hand information for the use of the immediate purpose
Primary data is the one, which is collected by the investigator himself for the purpose of a
specific inquiry or study. Such data is original in character and is generated by survey conducted
by individuals or research institution or any organization.
 Primary data are more expensive than secondary data.
Secondary source:
Secondary data are those data which have been already collected and analysed by some earlier
agency for its own use; and later the same data are used by a different agency.
Secondary data are available from libraries, government agencies and the internet.
A common place to look for secondary data is a library. Here, data can be obtained from
magazines, journals and newspapers.
Government agencies
Government data can be obtained from publications issued by local, state, national and
international governments. Such data include laws, regulations, statistics and consumer
Secondary data can be obtained from search engines such as Yahoo, Google,, etc., on
the internet.

Advantages of secondary data

• Immediately available.
• Cheaper than obtaining new data.
Disadvantages of secondary data
• May be incomplete.
• May have been collected to satisfy different needs.
• No control exists over the method of collection and accuracy of the data.

Methods of data collection

Why do we sample from a population? The three reasons why we sample are:
(a) The determination of the characteristic under investigation may involve a destructive test, as
for example in determining the tensile strength of a metal specimen or the lifetime of a car
(b) It is sometimes impossible to check all items in a population. For example, it is not possible
to count the population of fish in a lake, the population of birds and the population of snakes.
(c) The cost of studying all the items in a population is often prohibitive and time consuming.
There are three major methods of data collection
i. Observation or measurement
ii. Interviews and questionnaires
iii. The use of documentary sources
I. Observation or measurement
In this method, data can be obtained through direct observation or measurement.
- It requires training of persons who measure in order to insure the use of standard
- Provides accurate information but it is expensive and inconvenient
II. Interviews and Questionnaires
Questionnaire: - are written documents which instruct the readers or listeners to answer the
questions written on it.
There are three ways of collecting information under this method
a) Face to face interviews ( Questionnaires in charge of interviewers )
b) Telephone interviews
c) Mailed questionnaires ( Self administered questionnaires returned by mail )
III. The use of documentary sources
It is extracting of information from existing sources (e.g. Hospital records)
1. How does statistics help for your profession?
2. Differentiate descriptive and inferential statistics.
3. Mention some limitations of statistics (discuss by examples).
4. Explain the difference between the following statistical terms by giving example?
. Qualitative Vs quantitative variables
. Nominal Vs ordinal
.Interval Vs Ratio
. Parameter Vs statistic
. Secondary Vs primary data
5. Explain various methods of collecting primary and secondary data.

6. What is a questionnaire?

7. Classify the following data based on scale of measurement.

a. Months of the year Meskerm, Tikimit, hedare …

b. The net wages of a group of workers
c. Socioeconomic status of a family when classified as low, middle and upper classes.
d. The daily temperature of Adam town for 30 days.
8. The following present a list of different attributes and rules for assigning numbers to
objects. Try to classify the different measurement systems into one of the four types of
a) Your checking account number as a name for your account.
b) Your checking account balance as a measure of the amount of money you have
in that account.
c) Your score on the first statistics test as a measure of your knowledge of statistics.
d) Your score on an individual intelligence test as a measure of your intelligence.
e) A response to the statement "Abortion is a woman's right" where "Strongly
Disagree" = 1, "Disagree" = 2, "No Opinion" = 3, "Agree" = 4, and "Strongly
Agree" = 5, as a measure of attitude toward abortion.
f) Times for swimmers to complete a 50-meter race
g) Months of the year Meskerm, Tikimit…
h) Socioeconomic status of a family when classified as low, middle and upper
i) Blood type of individuals, A, B, AB and O.
j) Regions numbers of Ethiopia (1, 2, 3 etc.)
k) The number of students in a college;
l) the net wages of a group of workers;
m) the height of the men in the same to
Organization and Methods of Data Presentation
Classification: - is the process of arranging items/data into classes or categories according to their
similarities and/or differences.
Objects of Classification:
1. Classification eliminates inconsistency and also brings out the points of similarity and/or
dissimilarity of collected items/data.
2. It eliminates unnecessary details.
3. It facilitates comparison and highlights the significant aspect of data.
4. It enables one to get a mental picture of the information and helps in drawing inferences.
5. It helps in the statistical treatment of the information collected.
2.2 Types of classification
a) Geographical:
In this type of classification the data are classified according to geographical region of cities,
districts, countries etc. For instance, the production of teff in different states of Ethiopia,
production of wheat in different countries etc
Country America China Denmark France India
Yield of
Wheat (kg) 1925 893 225 439 862
b) Chronological :
In chronological classification the collected data are arranged according to the order of time
expressed in years, months, weeks, etc. The data is generally classified in ascending order of
time. For example, the data related with population, sales of a firm, imports and exports of a
country are always subjected to chronological classification on the basis of time.
Example: The estimates of birth rates in India during 1970 – 76 are
Year 1970 1971 1972 1973 1974 1975 1976
Birth Rate 36.8 36.9 36.6 34.6 34.5 35.2 34.2
c) Qualitative: - according to some qualitative characteristics.
In this type of classification data are classified on the basis of same attributes or quality like sex,
literacy, religion, employment etc. Such attributes cannot be measured along with a scale. For
example, if the population to be classified in respect to one attribute, say sex, then we can
classify them into two namely that of males and females. Similarly, they can also be classified
into ‘employed’ or ‘unemployed’ on the basis of another attribute ‘employment’. Thus when the
classification is done with respect to one attribute, which is dichotomous in nature, two classes
are formed, one possessing the attribute and the other not possessing the attribute. This type of
classification is called simple or dichotomous classification.
A simple classification may be shown as under

Female Employe
d) Quantitative classification: – in terms of magnitude.
Quantitative classification refers to the classification of data according to some
characteristics that can be measured such as height, weight, etc., For example the students of
a college may be classified according to weight.
2.2 Frequency Distributions
Frequency: - is the number of times a certain value or set of values occurs in a specific group.
The word 'frequency' is derived from 'how frequently' a variable occurs.
Frequency distribution: is a table that presents data according to some criteria with the corresponding
number of items falling in each class (i.e. with the corresponding frequencies.)
Example: A frequency distribution presenting the number of male and females in 1 st Statistics department.
Sex Frequency
Male 1
Female 20 A frequency distribution is constructed for three main
1. To facilitate the analysis of data.
2. To estimate frequencies of the unknown population distribution from the distribution of
sample data and
3. To facilitate the computation of various statistical measures
Generally, there are three basic types of frequency distributions: Categorical, Ungrouped and Grouped
frequency distributions.
1. Categorical frequency distribution
–the data are usually qualitative
– the scales of measurements for the data are usually nominal or ordinal
The categorical frequency distribution is used for data which can be placed in specific categories
such as nominal or ordinal level data. For example, data such as political affiliation, religious
affiliation, blood type, marital status, or major field of study would use categorical frequency
Example: The following data are on the political party affiliations of sample of 21 Rural
Development and Agricultural Extension students. D, R, and O stand for Democratic,
Republican and other, respectively.
The classes for grouping are ‘Democratic’, ‘Republican’ and ‘Other’.
Table: Number of students by political party affiliations.

Class Frequency Relative frequency

Democratic 13 0.325
Republican 18 0.45
Other 9 0.225
Total 40 1
Example1: Thirty students, last year, took Stat 273 course and their grades were as follows.
Construct an appropriate frequency distribution for these data.
Solution: There are five kinds of grades: A, B, C, D and F which may be used as the classes for
constructing the distribution. The frequency distribution becomes as follows.
Class Tally Frequency Percent
A ///// 5 16.7
B ///// //// 9 30.0
C ///// ///// / 11 36.7
D /// 3 10.0
F // 2 6.7
Example 2: The following are the blood groups of a sample of patients who attend Peace
(a) What is the population in this study?
(b) What is the variable in this study?
(c) Construct a frequency table for the data.
2. Ungrouped frequency distribution
Ungrouped frequency distribution is a table of all potential raw scored values that could possibly occur in
the data along with their corresponding frequencies. Ungrouped frequency distribution is often
constructed for small set of data or a discrete variable.
Constructing an ungrouped frequency distribution
To construct an ungrouped frequency distribution, first find the smallest and the largest raw scores in the
collected data. Then make a columnar table of all potential raw scored values arranged in order of
magnitude with the number of times a particular value is repeated, i.e., the frequency of that value. To
facilitate counting method, tallies can be used.
Example: The following data are the ages in years of 20 women who attend health education last year: 30,
41, 39, 41, 32, 29, 35, 31, 30, 36, 33, 36, 32, 42, 30, 35, 37, 32, 30, and 41.
Construct a frequency distribution for these data.
STEP 1. Construct a table, tally the data and complete the frequency column. The frequency distribution
becomes as follows.
Age Tally Frequency
29 / 1
30 //// 4
31 / 1
32 /// 3
33 / 1
35 // 2
36 // 2
37 / 1
39 / 1
41 /// 3
42 / 1

3. Grouped frequency distribution

When the range of the data is large, the data must be grouped into classes. Grouped frequency distribution
is a frequency distribution when several numbers of data are grouped into one class.
Some Important Definitions
– Class: the different, non overlapping groups of data.
Class limits: The class limits are the lowest and the highest values that can be included in the
class. For example, take the class 30-40. The lowest value of the class is 30 and highest class is
40. The two boundaries of class are known as the lower limits and the upper limit of the class.
Class Interval: The class interval may be defined as the size of each grouping of data. For
example, 50-75, 75-100, 100-125… are class intervals.
– Class boundaries: separate one class in a grouped frequency distribution from another. The
boundaries have one more decimal place than the raw data and therefore do not appear in the
collected data. There is no gap between the upper boundary of one class and the lower boundary of
the next class. The lower class boundary (LCB) is found by subtracting 0.5 units of measurement
from the lower class limit (LCL) and the upper class boundary (UCB) is found by adding 0.5 units of
1 1
measurement to the upper class limit (UCL). That is, LCB=LCL - 2 U and UCB =UCL + 2 U
Class width (W): The difference between the upper and lower class boundaries of a class interval
is called the class width of the class interval. Class widths of class intervals can also be found
by subtracting two consecutive lower class limits, or by subtracting two consecutive upper
class limits.
Class mark (M): the midpoint of a class interval.
UCBi + LCB i
i.e. 2
– Unit of measurement (U): the smallest difference between any two values of the variable being
– Cumulative frequency (C f) less than type: the total frequency of all values (observations) less than
or equal to the upper class boundary for the given class.
– Cumulative frequency (C f) more than type: The total frequency of all values (observations) greater
than or equal to the lower class boundary for the given class.
A tabular arrangement of class intervals together with their corresponding cumulative frequency (either
less than or more than type; as defined above) is called a cumulative frequency distribution.
– Relative frequency: the frequency a class divided by the total frequency (i.e. sum of all frequencies)
and, if multiplied by 100, gives the percent of values falling in that class.
Frequency of that class
Re lative frequency of a class=
Total frequency
 The relative frequency shows what fractional part or proportion of the total frequency belongs to the
corresponding class.
 The sum of all the relative frequencies in the frequency distribution is always 1.
Relative cumulative frequency (less than type/ more than type): total of the relative frequencies
above/ below a class inclusively. Or the cumulative frequency (less than type/more than type) divided
by the total frequency. This gives the percent of values which are less than/more than the upper/lower
class boundary.

Guidelines to construct a grouped frequency distribution

STEP 1. Find the maximum(Max) and the minimum(Min) observation, and then compute their range
STEP 2. Fix the number of classes desired (k). there are two ways to fix k:
– Fix k arbitrarily between 5 and 20, or
Use Sturge’s Formula: K=1+ 3.322log 10 N
Where N= is total number of Observation.
Log = logarithm of the number
K = Number of class intervals. And round this value of k up to get an integer number.
STEP 3. Find the class widths (W) by dividing the range by the number of classes and round the number up
to get an integer value. W= where w= Class width
R= Range and K =Number of classes
STEP 4. Pick a suitable starting point less than or equal to the minimum value. This starting point is the
lower limit of the first class. Continue to add the class width to this lower limit to get the rest of
the lower limits.
STEP 5. Find the upper class limits. To find the upper class limit of the first class, subtract one unit of
measurement from the lower limit of the second class. Then continue to add the class width to this
upper limit so as to get the rest of the upper limits.
1 1
STEP 6. Compute the class boundaries as: LCB=LCL− U and UCB=UCL+ U
2 2
Where LCL = lower class limit, UCL= upper class limit, LCB= lower class boundary and
UCB= upper class boundary.
The class boundaries are also half way between the upper limit of one class and the lower limit of the next
STEP 7. Tally the data.
STEP 8. Find the frequencies.
STEP 9. (If necessary) Find the cumulative frequencies (more than and less than types).
Types of class intervals:
There are three methods of classifying the data according to class intervals namely
a) Exclusive method
b) Inclusive method
c) Open-end classes
a) Exclusive method: When the class intervals are so fixed that the upper limit of one class is the
lower limit of the next class; it is known as the exclusive method of classification. The following
data are classified on this basis

Expenditure in money No. of families

0 – 5000 60
5000-10000 95
10000-15000 122
15000-20000 83
20000-25000 40
Total 400
b. Inclusive method: In this method, the overlapping of the class intervals is avoided. Both
the lower and upper limits are included in the class interval.
The inclusive method should be used in case of discrete variable.
Class interval Frequency
5- 9 7
10-14 12
15-19 15
20-29 21
30-34 10
35-39 5
Total 70
c. Open end classes: A class limit is missing either at the lower end of the first class
interval or at the upper end of the last class interval or both are not specified.
The necessity of open end classes arises in a number of practical situations, particularly
relating to economic and medical data when there are few very high values or few very low
values which are far apart from the majority of observations.
Salary Range No of workers
Below 2000 7
2000 – 4000 5
4000 – 6000 6
6000 – 8000 4
8000 and above 3
Example: The number of hours 40 employees spends on their job for the last 7 working days is given
62 50 35 36 31 43 43 43
Construct a suitable 41 31 65 30 41 58 49 41 frequency distribution
with inclusive type of 37 62 27 47 65 50 45 48 class interval for these
data using 8 classes. 27 53 40 29 63 34 44 32
STEP 1. Max = 65, Min = 26 58 61 38 41 26 50 47 37 so that R = 65-26 = 39
STEP 2. It is already determined to construct a
frequency distribution having 8 classes.
R 39
STEP 3. Class width W = = =4.875≈ 5
K 8
STEP 4. Starting point = 26 = lower limit of the first class. And hence the lower class limits become
26 31 36 41 46 51 56 61
STEP 5. Upper limit of the first class = 31-1 = 30. And hence the upper class limits become
30 35 40 45 50 55 60 65
The lower and the upper class limits (Steps 5 and 6) can be written as follows.
Class limits
26 – 30
31 – 35
36 – 40
41 – 45
46 – 50
51 – 55
56 – 60
61 – 65
STEP 6. By subtracting 0.5 units of measurement from the lower class limits and by adding 0.5 units of
measurement to the upper class limits, we can get lower and upper class boundaries as follows.
Class boundaries
25.5 – 30.5
30.5 – 35.5
35.5– 40.5
40.5– 45.5
45.5– 50.5
50.5– 55.5
55.5– 60.5
60.5– 65.5

STEPS 8, 9 and 10 are displayed in the following table (columns 3, 4 and 5&6 respectively).
Class limits Class Tally Frequency Cumulative Cumulative
boundaries frequency (less frequency (more
than type) than type)
26 – 30 25.5 – 30.5 ///// 5 5 40
31 – 35 30.5 – 35.5 ///// 5 10 35
36 – 40 35.5– 40.5 ///// 5 15 30
41 – 45 40.5– 45.5 ///// //// 9 24 25
46 – 50 45.5– 50.5 ///// // 7 31 16
51 – 55 50.5– 55.5 / 1 32 9
56 – 60 55.5– 60.5 // 2 34 8
61 – 65 60.5– 65.5 ///// / 6 40 6

Example 2:
Example 2.2
Table 2.15 gives the ages of a sample of patients who attended Hope Medical Hospital.
(a) Find the sample size. (b) Complete the blank cells.
Table 2.15: Ages of patients
Ages (years) Frequency Relative frequency Cumulative
frequency less than type
10 – 14 – – –
15 – 19 8 0.16 12
20 – 24 15 – –
25 – 29 – – 37
30 – 34 – – –
(a) If the sample size is n, then the relative frequency of the second class interval
is 8 ÷ n. Hence, n is a root of the equation
n= =50
a) Hence the sample size is equal to 50.
b) The missing value will be

Ages (years) Frequency Relative frequency Cumulative

frequency less than type
10 – 14 4 0.08 4
15 – 19 8 0.16 12
20 – 24 15 0.3 27
25 – 29 10 0.2 37
30 – 34 13 0.26 50
c) d)

Notice again that:

(i) The last cumulative frequency is equal to the sum of all the frequencies;
(ii) Relative frequencies must add up to 1, allowing for rounding errors.
Exercise1: Given below are the numbers of tools produced by workers in a factory.
43, 18, 25, 18, 39, 44, 19, 20, 20, 26, 40, 45, 38, 25, 13, 14, 27, 41, 42, 17, 34, 31, 32, 27, 33, 37,
25, 26, 32, 25, 33, 34, 35, 46, 29, 34, 31, 34, 35, 24, 28, 30, 41, 32, 29, 28, 30, 31, 30, 34, 31, 35,
36, 29, 26, 32, 36, 35, 36, 37, 32, 23, 22, 29, 33, 37, 33, 27, 24, 36, 23, 42, 29, 37, 29, 23, 44, 41,
45, 39, 21, 21, 42, 22, 28, 22, 15, 16, 17, 28, 22, 29, 35, 31, 27, 40, 23, 32, 40, 37
Construct frequency distribution with inclusive type of class interval. Also find.
1. How many workers produced more than 38 tools?
2. How many workers produced less than 23 tools?
Using sturge’s formula for determining the number of class intervals, we have
Number of class intervals = 1+ 3.322 log10N, N=100
= 1+ 3.322 log10100
= 7.6
¿ Range 46−13.6
Sizes of class interval= = =5
Number of classintervalNumber of class interval 7.6
Example: The following table gives the distribution of the ages of a sample of patients who
Hope Hospital.
Age (years) Frequency Relative Frequency
5 – 14 6 0.08
15 – 24 9 –
25 – 34 – 0.24
35 – 44 24 –
45 – 54 15 –
55 – 64 – –
(a) What is the population in this study? (b) What is the variable in this study?
(c) What is the sample size? (d) Complete the blank cells in the table.

2.2 Diagrammatic and Graphic Presentation of Data

The data that is presented by a frequency distribution can also be displayed diagrammatically or graphically.
Diagrams and graphs:
Significance of Diagrams and Graphs:
Diagrams and graphs are extremely useful because of the following reasons.
1. They are attractive and impressive.
2. They make data simple and intelligible.
3. They make comparison possible
4. They save time and labour.
5. They have universal utility.
6. They give more information.
7. They have a great memorizing effect.
Usually diagrams are appropriate for presenting discrete data, whereas graphs are appropriate for
presenting continuous types of data.
There are three common diagrammatic presentations of data: bar-diagram/charts, pie-chart and
pictograms, as well as three common graphic presentations of data: histogram, frequency polygon, and
cumulative frequency polygon (ogive).
A diagram is a visual form for presentation of statistical data, highlighting their basic facts and
General rules for constructing diagrams:
1. A diagram should be neatly drawn and attractive.
2. The measurements of geometrical figures used in diagram should be accurate and
3. The size of the diagrams should match the size of the paper.
4. Every diagram must have a suitable but short heading.
5. The scale should be mentioned in the diagram.
6. Diagrams should be neatly as well as accurately drawn with the help of drawing instruments.
7. Index must be given for identification so that the reader can easily make out the meaning of
the diagram.
8. Footnote must be given at the bottom of the diagram.
9. Economy in cost and energy should be exercised in drawing diagram.
Types of diagrams:
In practice, a very large variety of diagrams are in use and new ones are constantly being added.
For the sake of convenience and simplicity, they may be divided under the following heads:
1. One-dimensional diagrams
2. Two-dimensional diagrams
3. Three-dimensional diagrams
4. Pictograms and Cartograms
A) One-dimensional diagrams:
In such diagrams, only one-dimensional measurement, i.e height is used and the width is not
considered. These diagrams are in the form of bar or line charts and can be classified as
1. Line Diagram
2. Simple Diagram
3. Multiple Bar Diagram
4. Sub-divided Bar Diagram
5. Percentage Bar Diagram
Line Diagram:
Line diagram is used in case where there are many items to be shown and there is not much of
difference in their values.
Example 1: Show the following data by a line chart:
No. of children 0 1 2 3 4 5
Frequency 10 14 9 6 4 2
I. Bar-diagrams/ Bar-charts
- Bar-diagram is a series of equally spaced bars having equal width and the height of each bar
representing the magnitude or frequency of observations in each group.
- Bar-diagrams are usually used to represent one way or simple frequency distribution.
- Bar-diagrams can be drawn either horizontally or vertically. Usually horizontal bar-diagrams are used
for qualitatively classified data whereas vertical bar-diagrams are used for quantitatively classified
Example: Horizontal bar-diagram.

B lo o d T y p e

8 10 12 14 16 18


There are a number of bar-diagrams. The most common being:

- Simple bar-diagrams
- Deviation (two-way) bar-diagrams
- Broken bar-diagrams
- Component (subdivided) bar-diagrams
- Multiple bar-diagrams
1. Simple bar-diagrams
Simple bar-diagrams are used to depict data of single variable or one-way variable.
Example: The following frequency distribution shows sales of production (in million birr) of three
products for 2004 production year.
Produc Sale (in million)
A 14
B 21
C 9
D 17
The bar-diagram presentation for these data is given below.

S a le s ( in m illio n b ir r ) 20








2. Deviation bar-diagrams
When the data take both positive and negative values (for instance data on profit, net export, percent
change, etc) deviation bar-diagrams are appropriate.
Example: Present the following data using a suitable bar-diagram.
Data: Net profit (in thousands birr) in oil sales for five years
Year Profit (in thousands)
1997 12
1998 -5
1999 14
2000 9
2001 -6

The deviation bar-diagram for the data looks like the following.
Profit (in thousands)


1997 1998 1999 2000 2001


3. Broken bar-diagrams
This kind of bar-diagram is used to present data involving a few extreme values where it will be difficult
to accommodate the magnitude of the bars corresponding to these values within the graph paper. In this
case we use pieces of bars with each piece starting with a jump on the numerical scale.
Example: Data: - Amount of production per a day for four products of a factory.
Product Quantity produced (kg/day)
A 14
B 35
C 23
D 109

4. Component bar-diagrams
When it is desired to show how a total (an aggregate) is divided into component parts, we use component
bar diagram. In such type of bar-diagrams, the bars represent aggregate value of a variable with each
aggregate broken into its component parts and different colors or designs are used for identification.
Example: Represent the following data using bar-charts
Data: Yields of production of farmers in Southern Ethiopia.
Year  1990 EC 1991 EC 1992 EC 1993 EC
Barley 14 15 26 19
Wheat 10 15 14 25
Maize 2 6 10 3
Total 26 36 50 47







1990 1991 1992 1993

The component bar-diagram for this table is as follows
5. Multiple bar-diagrams
Multiple bar-diagrams are used to display data on more than one variable. They are used for comparing
different variables at the same time.
Example: The data given in the above example can be presented by using multiple bar-diagram as below
P r o d u c tio n





1990 1991 1992 1993


II. Pie-charts
A pie-chart is a circle that is divided into sections or wedgrs according to the percentages of frequencies
in each category of the distribution. The angle of the sector of a class is obtained by multiplying the ratio
of the frequency of the class to the total frequency by 360 0.
frequency of the class
i.e. sector angle of a class= ×3600
total frequency
Note that pie-charts are usually used for depicting nominal level data.
Example: A survey showed that a car owner spends birr 2,950 per year on operating expenses. Below is
the breakdown of the various expenditure items. Draw an appropriate chart to portray the data.
Expenditure item Amount (in birr)
Fuel 603
Interest on car loan 279
Repairs 930
Insurance and license 646
Depreciation 492
Total 2,950

How to draw a pie-char

- First find the percentages of each class
- Next calculate the degree measures for each class
- Finally, using a protractor, put each sector /degree measure/ in a circle and give a key for explanation.
Expenditure item Amount (in Percentage Degree
birr) (approx) (approx)
Fuel 603 20 74
Interest on car loan 279 9 34
Repairs 930 32 113
Insurance and license 646 22 79
Depreciation 492 17 60
Total 2,950 100 360
Now we can draw the pie-chart for the data.

17% 20%

9% 22%
Insurance and license
Interest on car loan 32%


III. Pictograms
In pictograms, we represent the data by means of some picture symbols. Here we decide a suitable picture
to represent a definite number of units in which the variable is measured.
SExample: The following table shows the orange production in a plantation from production
year 1990-1993. Represent the data by a pictogram.
Table: Orange productions from 1990 to 1993.
Production year 1990 1991 1992 1993
Amount (in kg) 3000 3850 3500 5000

IV. Histogram
A histogram is
another way of data
presentation which
is more suitable for
distributions with
continuous classes.
In drawing a histogram, we put the class boundaries of each class on the horizontal axis and its respective
frequency on the vertical axis.
Example: Draw a histogram presenting the following data.
Frequency Cumulative Cumulative
Class Boundaries Class Mark Frequency (less Frequency (more
than type) than type)
5.5 – 11.5 8.5 2 2 20
11.5 – 17.5 14.5 2 4 18
17.5 – 23.5 20.5 7 11 16
23.5 – 29.5 26.5 4 15 9
29.5 – 35.5 32.5 3 18 5
35.5 – 41.5 38.5 2 20 2

V. Frequency Polygon
A frequency polygon is a line graph drawn by taking the frequencies of the classes along the vertical axis
and their respective class marks along the horizontal axis. Then join the cross points by a free hand curve.
Example: Present the data in the previous example using a frequency polygon.


0.0 8.50 14.50 20.50 26.50 32.50 38.50

Class Marks

VI. Cumulative Frequency Polygon (Ogive)

Cumulative frequency polygon can be traced on less than or more than cumulative frequency basis. Place
the class boundaries along the horizontal axis and the corresponding cumulative frequencies (either less
than or more than cumulative frequencies) along the vertical axis. Then join the cross points by a free
hand curve.
Example: the data in the previous example can be presented using either a less than or a more than
cumulative frequency polygon as given below (i) and (ii) respectively.
(i) Less than type cumulative frequency polygon

Less than type cumulative frequencies



11.50 17.50 23.50 29.50 35.50 41.50

Upper class boundaries

(ii) More than type cumulative frequency polygon
More than type cumulative frequencies



5.50 11.50 17.50 23.50 29.50 35.50

Lower class boundaries

1. Given the following row data:
62 50 57 58 51 53 62 64 60 61
60 51 64 55 55 52 60 65 58 60
59 52 63 56 56 58 64 63 62 60
58 54 62 54 54 60 65 60 62 59
56 63 52 53 62 53 61 61 59 65
a) Construct simple frequency distribution table.
b) Construct grouped frequency distribution table.
2. If class mid-points in a frequency distribution of a group of persons are 25, 32, 39, 46, 53, 60, 67, 74
and 81, find (a) size of the class interval, and (b) the class boundaries.
3. In a sample study about coffee drinking habits in two villages A and B, the following information was
A: Females were 40%. Total coffee drinkers were 45% and male non-coffee drinkers were 20%.
B: Male were 55%, male non-coffee drinkers were 30% and female coffee drinkers were 15%.
Present the above information in a tabular form.
4. The following table shows the marital status of males and females (18 years and older) in a certain city.
Draw a pie chart separately for males and females to display the data.
Marital Status Male (percent of total) Female (percent of total)
Single 21 16
Married 65 73
Widowed 9 4
Divorced 5 7
5. Prepare (a) histogram (b) frequency polygon (c) Ogive for the following frequency distribution of
marks in a final examination.
Class 10-19 20-29 30-39 40-49 50-59 60-69 70-79 80-89
Frequency 6 12 20 14 12 8 6 2

