2 - Data

STATISTICS-I
Chap-02: Collection, Organization & Presentation of Data

Contents:
Data
Types of data
Methods of Primary data collection
Sources of secondary data
Organization of data
Presentation of data
Data:
Definition:
Data is the plural form of datum. The raw materials of
research are known as data. Specifically, the measures
of characteristics of units or objects or individuals will
constitute data.
Example: Height, weight, body fitness of Cadets of

MCC.
Data:
Types:
a) On the basis of nature of characteristics
i) Qualitative data
ii) Quantitative data(Discrete, Continuous)
b) According to source
i) Primary data
ii) Secondary data
Data:
Primary data:
The data which are collected from the population or
sample units directly is called primary data.
Example: If the cadets personal information collected
from the cadets directly then it will constitute primary
data.
* Primary data is original in character, not well
organized, highly expensive w.r.t. money, time and
labor.
Data:
Secondary data:
The data which are already obtained by other persons
or organizations and are already published or utilized
is called secondary data.
Example: If the cadets personal information collected

from the respective form master or admin office then
the data will be secondary.
Methods of primary data collection:
a) Direct personal interview or face to face interview or schedule

method:
It is the most widely used method for primary data collection. In
this method the researcher ask questions the respondents directly
from a prior printed questionnaire.
* The data are more reliable and accurate

*The chance of having wrong information is less
*It is fruitful when the respondents are illiterate
# It is costly and more time consuming
b) Indirect oral inquiry:

In this method the interviewer does not ask questions
the respondent rather he takes help from other person
to have the answers.
*It is needed when the respondents are dangerous like

terrorists, drug addicted, smugglers etc.
# The answer or information may not be reliable
c) Mailed questionnaire method:

In this method the interviewer send a questionnaire to the
respondent through email or post mail. For post mail a self addressed
stamped envelope should be sent with the questionnaire.
*It is fruitful when there are some sensitive questions

* It is less costly for large scale survey
# The chance of having wrong information is more
# All answers may not be required
# It is applicable to educated respondents only
d) Through local agents or schedule:

In this method the enumerator appoint some local agents or
correspondents in different areas to collect the information by printed
questionnaire called schedule. Govt., Non-govt. agencies, Newspaper
agencies, print and electronic media adopt this method.
* The chance of having wrong information is less

* It is fruitful when the area is large such as Census
* It is less time consuming
# The information may be biased
# Skilled and well-trained enumerators needed to get reliable data
e) Telephone interview:
In this method a conversation is happened between the
interviewer and respondent through a telephone call.
* It is less time and cost consuming

* It is applicable for large no. of respondents spread over a
wide geographical area.
# Some information may be wrong
# Response rate is lower than direct personal interview
f) Online interview:
In this method a survey is conducted through facebook,
twitter, video calling etc.
g) Through experiment:
In natural sciences like physics, chemistry, astronomy and
biological sciences like botany, zoology, biochemistry,
microbiology, pharmacy data are obtained when
treatments are applied on experimental units in a
controlled laboratory.
Sample questionnaire:
Title: ‘Applicable or not for Math Olympiad’
Title: ‘Teaching Proficiency’
Title: ‘Cadet’s Personality & Behavior’
*
Sources of secondary data :
a) Published sources:
 International publications, articles
 Govt., semi-govt., non govt. publications/records
 Official Statistics
 Trade and financial Journals
 Diaries, Books, Newspapers, magazines
 Websites, blogs
 International organizations like World Bank, WHO, IMF, ILO, UNDP
 Local organizations like BBS, BRRI, BARDEM, BRAC, ICDDRB
b) Unpublished sources: Private firms or business houses, institutions.

Primary data Vs secondary data :
 Primary data is more expensive than secondary data in

terms of time, money and labor.
 Secondary data may not be suitable, adequate and

reliable than primary data for the purpose of
investigation.
 Primary data is original in character but not well organized

than secondary data.
Selection of appropriate method:
 One must always remember that each method of data collection

has its importance, and none is superior in all situations.
 The secondary data may be used in case the researcher finds them
reliable, adequate and appropriate for his study.
 Finally, the most desirable approach w. r. t. the selection of the

method depends on the nature of the particular problem, time
and resources available along with the desired degree of accuracy.
Organization (Processing, Classification/tabulation of Data):
Processing: After collection, the data has to be processed

through editing and coding.
Editing:
Once the set of data have been collected, it is necessary to
process them for proper presentation. Editing of data is required
as preparatory work before the tabulation and statistical analysis
is carried out. This is quite a difficult job and requires a great
deal of skill and experience. While editing primary data the
following consideration need attention:
 The data should be complete
 The data should be consistent
 The data should be accurate
 The data should be homogenous
Coding:
When the data is to be processed by computer it must be coded
and converted into the computer language. For some qualitative
data, the code numbers can be assigned. For example, to a
question, “do you smoke?” a code of 1 can be assigned to the
answer ‘yes’ and a code of 0 can be assigned to the answer ‘no’.
Classification/tabulation of Data:
Classification/tabulation is a process of arranging the
available information into homogeneous groups within
some rows and columns according to similarities or
same characteristics.
Construction of a table:
A good table consists of the followings:
1) Table number 2) Title of the table, 3) Row heading or stub,
4) Column heading or caption, 5) Body of the table, 6)
Footnote( If needed).
The data can be classified/tabulated into the following

groups.
a) Geographical classification/tabulation
b) Chronological classification/tabulation
c) Quantitative classification/tabulation (frequency distribution)
d) Qualitative classification/tabulation
Classification of Data:
a) Geographical classification:
In geographical classification the data are classified
on the basis of geographical areas.
Table-01: The number of COVID-19 patients of Bangladesh
classified according to different divisions are shown below
Division Dhaka CTG Sylhet Mymensingh Barishal Rajshahi Rangpur Khulna
No of 44000 30450 15200 23630 12325 25214 16482 23147

Patients
b) Chronological classification:
In Chronological classification the data are classified
on the basis of time.
in August, 2020 classified according to different time are
shown below
Date Aug 01 Aug 02 Aug 03 Aug 04 Aug 05 Aug 06 Aug 07 Aug 08
No of 1005 1245 1348 1578 1658 1875 2312 1925

Patients
c) Quantitative classification/frequency distribution:
In Quantitative classification the data are classified in
terms of magnitudes.
Table-03: The number of COVID-19 patients of Bangladesh in 11th
August, 2020 classified according to different ages are shown
below
Age 0-10 10-20 20-30 30-40 40-50 50-60 60-70 70-80
interval
No of 22 33 45 57 75 99 125 258
Patients
d) Qualitative classification:
In Qualitative classification the data are classified in
terms of attributes or categories.
classified according to different sex are shown below
Sex Male Female
No of Patients 145000 102000

*** Classification/tabulation may be one way or two way

1) One way classification/simple table:
If the whole data are classified according to one characteristics.
Table-01: The number of COVID-19 patients of Bangladesh classified
according to different sex are shown below
2) Two way classification/complex table:
If the whole data are classified according to two characteristics.
Table-02: The number of COVID-19 patients of Bangladesh classified
according to different sex and religion are shown below
Sex
Religion Male Female
Muslim 95000 75000
Hindu 35000 15000
Buddhist 13000 11000
Christian 2000 1000

Quantitative Classification/Frequency Distribution:
Frequency:
The repeated times of a value is called frequency of
that value.
For example: The age of 5 cadets are 24, 25, 24, 26, 25.
Here the frequency of three values 24, 25, 26 are 2, 2, 1
respectively.
*** The data with frequency is called grouped data and without
frequency is called ungrouped data.
Quantitative Classification/Frequency Distribution:
Frequency distribution:
It is a statistical table which shows the distribution of
whole data according to different classes.
For an example, the number of COVID-19 patients of Bangladesh in 11th

August, 2020 classified according to different ages are shown below
Frequency Distribution:
Types of Frequency distribution:
According to variable frequency distribution are of two
types such as
1) Discrete frequency distribution
2) Continuous frequency distribution
1) Discrete frequency distribution:

In this frequency distribution the whole data are
represented against the discrete variable.
For example, if we consider the number of cars in 500
families of Tangail town then we may have the following
frequency distribution:
Number of cars Number of families
0 250
1 125
2 75
3 50
Total 500
2) Continuous frequency distribution:
In this frequency distribution the continuous data are
represented in terms of some class interval.
For an example, the number of COVID-19 patients of Bangladesh in 11th

August, 2020 classified according to different ages are shown below:
Continuous frequency distribution can be constructed in
two ways such as
i) Exclusive method
ii) Inclusive method
i) Exclusive method:
Where upper limit of a class interval is not included in that class.
In below example the class interval is 10.
ii) Inclusive method:
Where upper limit of a class interval is included in that class.
Marks obtained No. of cadets
(Class Interval 10)
10 – 19 5
20 – 29 7
30 – 39 22
40 – 49 10
50 – 59 6
Total 50
Class intervals:
The difference between the upper limit and lower limit of a any class or
group is called class interval.
There are three types of class interval.
a) Equal class interval ( 10-20, 20-30, 30-40)
b) Unequal class interval (10-20, 20-25, 25-37)
c) Open-end class interval
Daily income(TK) Peoples
Less than 100 50
100 -150 75
Moe than 150 100
Construction of continuous frequency distribution:
The following are the important steps to construct
continuous frequency distribution.
Step-1: Determination of range
Range, R = highest value – lowest value
Step-2: Determination of number of classes

The number of classes should be in the range 5 and 25.
‘Sturges’ suggested the following formula for determining
approximate number of classes,
K = 1 + 3.322 ; N = no. of total observations

= An integer value
Step-3: Determination of class interval

The formula for determining class interval or width of a class is,
C = (should be integer)
As far as possible one should avoid class intervals such as 3, 4, 7, 11, 26,
etc. Preferably, one should have class intervals of either 5 or multiples of
Step-4: Determination of class limit
The starting point, i.e. the lower limit of the first class,
should either be zero or 5 or multiple of 5. For example, if
the lowest value of the series is 63 and we have taken a class
interval of 10, then the first class should be 60-70, instead of
63-73.
Step-5: Tally and frequency
Take each item from the data one at a time and put tally
mark ( ) against the class to which the item belongs. Count
the tally marks and place this number against the class to
which the items belong. Count the total frequency and check
Mathematical problem:
Construct a frequency table (exclusive method) by using suitable class
interval from the following obtained marks of 30 students:
34 36 31 46 76 86 42 44 32 46 40 54 66 56 50
42 33 80 77 81 46 40 60 63 64 76 56 57 57 70
Solution:
Here,
The maximum value = 86
The minimum value = 31
So range, R = 86 – 31
According to Sturges rule number of classes,
K = 1 + 3.322 ; N = 30
= 1 + 3.322 (30)
= 5.9069
= 6 (app)
Now the class interval , C
= 9.1667
Table: Frequency distribution by taking class interval as 10
Class Tally Frequency
(Marks Obtained) f
30 – 40 5
40 – 50 8
50 – 60 6
60 – 70 4
70 – 80 4
80 – 90 3
Total N = = 30
Frequency distribution
*Calculation of Less than and more than cumulative frequency(F), frequency density,
relative frequency, percentage frequency.(K=class interval)
Less than CF More than CF Frequency Relative Percentage

Frequency
Class /upward CF /down CF density frequency frequency
f
F F f/k RF= f/N RF×100
30 – 40 5 5 30 5/10=0.5 0.17 17
40 – 50 8 5+8=13 25 0.8 0.27 27
50 – 60 6 19 17 0.6 0.20 20
60 – 70 4 23 11 0.4 0.13 13
70 – 80 4 27 3+4=7 0.4 0.13 13
80 – 90 3 30 3 0.3 0.10 10
Total N = 30 1 100
Presentation of Data:
The organized data can be presented by diagrams or

graphs into two ways:
i) Quantitative presentation (Stem-leaf plot, Frequency curve,

frequency polygon, ogive curve, histogram)
ii) Qualitative presentation (Bar diagram, pie chart, historigram)
i) Quantitative data presentation:
a) Stem-leaf plot:
It is a graphical technique of representing quantitative data that can be
used to examine to shape of the distribution, the range of the values and
point of concentration of the values. Each numerical value is divided into
two parts namely stem and leaf. Usually the stem is the first digit or digits
or integer part or any other suitable part of the observed values and leaf
is the trailing digit or decimal places.
Example: For observed value 27, 2 is stem and 7 is leaf

For observed value 127, 12 is stem and 7 is leaf
For observed value 12.7, 12 is stem and 7 is leaf
Problem-1:
The following values are the obtained marks of 30 students:
34 36 31 46 76 86 42 44 32 46 40 54 66 56 50
42 33 80 77 81 46 40 60 63 64 76 56 57 57 70
Use a stem-leaf plot to display the data

Here,
Maximum value = 86
Minimum value = 31
The stem-leaf plot for the given data is given below:
Stem Leaf
3 4 6 1 2 3
4 6 2 4 6 0 2 6 0
5 4 6 0 6 7 7
6 6 0 3 4
7 6 7 6 0
8 6 0 1
By arranging the leaves in ascending order the final stem-leaf plot will be:
Stem Leaf Frequency
3 1 2 3 4 6 5
4 0 0 2 2 4 6 6 6 8
5 0 4 6 6 7 7 6
6 0 3 4 6 4
7 0 6 6 7 4
8 0 1 6 3
Total N = 30
Where, 3│1 means 31 and class interval = 10 (30-40, 40-50, …)

Problem-2:
The typing speed of 24 students was recoded as follows:
13 12 6 8 15 18 17 24 28 23 27 23
21 20 15 18 23 25 23 13 17 18 19 18
Use a stem-leaf plot to display the data

Here,
Maximum value = 28
Minimum value = 6
We know the number of stems lies between 5 and 25 in general.
So the stem-leaf plot for the given data will be as follows:
Stem Leaf
5 1 3
10 3 2 3
15 0 3 2 0 3 2 3 4 3
20 4 3 3 1 0 3 3
25 3 2 0
Stem Leaf Frequency
5 1 3 2
10 2 3 3 3
15 0 0 2 2 3 3 3 3 4 9
20 0 1 3 3 3 3 4 7
25 0 2 3 3
Total N = 24
Where, 5│1 means 5+1 = 6 and class interval = 5 (5-10, 10-15, …)

Problem-3:
The price earning ratio of 20 stocks was recoded as follows:
20.8 20.9 22.0 22.3 22.6 21.7 20.4 21.4 23.3 19.8
20.9 21.0 22.6 21.5 22.2 19.4 20.4 21.5 22.7 21.3
Use a stem-leaf plot to display the data following 19│4 means 19.4
Here,
Maximum value = 23.3
Minimum value = 19.4
The stem-leaf plot for the given data will be as follows:
Stem Leaf
19 8 4
20 8 9 4 9 4
21 7 4 0 5 5 3
22 0 3 6 6 2 7
23 3
Stem Leaf Frequency
19 4 8 2
20 4 4 8 9 9 5
21 0 3 4 5 5 7 6
22 0 2 3 6 6 7 6
23 3 1
Total N = 20
Where, 19│4 means 19.4 and class interval = 1 (19-20, 20-21, …)

Importance or uses of stem-leaf plot:
 Range can be measured easily
 Nature of frequency distribution is easily known
 Identity of each observation is maintained
 Data can be represented easily in simple and scientific form
 Real mode can be determined
 Extreme value can be identified easily
b) Frequency Curve:
When we plot the frequencies corresponding to different mid value of
class intervals and the free hand curve thus obtained is called frequency
curve.
Problem:
Construct a frequency curve from the following distribution:
Class 0 - 10 10 - 20 20 - 30 30 - 40 40 - 50 50 – 60 60 - 70 Total
Frequency 4 6 8 15 12 4 1 50
Drawing Procedure: Plot the mid point of class intervals along X-axis and
corresponding frequencies along Y-axis. Now put a dot against every mid point for
every frequency and then join the dots by freehand turning. The freehand curve
thus obtained is required frequency curve.
Y Frequency Curve
Along X-axis 1 square = 2 units
20 Along Y-axis 1 square = 1 unit
15
Frequenc
10
y
0 5 15 25 35 45 55 65 X
Mid Value
C) Frequency Polygon:
When we plot the frequencies corresponding to different mid value of
class intervals and the polygon thus obtained is called frequency polygon.
Problem:
Construct a frequency polygon from the following distribution:
Class 0 - 10 10 - 20 20 - 30 30 - 40 40 - 50 50 – 60 60 - 70 Total
Frequency 4 6 8 15 12 4 1 50
Drawing Procedure: Plot the mid point of class intervals along X-axis and
corresponding frequencies along Y-axis. Now put a dot against every mid point for
every frequency and then join the dots by strait lines one by one. Add the starting dot
with previous mid point and ending dot with next mid point. The polygon thus
obtained is required frequency polygon.
Y Frequency Polygon
20 Along Y-axis 1 square = 1 unit
15
Frequenc
10
y
0 5 15 25 35 45 55 65 75 X
Mid Value
d) Histogram/column diagram:
Histogram is a suitable graph for representing the frequency distribution of
a continuous or grouped series(exclusive). If the class limit is inclusive, then
it must be converted to exclusive. Here the class limits are plotted along X-
axis and frequencies are plotted along Y-axis. A rectangle(column) is drawn
on each class where the length of rectangle is equal to the value of
respective frequency and width is equal to class interval. All drawn
rectangles should be adjacent to each other.
Uses:
i) frequency curve and frequency polygon can be drawn from histogram by
joining the midpoints of the rectangles.
ii) Mode can be determined from histogram of any frequency distribution
iii) Histogram is used to present social, economic and researches data.
Problem:
1) Draw a histogram from the following frequency distribution
Class 20-30 30-40 40-50 50-60 60-70

Frequency 6 8 5 4 3
Drawing procedure:
In a graph paper plot the class limits along X-axis and frequency along Y-
axis. Draw a rectangle on each class where the where the length of
rectangle is equal to the value of respective frequency and width is equal
to class interval. All drawn rectangles should be adjacent to each other.
The graph thus obtained is required histogram.
Y Histogram
Along Y-axis 2 squares = 1 unit
7
Frequenc
5
y
8
3 6
5
4
3
1
0 20 30 40 50 60 70 80 X
Class Limits
Frequency curve from Histogram:
Y
Histogram
7
Frequenc
5
y
8
3 6
5
4
3
1
0 20 30 40 50 60 70 80 X
Class Limits
Frequency polygon from Histogram:
Y
Histogram
7
Frequenc
5
y
8
3 6
5
4
3
1
0 20 30 40 50 60 70 80 X
Class Limits
Problem:
Class 20-29 30-39 40-49 50-59 60-69
Frequency 6 8 5 4 3
Drawing procedure:
Firstly, convert the inclusive class intervals into exclusive class intervals
by adding 0.5 with upper limit and subtracting 0.5 from lower limit.
(lower limit of next class – upper limit of previous class)/2 = 30-29/2 = 0.5
Class 19.5-29.5 29.5-39.5 39.5-49.5 49.5-59.5 59.5-69.5

Frequency 6 8 5 4 3
Now in a graph paper plot the class limits along X-axis and frequency
along Y-axis. Draw a rectangle on each class where the where the
length of rectangle is equal to the value of respective frequency and
width is equal to class interval. All drawn rectangles should be
adjacent to each other. The graph thus obtained is required
histogram.
Y Histogram
7
Frequenc
5
y
8
3 6
5
4
3
1
0 19.5 29.5 39.5 49.5 59.5 69.5 X

Class Limits
Problem:
Class 20-25 25-40 40-50 50-60 60-65
Frequency 6 8 5 4 3
Drawing procedure:
Since in given frequency distribution, the class intervals are not equal
so we should use frequency density instead of frequency to get better
graph. Now the frequency density are shown below:
Class 20-25 25-40 40-50 50-60 60-65

Frequency density 1.20 0.53 0.50 0.40 0.60
Now in a graph paper plot the class limits along X-axis and frequency
density along Y-axis. Draw a rectangle on each class where the
where the length of rectangle is equal to the value of respective
frequency density and width is equal to class interval. All drawn
rectangles should be adjacent to each other. The graph thus
obtained is required histogram.
Y Histogram
Along Y-axis 1 square = 0.1 unit
1.5
Frequenc
density
1.0
y
1.2
0.5
0.53 0.6
0.5 0.4
0 20 25 40 50 60 65 X
Class Limits
e) Ogive curve:
When we plot the cumulative frequency of a frequency distribution
against the upper limit or lower limit then the curve obtained is
called cumulative frequency curve or ogive curve.
There are two types of ogive curve;
1) Less than ogive: Here the successive frequencies increase, and cumulative
frequency are plotted against upper limit of the class. It is an upward curve.
2) More than ogive: Here the successive frequencies decrease, and
cumulative frequency are plotted against lower limit of the class. It is a
downward curve.
Uses: i) to determine median, quantiles.

ii) to compare the run rate of two teams in cricket.
Example:
Over 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
BN 6 10 15 25 28 36 42 51 60 65 72 77 85 89 94 100 112 124 135 142
NZ 4 12 20 25 32 39 45 52 55 64 69 74 82 90 92 99 115 122 130 137
Less than Ogive

160
140
BN
120
NZ
100
Run
80
60
40
20
0
0 5 10 15 20 25
Over
Problem: Draw a less than ogive and a more than ogive
from the data given below:
Frequency
Class
f
30 – 40 5
40 – 50 8
50 – 60 6
60 – 70 4
70 – 80 4
80 – 90 3
Total N = 30
Drawing Procedure: Calculate less than cumulative frequency and more than
cumulative frequency. For less than ogive cumulative frequency are plotted against
upper limit of the class and for more than ogive cumulative frequency are plotted
against lower limit of the class. Now put a dot against every class limit for every
cumulative frequency and then join the dots by freehand turning. The freehand curve
thus obtained is ogive curve.
Frequency Less than/upward

Class More than/downward F
f F
30 – 40 5 5 30
40 – 50 8 13 25
50 – 60 6 19 17
60 – 70 4 23 11
70 – 80 4 27 7
80 – 90 3 30 3
Total N = 30
Less than ogive: (in Excel)
Less than Ogive Curve

35
30
Cumulative Frequency
25
20
15
10
0
30 40 50 60 70 80 90 100
Upper Limit
Less than ogive:(Manual) Ogive Curve
Along Y-axis 1 square = 2 units
30
Cumulative
Frequency
20
10
40 50 60 70 80 90
Upper Limit
More than ogive:(In excel)
More than Ogive Curve
35
30
25
Cumulative Frequency
20
15
10
0
20 30 40 50 60 70 80 90
Lower Limit
More than ogive:(Manual) Ogive Curve
30
Cumulative
Frequency
20
10
30 40 50 60 70 80
Lower Limit
ii) Qualitative/categorical data presentation:
a) Component or Bar diagram
Bar diagram is a suitable graph for representing the frequency distribution
of a categorical or qualitative or time series data. Here the categories are
plotted along X-axis and frequencies are plotted along Y-axis. A rectangle
or bar is drawn on each category where the length of bar is equal to the
value of respective frequency. There must be gap among the bars. All
drawn bars should be of equal width and equal distant to each other. It
plays an important role in newspaper, journals and campaigns.
Types
1) Simple bar diagram: only one variable
2) Multiple bar diagram: two/more interrelated variables
Problems
1) Draw a suitable diagram for the given data
Game like by Cadets Football Cricket Basketball Volleyball
No. of Cadets 120 80 60 40
Drawing Procedure
Sine the variable ‘game like by cadets’ is a categorical variable so we can
present the above data by a simple bar diagram. Here the categories (game
name) are plotted along X-axis and frequencies (No of Cadets) are plotted
along Y-axis. A rectangle or bar is drawn on each category where the length
of bar is equal to the value of respective frequency. There must be gap
among the bars. All drawn bars should be of equal width and equal distant
to each other. The diagram thus obtained is the required bar diagram.
Bar Diagram
Y
150
Cadets
No. of
100
120
50
80
60
40
Football Cricket Basketball Volleyball X

Game Name
In Excel
Bar Diagram
140
120
120
100
No. Of Cadets
80 Football
80 Cricket
Basketball
60
60 Volleyball
40
40
20
Game Name
2) Draw a suitable diagram for the given data of Bangladesh in last 4 years
Year 2016 2017 2018 2019
Export(million $) 140 160 150 180
Import(million $) 170 180 160 190
Drawing Procedure
Sine it is time series data with two interrelated variables so we can present the above
data by a multiple bar diagram. Here the categories (years) are plotted along X-axis
and frequencies (export, import) are plotted along Y-axis. Two adjacent rectangles or
bars are drawn on each category where the length of each bar is equal to the value of
respective frequency. There must be gap among the categories. All drawn pair of bars
should be of equal width and equal distant to each other. The diagram thus obtained
is the required multiple bar diagram.
Multiple Bar Diagram Export
Y Import
200
150
Export, Import
100
170 180 180 190
140
160 150 160
50
2016 2017 2018 2019 X

Year
In Excel
Export Import
Multiple Bar Diagram
200 190
180 180
180 170
160 160
160 150
140
140
Export, Import
120
100
80
60
40
20
0
2016 2017 2018 2019
Year
3) Present the following data by a suitable diagram
Year 2013 2014 2015 2016 2017 2018
Sales(Ton) 85 110 105 110 140 180
Since it is time series data so the suitable diagram will be bar diagram.
Drawing Procedure: Here the categories (years) are plotted along
X-axis and frequencies (sales) are plotted along Y-axis. A rectangle
or bar is drawn on each category where the length of bar is equal
to the value of respective frequency. There must be gap among
the bars. All drawn bars should be of equal width and equal
distant to each other. The diagram thus obtained is the required
bar diagram.
Bar Diagram
180
180
160
140
140
120 110 110

105
Sales
100 85
80
60
40
20
0
2013 2014 2015 2016 2017 2018
Year
b) Pie-Chart or angular diagram
Pie-Chart is a suitable graph for representing the total frequency
distribution (such as total budget) of a categorical or qualitative
data. A circle of suitable radius is drawn to represent the total
frequency where the circle is divided according to the proportion
of the magnitude of an item to the magnitude of all items. It plays
an important role in newspaper, journals and campaigns.
Problem
Draw a pie-chart for the following data
Game like by Cadets Football Cricket Basketball Volleyball Total
No. of Cadets 120 80 60 40 300
Drawing procedure:
For drawing pie-chart the data are expressed as the segments of ,
which is shown below:
Game name No. of cadets Angle in Degree
Football 120
Cricket 80
Basketball 60
Volleyball 40
Total 300
A circle is drawn with a suitable radius. The angles received in the center of
the circle can be drawn with the help of semi circular and the circle is
divided in different proportions. Each portion is marked individually. The
chart thus obtained is the required pie-chart.
Pie-Chart
Volleyball; 40;
13%
Football; 120;
Basketball; 60; 40%
20%
Cricket; 80; 27%

C) Historigram/simple line graph:
A set of data depending on time series is called time series. The line
graph of the time series is called historigram. Time series is a record
of the values of a variable during a particular period taken at
successive intervals of time. When the values of the variable are
plotted against time on graph paper and the points so obtained are
joined by straight line segments. A historigram gives a rough idea
about the nature of changes in the time dependent variable.
Problem
1) Present the following data by a suitable graph.
Year 2013 2014 2015 2016 2017 2018
Sales(Ton) 85 110 105 110 140 180

Since it is time series data so the suitable graph will be simple line graph or historigram.
Drawing Procedure: Plot the years along X-axis and sales along Y-axis. Now
put a dot against every year for every sales and then join the dots by strait
line segments one by one. The line graph thus obtained is the required
historigram.
Historigram
200
180
180
160
140
140
120 110 105 110
Sales
100 85
80
60
40
20
0
2012 2013 2014 2015 2016 2017 2018 2019
Year
2) Draw a suitable graph for the given data of Bangladesh in last 4 years
Year 2016 2017 2018 2019
Export(million $) 140 190 150 180
Import(million $) 170 180 160 190
Since it is time series data so the suitable graph will be multiple line
graph or historigram.
Drawing Procedure: Plot the years along X-axis and exports, imports
along Y-axis. Now put a dot against every year for every export and
then join the dots by strait line segments one by one. The line graph
thus obtained is the required historigram for exports. Similarly we
will get another line graph for imports.
Multiple line graph

200
180
160
140
Export, Import
120
Export(million $)
100
Import(million $)
80
60
40
20
0
2015 2016 2017 2018 2019 2020
Years

2 - Data

Uploaded by

Copyright:

Available Formats

You might also like

2 - Data

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

2 - Data

Uploaded by

Copyright:

Available Formats

STATISTICS-I

Chap-02: Collection, Organization & Presentation of Data

Example: Height, weight, body fitness of Cadets of

Example: If the cadets personal information collected

a) Direct personal interview or face to face interview or schedule

* The data are more reliable and accurate

b) Indirect oral inquiry:

*It is needed when the respondents are dangerous like

c) Mailed questionnaire method:

*It is fruitful when there are some sensitive questions

d) Through local agents or schedule:

* The chance of having wrong information is less

* It is less time and cost consuming

b) Unpublished sources: Private firms or business houses, institutions.

 Primary data is more expensive than secondary data in

 Secondary data may not be suitable, adequate and

 Primary data is original in character but not well organized

 One must always remember that each method of data collection

 Finally, the most desirable approach w. r. t. the selection of the

Processing: After collection, the data has to be processed

The data can be classified/tabulated into the following

Division Dhaka CTG Sylhet Mymensingh Barishal Rajshahi Rangpur Khulna

No of 44000 30450 15200 23630 12325 25214 16482 23147

No of 1005 1245 1348 1578 1658 1875 2312 1925

No of Patients 145000 102000

*** Classification/tabulation may be one way or two way

Muslim 95000 75000

Hindu 35000 15000

Buddhist 13000 11000

Christian 2000 1000

For an example, the number of COVID-19 patients of Bangladesh in 11th

1) Discrete frequency distribution:

For an example, the number of COVID-19 patients of Bangladesh in 11th

Step-2: Determination of number of classes

K = 1 + 3.322 ; N = no. of total observations

Step-3: Determination of class interval

Now the class interval , C

Less than CF More than CF Frequency Relative Percentage

40 – 50 8 5+8=13 25 0.8 0.27 27

70 – 80 4 27 3+4=7 0.4 0.13 13

The organized data can be presented by diagrams or

i) Quantitative presentation (Stem-leaf plot, Frequency curve,

Example: For observed value 27, 2 is stem and 7 is leaf

Use a stem-leaf plot to display the data

Where, 3│1 means 31 and class interval = 10 (30-40, 40-50, …)

Use a stem-leaf plot to display the data

Where, 5│1 means 5+1 = 6 and class interval = 5 (5-10, 10-15, …)

Where, 19│4 means 19.4 and class interval = 1 (19-20, 20-21, …)

Class 20-30 30-40 40-50 50-60 60-70

Class 19.5-29.5 29.5-39.5 39.5-49.5 49.5-59.5 59.5-69.5

0 19.5 29.5 39.5 49.5 59.5 69.5 X

Class 20-25 25-40 40-50 50-60 60-65

Uses: i) to determine median, quantiles.

Less than Ogive

Frequency Less than/upward

Less than Ogive Curve

Football Cricket Basketball Volleyball X