Professional Documents
Culture Documents
2 - Data
2 - Data
2 - Data
b) According to source
i) Primary data
ii) Secondary data
Data:
Primary data:
The data which are collected from the population or
sample units directly is called primary data.
Example: If the cadets personal information collected
from the cadets directly then it will constitute primary
data.
* Primary data is original in character, not well
organized, highly expensive w.r.t. money, time and
labor.
Data:
Secondary data:
The data which are already obtained by other persons
or organizations and are already published or utilized
is called secondary data.
e) Telephone interview:
In this method a conversation is happened between the
interviewer and respondent through a telephone call.
g) Through experiment:
In natural sciences like physics, chemistry, astronomy and
biological sciences like botany, zoology, biochemistry,
microbiology, pharmacy data are obtained when
treatments are applied on experimental units in a
controlled laboratory.
Sample questionnaire:
Title: ‘Applicable or not for Math Olympiad’
Title: ‘Teaching Proficiency’
Title: ‘Cadet’s Personality & Behavior’
*
Sources of secondary data :
a) Published sources:
International publications, articles
Govt., semi-govt., non govt. publications/records
Official Statistics
Trade and financial Journals
Diaries, Books, Newspapers, magazines
Websites, blogs
International organizations like World Bank, WHO, IMF, ILO, UNDP
Local organizations like BBS, BRRI, BARDEM, BRAC, ICDDRB
The secondary data may be used in case the researcher finds them
reliable, adequate and appropriate for his study.
Editing:
Once the set of data have been collected, it is necessary to
process them for proper presentation. Editing of data is required
as preparatory work before the tabulation and statistical analysis
is carried out. This is quite a difficult job and requires a great
deal of skill and experience. While editing primary data the
following consideration need attention:
The data should be complete
The data should be consistent
The data should be accurate
The data should be homogenous
Coding:
When the data is to be processed by computer it must be coded
and converted into the computer language. For some qualitative
data, the code numbers can be assigned. For example, to a
question, “do you smoke?” a code of 1 can be assigned to the
answer ‘yes’ and a code of 0 can be assigned to the answer ‘no’.
Classification/tabulation of Data:
Classification/tabulation is a process of arranging the
available information into homogeneous groups within
some rows and columns according to similarities or
same characteristics.
Construction of a table:
A good table consists of the followings:
1) Table number 2) Title of the table, 3) Row heading or stub,
4) Column heading or caption, 5) Body of the table, 6)
Footnote( If needed).
Classification/tabulation of Data:
No of 22 33 45 57 75 99 125 258
Patients
Classification of Data:
d) Qualitative classification:
In Qualitative classification the data are classified in
terms of attributes or categories.
Table-04: The number of COVID-19 patients of Bangladesh
classified according to different sex are shown below
Sex Male Female
Frequency:
The repeated times of a value is called frequency of
that value.
For example: The age of 5 cadets are 24, 25, 24, 26, 25.
Here the frequency of three values 24, 25, 26 are 2, 2, 1
respectively.
*** The data with frequency is called grouped data and without
frequency is called ungrouped data.
Quantitative Classification/Frequency Distribution:
Frequency distribution:
It is a statistical table which shows the distribution of
whole data according to different classes.
0 250
1 125
2 75
3 50
Total 500
Frequency Distribution:
2) Continuous frequency distribution:
In this frequency distribution the continuous data are
represented in terms of some class interval.
C = (should be integer)
As far as possible one should avoid class intervals such as 3, 4, 7, 11, 26,
etc. Preferably, one should have class intervals of either 5 or multiples of
Frequency Distribution:
Step-4: Determination of class limit
The starting point, i.e. the lower limit of the first class,
should either be zero or 5 or multiple of 5. For example, if
the lowest value of the series is 63 and we have taken a class
interval of 10, then the first class should be 60-70, instead of
63-73.
Step-5: Tally and frequency
Take each item from the data one at a time and put tally
mark ( ) against the class to which the item belongs. Count
the tally marks and place this number against the class to
which the items belong. Count the total frequency and check
Frequency Distribution:
Mathematical problem:
Construct a frequency table (exclusive method) by using suitable class
interval from the following obtained marks of 30 students:
34 36 31 46 76 86 42 44 32 46 40 54 66 56 50
42 33 80 77 81 46 40 60 63 64 76 56 57 57 70
Solution:
Here,
The maximum value = 86
The minimum value = 31
So range, R = 86 – 31
Frequency Distribution:
According to Sturges rule number of classes,
K = 1 + 3.322 ; N = 30
= 1 + 3.322 (30)
= 5.9069
= 6 (app)
= 9.1667
Frequency Distribution:
Table: Frequency distribution by taking class interval as 10
Class Tally Frequency
(Marks Obtained) f
30 – 40 5
40 – 50 8
50 – 60 6
60 – 70 4
70 – 80 4
80 – 90 3
Total N = = 30
Frequency distribution
*Calculation of Less than and more than cumulative frequency(F), frequency density,
relative frequency, percentage frequency.(K=class interval)
30 – 40 5 5 30 5/10=0.5 0.17 17
50 – 60 6 19 17 0.6 0.20 20
60 – 70 4 23 11 0.4 0.13 13
80 – 90 3 30 3 0.3 0.10 10
Total N = 30 1 100
Presentation of Data:
a) Stem-leaf plot:
It is a graphical technique of representing quantitative data that can be
used to examine to shape of the distribution, the range of the values and
point of concentration of the values. Each numerical value is divided into
two parts namely stem and leaf. Usually the stem is the first digit or digits
or integer part or any other suitable part of the observed values and leaf
is the trailing digit or decimal places.
34 36 31 46 76 86 42 44 32 46 40 54 66 56 50
42 33 80 77 81 46 40 60 63 64 76 56 57 57 70
13 12 6 8 15 18 17 24 28 23 27 23
21 20 15 18 23 25 23 13 17 18 19 18
Stem Leaf
5 1 3
10 3 2 3
15 0 3 2 0 3 2 3 4 3
20 4 3 3 1 0 3 3
25 3 2 0
Presentation of Data:
By arranging the leaves in ascending order the final stem-leaf plot will be:
Stem Leaf Frequency
5 1 3 2
10 2 3 3 3
15 0 0 2 2 3 3 3 3 4 9
20 0 1 3 3 3 3 4 7
25 0 2 3 3
Total N = 24
20.8 20.9 22.0 22.3 22.6 21.7 20.4 21.4 23.3 19.8
20.9 21.0 22.6 21.5 22.2 19.4 20.4 21.5 22.7 21.3
Use a stem-leaf plot to display the data following 19│4 means 19.4
Presentation of Data:
Here,
Maximum value = 23.3
Minimum value = 19.4
The stem-leaf plot for the given data will be as follows:
Stem Leaf
19 8 4
20 8 9 4 9 4
21 7 4 0 5 5 3
22 0 3 6 6 2 7
23 3
Presentation of Data:
By arranging the leaves in ascending order the final stem-leaf plot will be:
Stem Leaf Frequency
19 4 8 2
20 4 4 8 9 9 5
21 0 3 4 5 5 7 6
22 0 2 3 6 6 7 6
23 3 1
Total N = 20
Frequency 4 6 8 15 12 4 1 50
Drawing Procedure: Plot the mid point of class intervals along X-axis and
corresponding frequencies along Y-axis. Now put a dot against every mid point for
every frequency and then join the dots by freehand turning. The freehand curve
thus obtained is required frequency curve.
Presentation of Data:
Y Frequency Curve
Along X-axis 1 square = 2 units
20 Along Y-axis 1 square = 1 unit
15
Frequenc
10
y
0 5 15 25 35 45 55 65 X
Mid Value
Presentation of Data:
C) Frequency Polygon:
When we plot the frequencies corresponding to different mid value of
class intervals and the polygon thus obtained is called frequency polygon.
Problem:
Construct a frequency polygon from the following distribution:
Class 0 - 10 10 - 20 20 - 30 30 - 40 40 - 50 50 – 60 60 - 70 Total
Frequency 4 6 8 15 12 4 1 50
Drawing Procedure: Plot the mid point of class intervals along X-axis and
corresponding frequencies along Y-axis. Now put a dot against every mid point for
every frequency and then join the dots by strait lines one by one. Add the starting dot
with previous mid point and ending dot with next mid point. The polygon thus
obtained is required frequency polygon.
Presentation of Data:
Y Frequency Polygon
Along X-axis 1 square = 3 units
20 Along Y-axis 1 square = 1 unit
15
Frequenc
10
y
0 5 15 25 35 45 55 65 75 X
Mid Value
Frequency distribution
d) Histogram/column diagram:
Histogram is a suitable graph for representing the frequency distribution of
a continuous or grouped series(exclusive). If the class limit is inclusive, then
it must be converted to exclusive. Here the class limits are plotted along X-
axis and frequencies are plotted along Y-axis. A rectangle(column) is drawn
on each class where the length of rectangle is equal to the value of
respective frequency and width is equal to class interval. All drawn
rectangles should be adjacent to each other.
Uses:
i) frequency curve and frequency polygon can be drawn from histogram by
joining the midpoints of the rectangles.
ii) Mode can be determined from histogram of any frequency distribution
iii) Histogram is used to present social, economic and researches data.
Frequency distribution
Problem:
1) Draw a histogram from the following frequency distribution
Drawing procedure:
In a graph paper plot the class limits along X-axis and frequency along Y-
axis. Draw a rectangle on each class where the where the length of
rectangle is equal to the value of respective frequency and width is equal
to class interval. All drawn rectangles should be adjacent to each other.
The graph thus obtained is required histogram.
Presentation of Data:
Y Histogram
Along X-axis 1 square = 2 units
Along Y-axis 2 squares = 1 unit
7
Frequenc
5
y
8
3 6
5
4
3
1
0 20 30 40 50 60 70 80 X
Class Limits
Frequency curve from Histogram:
Y
Histogram
Along X-axis 1 square = 2 units
Along Y-axis 2 squares = 1 unit
7
Frequenc
5
y
8
3 6
5
4
3
1
0 20 30 40 50 60 70 80 X
Class Limits
Frequency polygon from Histogram:
Y
Histogram
Along X-axis 1 square = 2 units
Along Y-axis 2 squares = 1 unit
7
Frequenc
5
y
8
3 6
5
4
3
1
0 20 30 40 50 60 70 80 X
Class Limits
Frequency distribution
Problem:
2) Draw a histogram from the following frequency distribution
Class 20-29 30-39 40-49 50-59 60-69
Frequency 6 8 5 4 3
Drawing procedure:
Firstly, convert the inclusive class intervals into exclusive class intervals
by adding 0.5 with upper limit and subtracting 0.5 from lower limit.
(lower limit of next class – upper limit of previous class)/2 = 30-29/2 = 0.5
Frequency distribution
Now in a graph paper plot the class limits along X-axis and frequency
along Y-axis. Draw a rectangle on each class where the where the
length of rectangle is equal to the value of respective frequency and
width is equal to class interval. All drawn rectangles should be
adjacent to each other. The graph thus obtained is required
histogram.
Presentation of Data:
Y Histogram
Along X-axis 1 square = 2 units
Along Y-axis 2 squares = 1 unit
7
Frequenc
5
y
8
3 6
5
4
3
1
Drawing procedure:
Since in given frequency distribution, the class intervals are not equal
so we should use frequency density instead of frequency to get better
graph. Now the frequency density are shown below:
Frequency distribution
Now in a graph paper plot the class limits along X-axis and frequency
density along Y-axis. Draw a rectangle on each class where the
where the length of rectangle is equal to the value of respective
frequency density and width is equal to class interval. All drawn
rectangles should be adjacent to each other. The graph thus
obtained is required histogram.
Presentation of Data:
Y Histogram
Along X-axis 1 square = 2 units
Along Y-axis 1 square = 0.1 unit
1.5
Frequenc
density
1.0
y
1.2
0.5
0.53 0.6
0.5 0.4
0 20 25 40 50 60 65 X
Class Limits
Frequency distribution
e) Ogive curve:
When we plot the cumulative frequency of a frequency distribution
against the upper limit or lower limit then the curve obtained is
called cumulative frequency curve or ogive curve.
There are two types of ogive curve;
1) Less than ogive: Here the successive frequencies increase, and cumulative
frequency are plotted against upper limit of the class. It is an upward curve.
2) More than ogive: Here the successive frequencies decrease, and
cumulative frequency are plotted against lower limit of the class. It is a
downward curve.
140
BN
120
NZ
100
Run
80
60
40
20
0
0 5 10 15 20 25
Over
Frequency distribution
Problem: Draw a less than ogive and a more than ogive
from the data given below:
Frequency
Class
f
30 – 40 5
40 – 50 8
50 – 60 6
60 – 70 4
70 – 80 4
80 – 90 3
Total N = 30
Frequency distribution
Drawing Procedure: Calculate less than cumulative frequency and more than
cumulative frequency. For less than ogive cumulative frequency are plotted against
upper limit of the class and for more than ogive cumulative frequency are plotted
against lower limit of the class. Now put a dot against every class limit for every
cumulative frequency and then join the dots by freehand turning. The freehand curve
thus obtained is ogive curve.
30
Cumulative Frequency
25
20
15
10
0
30 40 50 60 70 80 90 100
Upper Limit
Frequency distribution
Less than ogive:(Manual) Ogive Curve
Along X-axis 1 square = 2 units
Along Y-axis 1 square = 2 units
30
Cumulative
Frequency
20
10
40 50 60 70 80 90
Upper Limit
Frequency distribution
More than ogive:(In excel)
More than Ogive Curve
35
30
25
Cumulative Frequency
20
15
10
0
20 30 40 50 60 70 80 90
Lower Limit
Frequency distribution
More than ogive:(Manual) Ogive Curve
Along X-axis 1 square = 2 units
Along Y-axis 1 square = 2 units
30
Cumulative
Frequency
20
10
30 40 50 60 70 80
Lower Limit
Presentation of Data:
ii) Qualitative/categorical data presentation:
a) Component or Bar diagram
Bar diagram is a suitable graph for representing the frequency distribution
of a categorical or qualitative or time series data. Here the categories are
plotted along X-axis and frequencies are plotted along Y-axis. A rectangle
or bar is drawn on each category where the length of bar is equal to the
value of respective frequency. There must be gap among the bars. All
drawn bars should be of equal width and equal distant to each other. It
plays an important role in newspaper, journals and campaigns.
Types
1) Simple bar diagram: only one variable
2) Multiple bar diagram: two/more interrelated variables
Presentation of Data:
Problems
1) Draw a suitable diagram for the given data
Game like by Cadets Football Cricket Basketball Volleyball
No. of Cadets 120 80 60 40
Drawing Procedure
Sine the variable ‘game like by cadets’ is a categorical variable so we can
present the above data by a simple bar diagram. Here the categories (game
name) are plotted along X-axis and frequencies (No of Cadets) are plotted
along Y-axis. A rectangle or bar is drawn on each category where the length
of bar is equal to the value of respective frequency. There must be gap
among the bars. All drawn bars should be of equal width and equal distant
to each other. The diagram thus obtained is the required bar diagram.
Bar Diagram
Along Y-axis 1 square = 10 units
Y
150
Cadets
No. of
100
120
50
80
60
40
120
120
100
No. Of Cadets
80 Football
80 Cricket
Basketball
60
60 Volleyball
40
40
20
Game Name
Presentation of Data:
2) Draw a suitable diagram for the given data of Bangladesh in last 4 years
Year 2016 2017 2018 2019
Export(million $) 140 160 150 180
Import(million $) 170 180 160 190
Drawing Procedure
Sine it is time series data with two interrelated variables so we can present the above
data by a multiple bar diagram. Here the categories (years) are plotted along X-axis
and frequencies (export, import) are plotted along Y-axis. Two adjacent rectangles or
bars are drawn on each category where the length of each bar is equal to the value of
respective frequency. There must be gap among the categories. All drawn pair of bars
should be of equal width and equal distant to each other. The diagram thus obtained
is the required multiple bar diagram.
Multiple Bar Diagram Export
Along Y-axis 1 square = 10 units
Y Import
200
150
Export, Import
100
170 180 180 190
140
160 150 160
50
120
100
80
60
40
20
0
2016 2017 2018 2019
Year
Presentation of Data:
3) Present the following data by a suitable diagram
Year 2013 2014 2015 2016 2017 2018
Since it is time series data so the suitable diagram will be bar diagram.
Drawing Procedure: Here the categories (years) are plotted along
X-axis and frequencies (sales) are plotted along Y-axis. A rectangle
or bar is drawn on each category where the length of bar is equal
to the value of respective frequency. There must be gap among
the bars. All drawn bars should be of equal width and equal
distant to each other. The diagram thus obtained is the required
bar diagram.
Presentation of Data:
Bar Diagram
180
180
160
140
140
100 85
80
60
40
20
0
2013 2014 2015 2016 2017 2018
Year
Presentation of Data:
b) Pie-Chart or angular diagram
Pie-Chart is a suitable graph for representing the total frequency
distribution (such as total budget) of a categorical or qualitative
data. A circle of suitable radius is drawn to represent the total
frequency where the circle is divided according to the proportion
of the magnitude of an item to the magnitude of all items. It plays
an important role in newspaper, journals and campaigns.
Problem
Draw a pie-chart for the following data
Game like by Cadets Football Cricket Basketball Volleyball Total
No. of Cadets 120 80 60 40 300
Presentation of Data:
Drawing procedure:
For drawing pie-chart the data are expressed as the segments of ,
which is shown below:
Game name No. of cadets Angle in Degree
Football 120
Cricket 80
Basketball 60
Volleyball 40
Total 300
Presentation of Data:
A circle is drawn with a suitable radius. The angles received in the center of
the circle can be drawn with the help of semi circular and the circle is
divided in different proportions. Each portion is marked individually. The
chart thus obtained is the required pie-chart.
Pie-Chart
Volleyball; 40;
13%
Football; 120;
Basketball; 60; 40%
20%
100 85
80
60
40
20
0
2012 2013 2014 2015 2016 2017 2018 2019
Year
Presentation of Data:
2) Draw a suitable graph for the given data of Bangladesh in last 4 years
Year 2016 2017 2018 2019
Export(million $) 140 190 150 180
Import(million $) 170 180 160 190
Since it is time series data so the suitable graph will be multiple line
graph or historigram.
Drawing Procedure: Plot the years along X-axis and exports, imports
along Y-axis. Now put a dot against every year for every export and
then join the dots by strait line segments one by one. The line graph
thus obtained is the required historigram for exports. Similarly we
will get another line graph for imports.
Presentation of Data:
180
160
140
Export, Import
120
Export(million $)
100
Import(million $)
80
60
40
20
0
2015 2016 2017 2018 2019 2020
Years