Download as pdf or txt
Download as pdf or txt
You are on page 1of 11

Bio Statistics 2022

2. FREQUENCY DISTRIBUTION AND GRAPHS

Objectives:
At the end of the chapter, the student should be able to:
1. Summarize and present data collected.
2. State the rules for frequency distribution;
3. Make a grouped frequency distribution; and
4. Identify the tree main types of graphs to represent data that is
in a frequency distribution.

Introduction
The problem most decision makers must resolve is how to deal with the uncertainty that is
inherent in almost all aspects of their jobs. Raw data provide little, if any, information to the decision
makers. Thus, they need a means of converting the raw data into useful information.
The two main functions of descriptive statistics are summarizing and presenting data. The most
common way to summarize data is in a frequency distribution. Charts and graphs are also used to
present data.

Frequency Distribution
The easiest method of organizing data is a frequency distribution, which converts raw data into a
meaningful pattern for statistical analysis. A frequency distribution is a table used to describe a data
set. It summarizes data by telling how many frequencies appear in each group or class. A categorical
frequency distribution is used for nominal data and lists the categories and tells how many are in the
category. Numerical data can be presented in ungrouped or grouped frequency distribution. An
ungrouped frequency distribution lists each number and the frequency for that number. A grouped
frequency distribution gives several classes and the frequencies for each class. To decide whether to
use ungrouped or grouped frequency distribution, find the range. A range is the highest number minus
the lowest number in the data set. If the range is small, use an ungrouped frequency distribution.
Examples:
a. Categorical Data
Ungrouped data:
Table 2. Ages of Freshman Students
Age Frequency
15 5
16 17
17 3
Total 25

b. Grouped data:
Table 3. Family Income
Income (Php000) Frequency
10 – 14 25
15 – 19 17
20 – 24 13
25 – 29 9
30 – 34 6
Total 70

Prepared by MARIANNE JANE ANTOINETTE D. PUA, M.S. -Page- 1 -


Bio Statistics 2022

RULES FOR FREQUENCY DISTRIBUTIONS


1) There should be between 5 to 20 classes.
2) The class width should be odd.
3) The class limits should be continuous
4) The class limits must be mutually exclusive. Class limits can not overlap.
5) The classes should be continuous.
6) Classes must be exhaustive. There should be a class for all the data. The classes should be
equal in width.

STEPS IN MAKING A GROUPED FREQUENCY TABLE


A frequency table lists intervals or ranges of data values called data classes together with the
number of data values from the set that are in each class. This number is called the frequency of the
class. The following are the steps of constructing a grouped frequency distribution:
1) Find the highest and lowest value.
2) Find the range. r  highest  lowest .
3) Decide the number of classes, (k ) to use (between 5 and 15)
r
4) Find the suggested class width (cw) : cw  . (Round up)
k
5) Decide on class bounds. Take the lowest value (or slightly lower) for the first lower class limit.
Add the class width to get the next lower class limit. Keep adding the class width to get all of
the lower class limits.
6) Find the upper limits. If the data is rounded off to the units place, subtract one from the second
lower class limits. If the data is round off to the tenths place, subtract 0.01 . If the data is
rounded to the hundredths place, round off to 0.01 , etc.
7) Find the class boundaries.
8) Tally and write down the frequencies for each class.

Example: Make a frequency distribution of the data below:


200 325 180 210 225
185 320 225 235 310
290 250 285 190 235
260 230 245 205 290
Solution:
Step 1: Highest = 325 lowest = 180
Step 2:
Step 3: Class bound is 6
Step 4: or
Step 5: Since the lowest number is could be used as starting point to get exactly 6 classes:.
Step 6:

Prepared by MARIANNE JANE ANTOINETTE D. PUA, M.S. -Page- 2 -


Bio Statistics 2022

Lower class limits


Add 25 to get the next lower limit and continue
to get the next limit.

180
205
230
255
280
305
Upper class limits

Subtract 1 from the second lower class limit to


get the first upper class limit and continue the
process to get the next upper class.
204
239
254
279
305
329
Thus, the class limits are:
Class Limits
180 - 204
205 - 239
230 - 254
255 - 279
280 - 305
305 - 329
Step 7:
Class boundaries: Subtract 0.5 from each of the
lower class limit and add 0.5 to each upper class
limit.

Class Limits Class Boundaries


180 - 204 179.5 - 204.5
205 - 239 204.5 - 239.5
230 - 254 229.5 - 254.5
255 - 279 254.5 - 279.5
280 - 305 279.5 - 305.5
305 - 329 305.5 - 329.5

Step 8:

Prepared by MARIANNE JANE ANTOINETTE D. PUA, M.S. -Page- 3 -


Bio Statistics 2022

The frequencies and the cumulative frequencies

Class Limits Class Boundaries f cf


180 - 204 179.5 - 204.5 4 4
205 - 239 204.5 - 239.5 5 9
230 - 254 229.5 - 254.5 4 13
255 - 279 254.5 - 279.5 1 14
280 - 305 279.5 - 305.5 3 17
305 - 329 305.5 - 329.5 3 20
Total 20

Example: Statistics exam grades. Suppose that 20 statistics students’ scores on an exam are as
follows:
97, 92, 88, 75, 83, 67, 89, 55, 72, 78, 81, 91, 57, 63, 67, 74, 87, 84, 98, 46
r 52
We select k  6 and r  98  46  52 thus cw  = = 8.67  9 . The frequency table is as
k 6
follows:
Table 4. Examination Grades in Statistics 11

Class Limits Class Boundaries f cf


90-99 90.5-99.5 4 4
80-89 90.5-89.5 6 10
70-79 70.5-79.5 4 14
60-69 60.5-69.5 3 17
50-59 50.5-59.5 2 19
40-49 40.5-49.5 1 20
Total 20

Grouped frequency distributions have parts besides the class limits and frequencies that can
be found if the limits and the frequencies are given. The class boundaries are obtained by taking half
of the distance from one upper class limit to the next upper class. Subtract this amount from each
lower class limit (LL) and add this amount to each Upper Class Limit (UL).
Range – The difference between the highest data(l) and the lowest data(s):
Lower Class Limit (LL) – The least value that can belong to a class.
Upper Class Limit (UL) – The greatest value that can belong to a class.
Class Width (CW) – The difference between the upper (or lower) class limits of consecutive
classes. All classes should have the same class width.
Class Midpoint (CM) – The middle value of each data class. To find the class midpoint, average
the upper and lower class limits, that is, .

Example: From the frequency table of statistics grades above.


The upper class limits are 99, 89, 79, 69, 59, and 49.
The lower class limits are 90, 80, 70, 60, 50, and 40.
The class midpoints are 94.5, 84.5, 74.5, 64.5, 54.5, and 44.5.
The width of each class is 10.

Prepared by MARIANNE JANE ANTOINETTE D. PUA, M.S. -Page- 4 -


Bio Statistics 2022

From Table 4, the frequency distribution with class boundaries and midpoints is:
Table 4. Examination Grades in Statistics 11

Class Class Midpoints


Limits f Boundaries (CW)
90-99 4 90.9-99.5 94.5
80-89 6 90.5-89.5 84.5
70-79 4 70.5-79.5 74.5
60-69 3 60.5-69.5 64.5
50-59 2 50.5-59.5 54.5
40-49 1 40.5-49.5 44.5
Total 20

NOTE: The Frequency Distribution shows how the observations cluster around a central value; and
degree of difference between observations.

MATHEMATICAL NOTATION
The following symbols and variables will have the meanings given below. (unless otherwise
specified)
Variables
x = data value
n = number of values in a sample data set
N = number of values in a population data set
f = frequency of a data class

Symbol
 indicates the sum of all values for the following variable or expression.

Example: Using our notation, we can write the statement that the sum of the frequencies in a frequency
table should equal the number of values in the data set as follows:
 f n
CUMULATIVE FREQUENCY
The cumulative frequency of a data class is the number of data elements in that class and all
previous classes. This can be done either ascending or descending order.
Example:
Class Frequency ( f ) Cumulative
Frequency
90-99 4 4
80-89 6 10
70-79 4 14
60-69 3 17
50-59 2 19
40-49 1 20
Notice that the last entry in the cumulative frequency column is n  20 .

RELATIVE FREQUENCY
The relative frequency of a data class is the percentage of data elements in that class. We can
calculate the relative frequency for each class as follows:

Prepared by MARIANNE JANE ANTOINETTE D. PUA, M.S. -Page- 5 -


Bio Statistics 2022

f
relative frequency =
n
Example:
Class Frequency ( f ) Cumulative Relative
Frequency Frequency
(f / n)
90-99 4 4 0.20
80-89 6 10 0.30
70-79 4 14 0.20
60-69 3 17 0.15
50-59 2 19 0.10
40-49 1 20 0.05

Note: The sum of the relative frequencies should be 1.


f
 n 1

GRAPHS FOR FREQUENCY DISTRIBUTION


There are three main types of graphs used to represent data that in a frequency distribution.
The HISTOGRAM, the FREQUENCY POLYGON and the
Examination Grade in Statistics 11
OGIVE. On all graphs and charts, equal spaces should always
represent equal amounts and should be properly labeled and titled. 7
A histogram is a graphical representation of the 6

information in a frequency table using a bar graph. The height of 5


Frequency

the bars represents the frequencies and the bottom of the bars is 4

labeled with the class boundaries. Since the class boundaries 3

overlap, the bars touch on the sides. 2


1

0
Example: Created in Excel from the data used in Table 4:
94.5 84.5 74.5 64.5 54.5 44.5
Class limits

Notice that the bar for each class is centered at the class
midpoint, and the bars for successive classes touch.

A frequency polygon is a line graph representation of


Examination Grade in Statistics 11
the information in a frequency table. The frequencies are
plotted up the side and the midpoints are labeled along the 7

bottom. The numbers written on the bottom should actually be 6


the midpoints. The midpoints should be the same distance
5
apart. An extra space is assed to the left and an extra space is
Frequency

added to the right, keeping the same distance along the x-axis. 4

The frequencies for these two points are zero. 3

Like a histogram, the vertical axis represents frequency 2


and the horizontal axis represents the variable being measured in
1
the data set. To construct the graph, a point is plotted for each
class at its midpoint and with height given by the frequency of 0
104.5 94.5 84.5 74.5 64.5 54.5 44.5 34.5
the class. The points are then connected by straight lines. Class limits

Example: Created in Excel using the same data in Table 4.

Prepared by MARIANNE JANE ANTOINETTE D. PUA, M.S. -Page- 6 -


Bio Statistics 2022

The ogive is a graph of the cumulative frequencies and


the upper boundaries. The cumulative frequencies are plotted up Examination Grade in Statistics 11

to the side and the upper class limits are labeled along the 25

bottom. An extra space is added to the left of the x-axis and the
frequency of this extra one is zero. 20

15
Relative frequency graphs use the proportion for each
group instead of the frequencies. The proportion for each group 10

is found by dividing each frequency by the total number of


5
frequencies. Relative histograms, frequency polygons, and
ogives will look like the previous graphs except that the 0
proportions will be graphed up the side instead of frequencies. 99 89 79 69 59 49

OTHER GRAPHS AND CHARTS

Pareto charts represent data of a categorical frequency distribution in a bar graph. The bars
need to be the same width and the same space should be used between each bar.

Pareto Diagram (Excel)


Pareto Diagram (SPSS)
Histogram

30 150
Percent

15 120.00%
11 100.00%
Frequency 9
10 80.00%
60.00%
20 100
Frequency

5 40.00%
20.00%
0
0 0.00%
2 1 More
10 50
Frequency 11 9 0
8
Cumulative 55.00% 100.00% 100.00%
5 Cumulative Count %
4
3
0 0 Count Gender
FRESHMAN JUNIOR
SOPHOMORE SENIOR

Class level
17

Time Series graphs plot numbers over period of time by


Enrollment Data from 2003-2008
putting the series of time along the bottom of the graph and the
amount of each period of time as a point above that time and then 900

800
connects the dots with a line to show the trend over time.
700

600

Pie charts divide a circle into proportions representing 500

the percent total for each category. 400

40-49 300
50-59 5% 90-99 200
10% 20%
100
60-69 0
15% 2003 2004 2005 2006 2009 2008

80-89
70-79 30%
20%

Prepared by MARIANNE JANE ANTOINETTE D. PUA, M.S. -Page- 7 -


Bio Statistics 2022

EXPLORATORY DATA ANALYSIS


Exploratory data analysis is used to examine data to find out what can be discovered about the
data. Two methods to present for exploratory data analysis are stem-and-leaf plot and box plot.

A STEM-AND-LEAF PLOT uses the first digit (or digits) as the stem and the last digit as the
leaf to form group of classes.

Example 1: A 100 item test was given to 25 statistics students. The result is shown below:
55 32 20 22 43 14 17 48 24
31 21 22 35 23 36 23 18 25
13 28 12 29 13 18 19
Make a stem-and-leaf plot of the above data.
Solution:

Step 1. Arrange the data to ascending order (from lowest to highest)

12 13 13 14 17 18 18 19 20
21 22 22 23 23 24 25 28 29
31 32 35 36 43 48 55

Step 2. Separate the data according to classes. Using the first digit to separate the classes, we have.
12 13 13 14 17 18 18 19
20 21 22 22 23 23 24 25 28 29
31 32 35 36
43 48
55

Step 3. Use the first digit for the leading digit (or stem) and list all the last digits in order for the
trailing digit (or leaf):

Stem Leaf
1 2 3 3 4 7 8 8 9
2 0 1 2 2 3 3 4 5 8 9
3 1 2 5 6
4 3 8
5 5

Interpretation:
The stem-and-leaf plot shows that most of the students obtained the score from 20 to 29.

Example 2: Make a stem-and leaf plot for the following numbers.


215 239 212 245 226 228 246 213 247 225
236 223 221 248 237 242 218 236 232 238

Solution:

Step 1. Arrange the data to ascending order (from lowest to highest)

212 213 215 218 221 223 225 226 228 232
236 236 237 238 239 242 245 246 247 248
Prepared by MARIANNE JANE ANTOINETTE D. PUA, M.S. -Page- 8 -
Bio Statistics 2022

Step 2. Separate the data according to classes. Since all of the first digits in the given data are the same
(2), use the second digit to separate the classes.

212 213 215 218


221 223 225 226 228 232
236 236 237 238 239
242 245 246 247 248

Step 3. Use the first 2 digits for the leading digit (or stem) and list all the last digits in order for the
trailing digit (or leaf):
Leading Digit Stem
21 2 3 5 9
22 1 3 5 6 8
23 2 6 6 7 8 9
24 2 5 6 7 8
Interpretation:
The stem-and-leaf plot shows that most of the students obtained the score from 231 to 239.
A BOX-AND-WHISKER PLOT graphs five values of the set of data on a number line. The
five values are:
1. The lowest value in the set of data.
2. The lower hinge.
3. The median.
4. The upper hinge.
5. The highest value of the set of data.

A box is drawn from the lower hinge to the upper hinge and lines are drawn from the box to the
highest and lowest value. The lower hinge is the median of all the values less than or equal to the
median when the set of data set has an odd number of values, or the median of all values less than the
median when the set of data has an even number of values. The upper hinge is the median of all values
greater than or equal median when the set of data has an odd number of values, or the median of all
values greater than the median when the set of data has an even number of values.

Example 1: A 100 item test was given to 25 statistics students. The result is shown below:
55 32 20 22 43 14 17 48 24
31 21 22 35 23 36 23 18 25
13 28 12 29 13 18 19

Solution:
Step 1. Arrange the data to ascending order (from lowest to highest)

12 13 13 14 17 18 18 19 20
21 22 22 23 23 24 25 28 29
31 32 35 36 43 48 55

Step 2. Determine the five values:


1. The lowest value in the set of data is 12.
2. The highest value of the set of data is 55
3. The median is 23
4. The lower hinge is the midpoint of the numbers below the median which is 18
5. The upper hinge is the midpoint of the numbers above the median which is 31.5
(Below is the graph using SSP)
Prepared by MARIANNE JANE ANTOINETTE D. PUA, M.S. -Page- 9 -
Bio Statistics 2022

Interpretation: The box whisker plot shows that the data is not symmetrical and that the data is
positively skewed since the whisker in longer on the right.

Example 2.

. Determine the five values:


1. The lowest value in the set of data is 212.
2. The highest value of the set of data is 248
3. The median is 234.
4. The lower hinge is the midpoint of the numbers below the median which is 222.
5. The upper hinge is the midpoint of the numbers above the median which is 240

(Below is the graph using SSP)

Interpretation: The box and whisker plot shows that the data is not symmetrical and that the
data is negatively skewed since the whisker in longer on the right.

Prepared by MARIANNE JANE ANTOINETTE D. PUA, M.S. -Page- 10 -


Bio Statistics 2022

Worksheet no. 2

1. From the given data, construct their corresponding frequency distribution indicating the Steps 1
to 7.
a. Ages of 30 ISU students
18 17 31 36 30 35 23 33 22 24
19 27 21 22 24 33 19 26 28 26
29 18 21 25 23 25 28 29 27 18

b. The Grade Point Average of 16 students in Statistics 11.


2.14 2.89 2.36 1.56 2.98 2.76 2.48 3.16
3.35 2.69 1.57 1.82 1.78 3.56 2.64 2.36

2. Given the following frequency distributions of examination scores of students in


Statistics. Construct the following:
a. Histogram d. Relative frequency histogram
b. Frequency polygon e. Relative frequency polygon
c. Ogive f. Relative frequency ogive

Class
f
Limits
40-49 2
50-59 3
60-69 4
70-79 8
80-89 5
90-99 2
Total 24

3. Construct a time series graph using the data below:


School Year 2000 2001 2002 2003 2004 2005 2006
Income 3500 4500 4950 3000 2600 5400 5490
4. Construct a pie graph using the data below:
Brand A B C D E
Number 50 30 20 10 45

References:
Beaver, B.M. and Beaver R.J. (1999). Introduction to Probability and Statistics. 10th ed. New York: Duxbury Press.
Bluman, A. (1998) Elementary Statistics: A Step by Step Approach. 3rd ed. McGraw-Hill Book Co.
Deuna, Melecio C. (1996), Elementary Statistics for Basic Education. Quezon City: Phoenix Publishing House, Inc.
Febre, F.A. and Virginia F. Cawagas (Consultant)(1987) Introduction to Statistics. Metro Manila, Pheonix Publishing
House, Inc.
Ferguson G. (1981) Statistical Analysis in Psychology and Education. 5th ed. New York: McGraw-Hill Book Company.
Padua, R. N., E.G. Adanza and R.T. Guinto (1986) Statistics: Theory and Applications. Metro Manila: Hermil Printing
Services.
Reyes, C.Z. and Saren, L.L. (2003). Metro Manila. M.G. Reprographics.
Spiegel, M. and Stephens, L. (1999). Schaum’s Outline Theory and Problems in Probability and Statistics. 3rd. Edition.
Singapore: McGraw-Hill Book Company.
Triola, Mario (1995) Elementary Statistics. New York: Addison-Wesley Publishing Company.
Walpole, R.E (1982) Introduction to Statistics. 3rd ed. New York: Macmillan Publishing Co. Inc.

Prepared by MARIANNE JANE ANTOINETTE D. PUA, M.S. -Page- 11 -

You might also like