Download as pdf or txt
Download as pdf or txt
You are on page 1of 4

STATISTICS – GROUPED DATA

REVIEWING

Why Use Grouped Data?

Sometimes data can have many values/data points and a wide range of data that is harder to
interpret if left ungrouped. This is especially true for continuous data, that is, data that does not
have clearly defined values, such as height, weight or volume. Consider these two scenarios:

1) Taking the shoe size of 50 girls in a class


2) Taking the height of 50 girls in a class

In the first scenario you would have many repeated values such as size 5 or size 6 and leaving
this data ungrouped could yield beneficial analysis.

In the second scenario you will have many values such as 151 cm, 152 cm, …, 165 cm and
maybe even higher. This does not even consider decimal values! It is very possible many of
the girls would have a unique height and trying to keep the data ungrouped is no better than
looking at the data raw.

As a rule of thumb, we use grouped data when you have many different values in your raw data
and not many repeated values.

Definitions

There are many terms related to statistics that you should be aware of:

 Class/Group Intervals – These refer to the range of numbers for each group. Three
examples are 140 – 149, 180 – 189, 160 – 169.

 Class Limits – These are the numbers that define the beginning and ending of each class.
They can be further split into lower class limits and upper class limits. In the example
above 140, 180 and 160 are all lower class limits while 149, 189, 169 are all upper class
limits.
 Class Boundaries – Class boundaries are extended ranges of the class limits. These are
used to connect the classes for graphing and other purposes. DO NOT CONFUSE
CLASS LIMITS AND CLASS BOUNDRIES! There is a lower boundary and an upper
boundary for each class. For example:

Class Lower Boundary Upper Boundary Class Boundary


140 – 149 139.5 149.5 139.5 – 149.5
150 – 159 149.5 159.5 149.5 – 159.5
160 – 169 159.5 169.5 159.5 – 169.5

Notice that the upper boundary is made by ADDING 0.5 while the lower boundary is
made by SUBTRACTING 0.5. This is done so that the upper class boundary of the
previous class is equal to the lower class boundary of the next class. This is to remove
any gaps between the boundaries.

 Class width – This is the amount of numbers that make up each class. For example:
The class width of 140 – 149 is 10.
[140 – 149] = {140, 141, 142, 143, 144, 145, 146, 147, 148, 149} ten numbers.
This can also be found using a formula:
𝑢𝑝𝑝𝑒𝑟 𝑏𝑜𝑢𝑛𝑑𝑎𝑟𝑦 − 𝑙𝑜𝑤𝑒𝑟 𝑏𝑜𝑢𝑛𝑑𝑎𝑟𝑦 =
149.5 − 139.5 =
10
Alternatively:
(𝑢𝑝𝑝𝑒𝑟 𝑙𝑖𝑚𝑖𝑡 − 𝑙𝑜𝑤𝑒𝑟 𝑙𝑖𝑚𝑖𝑡) + 1 =
(149 − 140) + 1 =
10
 Midpoint – This is the middle number in each class. This is found by adding the limits
and dividing by 2. For example, the midpoint of 140 – 149 is:

𝑢𝑝𝑝𝑒𝑟 𝑙𝑖𝑚𝑖𝑡 + 𝑙𝑜𝑤𝑒𝑟 𝑙𝑖𝑚𝑖𝑡


𝑚𝑖𝑑𝑝𝑜𝑖𝑛𝑡 =
2
140 + 149
𝑚𝑖𝑑𝑝𝑜𝑖𝑛𝑡 =
2

289
𝑚𝑖𝑑𝑝𝑜𝑖𝑛𝑡 =
2

𝑚𝑖𝑑𝑝𝑜𝑖𝑛𝑡 = 144.5

 Modal Class – This is the class that contains the highest frequency. In the table given
earlier this would be 140 – 149 with a frequency of 14 students.

 Median Class – This is the class that contains the median student. Since there are 50
students, the median student is the 25.5th student.

To find the 25.5th student simply add the frequencies coming down.

The sum of the


Cumulative
Height (cm) Frequency frequencies marks the
Frequency end of each class.
140 – 149 14 14 The 25.5th student is
150 – 159 10 24 after the class 150 – 159
ends at 24 students
160 – 169 11 35
while 160 – 169 ends at
170 – 179 6 41 35 students. Hence the
180 – 189 9 50 median student is in the
class 160 - 169

The median class in the example given is 160 − 169.

 The cumulative frequency is the sum of all the frequencies before and up to the current
class. You can simply add the current frequency to the previous cumulative frequency to
get the new value. We will talk more about cumulative frequencies in the future.
Cumulative
Height (cm) Frequency
Frequency
140 – 149 14 14
150 – 159 10 24
160 – 169 11 35
170 – 179 6 41
180 – 189 9 50

A Few Things to Note


- The sum of the frequency should add up to the total frequency given in the table. Use this
to check your answers

- The classes all have the same width. You should always try to make sure your class
widths are the same.

- Notice the class limits and boundaries increase by the class width (10) going down the
table.

- The midpoint also increases by the class width going down the table. This means you
only need to calculate the first midpoint and add the class width to get the others.

- Where possible make sure you put the units of the classes in the header “Heights (cm)”.

- Be careful when calculating class widths. Remember it is the subtraction of the


boundaries and not the limits.

- The last cumulative frequency is the total frequency of the data.

You might also like