Download as pdf or txt
Download as pdf or txt
You are on page 1of 34

MATH& 146

Lesson 11
Section 1.6
Categorical Data

1
Frequency
The first step to organizing categorical data is to
count the number of data values there are in each
category of interest.

We can organize these counts (or frequencies)


into a frequency table, which records the totals
and the category names.

2
Frequency
A class with 20 students had the following
distribution of grades:

A, A, A, B, B, B, B, B, C, C, C, D, D, D, D, D, D, F, F, F

GRADE FREQUENCY
A 3
B 5
C 3
D 6
F 3
3
Relative Frequency
A relative frequency is the proportion of times a
category occurs. Relative frequencies can be
written as fractions, decimals, or percents.

GRADE FREQUENCY RELATIVE FREQUENCY


A 3 0.15
B 5 0.25
C 3 0.15
D 6 0.30
F 3 0.15

4
Cumulative Relative
Frequency
Cumulative relative frequency is the
accumulation of the previous relative frequencies.

RELATIVE CUMULATIVE RELATIVE


GRADE FREQUENCY
FREQUENCY FREQUENCY
A 3 0.15 0.15
B 5 0.25 0.40
C 3 0.15 0.55
D 6 0.30 0.85
F 3 0.15 1.00

5
Example 1
Fifty part-time students were asked how many courses
they were taking this term. The (incomplete) results
are shown below:
Cumulative Relative
# of Courses Frequency Relative Frequency Frequency
1 30 0.6
2 15
3

a. Fill in the blanks in the table above.


b. What percent of students take exactly two courses?
c. What percent of students take at most two courses?
6
Graphs of Categorical Data
There are two simple visual summaries that are
used for categorical data
Circle graphs (pie charts) show the amount of
data that belong to each category as a proportional
part of the whole.
Bar graphs consist of bars that are separated
from each other. The bars can be rectangles or
they can be rectangular boxes and they can be
vertical or horizontal.

7
Graphs of Categorical Data
To get a better sense of graphing categorical data,
consider the following table about the Titanic. The
table lists the number and percentages in each class
on the Titanic's voyage.

CLASS FREQUENCY RELATIVE FREQUENCY


First 325 14.77%
Second 285 12.95%
Third 706 32.08%
Crew 885 40.21%
Total 2201 100.01%
8
Pie Charts
When you are interested in relative frequencies, a
pie chart might be your display of choice.

They slice the circle into


pieces whose size is
proportional to the
fraction of the whole in
each category.

9
10
Pie Charts
There are two rules to
follow when creating a
pie chart:
1) The pieces have to
add up to 100%.
2) No person can be
represented in
more than one
piece.
BAD PIE CHART
271% even without
an Other category. 11
Example 2
Which set of percentages
would best fit this pie
chart?

A. 54%, 8%, 30%, 8%


B. 47%, 23%, 8%, 22%
C. 51%, 17%, 15%, 17%
D. 27%, 26%, 24%, 23%

12
Bar Charts
A bar chart displays the distribution of a
categorical variable, showing the counts for each
category next to each other for easy comparison.
Notice that each bar is separated from each other.

13
Pie Charts vs. Bar Charts
While pie charts are well known, they are not
typically as useful as other charts. It is generally
more difficult to compare group sizes in a pie chart
than in a bar chart, especially when categories
have nearly identical counts or proportions.

14
Example 3
Use the graphs to rank the categories from largest
to smallest.

15
Example 4
Which category is largest? Which is smallest?

16
The Titanic
Here is part of a data matrix about the passengers
and crew aboard the Titanic. Each case (row) of
the data table represents a person on board the
ship.

Survived Age Sex Class


Died Adult Male Third
Survived Adult Male Crew
Died Child Male Third
Survived Child Female First
Died Adult Male Third
Died Adult Female Crew
17
The Titanic
The problem with data matrices is that you can't
see what's going on. And seeing is just what we
want to do. We need ways to show the data so
that we can see patterns, relationships, trends,
and exceptions.
Survived Age Sex Class
Died Adult Male Third
Survived Adult Male Crew
Died Child Male Third
Survived Child Female First
Died Adult Male Third
Died Adult Female Crew
18
The Titanic
To look at two categorical variables together, we
often arrange the counts in a two-way table. Here
is a two-way table of those aboard the Titanic,
classified according to class of ticket and whether
or not they survived.

Class
First Second Third Crew Total
Survived 203 118 178 212 711
Survival

Died 122 167 528 673 1490


Total 325 285 706 885 2201

19
The Titanic
Because the table shows how the individuals are
distributed along each variable, contingent on the
value of the other variable, such a table is called a
contingency table.

Class
First Second Third Crew Total
Survived 203 118 178 212 711
Survival

Died 122 167 528 673 1490


Total 325 285 706 885 2201

20
Contingency Tables
Class
First Second Third Crew Total
Survived 203 118 178 212 711
Survival

Died 122 167 528 673 1490


Total 325 285 706 885 2201

The margins of the table, both on


Class Frequency
the right and at the bottom, give First 325
totals. The bottom line is just the Second 285
frequency table of the variable Third 706
Class. Crew 885
Total 2201
21
Contingency Tables
Class
First Second Third Crew Total
Survived 203 118 178 212 711
Survival

Died 122 167 528 673 1490


Total 325 285 706 885 2201

The right column of the table is the frequency table


of the variable Survival.
Survival Frequency
Survived 711
Died 1490
Total 2201
22
Contingency Tables
Class
First Second Third Crew Total
Survived 203 118 178 212 711
Survival

Died 122 167 528 673 1490


Total 325 285 706 885 2201

Each cell of the table gives the count for a


combination of values of the two variables. For
example, the highlighted cell shows that 118
second-class passengers survived.
So what does the green highlighted cell show?
23
Row Proportions
The table below shows the row proportions for
the Titanic data set. The row proportions are
computed as the counts divided by their row totals.

Class

First Second Third Crew Total

Survived 203/711 = .286 118/711 = .166 178/711 = .250 212/711 = .298 711/711 = 1.000
Survival

122/1490 = 528/1490 = 673/1490 = 1490/1490 =


Died 167/1490 = .112
.082 .354 .452 1.000
325/2201 = 285/2201 = 706/2201 = 885/2201 = 2201/2201 =
Total
.148 .129 .321 .402 1.000
24
Row Proportions
So what does 203/711 = .286 (first column, first
row) represent?
It corresponds to the proportion of survivors who
were in first class.

Class

First Second Third Crew Total

Survived 203/711 = .286 118/711 = .166 178/711 = .250 212/711 = .298 711/711 = 1.000
Survival

122/1490 = 528/1490 = 673/1490 = 1490/1490 =


Died 167/1490 = .112
.082 .354 .452 1.000
325/2201 = 285/2201 = 706/2201 = 885/2201 = 2201/2201 =
Total
.148 .129 .321 .402 1.000
25
Example 5
a) What does 167/1490 = .112 (second column,
second row) represent in the table?
b) What does 885/2201 = .402 (fourth column,
third row) represent in the table?

Class

First Second Third Crew Total

Survived 203/711 = .286 118/711 = .166 178/711 = .250 212/711 = .298 711/711 = 1.000
Survival

122/1490 = 528/1490 = 673/1490 = 1490/1490 =


Died 167/1490 = .112
.082 .354 .452 1.000
325/2201 = 285/2201 = 706/2201 = 885/2201 = 2201/2201 =
Total
.148 .129 .321 .402 1.000
26
Column Proportions
A contingency table of the column proportions is
computed in a similar way, where each column
proportion is computed as the count divided by the
corresponding column total.

Class

First Second Third Crew Total

Survived 203/325 = .625 118/285 = .414 178/706 = .252 212/885 = .240 711/2201 = .323
Survival

1490/2201 =
Died 122/325 = .375 167/285 = .586 528/706 = .748 673/885 = .760
.677
325/325 = 285/285 = 706/706 = 885/885 = 2201/2201 =
Total
1.000 1.000 1.000 1.000 1.000
27
Example 6
a) What does 167/285 = .586 (second column,
second row) represent in the table?
b) What does 711/2201 = .323 (fifth column, first
row) represent in the table?

Class

First Second Third Crew Total

Survived 203/325 = .625 118/285 = .414 178/706 = .252 212/885 = .240 711/2201 = .323
Survival

1490/2201 =
Died 122/325 = .375 167/285 = .586 528/706 = .748 673/885 = .760
.677
325/325 = 285/285 = 706/706 = 885/885 = 2201/2201 =
Total
1.000 1.000 1.000 1.000 1.000
28
Column Proportions
In the table, the value 0.625 indicates that 62.5%
of first class passengers survived. This rate of
survival is much higher compared to second class
passengers (41.4%), third class passengers
(25.2%), or crew members (24.0%).

Class

First Second Third Crew Total

Survived 203/325 = .625 118/285 = .414 178/706 = .252 212/885 = .240 711/2201 = .323
Survival

1490/2201 =
Died 122/325 = .375 167/285 = .586 528/706 = .748 673/885 = .760
.677
325/325 = 285/285 = 706/706 = 885/885 = 2201/2201 =
Total
1.000 1.000 1.000 1.000 1.000
29
Column Proportions
Because these differences in survival rates
between the classes is unlikely from random
chance alone, this provides evidence that the class
and survival variables are associated. We say the
two variables are dependent.

Class

First Second Third Crew Total

Survived 203/325 = .625 118/285 = .414 178/706 = .252 212/885 = .240 711/2201 = .323
Survival

1490/2201 =
Died 122/325 = .375 167/285 = .586 528/706 = .748 673/885 = .760
.677
325/325 = 285/285 = 706/706 = 885/885 = 2201/2201 =
Total
1.000 1.000 1.000 1.000 1.000
30
Example 3
A random set of 100 people who have pets were
polled to see if there was an association between
gender and whether they preferred either a dog or
a cat. The results of the survey are below.

Dog Cat Total


Male 40 10 50

Female 20 30 50

Total 60 40 100
31
Example 3 continued
a) Compute and interpret the column proportions.
b) Does there appear to be an association
between gender and type of pet? Explain.

Dog Cat Total


Male 40 10 50

Female 20 30 50

Total 60 40 100
32
Example 4
There are 10 boys and 12 girls in Mr. Fleck's fourth
grade class and 15 boys and 18 girls in Mrs. Parkers
fourth grade class. One student is randomly selected
to be hall monitor.
a) Use this information to complete the contingency
table below.
Gender
Boy Girl Total
Teacher Mr. Fleck
Mrs. Parker
Total
33
Example 4 continued
a) Compute and interpret the row proportions.
b) Does there appear to be an association between
teacher and student's gender? Explain.
Gender
Boy Girl Total

Mr. Fleck 10 12 22

Mrs. Parker 15 18 33

Total 25 30 55
34

You might also like