Professional Documents
Culture Documents
MATH& 146 Lesson 11: Section 1.6
MATH& 146 Lesson 11: Section 1.6
Lesson 11
Section 1.6
Categorical Data
1
Frequency
The first step to organizing categorical data is to
count the number of data values there are in each
category of interest.
2
Frequency
A class with 20 students had the following
distribution of grades:
A, A, A, B, B, B, B, B, C, C, C, D, D, D, D, D, D, F, F, F
GRADE FREQUENCY
A 3
B 5
C 3
D 6
F 3
3
Relative Frequency
A relative frequency is the proportion of times a
category occurs. Relative frequencies can be
written as fractions, decimals, or percents.
4
Cumulative Relative
Frequency
Cumulative relative frequency is the
accumulation of the previous relative frequencies.
5
Example 1
Fifty part-time students were asked how many courses
they were taking this term. The (incomplete) results
are shown below:
Cumulative Relative
# of Courses Frequency Relative Frequency Frequency
1 30 0.6
2 15
3
7
Graphs of Categorical Data
To get a better sense of graphing categorical data,
consider the following table about the Titanic. The
table lists the number and percentages in each class
on the Titanic's voyage.
9
10
Pie Charts
There are two rules to
follow when creating a
pie chart:
1) The pieces have to
add up to 100%.
2) No person can be
represented in
more than one
piece.
BAD PIE CHART
271% even without
an Other category. 11
Example 2
Which set of percentages
would best fit this pie
chart?
12
Bar Charts
A bar chart displays the distribution of a
categorical variable, showing the counts for each
category next to each other for easy comparison.
Notice that each bar is separated from each other.
13
Pie Charts vs. Bar Charts
While pie charts are well known, they are not
typically as useful as other charts. It is generally
more difficult to compare group sizes in a pie chart
than in a bar chart, especially when categories
have nearly identical counts or proportions.
14
Example 3
Use the graphs to rank the categories from largest
to smallest.
15
Example 4
Which category is largest? Which is smallest?
16
The Titanic
Here is part of a data matrix about the passengers
and crew aboard the Titanic. Each case (row) of
the data table represents a person on board the
ship.
Class
First Second Third Crew Total
Survived 203 118 178 212 711
Survival
19
The Titanic
Because the table shows how the individuals are
distributed along each variable, contingent on the
value of the other variable, such a table is called a
contingency table.
Class
First Second Third Crew Total
Survived 203 118 178 212 711
Survival
20
Contingency Tables
Class
First Second Third Crew Total
Survived 203 118 178 212 711
Survival
Class
Survived 203/711 = .286 118/711 = .166 178/711 = .250 212/711 = .298 711/711 = 1.000
Survival
Class
Survived 203/711 = .286 118/711 = .166 178/711 = .250 212/711 = .298 711/711 = 1.000
Survival
Class
Survived 203/711 = .286 118/711 = .166 178/711 = .250 212/711 = .298 711/711 = 1.000
Survival
Class
Survived 203/325 = .625 118/285 = .414 178/706 = .252 212/885 = .240 711/2201 = .323
Survival
1490/2201 =
Died 122/325 = .375 167/285 = .586 528/706 = .748 673/885 = .760
.677
325/325 = 285/285 = 706/706 = 885/885 = 2201/2201 =
Total
1.000 1.000 1.000 1.000 1.000
27
Example 6
a) What does 167/285 = .586 (second column,
second row) represent in the table?
b) What does 711/2201 = .323 (fifth column, first
row) represent in the table?
Class
Survived 203/325 = .625 118/285 = .414 178/706 = .252 212/885 = .240 711/2201 = .323
Survival
1490/2201 =
Died 122/325 = .375 167/285 = .586 528/706 = .748 673/885 = .760
.677
325/325 = 285/285 = 706/706 = 885/885 = 2201/2201 =
Total
1.000 1.000 1.000 1.000 1.000
28
Column Proportions
In the table, the value 0.625 indicates that 62.5%
of first class passengers survived. This rate of
survival is much higher compared to second class
passengers (41.4%), third class passengers
(25.2%), or crew members (24.0%).
Class
Survived 203/325 = .625 118/285 = .414 178/706 = .252 212/885 = .240 711/2201 = .323
Survival
1490/2201 =
Died 122/325 = .375 167/285 = .586 528/706 = .748 673/885 = .760
.677
325/325 = 285/285 = 706/706 = 885/885 = 2201/2201 =
Total
1.000 1.000 1.000 1.000 1.000
29
Column Proportions
Because these differences in survival rates
between the classes is unlikely from random
chance alone, this provides evidence that the class
and survival variables are associated. We say the
two variables are dependent.
Class
Survived 203/325 = .625 118/285 = .414 178/706 = .252 212/885 = .240 711/2201 = .323
Survival
1490/2201 =
Died 122/325 = .375 167/285 = .586 528/706 = .748 673/885 = .760
.677
325/325 = 285/285 = 706/706 = 885/885 = 2201/2201 =
Total
1.000 1.000 1.000 1.000 1.000
30
Example 3
A random set of 100 people who have pets were
polled to see if there was an association between
gender and whether they preferred either a dog or
a cat. The results of the survey are below.
Female 20 30 50
Total 60 40 100
31
Example 3 continued
a) Compute and interpret the column proportions.
b) Does there appear to be an association
between gender and type of pet? Explain.
Female 20 30 50
Total 60 40 100
32
Example 4
There are 10 boys and 12 girls in Mr. Fleck's fourth
grade class and 15 boys and 18 girls in Mrs. Parkers
fourth grade class. One student is randomly selected
to be hall monitor.
a) Use this information to complete the contingency
table below.
Gender
Boy Girl Total
Teacher Mr. Fleck
Mrs. Parker
Total
33
Example 4 continued
a) Compute and interpret the row proportions.
b) Does there appear to be an association between
teacher and student's gender? Explain.
Gender
Boy Girl Total
Mr. Fleck 10 12 22
Mrs. Parker 15 18 33
Total 25 30 55
34