Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 30

11.

2 INFERENCE FOR
RELATIONSHIPS

HW: p. 724 (29-35 odd, 43, 45, 49, 51, 53-58)


OVERVIEW

• Chi-Square Goodness-of-Fit tests allow us to compare expected outcomes with


observed outcomes of a single sample.
• What if we want to compare two or more samples or groups?
• In this section, we’ll use the Chi-square test for homogeneity to do this.
COMPARING DISTRIBUTIONS OF A
CATEGORICAL VARIABLE

• A one-way table can summarize data for a single categorical variable


• A two-way table can be used to summarize data on the relationship
between two categorical variables.
• One of the clearest way to describe this relationship is to compare the
conditional distributions of the response variable for each value of the
explanatory variable.
CHECK YOUR UNDERSTANDING, P. 698

The Pennsylvania State University has its main campus in the town of State College and more than 20
smaller “commonwealth campuses” around the state. The Penn State Division of Student Affairs
polled separate random samples of undergraduates from the main campus and commonwealth
campuses about their use of online social networking. Facebook was the most popular site, with more
than 80% of students having an account. Here is a comparison of Facebook use by undergraduates at
the main campus and commonwealth campuses who have a Facebook account:
CONTINUED

•1.   Calculate the conditional distribution (in proportions) of Facebook use for each campus setting.

For the main campus:


several times a month or less:
at least once a week:
at least once a day:
For the commonwealth campuses:
several times a month or less:
at least once a week:
at least once a day:
CONTINUED

2. Why is it important to compare proportions rather than counts in


Question 1?
Because there was such a big difference in the sample size from the
two different types of campuses.
3. Make a bar graph that compares the two conditional distributions.
What are the most important differences in Facebook use between the
two campus settings?
Students on the main campus are more likely to be everyday users of
Facebook than are students on the commonwealth campuses, and those
on the commonwealth campuses are more likely to use Facebook
several times a month or less than are those students on the main
campus.
MULTIPLE COMPARISONS

Multiple Comparisons is the problem of how to do many comparisons at once


with an overall measure of confidence in all our conclusions.
Statistical methods for dealing with multiple comparisons usually have two parts:
1. An overall test to see if there is good evidence of any differences among the
parameters that we want to compare.
2. A detailed follow-up analysis to decide which of the parameters differ and to
estimate how large the differences are.
HYPOTHESES

••  There is no difference in the distribution of a categorical variable for


several populations or treatments.
• There is a difference in the distribution of a categorical variable for
several populations or treatments.
FINDING EXPECTED COUNTS

•   The expected count in any cell of a two-way table when is true is


Not on AP Formula Sheet!

Example
Expected
  French wine bought:
TEST STATISTIC

•  

The sum is over all cells (except totals!) in the two-way table.
CHECK YOUR UNDERSTANDING, P. 703

In the previous Check Your Understanding (page 698), we presented data on the use of Facebook by two randomly
selected groups of Penn State students. Here are the data once again.

Do these data provide convincing evidence of a difference in the distributions of Facebook use among students in the
two campus settings?
1. State appropriate null and alternative hypotheses for a significance test to help answer this question.
H0: There is no difference in the distributions of Facebook use between students at Penn State’s main campus and its
commonwealth campuses
Ha: There is a difference in the distributions of Facebook use between students at Penn State’s main campus and its
commonwealth campuses
CONTINUED

2. Calculate the expected counts. Show your work.


Main campus: several times, 77.56; once a week, 220.25; once a day, 612.19. 
Commonwealth campus: several times, 53.44; once a week, 151.75; once a day, 421.81.

n !
l um
co
t : t als
n
Hi d To
Ad
CONTINUED

•3.  Calculate the chi-square statistic. Show your work.


CHI-SQUARE TEST FOR HOMOGENEITY

• To determine whether or not a distribution of a categorical variable differs for two


or more populations or treatments, we use a chi-square test for homogeneity.
• This test is similar to the goodness-of-fit test in the sense that we compare observed
counts to expected counts using a chi-square test statistic.
• However, the hypotheses and degrees of freedom differ slightly.
4 STEP PROCESS

•1.  STATE the hypotheses and significance level.


2. PLAN: choose appropriate inference method and check the conditions.
• Random
• Large Counts (expected counts at least 5)
• Independent
3. DO: If conditions are met, calculate test statistic () and P-value.

4. CONCLUDE: Interpret your results in the context of the problem.


*If you find evidence to reject the null hypothesis, you should perform a follow up
analysis to determine the components that contributed the most to the chi-square
statistic.
CHECK YOUR UNDERSTANDING, P. 705

•  
Refer to the previous Check Your Understanding (p. 703).
1. Use Table C to find the P-value. Then use your calculator’s command.
Using Table C with df = (3 − 1)(2 − 1) = 2: P-value < 0.0005.
Using the calculator: 0.000059.
2. Interpret the P-value from the calculator in context.
Assuming that there is no difference in the distributions of Facebook use between students at Penn State’s
main campus and students at Penn State’s commonwealth campuses, the probability of observing a
difference in the distributions of Facebook use for the two samples as large as or larger than the one found
in this study is about 6 in 100,000.
3. What conclusion would you draw? Justify your answer.
Since the P-value was so small, reject H0. It appears that the distribution of Facebook use is different among
students at Penn State’s main campus and students at Penn State’s commonwealth campuses.
CALCULATOR

•You
  can use your calculator to perform calculations for a chi-square test for homogeneity. We’ll use the data from
wine and music study in your book.
1. 2nd → (MATRIX)→ EDIT
2. Choose 1:[A]
• Enter the dimensions of the matrix (3 x 3)
• Enter observed counts from the two-way table in the same locations in the matrix

3. Choose test
• STAT → TEST

• Adjust settings
• Choose Calculate or Draw

4. To see the expected counts, go to the home screen and ask for a display of matrix [B]
• 2nd → (MATRIX)
• Choose 2:[B]
CHECK YOUR UNDERSTANDING, P. 708

Canada has universal health care. The United States does not but often offers more
elaborate treatment to patients with access. How do the two systems compare in
treating heart attacks? Researchers compared random samples of 2600 U.S. and 400
Canadian heart attack patients. One key outcome was the patients’ own assessment of
their quality of life relative to what it had been before the heart attack. Here are the data
for the patients who survived a year:
CONTINUED

1. Construct an appropriate graph to compare the


distributions of opinion about quality of life
among heart attack patients in Canada and the
United States.
CONTINUED

2. Is there a significant difference between the two distributions of quality-of-life ratings? Carry out an
appropriate test at the α = 0.01 level.
State: We want to perform a test at the α = 0.01 level of
H0: There is no difference in the distribution of quality of life in Canada and the United States
Ha: There is a difference in the distribution of quality of life in Canada and the United States

Plan: Use a chi-square test for homogeneity if the conditions are satisfied. 


Random: The data came from separate random samples.
Large sample size in which the expected counts are: Canada—much better, 77.37; somewhat better, 71.47; about
the same, 109.91; somewhat worse, 41.70; much worse, 10.55; United States—much better, 538.63; somewhat
better, 497.53; about the same, 765.09; somewhat worse, 290.30; much worse, 73.45. All these counts are at least
5. 
Independent: We clearly have less than 10% of all heart attack victims in the United States and in Canada. 
CONTINUED

Do: The test statistic is χ2 = 11.725 and the P-value is 0.0195 using df = 4. 
Conclude: Since the P-value is greater than 0.01, we fail to reject H0. There is not enough
evidence to conclude that there is a difference in the distribution of quality of life in Canada
and the United States.
FOLLOW-UP ANALYSIS

••  If
the chi-square tests allows us to reject the null hypothesis of no difference, we want to
do a follow-up analysis that examines the differences in detail.
• Start by examining which cells in the two-way table show large deviations between the
observed and expected counts.
• Then, look at the individual components to see which terms contribute most to the chi-
square statistic.
MINITAB OUTPUT

• Minitab lists the observed


counts, then the expected
counts, and finally the
individual components for
each cell.
CHECK YOUR UNDERSTANDING, P. 713

Sample surveys on sensitive issues can give different results depending on how the
question is asked. A University of Wisconsin study randomly divided 2400 respondents
into three groups. All participants were asked if they had ever used cocaine. One group of
800 was interviewed by phone; 21% said they had used cocaine. Another 800 people were
asked the question in a one-on-one personal interview; 25% said “Yes.” The remaining
800 were allowed to make an anonymous written response; 28% said “Yes.”
1. Was this an experiment or an observational study? Justify your answer.
This was an experiment. Each individual was exposed to a treatment that involved how
they were contacted.
CONTINUED

2. Make a two-way table of responses about cocaine use by how the survey was
administered.
CONTINUED

3. Are the differences between the three groups statistically significant? Give appropriate evidence to
support your answer.
State: We want to perform a test at the α = 0.05 level of
H0: There is no difference in the actual proportion who answer yes based on contact method
Ha: There is a difference in the actual proportion who answer yes based on contact method
Plan: Use a chi-square test for homogeneity if the conditions are satisfied. 
Random: The data came from a randomized experiment.
Large sample size in which the expected counts are: Phone—Yes, 197.33; No, 602.67; Interview—
Yes, 197.33; No, 602.67; Written response—Yes, 197.33; No, 602.67. All these counts are at least 5. 
Independent: Due to the random assignment, these three groups of people can be viewed as
independent. Also, knowing one person’s response gives no information about another person’s
response. 
CONTINUED

Do: The test statistic is χ2 = 10.619 and the P-value is 0.0049 using df = 2. 
Conclude: Since the P-value is less than 0.05, we reject H0 and conclude that there is
convincing evidence of a difference in the proportion of people who answer “Yes”
based on how they are contacted.
CHI-SQUARE FOR
ASSOCIATION/INDEPENDENCE

•To 
determine whether or not two categorical variables are related in a population, we
can use a chi-square test of association/independence.
The mechanics of this test are exactly the as those for the test for homogeneity.
The only difference is that the hypotheses are defined in terms of an association
between the two categorical variables.
There is no association between two categorical variables in the population of interest.
There is an association between two categorical variables in the population of interest.
EXAMPLE

A recent study looked into the relationship between political views and opinions about
nuclear energy. A survey administered to 100 randomly selected adults asked their political
leanings as well as their approval of nuclear energy. The results are below:
Liberal Conservative Independent
Approve 10 15 20
Disapprove 9 2 16
No Opinion 8 2 18

Do these data provide convincing evidence that political leanings and views on nuclear
energy are associated in the larger population of adults from which the sample was selected?
ANSWERS

•   will perform a chi-square test for association/independence.


We
There is no association between political leanings and views on nuclear energy.
There is an association between political leanings and views on nuclear energy.
Conditions: We have a random sample of adults. All expected counts are greater than 5. There are at
least 1000 adults in the population.

Since the P-value is less than 5%, we have significant evidence to reject the null hypothesis. It
appears there is an association between political leanings and views on nuclear energy. A follow-up
analysis suggests the biggest difference occurs with more Conservatives approving of nuclear energy
than expected.

You might also like