Section 9 Solutions: Statistics 104 Spring 2020

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 6

Section 9 Solutions

Statistics 104
Spring 2020

Topics

– Inference for Binomial Proportions


– Inference for Two-Way Tables

1. Staph Infections. A study investigated ways to prevent staph infections in surgery patients.
In a first step, the researchers examined the nasal secretions of a random sample of 6,771
patients admitted to various hospitals for surgery. They found that 1,251 of these patients
tested positive for Staphylococcus aureus, a bacterium responsible for most staph infections.
Calculate a 95% confidence interval for the proportion of all patients admitted for surgery
who are carriers of S. aureus, and provide an interpretation of the interval.1
The normal approximation and exact methods give identical results out to the thousandths,
since the sample size is so large.
The 95% confidence interval is (0.18, 0.19). We are 95% confident that the population pro-
portion of patients admitted for surgery who are carriers of S. aureus is captured by the
interval (0.18, 0.19).
#normal approximation
prop.test(x = 1251, n = 6771)$conf.int

## [1] 0.1756216 0.1942558


## attr(,"conf.level")
## [1] 0.95
#exact methods
binom.test(x = 1251, n = 6771)$conf.int

## [1] 0.1755778 0.1942116


## attr(,"conf.level")
## [1] 0.95

1 Problem from The Practice of Statistics in the Life Sciences, 3rd ed., p. 482.

1
2. Dropping Aphids. Pea aphids, Acyrthosiphon pisum, are wingless, sap-sucking insects that
live on plants. They evade predators by dropping off leaves. A study examined the mecha-
nism of aphid drops. Researchers hung live aphids upside down from delicate tweezers and
then released them. Out of the 20 aphids tested, 19 landed on their legs.2
a) From this data, is there evidence that live aphids land right side up (on their legs)
more often than chance alone would predict? Conduct a test and summarize your
conclusions.
The null hypothesis is the proportion of times live aphids land right side up does not
differ from what would be expected by chance: H0 : p = 0.50. The alternative hypothesis
is that the proportion of times live aphids land right side up is different from what
would be expected by chance: HA : p , 0.50. Let α = 0.05.
The p-value is 4.0 × 10−5 ; there is sufficient evidence to reject the null hypothesis in
favor of the alternative.3 The observed probability of landing right side up is 0.95,
which is greater than 0.5. The results of this experiment suggest that live aphids land
right side up more often than chance alone would predict.
binom.test(x = 19, n = 20, p = 0.50, alternative = "two.sided")

##
## Exact binomial test
##
## data: 19 and 20
## number of successes = 19, number of trials = 20, p-value = 4.005e-05
## alternative hypothesis: true probability of success is not equal to 0.5
## 95 percent confidence interval:
## 0.7512672 0.9987349
## sample estimates:
## probability of success
## 0.95
b) Researchers were also interested in assessing whether aphid aerial-righting behavior
is an active mechanism, or a passive mechanism arising from body shape and simple
aerodynamics. They repeated the experiment with 23 dead aphids; of these, 12 landed
on their legs after being released. Is there evidence suggesting that dead aphids land
right side up more often than chance alone would predict? Conduct a test and summa-
rize your conclusions.
The null hypothesis is the proportion of times dead aphids land right side up does
not differ from what would be expected by chance: H0 : p = 0.50. The alternative
hypothesis is that the proportion of times dead aphids land right side up is different
from what would be expected by chance: HA : p , 0.50. Let α = 0.05.
The p-value is 1; there is not sufficient evidence to reject the null hypothesis in favor of
the alternative. The observed proportion of landing right side up, 0.52, is not different
enough from 0.50 to suggest that dead aphids do not land right side up more or less
often than would be expected by random chance.
2 Problem from The Practice of Statistics in the Life Sciences, 3rd ed., p. 475.
3 Since np = 10 and (1 − p ) = 10, prop.test() could also be used.
0 0

2
binom.test(x = 12, n = 23, p = 0.50, alternative = "two.sided")

##
## Exact binomial test
##
## data: 12 and 23
## number of successes = 12, number of trials = 23, p-value = 1
## alternative hypothesis: true probability of success is not equal to 0.5
## 95 percent confidence interval:
## 0.3058780 0.7318038
## sample estimates:
## probability of success
## 0.5217391
3. Genotype at actn3.r577x. In the FAMuSS study introduced in Unit 1, researchers measured
a variety of demographic and genetic characteristics on study participants, including data
on race and genotype at a specific locus on the ACTN3 gene.
a) Is there evidence of an association between genotype and race at α = 0.05?
First, check the assumptions. It is reasonable to assume independence, since it is un-
likely that any participants were related. None of the expected counts are less than
5.
H0 : Race and genotype are independent. HA : Race and genotype are not independent.
Let α = 0.05.
The test statistic is 19.4, with p = 0.013. There is sufficient evidence to reject the null
hypothesis of independence between race and genotype.
#load the data
library(oibiostat)
data("famuss")

#check expected values


chisq.test(famuss$race, famuss$actn3.r577x)$expected

## famuss$actn3.r577x
## famuss$race CC CT TT
## African Am 7.850420 11.84370 7.305882
## Asian 15.991597 24.12605 14.882353
## Caucasian 135.783193 204.85210 126.364706
## Hispanic 6.687395 10.08908 6.223529
## Other 6.687395 10.08908 6.223529
#conduct test
chisq.test(famuss$race, famuss$actn3.r577x)

##
## Pearson's Chi-squared test
##

3
## data: famuss$race and famuss$actn3.r577x
## X-squared = 19.4, df = 8, p-value = 0.01286
b) Examine the residuals to characterize the nature of the association. Summarize your
conclusions.
The largest residuals are in the first row; there are many more African Americans with
the CC genotype than expected under independence, and fewer with the CT genotype
than expected. The residuals in the second row indicate a similar trend for Asians, but
with a less pronounced difference.
#examine residuals
chisq.test(famuss$race, famuss$actn3.r577x)$residuals

## famuss$actn3.r577x
## famuss$race CC CT TT
## African Am 2.90863193 -1.69802497 -0.85310170
## Asian 1.25242978 -1.24720387 0.28971360
## Caucasian -0.92538910 0.77888407 -0.03244366
## Hispanic -1.03920927 -0.02804356 1.11294757
## Other 0.12088363 0.28678513 -0.49045147
4. Cilantro Distaste. In a questionnaire, 1,994 respondents out of 14,604 reported that they
thought cilantro tasted like soap. Suppose a random sample of 150 individuals are selected
for further study.
a) Calculate the probability that 20 of the people sampled are soapy-taste detectors.
The probability that 20 of the people sampled are soapy-taste detectors is 0.0953.
dhyper(20, 1994, 14604-1994, 150)

## [1] 0.09526515
b) Calculate the probability that 20 or more of the people sampled are soapy-taste detec-
tors.
The probability that 20 or more of the people sampled are soapy-taste detectors is
0.582.
phyper(19, 1994, 14604-1994, 150, lower.tail = FALSE)

## [1] 0.5820395
c) Suppose that the 150 individuals were sampled with replacement. What is the proba-
bility of selecting 20 soapy-taste detectors?
The probability of selecting 20 soapy-taste detectors is 0.0948.
dbinom(20, 150, 1994/14604)

## [1] 0.09479083
d) Compare the answers from part a) and part c). Explain why the answers are essentially
identical.

4
With a large sample size, sampling with replacement is highly unlikely to result in
any particular individual being sampled again. In this case, the hypergeometric and
binomial distributions will produce equal probabilities.
5. Chronic Fatigue. A randomized controlled trial was conducted to assess the efficacy of
intramuscular magnesium injections for the treatment of chronic fatigue syndrome (CFS).
In the trial, 32 patients with CFS were randomly allocated to either the treatment group or
the placebo group; 15 patients were assigned to treatment and 17 were assigned to placebo.
Patients were asked about whether they felt they benefited from treatment after 6 weeks.
Of the 15 treated patients, 12 said that they felt better. In contrast, of the 17 patients on
placebo, 3 said that they felt better.
a) Organize the results of the study in a 2 × 2 table.
Treatment Placebo Sum
Improvement 12 3 15
No Improvement 3 14 17
Sum 15 17 32
b) Compute the expected values under the null hypothesis of no association.
There are expected cell counts that are less than 10. Thus, it is not appropriate to use
the χ2 test for analyzing these data.
#enter the data
cfs.table = matrix(c(12, 3, 3, 14), nrow = 2, ncol = 2, byrow = T)
dimnames(cfs.table) = list("Treatment" = c("Magnesium", "Placebo"),
"Outcome" = c("Improvement", "No Improvement"))

chisq.test(cfs.table)$expected

## Outcome
## Treatment Improvement No Improvement
## Magnesium 7.03125 7.96875
## Placebo 7.96875 9.03125
c) Compute the probability of the observed results, under the assumption that the margins are
fixed and that the null hypothesis is true.
Use the hypergeometric distribution with parameters N = 32, m = 15, and n = 15; calculate
P (X = 12). Consider the "successes" to be the individuals who experience an improvement,
and the "number sampled" to be the people randomized to the treatment group. The prob-
ability of the observed set of results, assuming the marginal totals are fixed and the null
hypothesis is true, is 5.47 × 10−4 .
dhyper(12, 15, 32 - 15, 15)

## [1] 0.000546911
d) Enumerate all possible sets of results that are more extreme than the observed results, in the
same direction. Calculate the probability of each set of results, under the assumptions that
the margins are fixed and that H0 is true.

5
More individuals than expected in the treatment group experienced an improvement; thus,
tables that are more extreme in the same direction also consist of those where more people
in the treatment group experience an improvement than observed. These are tables in which
13, 14, or 15 individuals in the treatment group experience an improvement. The associated
probabilities, respectively, are 2.52 × 10−5 , 4.51 × 10−7 , and 1.77 × 10−9 .
dhyper(13, 15, 32 - 15, 15)

## [1] 2.524205e-05
dhyper(14, 15, 32 - 15, 15)

## [1] 4.507509e-07
dhyper(15, 15, 32 - 15, 15)

## [1] 1.76765e-09
Treatment Placebo Sum
Improvement 13 2 15
No Improvement 2 15 17
Sum 15 17 32

Treatment Placebo Sum


Improvement 14 1 15
No Improvement 1 16 17
Sum 15 17 32

Treatment Placebo Sum


Improvement 15 0 15
No Improvement 0 17 17
Sum 15 17 32
e) Assess whether there is statistical evidence for an association between treatment with intra-
muscular magnesium and improved outcome, based on these data. Let α = 0.05. Summarize
your results.
Let p1 represent the population proportion of individuals who experience an improvement
in the treatment group and p2 represent the population proportion of individuals who ex-
perience an improvement in the placebo group. Test H0 : p1 = p2 against HA : p1 , p2 . Let
α = 0.05. The p-value is 0.001. There is sufficient evidence to reject the null in favor of the
alternative; the data provides evidence that magnesium injection is an effective treatment
for chronic fatigue syndrome over placebo.
fisher.test(cfs.table)$p.val

## [1] 0.001032612

You might also like