Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 5

STAT 250

PRACTICE PROBLEM SOLUTIONS

1. A scientist wants to determine if a certain drug will enable rats to learn the way through a maze
faster. Describe how he should set up an experiment to test this. How should he analyze the data once
it is collected? What is the conclusion if the p-value = .0317?

Approach #1: Divide a sample of rats into two groups at random. One group gets the drug, the other gets
a placebo. Record the times for each of the rats going through the maze.
One quantitative variable (time through maze) and one categorical (drug or placebo) => Test for a
difference in means (unpaired)

Approach #2: (Matched pairs) Run each rat through the maze twice, once with the drug and once without.
Randomize the order (drugged/not) for rat.
Paired data, so take the difference in times for each rat.
One quantitative variable (difference in times through the maze) => Paired test for a mean difference
(which would be done as a single mean)

p-value=0.0317 ==> Reject H0 (at a 5% level). There is sufficient evidence to conclude that the mean
time through the maze is faster for rats that have the drug.

2. Medical researchers want to determine how strong a relationship there is between number of
cigarettes smoked per day by pregnant women and birth-weight of the baby. They collect this data for a
sample of newborn babies. What should they do with it?

Two quantitative variables (number of cigarettes per day and birth weight of baby) => scatterplot,
correlation
Look at a scatterplot of the data (linear association?), compute the correlation between number of
cigarettes and birth weight, and find a confidence interval for the correlation (because it’s asking “how
strong is the relationship”, not “is there a relationship”).

3. Does alcohol slow reaction time? Seven participants agree to participate in a study in which each
has reaction time measured, then drinks a preset amount of alcohol and again has reaction time tested.
Results are given below. How would you analyze this data?
Before: 15 12 21 25 24 11 18
After: 18 10 28 28 24 16 28

This is paired data, so first take the difference for each case.
Do a test for a single mean to see if the mean difference (after – before) is less than 0.
Because the sample size is so small, a randomization test would be more appropriate than a t-test.

4. Someone claims that less than half of Penn State students have to write a research paper during
their sophomore year. You ask a random sample of sophomores to see if you can find statistical
evidence to refute this. What statistic(s) would you compute from the sample? What test would you
perform and what would you use for Ho and Ha? What if the p-value comes out to be .1822?

Statistic: ^p= proportion of sampled sophomores who had to write a research paper
Procedure: Test for a single proportion
Hypotheses: H 0 : p=0.5 vs. H a : p >0.5 where p=proportion of all sophomores who write a research
paper
Conclusion: A p-value of 0.1822 is not very small, so do not reject H 0. We do not have enough evidence
to conclude that the proportion of sophomores who write a research paper is more than one half.
2

5. Assume that total SAT scores of applicants to Podunk University follow a normal distribution with
mean μ=1100 and standard deviation σ=120.
(a) Describe how we might select a random sample of size 50 from among Podunk’s 5,000 applicants.
(b) What can you say about the distribution (center, spread and shape) of sample mean total SAT scores
for random samples of 50 Podunk applicants?

(a) Make a list of the applicants. Either put all the names into a hat and draw 50 randomly, or else put the
names into a spreadsheet and use technology (Minitab or StatKey) to take a random sample of 50.
(b) According to the Central Limit Theorem (CLT) the sample means x́ will follow a normal distribution,
σ 120
centered at the same mean ( μ=1100) with a standard error of = =17.0 . x́ N ( 1100 ,17.0).
√ n √50
(Note this is asking about the sampling distribution).

6. A chemist wants to predict the speed of a chemical reaction based on temperature. How should she
proceed?

She should run the reaction at various temperatures and record the speed each time.
Two quantitative variables (speed of reaction and temperature)
She should then look at a scatterplot of speed vs. temperature and use simple linear regression to find
the regression equation for prediction (if the relationship is approximately linear).

7. A bootstrap distribution is shown for 1000


bootstrap statistics for the correlation between Price
and Mileage for a sample of 25 used Mustang cars.
(a) Estimate the sample correlation for the 25
cars.
(b) Give an approximate 95% confidence
interval
for the correlation and clearly interpret it.

(a) r ≈ -0.836 (the center of the bootstrap distribution)


(b) Cutting off 2.5%, so 1000(0.025) = 25 dots on each tail, we get a confidence interval of (-0.93, -0.73).
We are 95% confident that the correlation between price and mileage for used Mustang cars is
between -0.92 and -0.73.

8. You wish to determine whether or not there is a gender bias in what major students select. What
data do you collect, and how do you analyze it? What if the p-value comes out to be .0013?

For a random sample of students we would need to record information on two variables, Gender and
major. The data could then be summarized with a two-way table. These are both categorical and major
has more than two categories, so we test for a relationship using a Chi-square test for association.
Conclusion: With this very small p-value, we reject H0. This gives very strong evidence that the choice of
major is associated with gender.

9. A sample of 50 amateur figure skaters showed that they practiced an average of 18 hours per week
with a standard deviation of 6 hours. Find a 90% confidence interval for the number of hours practiced
per week.

One quantitative variable (hours of practice per week) => single mean
Confidence interval: statistic ± t* (SE)
Statistic: x́=¿ 18
3

σ s 6
SE: ≈ = =0.85
√ n √ n √ 50
From t-distribution with n-1 = 49 df, t* = 1.677
Confidence interval: statistic ± t* (SE) = 18 ± 1.677(0.85) = (16.58, 19.42)
We are 90% confident that amateur figure skaters practice between 16.58 and 19.42 hours per week, on
average.

10. ACME Egg Co. is studying the effects of artificial lighting, used to simulate shortened days, on egg
production of chickens. A brood of 25 chicks were separated at random into two groups, both raised in
windowless coops with artificial lighting. One group followed a natural 24 hour per day light/dark cycle
while the second group experienced an accelerated 20 hour day. The weekly egg productions for the two
groups are summarized in the table below. Test (at a 10% level) whether egg production is higher with the
20 hr day as compared to a 24 hr day.
Sample size Mean (# eggs) Std. Dev.
24 hour 45 5.1 1.8
20 hour 40 6.2 2.9

H 0 :μ 24=μ20 vs. H a : μ24 < μ20 where μ is the mean number of eggs laid per week.
5.1−6.2
t= =−2.07
2 2
1.8 2.9
√ +
45 40
Comparing to the left tail of a t39 distribution gives p-value=0.023. Reject H0. We have enough evidence
to conclude that the mean number of eggs produced is higher for the chickens on the 20 hour cycle.

11. Which of the statistics below could we find by examining a boxplot for a sample?
____ sample size ___ mean __x__ median __ x __ interquartile range
__ x __ maximum ____ std. deviation __ x __ 25%-tile

A boxplot shows the five number summary (min, Q1, median, Q3, max) so we can also find the IQR=Q 3-
Q1, but not the mean, std. dev. or sample size.

12. A new very strong painkiller is tested on eight arthritis sufferers. Each one is given a different dose
of the drug (in mg), and the number of hours that the drug is effective is measured. Let x = the amount of
drug administered, and y = the number of hours that the drug is effective as a painkiller. The data is
given below and the regression line is Hours = 1.5294 + 5.784 Dosage. Find the predicted duration for a
dose of 0.5 and compute the residual. Interpret the slope of the regression line.
Dosage 0.1 0.2 0.4 0.5 0.6 0.8 1.0 1.2
Hours 2 2 4 5 4 8 8 7

If dosage = 0.5, then predicted number of hours is 1.5294 + 5.784(0.5) = 4.421.


As dosage goes up by 1 mg, duration of effectiveness goes up by 5.784 hours.

13. A sample of 40 new businesses established in 2000 showed that 16 of them no longer existed after 5
years.
(a). Find a 99% confidence interval for the proportion of new businesses that fail to last five years.
(b) Describe how you would construct a bootstrap distribution to estimate the confidence interval in
(a).

We could do this either using bootstrapping or the normal distribution and SE formula. We’ll use the
normal distribution here, because you won’t have access to technology on the final. The sample
4

16
proportion of new businesses that fail is ^p= =0.40 .For a 99% confidence interval using a normal
40
distribution we have z* value 2.575. So the confidence interval is
^p ± z ¿ √ ^p ¿¿ ¿
We are 99% sure that the between 20.1% and 59.9% of new businesses twill fail within 5 years.
(b) To get a bootstrap sample we would sample 40 businesses (with replacement) from the original
sample and record the proportion that fail. We would repeat this prcess 100’s of times to get a bootstrap
distribution, then find the 0.5%-tile and 99.5%-tile to get the 99% confidence interval for p.

14. One fall suppose a Penn State gardener planted bulbs for 60 purple crocuses and 40 white
crocuses. The next spring, she counts 48 purple crocuses and 24 white ones. Is there evidence that the
germination rates between the two colors are different? Use a chi-square test for association.

The hypotheses are:


H0: Color and germination are not related
Ha: Color and germination are related
The chi-square statistic is
( 48−43.2 )2 (12−16.8 )2 ( 24−28.8 )2 ( 16−11.2 )2
χ 2= + + + =4.76
43.2 16.8 28.8 11.2
Using the right tail of a χ 2-distribution with 1 d.f. gives a p-value of 0.029.
We reject H0 and find evidence that the germination success of crocuses depends on the color.

15. How many times do people laugh per day? How would we collect data to answer this question?
How would we visualize this data? How would we summarize this data? How would we do inference for
this data?

One quantitative variable (number of laughs per day)


Visualize with a histogram, boxplot, or dotplot
Summarize with a mean or median, standard deviation, and/or five number summary
Confidence interval for a mean would be most appropriate for inference

16. A sample of intro stat students showed a correlation of r=-


0.21 between number of pierces and number of hours Measures from Scrambled Fall2010 Dot Plot

exercising. Is this a significant correlation? The graph at the


right shows a randomization distribution of 100 values for
testing Ho:ρ=0 vs. Ha:ρ≠0 based on this sample.
(a) Explain what a single dot in the dotplot represents.
(b) Explain how you would use the randomization distribution to
make a decision in this situation.
(c) Based on the original sample and what you see in the plot,
what would you expect to be the outcome of the test?
Explain your reasoning. -0.3 -0.2 -0.1 0.0 0.1 0.2 0.3
r
(a) Each dot represents the correlation for a sample where the
one of the variables (say pulse) has been shuffled so that it is not related to the other variable (pierces).
(b) Find the proportion of samples (in this case just 1 out of 100) that are more extreme than the observed
r= -0.21 and double to account for two tails. p-value=2(0.01)=0.02.
(c) This is a small p-value, so we have evidence that there is a non-zero correlation between pierces and
pulse rates (in this case a positive relationship).
5

17. The editors for a science fiction magazine are interested in estimating the proportion of college
students who believe that aliens from another planet have visited Earth. What data should they collect,
how should they collect it, and how should they analyze it? If in class, collect and analyze data.

They should collect a sample of college students (random, if possible) and ask “Do you believe that aliens
from another planet have visited Earth?” Find the proportion in this sample who say “yes” and use that to
estimate the proportion in the population of all college students, then find a confidence interval for this
proportion.

18. Are pedigreed dogs more likely to exhibit neurotic behavior than mutts? A pet psychologist studied a
random sample of 400 dogs, consisting of 160 with pedigrees and 240 mutts. 56 of the pedigreed dogs
and 60 of the mutts were judged to be neurotic. Test at a 5% significance level whether the proportion of
neurotics among pedigreed dogs is significantly higher than the proportion of neurotics among mutts.
Use the normal distribution and appropriate standard error formula.

Two categorical variables (pedigreed or mutt and neurotic or not), both just two categories
Test for a difference in proportions
H 0 : p 1= p2 vs. H a : p 1> p 2 where p1 and p2 are the proportion of neurotic dogs among pedigreed dogs
and mutts, respectively. The sample proportions are
56 60
^p1= =0.35 (pedigreed) and ^p2= =0.25 (mutts)
160 240
^p 1−^p 2 0.35−0.25
z= = =2.54
^p1 (1− ^p1 ) ^p2 (1− ^p2 ) 0.35 (1−0.35) 0.25(1−0.25)
√ n1
+
n2 √ 160
+
240
Using the right tail for a normal distribution the p-value is 0.006, so reject H 0. We have convincing
evidence that the proportion of neurotic dogs is higher for pedigreed dogs than it is for mutts.

19. What is/are the main difference(s) between a bootstrap distribution for finding a confidence interval
and a randomization distribution for testing a null hypothesis?

Bootstrap distribution: We merely sample with replacement from the original sample
Randomization distribution: We simulate in a way that is consistent with the null hypothesis.

20. There are six hall of fame flavors at the Berkey Creamery: Cherry Quist (creamy black cherry and
vanilla), Keeney Beany (double chocolate ice cream with chocolate chunks), Peachy Paterno (peach ice
cream with peach slices), WPSU Coffee Break (coffee ice cream with chocolate chips), Palmer
Mousseum w/ Almonds (chocolate ice cream with almonds and chocolate swirl), and Alumni Swirl (blue
and white: vanilla ice cream with mocha chips and blueberry swirl). Are these flavors equally preferred
among Creamery patrons? How would you collect data to answer this question? How would you analyze
the data? If in class, collect and analyze data.

One categorical variable (flavor preferred), more than two categories => Chi-square goodness-of-fit test
H0: p1 = p2 = p3 = p4 = p5 = p6 = 1/6
Ha: At least one of these proportions differs from 1/6

You might also like