Professional Documents
Culture Documents
MATH 376 - Final Exam Sample Solutions: 1 2 M 1 2 N I 1 2 1 I 2 2 2
MATH 376 - Final Exam Sample Solutions: 1 2 M 1 2 N I 1 2 1 I 2 2 2
MATH 376 - Final Exam Sample Solutions: 1 2 M 1 2 N I 1 2 1 I 2 2 2
The final exam will be close to the length of this sample. It will contain problems from
all five chapters we covered. Your “cheat sheet” can be up to three sheets of 8.5 x 11 paper
with notes on both sides. The test will come with the tables as we’ve done previously.
1. Suppose that X1 , X2 , . . . Xm and Y1 , Y2 , . . . , Yn are independent random samples, with
the variables Xi normally distributed with mean µ1 and variance σ12 , and the variables
Yi normally distributed with mean µ2 and variance σ22 .
(c) Find V (X − Y ).
The similar calculation for variance shows that
m n
X 1 2 X 1 2 1 1
V (X − Y ) = 2
σ 1 + 2
σ2 = σ12 + σ22
i=1
m j=1
n m n
(d) Suppose that σ12 = 4, σ22 = 3.2 and m = n. Find the sample size so that X − Y
will be within 0.5 units of µ1 − µ2 with probability 0.90.
With these values, V (X − Y ) = n1 (4 + 3.2) = 7.2
n
. We want to find n so
that
!
−0.5 (X − Y ) − (µ1 − µ2 ) 0.5
P p ≤ p ≤p = 0.90
7.2/n 7.2/n 7.2/n
The Z-score for the two sided probability of 0.90 is Z = 1.645. Solve
0.5
p = 1.645
7.2/n
√ 2
1.645 7.2
for n. Since 0.5
= 77.93, use n = 78.
1
(a) Do you think that carbon monoxide concentrations in air samples from this city
are normally distributed? Why or why not?
To be normally distributed, we look for almost all the data occurring
with two standard deviations of the mean and to have a mound shaped
symmetric distribution about the mean. In this case, the values are
positive and 0 is 1.5 standard deviations from the mean, the data cannot
be symmetrically distributed about the mean. We conclude the data are
not normally distributed.
(b) The EPA guidelines say that the air quality standard of 35 ppm should be reached
no more than once a year. Find the probability that the 35 ppm threshold will
be exceeded in 100 randomly selected samples.
35−15 20
Calculate the Z-score: √
σ/ n
= 10/10
= 20. The probability of this occur-
ring is 0.
The distribution function for the order statistic Y(n) , which is the max-
imum of the Yi is the nth power of the distribution function for Yi .
For a uniform distribution on the interval (0, θ), the density function
is fy (y) = 1θ for 0 < y < θ. The distribution function FY is
0, y<0
FY (y) = y/θ, 0 ≤ y ≤ θ
1, y>θ
for 0 ≤ u ≤ 1.
(b) Is U a pivotal quantity for θ?
Since the distribution of U does not depend on θ, it is a pivotal quantity
for θ.
2
(c) Find a 95% lower confidence bound for θ.
We want a value for a so that
Y(n)
P( ≤ a) = FU (a) = 0.95
θ
Thus an = 0.95, or a = 0.951/n . The lower confidence bound is Y(n) 0.95−1/n .
4. Two new drugs were given to patients with hypertension. The first drug lowered the
blood pressure of 16 patients an average of 11 points, with a standard deviation of 6
points. The second drug lowered the blood pressure of 20 other patients an average of
12 points, with a standard deviation of 8 points. Determine a 95% confidence interval
for the difference in the mean reductions in blood pressure. Assume the measurements
are normally distributed with equal variances.
With n = 4 and measurements 353, 351, 351, and 355, the sample mean is
352.5 and the sample variance is s2 = 3.67. For a 90% confidence interval, we
need to look up χ20.95 and χ20.05 for three degrees of freedom: χ20.95 = 0.351846
and χ20.05 = 7.81473 The 90% confidence interval is:
(n − 1)s2 (n − 1)s2
33.67 33.67
, = , ≈ (1.4, 31.3)
χ20.05 χ20.95 7.81473 0.351846
Of course, to do this we must assume that the measurements were indepen-
dent and normally distributed. This interval is sufficiently large that the
variance could be larger than 25 so the standard deviation could be larger
than 5. So it is possible that the accuracy is larger than two units.
3
where a and b do not depend on θ. Show that
n
X
d(Yi )
i=1
is sufficient for θ.
We use the method of factoring the likelihood function L(θ) = L(y1 , . . . , yn |θ).
n
Y
L(θ) = f (yi |θ)
i=1
Yn
= a(θ)b(yi )e−c(θ)d(yi )
i=1
n
!
Y Pn
= a(θ)n b(yi ) e−c(θ) i=1 d(yi )
i=1
n
n −c(θ)u
Y
= a(θ) e b(yi )
i=1
= g(u, θ)h(y1 , . . . , yn )
Pn
where u = i=1 d(yi ). By the factorization theorem, u is sufficient for θ.
4
For other values of n, differentiate the logarithm of L(p) = py (1 − p)n−y
with respect to p. We have
d ln(L(p)) d y n−y
= (y ln(p) + (n − y) ln(1 − p)) = − =0
dp dp p 1−p
y
Solve for p in terms of y to get p̂ = n
. It turns out that for y = 0
y
and y = n, the the maxima occur at n
as well. So the the maximum
likelihood estimator is p̂ = Yn .
8. Two different companies have applied to provide cable television service in Worcester.
Let p denote the proportion of all potential subscribers who favor the first company
over the second. Consider testing H0 : p = 0.5 versus Ha : p 6= 0.5 based on a random
sample of 25 individuals. Let the random variable X denote the number in the sample
who favor the first company and x represent the observed value of X.
(a) Which of the following rejection regions is most appropriate and why?
R1 = {x : x ≤ 7 or x ≥ 18} R2 = {x : x ≤ 8} R3 = {x : x ≥ 17}
R1 is most appropriate since it tests for any preference in either direction.
(b) Describe the corresponding (to your answer to (a)) type I and type II.
A type I error would be to reject the null hypothesis that the preferences
are the same when they are in fact the same. A type II error would be to
accept the null hypothesis that the preferences are the same when they
are not.
(c) What is the probability distribution of the test statistic X when H0 is true?
X is a binomial random variable with n = 25. When H0 is true, the
distribution is the sum
17
X 25
0.5x 0.525−x
x=8
x
5
17 17 7
X 25 x 25−x
X 25 x 25−x
X 25
0.3 0.7 = 0.3 0.7 − 0.3x 0.725−x = 1−0.512 = 0.488
x=8
x x=0
x x=0
x
9. Are male college students more easily bored than their female counterparts? This
question was examined in the article Boredom in Young Adults Gender and Cultural
Comparisons (J. Cross-Cult. Psych., 1991: 209-223). The authors administered a test
called the Boredom Proneness Scale to 97 male and 148 female college students. The
results are below:
Gender Sample Size Sample Mean Sample SD
Male 97 10.40 4.83
Female 148 9.26 4.68
(a) At the 5% level of significance, does the data support the research hypothesis
that the mean Boredom Proneness Rating is higher for men than for women?
Assume that the population standard deviation is known and equal to the sample
standard deviation.
Note that the assumption is reasonable for large sample sizes since the
T -distribution and normal distribution are close.
Since Z for 0.5 is 1.645, we would reject the null hypothesis and conclude
the data does support that the rating for men is higher than that for
women.
(b) Construct a 90% Confidence Interval for the difference in male and female scores
on the Boredom Proneness Scale.
Notice that zα/2 = z0.95 = 1.645 from (a) and σθ̂ = 0.623. Then compute
10. For a normal distribution with mean µ and variance σ 2 = 25, an experimenter wishes
to test H0 : µ = 10 versus Ha : µ = 5. Find the sample size n for which the most
powerful test will have α = β = .025.
This is an application of the formula of Example 10.9.
(zα + zβ )2 σ 2 (1.96 + 1.96)2 25
n= = = 15.366
(µa − µ0 )2 (10 − 5)2
It follows that n = 16 observations is sufficient.
6
11. The following table gives the attendance at a racetrack, x, and the amount that was
bet, y on n = 10 randomly selected days.
Attendance 117 128 122 119 131 135 125 120 130 127
Amount bet (in millions) 2.07 2.80 3.14 2.26 3.40 3.89 2.93 2.66 3.30 3.54
Figure 1:
(b) Calculate the regression coefficients for a simple linear regression model for the
data.
Calculating with x = 125.4, y = 2.999,
10
X
Sxx = (xi − x)2 = 306.4
i=1
10
X
Syy = (yi − y)2 = 2.926
i=1
X10
Sxy = (xi − x)(yi − y) = 26.44
i=1
SSE = 0.644
β1 = Sxy /Sxx = 26.4/306.4 = 0.0863
β0 = y − β1 x = −7.824
(c) Make a scatter plot of the residuals. Does this indicate that simple linear regres-
sion provides a good model for the data?
From the following residual plot, we would conclude the regression line
is a good model for the data.
7
Figure 2:
(d) Find a prediction interval for the amount bet for the attendance value equal to
the mean attendance.
In the following, we use α = 0.05, n − 2 = 8 degrees of freedom, so
that tα/2 = 2.306. We want an estimate at x∗ = x = 125.4, n = 10,
S 2 = SSE
n−2
= 0.0805 The prediction interval is
s
1 (x∗ − x)2
β0 + β1 x∗ ± tα/2 S 1+ +
n Sxx
r
9
= −7.824 + 0.0863 · 125.4 ± 2.306 · 0.0805 = 2.998 ± 0.1969.
8
Or as an interval, (2.8011, 3.1949).
(e) What proportion of the total variation in the amount bet can be explained by the
attendance?
Calculate r2 = 1 − SSE/Syy = 1 − 0.644/2.926 = 0.7799. We would
conclude that approximately 78% of the variation in the amount bet can
be explained by the attendance.