MATH 376 - Final Exam Sample Solutions: 1 2 M 1 2 N I 1 2 1 I 2 2 2

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 8

MATH 376 – Final Exam Sample Solutions

May 11, 2018

The final exam will be close to the length of this sample. It will contain problems from
all five chapters we covered. Your “cheat sheet” can be up to three sheets of 8.5 x 11 paper
with notes on both sides. The test will come with the tables as we’ve done previously.
1. Suppose that X1 , X2 , . . . Xm and Y1 , Y2 , . . . , Yn are independent random samples, with
the variables Xi normally distributed with mean µ1 and variance σ12 , and the variables
Yi normally distributed with mean µ2 and variance σ22 .

(a) What is the distribution of X − Y ?


The difference in sample means is a linear combination of the Xi and
Yj , normally distributed random variables, so the it is also normally
distributed.
(b) Find E(X − Y ).
The expected value is
m n
1 X 1X
E(X − Y ) = µ1 − µ2 = µ1 − µ2
m i=1 n j=1

(c) Find V (X − Y ).
The similar calculation for variance shows that
m n
X 1 2 X 1 2 1 1
V (X − Y ) = 2
σ 1 + 2
σ2 = σ12 + σ22
i=1
m j=1
n m n

(d) Suppose that σ12 = 4, σ22 = 3.2 and m = n. Find the sample size so that X − Y
will be within 0.5 units of µ1 − µ2 with probability 0.90.
With these values, V (X − Y ) = n1 (4 + 3.2) = 7.2
n
. We want to find n so
that
!
−0.5 (X − Y ) − (µ1 − µ2 ) 0.5
P p ≤ p ≤p = 0.90
7.2/n 7.2/n 7.2/n

The Z-score for the two sided probability of 0.90 is Z = 1.645. Solve
0.5
p = 1.645
7.2/n
 √ 2
1.645 7.2
for n. Since 0.5
= 77.93, use n = 78.

2. Environmental Protection Agency air quality standards for carbon-monoxide are 35


parts per million (ppm) averaged over one hour. In a large city the average values for
carbon monoxide is 15 ppm with a standard deviation of 10.

1
(a) Do you think that carbon monoxide concentrations in air samples from this city
are normally distributed? Why or why not?
To be normally distributed, we look for almost all the data occurring
with two standard deviations of the mean and to have a mound shaped
symmetric distribution about the mean. In this case, the values are
positive and 0 is 1.5 standard deviations from the mean, the data cannot
be symmetrically distributed about the mean. We conclude the data are
not normally distributed.
(b) The EPA guidelines say that the air quality standard of 35 ppm should be reached
no more than once a year. Find the probability that the 35 ppm threshold will
be exceeded in 100 randomly selected samples.
35−15 20
Calculate the Z-score: √
σ/ n
= 10/10
= 20. The probability of this occur-
ring is 0.

3. Let Y1 , . . . , Yn denote a random sample of size n from a population with a uniform


distribution on the interval (0, θ). Let Y (n) = max{Y1 , Y2 , . . . , Yn ) and U = 1θ Y (n).

(a) Show that U has distribution function



 0, u < 0
FU (u) = un , 0 ≤ u ≤ 1
1, u > 1

The distribution function for the order statistic Y(n) , which is the max-
imum of the Yi is the nth power of the distribution function for Yi .
For a uniform distribution on the interval (0, θ), the density function
is fy (y) = 1θ for 0 < y < θ. The distribution function FY is

 0, y<0
FY (y) = y/θ, 0 ≤ y ≤ θ
1, y>θ

The distribution function for Y(n) will be the nth power of FY ,



 0, y<0
Gn (y) = ( yθ )n , 0 ≤ y ≤ θ
1, y>θ

Then the distribution function for U = 1θ Y (n) will be

FU (u) = P (U ≤ u) = P (Y(n) ≤ θu) = Gn (θu)

for 0 ≤ u ≤ 1.
(b) Is U a pivotal quantity for θ?
Since the distribution of U does not depend on θ, it is a pivotal quantity
for θ.

2
(c) Find a 95% lower confidence bound for θ.
We want a value for a so that
Y(n)
P( ≤ a) = FU (a) = 0.95
θ
Thus an = 0.95, or a = 0.951/n . The lower confidence bound is Y(n) 0.95−1/n .
4. Two new drugs were given to patients with hypertension. The first drug lowered the
blood pressure of 16 patients an average of 11 points, with a standard deviation of 6
points. The second drug lowered the blood pressure of 20 other patients an average of
12 points, with a standard deviation of 8 points. Determine a 95% confidence interval
for the difference in the mean reductions in blood pressure. Assume the measurements
are normally distributed with equal variances.

Use a pooled sample variance:


15 · 62 + 19 · 82
s2p = = 51.65.
16 + 20 − 2
The Z-score for 0.95 is 1.96, so the interval is
r r
1 1 1 1
µ1 − µ2 ± 1.96 s2p ( + = 11 − 12 ± 1.96 51.65 + = −1 ± 4.72
n1 n2 16 20
Or as an interval, (−5.72, 3.72).
5. A precision instrument is guaranteed to read accurately to within 2 units. A sample
of four instrument readings on the same object yielded the measurements 353, 351,
351, and 355. Find a 90% confidence interval for the population variance. What
assumptions are necessary? Does the guarantee seem reasonable?

With n = 4 and measurements 353, 351, 351, and 355, the sample mean is
352.5 and the sample variance is s2 = 3.67. For a 90% confidence interval, we
need to look up χ20.95 and χ20.05 for three degrees of freedom: χ20.95 = 0.351846
and χ20.05 = 7.81473 The 90% confidence interval is:
(n − 1)s2 (n − 1)s2
   
33.67 33.67
, = , ≈ (1.4, 31.3)
χ20.05 χ20.95 7.81473 0.351846
Of course, to do this we must assume that the measurements were indepen-
dent and normally distributed. This interval is sufficiently large that the
variance could be larger than 25 so the standard deviation could be larger
than 5. So it is possible that the accuracy is larger than two units.

6. Suppose that Y1 , . . . , Yn is a random sample from a probability density function in the


(one-parameter) exponential family. That is,
a(θ)b(y)e−c(θ)d(y) a ≤ y ≤ b

f (y|θ) =
0 otherwise

3
where a and b do not depend on θ. Show that
n
X
d(Yi )
i=1

is sufficient for θ.

We use the method of factoring the likelihood function L(θ) = L(y1 , . . . , yn |θ).

n
Y
L(θ) = f (yi |θ)
i=1
Yn
= a(θ)b(yi )e−c(θ)d(yi )
i=1
n
!
Y Pn
= a(θ)n b(yi ) e−c(θ) i=1 d(yi )

i=1
n
n −c(θ)u
Y
= a(θ) e b(yi )
i=1
= g(u, θ)h(y1 , . . . , yn )
Pn
where u = i=1 d(yi ). By the factorization theorem, u is sufficient for θ.

7. A binomial experiment consisting of n trials resulted in observations y1 , . . . , yn , where


yi = 1 if the ith trial was a success and yi = 0 otherwise.

(a) What is the likelihood function L(p) of the observed sample?


The probability for a single trial is pyi (1−p)1−yi , so the likelihood function
for n trials is
n
Y Pn Pn
L(p) = L(y1 , . . . , yn |p) = pyi (1−p)1−yi = p i=1 yi
(1−p)n− i=1 yi
= py (1−p)n−y
i=1
Pn
where y = i=1 yi .
Pn
(b) What are the possible values of y = i=1 yi ?
The values are 0, 1, . . . , n.
(c) Find the value of p that maximizes L(p). (Hint: Consider the extreme cases of y
separately.)
The extreme cases are y = 0 and y = n. If y = 0, then L(p) = (1 − p)n
and if y = 1, then L(p) = pn . In the first case L(p) is maximized if p = 0
and in the second it is maximized if p = 1.

4
For other values of n, differentiate the logarithm of L(p) = py (1 − p)n−y
with respect to p. We have
d ln(L(p)) d y n−y
= (y ln(p) + (n − y) ln(1 − p)) = − =0
dp dp p 1−p
y
Solve for p in terms of y to get p̂ = n
. It turns out that for y = 0
y
and y = n, the the maxima occur at n
as well. So the the maximum
likelihood estimator is p̂ = Yn .

8. Two different companies have applied to provide cable television service in Worcester.
Let p denote the proportion of all potential subscribers who favor the first company
over the second. Consider testing H0 : p = 0.5 versus Ha : p 6= 0.5 based on a random
sample of 25 individuals. Let the random variable X denote the number in the sample
who favor the first company and x represent the observed value of X.
(a) Which of the following rejection regions is most appropriate and why?
R1 = {x : x ≤ 7 or x ≥ 18} R2 = {x : x ≤ 8} R3 = {x : x ≥ 17}
R1 is most appropriate since it tests for any preference in either direction.
(b) Describe the corresponding (to your answer to (a)) type I and type II.
A type I error would be to reject the null hypothesis that the preferences
are the same when they are in fact the same. A type II error would be to
accept the null hypothesis that the preferences are the same when they
are not.
(c) What is the probability distribution of the test statistic X when H0 is true?
X is a binomial random variable with n = 25. When H0 is true, the
distribution is the sum
17  
X 25
0.5x 0.525−x
x=8
x

(d) Compute the probability of a type I error.


The probability of a type 1 error is the sum of the excluded terms from
(c).
7   25  
X 25 x 25−x
X 25
0.5 0.5 + 0.5x 0.525−x = 0.022 + 0.022 = 0.044
x=0
x x=18
x

where we used the binomial table with n = 25 and a = 7. Notice by


symmetry the second sum is the same as the first.
(e) Compute the probability of a type II error for the selected region when p = 0.3
For at type II error with p = 0.3, we must calculate the sum for when a
probability of p = 0.3 yields values not in the rejection region. We can
write this as a difference of sums so that we can use the binomial table:

5
17   17   7  
X 25 x 25−x
X 25 x 25−x
X 25
0.3 0.7 = 0.3 0.7 − 0.3x 0.725−x = 1−0.512 = 0.488
x=8
x x=0
x x=0
x

9. Are male college students more easily bored than their female counterparts? This
question was examined in the article Boredom in Young Adults Gender and Cultural
Comparisons (J. Cross-Cult. Psych., 1991: 209-223). The authors administered a test
called the Boredom Proneness Scale to 97 male and 148 female college students. The
results are below:
Gender Sample Size Sample Mean Sample SD
Male 97 10.40 4.83
Female 148 9.26 4.68

(a) At the 5% level of significance, does the data support the research hypothesis
that the mean Boredom Proneness Rating is higher for men than for women?
Assume that the population standard deviation is known and equal to the sample
standard deviation.
Note that the assumption is reasonable for large sample sizes since the
T -distribution and normal distribution are close.

Use the hypotheses: H0 : µm = µw and Ha : µm > µw , where µm and


µw are the means for men and women. The test statistic is:
(10.4 − 9.26) − 0
Z= q = 1.829
4.832 4.682
97
+ 148

Since Z for 0.5 is 1.645, we would reject the null hypothesis and conclude
the data does support that the rating for men is higher than that for
women.
(b) Construct a 90% Confidence Interval for the difference in male and female scores
on the Boredom Proneness Scale.
Notice that zα/2 = z0.95 = 1.645 from (a) and σθ̂ = 0.623. Then compute

θ̂ ± zα/2 σθ̂ = 1.14 ± 1.645 · 0.623 = 1.14 ± 1.025


As we would expect from (a), 0 is not contained in the confidence interval
leading to the same conclusion as (a).

10. For a normal distribution with mean µ and variance σ 2 = 25, an experimenter wishes
to test H0 : µ = 10 versus Ha : µ = 5. Find the sample size n for which the most
powerful test will have α = β = .025.
This is an application of the formula of Example 10.9.
(zα + zβ )2 σ 2 (1.96 + 1.96)2 25
n= = = 15.366
(µa − µ0 )2 (10 − 5)2
It follows that n = 16 observations is sufficient.

6
11. The following table gives the attendance at a racetrack, x, and the amount that was
bet, y on n = 10 randomly selected days.

Attendance 117 128 122 119 131 135 125 120 130 127
Amount bet (in millions) 2.07 2.80 3.14 2.26 3.40 3.89 2.93 2.66 3.30 3.54

(a) Make a scatter plot of y against x.


See the following plot (with the regression line).

Figure 1:

(b) Calculate the regression coefficients for a simple linear regression model for the
data.
Calculating with x = 125.4, y = 2.999,
10
X
Sxx = (xi − x)2 = 306.4
i=1
10
X
Syy = (yi − y)2 = 2.926
i=1
X10
Sxy = (xi − x)(yi − y) = 26.44
i=1
SSE = 0.644
β1 = Sxy /Sxx = 26.4/306.4 = 0.0863
β0 = y − β1 x = −7.824

(c) Make a scatter plot of the residuals. Does this indicate that simple linear regres-
sion provides a good model for the data?
From the following residual plot, we would conclude the regression line
is a good model for the data.

7
Figure 2:

(d) Find a prediction interval for the amount bet for the attendance value equal to
the mean attendance.
In the following, we use α = 0.05, n − 2 = 8 degrees of freedom, so
that tα/2 = 2.306. We want an estimate at x∗ = x = 125.4, n = 10,
S 2 = SSE
n−2
= 0.0805 The prediction interval is
s
1 (x∗ − x)2
β0 + β1 x∗ ± tα/2 S 1+ +
n Sxx
r
9
= −7.824 + 0.0863 · 125.4 ± 2.306 · 0.0805 = 2.998 ± 0.1969.
8
Or as an interval, (2.8011, 3.1949).
(e) What proportion of the total variation in the amount bet can be explained by the
attendance?
Calculate r2 = 1 − SSE/Syy = 1 − 0.644/2.926 = 0.7799. We would
conclude that approximately 78% of the variation in the amount bet can
be explained by the attendance.

You might also like