Professional Documents
Culture Documents
Assignment 02 AK
Assignment 02 AK
Spring 2024
Assignment 02
1. In a survey of 320 likely voters, 174 responded that they would vote for the incumbent, and 146
responded that they would vote for the challenger. Let 𝑝 denote the fraction of all likely voters who
preferred the incumbent at the time of the survey, and let 𝑝̂ be the fraction of survey respondents who
preferred the incumbent.
𝑝̂(1−𝑝̂)
b. Use the estimator of the variance of 𝑝̂ , , to calculate the standard error of your
𝑛
estimator.
̂ 𝑝̂(1−𝑝̂)
The estimated variance of 𝑝̂ is 𝑣𝑎𝑟 (𝑝̂ ) = = 0.00077527. The estimated standard error is
𝑛
̂
the square root of that estimated variance, 𝑆𝐸(𝑝̂ ) = √𝑣𝑎𝑟 (𝑝̂ ) = 0.02784.
c. What is the 𝑝-value for the test of H0: 𝑝 = 0.5 vs. H1: 𝑝 ≠ 0.5?
𝑝̂−𝑝 0.54375−0.5
The t-statistic from this sample is 𝑡 = 𝑆𝐸(𝑝̂)0 = 0.02784
= 1.571.
1
Part (c) is a two-sided test and the p-value is the area in the tails of the standard normal
distribution outside ± (calculated t-statistic). Part (d) is a one-sided test and the p-value is the
area under the standard normal distribution to the right of the calculated t-statistic.
f. Did the survey contain statistically significant evidence that the incumbent was ahead of the
challenger at the time of the survey? Explain in your own words.
For the test H0: p = 0.5 versus H1: p > 0.5, we cannot reject the null hypothesis at the 5%
significance level. The p-value 0.058 is larger than 0.05. Equivalently the calculated t-statistic
1.571 is less than the critical value 1.6496 for a one-sided test with a 5% significance level. The
test suggests that the survey did not contain statistically significant evidence that the incumbent
was ahead of the challenger at the time of the survey.
2. Use the spreadsheet CPS96_15 (see the assignment instructions), which contains an extended version
of the data set used in Table 3.1 of the text for the years 1996 and 2015. It contains data on full-time
workers, ages 25–34, with a high school diploma or a B.A./B.S. as their highest degree. A detailed
description is given in CPS96_15_Description (see the assignment instructions). Use these data to
complete the following:
a. Compute the sample mean for average hourly earnings (AHE) in 1996 and 2015. Copy &
paste your R output here.
It’s the same output for the two questions, a and b (I just used a one-line command to get all of it
+ check the counts, to be sure).
A simple alternative is just to go with the shortest direct commands. Some examples:
> mean(subset(Asmt02$ahe, Asmt02$year == 2015))
[1] 24.65605
b. Compute the sample standard deviation for AHE in 1996 and 2015. Copy & paste your R
output here.
> sd(subset(Asmt02$ahe, Asmt02$year == 1996))
[1] 6.719149
> sd(subset(Asmt02$ahe, Asmt02$year == 2015))
[1] 12.36774
c. Construct a 95% confidence interval for the population means of AHE in 1996 and 2015.
2
> ahe1996 <- lm(Asmt02$ahe[which(Asmt02$year == 1996)] ~ 1)
> ahe2015 <- lm(Asmt02$ahe[which(Asmt02$year == 2015)] ~ 1)
> confint(ahe1996, level=0.95)
2.5 % 97.5 %
(Intercept) 15.21009 16.27253
> confint(ahe2015, level=0.95)
2.5 % 97.5 %
(Intercept) 23.67425 25.63785
d. Construct a 95% confidence interval for the change in the population means of AHE between
1996 and 2015.
So here we have the difference between the two sample means (diff) and the standard error of the
difference (se_of_diff) – you can check this one out on page 78 of the textbook, formula 3.19. A
pooled standard error is not appropriate as the data do not indicate the same population variance in
2015 and in 1996 (the sample standard deviations are quite different). Then, use formula 3.21 (page 78 in
the textbook) to calculate the lower and upper limit of the confidence interval:
> diff-1.96*se_of_diff
[1] 7.800627
> diff+1.96*se_of_diff
[1] 10.02886
3
> samplemean <- mean(A2$ahe)
> print(samplemean)
[1] 17.28735
Notice that I used different ways to get the answers, just to remind you about R’s flexibility
and your opportunities to play with the software.