Assignment 02 AK

ECON 333 D100 Statistical Analysis of Economic Data
Spring 2024
Assignment 02
1. In a survey of 320 likely voters, 174 responded that they would vote for the incumbent, and 146
responded that they would vote for the challenger. Let 𝑝 denote the fraction of all likely voters who
preferred the incumbent at the time of the survey, and let 𝑝̂ be the fraction of survey respondents who
preferred the incumbent.
a. Use the survey results to estimate 𝑝.

174
The estimate of p is the sample proportion 𝑝̂ calculated from the sample. 𝑝̂ = = 0.54375.
320
𝑝̂(1−𝑝̂)
b. Use the estimator of the variance of 𝑝̂ , , to calculate the standard error of your
𝑛
estimator.
̂ 𝑝̂(1−𝑝̂)
The estimated variance of 𝑝̂ is 𝑣𝑎𝑟 (𝑝̂ ) = = 0.00077527. The estimated standard error is
𝑛
̂
the square root of that estimated variance, 𝑆𝐸(𝑝̂ ) = √𝑣𝑎𝑟 (𝑝̂ ) = 0.02784.
c. What is the 𝑝-value for the test of H0: 𝑝 = 0.5 vs. H1: 𝑝 ≠ 0.5?
𝑝̂−𝑝 0.54375−0.5
The t-statistic from this sample is 𝑡 = 𝑆𝐸(𝑝̂)0 = 0.02784
= 1.571.
The p-value = 2Φ(−|𝑡|) = 2Φ(−1.571) = 2 × 0.05855 = 0.1171, if I get the numbers

calculated by a statistical software.
The p-value = 2Φ(−|𝑡|) = 2Φ(−1.571) = 2 × 0.0582 = 0.1164, if I look the numbers up in

the Z-table [Table 1 in the appendices, page 721 in the textbook].
We are OK to go with t-statistic and/or use the standard normal distribution for these
calculations because the sample size is large (n = 120 and up is good; although, one would have
to be careful if the proportion gets far away from 0.5 and very close to either 0 or 1).
d. What is the 𝑝-value for the test of H0: 𝑝 = 0.5 vs. H1: 𝑝 > 0.5?
The p-value = 1 − Φ(−|𝑡|) = 1 − Φ(−1.571) = 0.05855, if I get the numbers calculated by a
statistical software.
The p-value = 1 − Φ(−|𝑡|) = 1 − Φ(−1.571) = 0.0582, if I look the numbers up in the Z-table
[Table 1 in the appendices, page 721 in the textbook].
e. Why do the results from (c) and (d) differ? Explain in your own words.
1
Part (c) is a two-sided test and the p-value is the area in the tails of the standard normal
distribution outside ± (calculated t-statistic). Part (d) is a one-sided test and the p-value is the
area under the standard normal distribution to the right of the calculated t-statistic.
f. Did the survey contain statistically significant evidence that the incumbent was ahead of the
challenger at the time of the survey? Explain in your own words.
For the test H0: p = 0.5 versus H1: p > 0.5, we cannot reject the null hypothesis at the 5%
significance level. The p-value 0.058 is larger than 0.05. Equivalently the calculated t-statistic
1.571 is less than the critical value 1.6496 for a one-sided test with a 5% significance level. The
test suggests that the survey did not contain statistically significant evidence that the incumbent
was ahead of the challenger at the time of the survey.
2. Use the spreadsheet CPS96_15 (see the assignment instructions), which contains an extended version
of the data set used in Table 3.1 of the text for the years 1996 and 2015. It contains data on full-time
workers, ages 25–34, with a high school diploma or a B.A./B.S. as their highest degree. A detailed
description is given in CPS96_15_Description (see the assignment instructions). Use these data to
complete the following:
a. Compute the sample mean for average hourly earnings (AHE) in 1996 and 2015. Copy &
paste your R output here.
> aggregate(ahe~year, data=Asmt02, FUN=function(Asmt02) c(mean=mean(Asmt02),

c(sd=sd(Asmt02)), c(se=sd(Asmt02)/sqrt(length(Asmt02)), count=length(Asmt02))
))
year ahe.mean ahe.sd ahe.se ahe.count
1 1996 15.7413111 6.7191494 0.2705028 617.0000000
2 2015 24.6560529 12.3677404 0.4999363 612.0000000
It’s the same output for the two questions, a and b (I just used a one-line command to get all of it
+ check the counts, to be sure).
A simple alternative is just to go with the shortest direct commands. Some examples:
> mean(subset(Asmt02$ahe, Asmt02$year == 2015))
[1] 24.65605
> mean(Asmt02$ahe[which(Asmt02$year == 2015)])

[1] 24.65605
b. Compute the sample standard deviation for AHE in 1996 and 2015. Copy & paste your R
output here.
> sd(subset(Asmt02$ahe, Asmt02$year == 1996))
[1] 6.719149
> sd(subset(Asmt02$ahe, Asmt02$year == 2015))
[1] 12.36774
c. Construct a 95% confidence interval for the population means of AHE in 1996 and 2015.
2
> ahe1996 <- lm(Asmt02$ahe[which(Asmt02$year == 1996)] ~ 1)
> ahe2015 <- lm(Asmt02$ahe[which(Asmt02$year == 2015)] ~ 1)
> confint(ahe1996, level=0.95)
2.5 % 97.5 %
(Intercept) 15.21009 16.27253
> confint(ahe2015, level=0.95)
2.5 % 97.5 %
(Intercept) 23.67425 25.63785
d. Construct a 95% confidence interval for the change in the population means of AHE between
1996 and 2015.
> diff <- mean(Asmt02$ahe[which(Asmt02$year == 2015)])-mean(Asmt02$ahe[which(

Asmt02$year == 1996)])
> sd2015 <- sd(Asmt02$ahe[which(Asmt02$year == 2015)])
> sd1996 <- sd(Asmt02$ahe[which(Asmt02$year == 1996)])
> n2015 <- length(which(Asmt02$year == 2015))
> n1996 <- length(which(Asmt02$year == 1996))
> se_of_diff <- sqrt((sd2015*sd2015/n2015)+(sd1996*sd1996/n1996))
> print(diff)
[1] 8.914742
> print(se_of_diff)
[1] 0.5684259
So here we have the difference between the two sample means (diff) and the standard error of the
difference (se_of_diff) – you can check this one out on page 78 of the textbook, formula 3.19. A
pooled standard error is not appropriate as the data do not indicate the same population variance in
2015 and in 1996 (the sample standard deviations are quite different). Then, use formula 3.21 (page 78 in
the textbook) to calculate the lower and upper limit of the confidence interval:
> diff-1.96*se_of_diff
[1] 7.800627
> diff+1.96*se_of_diff
[1] 10.02886
No need to worry about t rather than Z as the sample is large.
Hints for (a) and (b):
Here’s an example of ‘Copy & paste your R output here:’
For the mean (a):

> library("readxl")
> A2 <- read_excel(file.choose())
3
> samplemean <- mean(A2$ahe)
> print(samplemean)
[1] 17.28735
For the standard deviation (b):

> stdev <- sqrt(sum((A2$ahe-samplemean)^2/(length(A2$ahe)-1)))
> print(stdev)
[1] 10.76467
Notice that I used different ways to get the answers, just to remind you about R’s flexibility
and your opportunities to play with the software.

Assignment 02 AK

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Assignment 02 AK

Uploaded by

Copyright:

Available Formats

ECON 333 D100 Statistical Analysis of Economic Data

a. Use the survey results to estimate 𝑝.

The p-value = 2Φ(−|𝑡|) = 2Φ(−1.571) = 2 × 0.05855 = 0.1171, if I get the numbers

The p-value = 2Φ(−|𝑡|) = 2Φ(−1.571) = 2 × 0.0582 = 0.1164, if I look the numbers up in

> aggregate(ahe~year, data=Asmt02, FUN=function(Asmt02) c(mean=mean(Asmt02),

> mean(Asmt02$ahe[which(Asmt02$year == 2015)])

> diff <- mean(Asmt02$ahe[which(Asmt02$year == 2015)])-mean(Asmt02$ahe[which(

No need to worry about t rather than Z as the sample is large.

Hints for (a) and (b):

Here’s an example of ‘Copy & paste your R output here:’

For the mean (a):

For the standard deviation (b):

You might also like