Econ140 Spring2016 Section05 Handout Solutions

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 5

Econ 140 - Spring 2016

Section 5
GSI: Fenella Carpena
February 18, 2016

1 Hypothesis Testing for Regression Coefficients


Exercise 1.1. (Adapted from Stock & Watson, Exercise 5.1) Suppose that a researcher, using data on
class size (CS) and average test scores from 100 third-grade classes, estimates the OLS regression

\ = 520.4 − 5.82 ∗ CS,


T estScore R2 = 0.08, SER = 11.5
(20.4) (2.21)

(a) Test H0 : β1 = −1.5 vs. Ha : β1 < −1.5 at 5% significance level using a t-statistic.
• t-stat = (−5.82 + 1.5)/2.21 = −1.95
• One-sided critical value = −1.64
• We reject H0 at the 5% level since −1.95 < −1.64

(b) Calculate the p-value for the two-sided test of the null hypothesis H0 : β1 = 0. Do you reject the null
hypothesis at the 5% level? At the 1% level?
• t-stat = (−5.82 − 0)/2.21 = −2.63
• p-value = 2 · P (Z > | − 2.63|) = 2 · 0.0043 = 0.0086
• We reject H0 at the 5% level since 0.86% < 5%
• We reject H0 at the 1% level since 0.86% < 1%

(c) Calculate the p-value for the two-sided test of the null hypothesis H0 : β1 = −5.6. Without doing any
additional calculations, determine whether −5.6 is contained in the 95% confidence interval for β1 .
• t-stat = (−5.82 + 5.6)/2.21 = −0.1
• p-value = 2 · P (Z > | − 0.1|) = 2 · 0.4602 = 0.9204
• We fail to reject H0 at the 5% level since 92.04% > 5%. Hence −5.6 is contained the 95% confidence
interval.

(d) Construct a 95% confidence interval for β1 .


βb1 ± 1.96 · SE(βb1 ) = −5.82 ± 1.96 · 2.21 = (−10.152, −1.4884)
(e) Construct a 99% confidence interval for β0 .
βb0 ± 2.58 · SE(βb0 ) = 520.4 ± 2.58 · 20.4 = (467.7, 573.0)

1
2 Dummy Variables
Exercise 2.1. Suppose that the Berkeley Undergraduate Program Office wishes to examine the potential
educational benefits of providing students with an iPad. Using data from 141 students, the table below
summarizes the mean and SD of the variable collGPA (college GPA) for students with and without an iPad.

n Mean SD
Without iPad 85 2.989 0.321
With iPad 56 3.159 0.421

(a) Let X 1 be the sample mean collage GPA of students with an iPad, and X 0 be the sample mean collage
GPA of students without an iPad. Calculate:
(i) X 1 − X 0
3.159 − 2.989 = 0.17

q 2 1 −2 X 0 )q
(ii) SE(X
s1 s0 0.4212 0.3212
n1 + n0 = 56 + 85 = 0.06616

(b) The program office does not plan to provide iPads to students unless there is evidence that having an
iPad increases students’ GPAs. What is the one-sided null and alternative hypothesis that the program
office would like to test?
Let µ1 bet the population mean GPA of students with an iPad. Let µ0 be the population mean GPA of
students without an iPad. The one-sided hypothesis we want to test is H0 : µ1 −µ0 = 0, H1 : µ1 −µ0 > 0.

(c) Carry out the test from part (b) at the 1% level. Do you reject or fail to reject the null hypothesis?
Using t-statistic:
X 1 −X 0 −0
• t-stat = SE(X 1 −X 0 )
= 0.17/0.06616 = 2.5695
• One-sided critical value = 2.33
• We reject H0 at the 1% level since 2.5695 > 2.33
Using p-value:

• p-value = P (Z > 2.569) = 0.005


• We reject H0 at the 1% level since 0.05% < 1%

(d) Can the difference in the GPA between students with and without an iPad be attributed to iPad own-
ership alone? Explain why or why not.
No. For example, maybe students who have an iPad are also richer (because iPads are expensive), and
richer students do better in school because they have more money and resources to spend on tutoring,
etc. In this case, the difference in GPA between iPad owners and non-iPad owners could be due to
wealth as well.

2
(e) Suppose that an analyst at the program office instead estimated a regression of college GPA on HasIP AD,
a dummy variable equal to 1 if the student has an iPad, and 0 otherwise, obtaining the following regres-
sion results. Interpret each of the coefficients in the regression.

Linear regression Number of obs = 141


F( 1, 139) = 6.57
Prob > F = 0.0114
R-squared = 0.0500
Root MSE = .36419

------------------------------------------------------------------------------
| Robust
colGPA | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
HasIPAD | .1695168 .0661386 2.56 0.011 .038749 .3002846
_cons | 2.989412 .0349104 85.63 0.000 2.920388 3.058436
------------------------------------------------------------------------------

• βb0 = 2.989. This is the sample average of students without an iPad.


• βb0 = 0.1695. The difference in the sample average GPA of students without an iPad. Those with
an iPad have 0.170 higher GPA.

(f) Consider again the null and alternative hypothesis from part (b).
(i) How can you use the regression coefficients from part (e) to test the same null and alternative
hypothesis as in part (b)?
(ii) Carry out the hypothesis test using the regression coefficients at the 1% level.
(iii) What does question (f)(i) and (f)(ii) tell us about how we can use regressions to carry out a test
for the differences of means across two groups?

(i) We can test H0 : β1 = 0, H1 : β1 6= 0.

(ii) The t-stat = βb1 /SE(βb1 ) = 0.16951/0.0661 = 2.566. The one-sided critical value is 2.33. Therefore we
reject H0 since 2.566 ¿ 2.33.

(iii) Note that the t-stat we get here is the same as the t-stat from part (c), save for rounding differences.
Testing the difference of means H0 : µ1 − µ0 = 0, H1 : µ1 − µ0 > 0 is equivalent to testing H0 : β1 = 0,
H1 : β1 > 0, where β1 is the coefficient for iP AD, the variable that defined the 2 groups (iPad owners
vs. non-iPad owners).

3
3 Heteroskedasticity and Homoskedasticity
Exercise 3.1. (Stock & Watson, Review the Concepts 5.3) Define homoskedasticity and heteroskedasticity.
Provide a hypothetical empirical example in which you think the errors would be heteroskedastic and explain
your reasoning.

• We say that the population error term ui is homoskedastic if var(ui |Xi ) is constant for all i. Oth-
erwise, we say that ui is heteroskedastic.
• An example in which the errors are likely heteroskedastic is a regression of wages on schooling, as
follows:
wagesi = β0 + β1 schoolingi + ui .
◦ If ui is homoskedastic, both var(ui |schoolingi ) and var(wagesi |schoolingi ) are constant.
◦ In particular, homoskedasticity would imply that var(wagesi |schoolingi ) is the same for all school-
ing levels. Another way of saying this is that the variability of wages around its mean is the same
regardless of educational attainment.
◦ Homoskedasticity is not realistic in this case because it is likely that people with more education
have wider job opportunities, which could lead to more variability in wages. In contrast, people
with low education levels have fewer opportunities and probably work minimum wage jobs, so
there is less dispersion of wages among the uneducated.
◦ In sum, we would expect that variability in wages is higher for the highly educated, and the
variability in wages is low for those with low levels of schooling. Therefore, in this example, the
errors ui are likely heteroskedastic.

Exercise 3.2. (Stock & Watson, Exercise 5.5, 5.6) In the 1980s, Tennessee conducted an experiment in
which kindergarten students were randomly assigned to “regular” and “small” classes and given standardized
tests at the end of the year. (Regular classes contained approximately 24 students, and small classes contained
approximately 15 students.) Suppose that, in the population, the standardized tests have a mean score of
925 points and a standard deviation of 75 points. Let SmallClass denote a binary variable equal to 1 if the
student is assigned to a small class, and 0 otherwise. A regression of T estScore on SmallClass yields

\ = 918.0 + 13.9 ∗ SmallClass,


T estScore R2 = 0.01, SER = 74.6
(1.6) (2.5)

(a) Do small classes improve test scores? By how much? Is this effect large? Explain.
The estimated gain from being in a small class is 13.9 points. This is a moderate increase, since 13.9 is
about 1/5 of the standard deviation of test scores, which is 75.
(b) Is the estimated effect of a class size on test scores statistically significant? Carry out the test at the 5%
level.
Here, we are considering H0 : β1 = 0, H1 : β1 6= 0. The t-stat = 13.9/2.5 = 5.56. The critical value is
1.96. Since |5.56| > 1.96, we reject H0 at the 5% level. The estimated effect of class size on test scores
is statistically significant.
(c) Construct a 99% confidence interval for the effect of SmallClass on T estScore.
13.9 ± 2.58 · 2.5 = 1.39 ± 6.45
(d) Do you think that the regression errors are plausibly homoskedastic? Explain.
The question asks whether the variability in test scores in large classes is the same as the variability
in small classes. It is hard to say. On the one hand, teachers in small classes might be able to spend
more time bringing all of the students along, reducing the poor performance of particularly unprepared
students. On the other hand, most of the variability in test scores might be beyond the control of the
teacher.

4
(e) SE(β̂1 ) was computed using heteroskedastic standard errors. Suppose that the regression errors were
homoskedastic: Would this affect the validity of the confidence interval constructed in part (c)? Explain.
The CI would still be valid. The SEs were computed using heteroskedastic-robust SEs, which are valid
whether the true population regression errors are homoskedastic or heteroskedastic.

You might also like