Download as pdf or txt
Download as pdf or txt
You are on page 1of 7

Universidad Carlos III de Madrid

Econometrics
Me+MIEM
Multiple Linear Regression. Estimation II
Problem Set 3_SOLUTIONS

1. (Yi ; X1i ; X2i ) satisfy the assumptions of multiple regression model RLM.1-RLM.4. You are interested in 1 ,
the causal e¤ect of X1 on Y: Assume that X1 and X2 are uncorrelated. 1 is estimated by regressing Y on X1
(so that X2 is not included in the regression). Does this estimator su¤er from omitted variable bias? Explain.
No, because when X1 and X2 are uncorrelated the omitted variable bias becomes equal to zero, i.e. if ~ 1 is the
OLS estimate using the simple regression model

Yi = 0 + 1 X1i + ui

instead of using the true regression model with both regressors,

Yi = 0 + 1 X1i + 2 X2i + Ui ; E [ Ui j X1i ; X2i ] = 0;

we can obtain that


~ !p Cov (X1i ; X2i )
1 + 2 :
1
V ar (X1i )
p
Then, when Cov (X1i ; X2i ) = 0 we obtain ~ 1 ! 1:
The drawback of omitting X2i in this case is that the contribution of 2 X2i moves into the new error, ui =
2 X2i + Ui ; which, despite being uncorrelated to X1i ; has larger variance than the error Ui of the bivariate
regression,
2
u = V ar ( 2 X2i + Ui )
= V ar ( 2 X2i ) + V ar (Ui ) + 2Cov ( 2 X2i ; Ui )
2 2
= 2 X2 + 2U + 0
2
> U;

unless 2 = 0; in which case X2i was actually not an omitted variable and ui = Ui :

1
2. (Yi ; X1i ; X2i ) satisfy the assumptions of multiple regression model RLM.1-RLM.4. Furthermore, V ar (Ui jX1i ; X2i ) =
4; and V ar (X1i ) = 6: A random sample of size n = 400 is drawn from the population.

(a) Suppose that X1 and X2 are uncorrelated. Calculate the variance of ^ 1 :


In this case we know that the OLS estimate of the multiple regression and the OLS estimate of the simple
regression using only X1 are equivalent and have the same variance.
Furthermore, in this case we also have homoskedasticity, V ar (Ui jX1i ; X2i ) = 4 = V ar (Ui ) ; so we can use
the simpli…ed formula for the variance of the OLS estimate of simple regression,
2
1 1 4 1
V ar ^ 1 2
U
= = = 0:00167:
n X1 400 6 600

[**These arguments are not strictly true as in the simple regression the error includes also the omitted part,
i.e. the actual error is Ui + 2 X2i not Ui ; and there is not guarantee either that V ar (Ui + 2 X2i jX1i ) =
V ar (Ui jX1i ) + V ar ( 2 X2i jX1i ) = 4 + 22 V ar (X2i jX1i ) is constant].
(b) Suppose that Corr(X1 ,X2 ) = 0:5. Calculate the variance of ^ 1 :
In this case the model is still homoskedastic, but the OLS estimates of simple and multiple regression are
no longer equivalent and we have to use the general formula deduced from the two-stages interpretation
of multiple regression (partitioned regression or Frisch-Waugh-Lowell Theorem):
1. First regress X1i on X2i and save the residuals ^"X1 i .
2. Obtain ^ as the OLS coe¢ cient of the simple "regression model" of Yi on the residuals ^"X i :
1 1

The residuals ^"X1 i of the …rst stage are estimating the errors "X1 i of the BLP of X1 on X2 ; which have
variance 2" = 2X1 1 R12 ; where R12 is the population R2 of this …rst stage regression, i.e.
X1

R12 = 2
X1 ;X2 = Corr(X1 ; X2 )2 = 0:52 :

Then, the variance of ^ 1 obtained in the second regression can be calculated using the expression of the
variance of simple regression OLS under homoskedasticity,
2 2
1 1 1 4
V ar ^ 1 U
= 2
U
= = 0:00222:
n 2
"X
n X1 (1 R12 ) 400 6 (1 0:52 )
1

(c) Comment the following statements: "If X1 and X2 are correlated, the variance of ^ 1 is bigger than it
would be if X1 and X2 were uncorrelated. Therefore, if we are interested in 1 ; is better to leave X2 out
of the regression if it is correlated with X1 :"
While it is true that the variance of multiple regression estimates can be larger than those of simple
regression estimates, if X1 and X2 are correlated the OLS estimates in a simple regression omitting X2
are biased and the magnitude of the bias does not decrease with sample size.

2
3. A school district runs an experiment to estimate the e¤ect of class size on obtained test scores in the second
year exams. The district allocates 50% of its …rst-course student from the previous year to second-course small
classes (18 students per class) and 50% to classes of normal size (21 students per class). New students of
district are treated di¤erently: 20% are randomly assigned to small classes and 80% to normal size classes. At
the end of the course for second-course students, each student is subjected to a standardized test. Let Yi the
grade obtained by the ith student, X1i is a binary variable equal to 1 if the student is assigned to a small class
and X2i is a binary variable that takes the value 1 if the student is incoming. Let 1 be the causal e¤ect on
test scores of reducing class size from a normal size to a small size.

(a) Consider the regression Yi = 0 + 1 X1i + Ui : Do you think that E ( Ui j X1i ) = 0? Is the OLS estimator
unbiased and consistent? Explain.
Treatment (assignment to small classes) was not randomly assigned in the population (the continuing
and newly-enrolled students) because of the di¤erence in the proportion of treated continuing and newly-
enrolled students. Thus, the treatment indicator X1 is correlated with X2 . If newly-enrolled students
perform systematically di¤erently on standardized tests than continuing students (perhaps because of
adjustment to a new school), then this becomes part of the error term u in (a). This leads to correlation
between X1 and u, so that E ( Ui j X1i ) 6= 0. Because E ( Ui j X1i ) 6= 0, the ^ 1 is biased and inconsistent.
(b) Consider the regression Yi = 0 + 1 X1i + 2 X2i + Ui Do you think that E ( Ui j X1i ; X2i ) depends on
X1i ? Explain. Do you think that E ( Ui j X1i ; X2i ) depends on X2i ? Explain. Will the 2 OLS estimator
provide an unbiased and consistent estimation of the causal e¤ect of the change to the new school (that
is, of being an incoming student)? Explain.
Because treatment was randomly assigned conditional on enrollment status (= X2 = continuing or newly-
enrolled), E ( Ui j X1i ; X2i ) will not depend on X1 . This means that the assumption of conditional mean
independence is satis…ed, and ^ 1 is unbiased and consistent. However, because X2 was not randomly
assigned (newly-enrolled students may, on average, have attributes other than being newly enrolled that
a¤ect test scores), ^ 1 may depend of X2 ; so that ^ 2 may be biased and inconsistent.

3
4. Using the data set CollegeDistance carry out the following exercises.

(a) Run a regression of years of completed education (ED) on distance to the nearest college (Dist). What is
the estimated slope?
-0.073
(b) Run a regression of ED on Dist; but include some additional regressors to control for characteristics of
the student, the student’s family and the local labor market. In particular, include as additional regressors
the variables Bytest; F emale; Black; Hispanic; Incomehi; Ownhome, DadColl; Cue80; and Stwmf g80:
What is the estimated e¤ect of Dist on ED?
-0.032
(c) Is the estimated e¤ect of Dist on ED in the regression in (b) substantively di¤erent from the regression
in (a)? Based on this, does the regression in (a) seem to su¤er from important omitted variable bias?
The coe¢ cient has fallen by more than 50%. Thus, it seems that result in (a) did su¤er from omitted
variable bias.
(d) Compare the …t of the regression in (a) and (b) using the regression standard errors, R2 and R2 : Why are
the R2 and R2 so similar in regression (b)?
The regression in (b) …ts the data much better as indicated by the R2 (0.2788 compared to 0.0074), R2 ,
and SER. The R2 and R2 are similar because the number of observations is large (n = 3796).
(e) The value of the coe¢ cient on DadColl is positive. What does this coe¢ cient measure?
Students with a “dadcoll = 1” (so that the student’s father went to college) complete 0.696 more years
of education, on average, than students with “dadcoll = 0” (so that the student’s father did not go to
college).
(f) Explain why Cue80 and Swmf g80 appear in the regression. Are the signs of their estimated coe¢ cients
(+ or -). What would you have believed? Interpret the magnitudes of these coe¢ cients.
These terms capture the opportunity cost of attending college. As Stwmf g80, the 1980 state hourly
wage in manufacturing, increases, forgone wages increase, so that, on average, college attendance declines.
The negative sign on the coe¢ cient is consistent with this. As Cue80, the county unemployment rate,
increases, it is more di¢ cult to …nd a job, which lowers the opportunity cost of attending college, so that
college attendance increases. The positive sign on the coe¢ cient is consistent with this.
(g) Bob is a black male. His high school was 20 miles from the nearest college. His test score (Bytest) was 58.
His family income in 1980 was $26,000 and his family owned a home. His mother attended college, but
his father did not. The unemployment rate in his county was 7.5%, and the State average manufacturing
hourly wage was $9.74. Predict the number Bob’s years of completed schooling using the regression in
(b) :
Bob’s predicted years of education are 14.79.
d (Bob)
ED = ^ + ^ dist + ^ Bytest + ^ F emale + ^ Black + ^ Hispanic + ^ Incomehi
0 1 2 3 4 5 6
^ ^ ^
+ Ownhome + DadColl + Cue80 + ^ Stwmf g80
7 8 9 10
= ^ + ^ 2 + ^ 58 + ^ 0 + ^ 1 + ^ 0 + ^ 1
0 1 2 3 4 5 6
+ ^ 1 + ^ 0 + ^ 7:5 + ^
7 8 9 9:74 10
= 8:828 0:0315 2 + 0:094 58 + 0:145 0 + 0:368 1 + 0:399 0 + 0:395 1
+0:152 1 + 0:696 0 + 0:023 7:5 0:052 9:74
= 14:79

Jim has the same characteristics as Bob except that his high school was 40 miles from the nearest college.
Predict Jim’s years of completed schooling using the regression in (b) :
Jim’s expected years of education is 0.0630 less than Bob’s. Thus, Jim’s expected years of education is
14.72:
d (Jim)
ED d (Bob)
ED = ^
1 dist = ^ 1 (4 2) = 2 ^ 1
= 2 ( 0:0315) = 0:0630:

4
5. A researcher plans to study the causal e¤ect of police on crime based on data from a random sample of US
counties. A regression of the county’s crime rate on the size (per capita) of the county police corps is proposed.

(a) Explain why this regression is likely to have an omitted variable bias. What variables would you add to
the regression to control for the important omitted variables?
The regression considered is
crime = 0 + 1 police + u1
but there are other important determinants of a country’s crime rate, including demographic characteristics
of the population, e.g. income per capita, (youngs’) unemployment rate, age composition, etc.
(b) Use your answer in (a) and the expression of the omitted variable bias to determine if the regression is
likely to over or underestimate the e¤ect of the police on the crime rate (i.e. do you think that ^ 1 > 1
or that ^ 1 < 1 ?)
Suppose that the crime rate is positively a¤ected by the fraction of young males in the population, and
that counties with high crime rates tend to hire more police. In this case, the size of the police force is
likely to be positively correlated with the fraction of young males in the population leading to a positive
value for the omitted variable bias so that ^ 1 > 1
h i Cov (ymales; police)
E ^1 = 1 + 2
V ar (police)
= 1 + (+) (+)
> 1

where the covariance between both regressors is positive, as well as 2; the coe¢ cient of the proportion
of young males

5
6. This problem deals with the di¤erence between lineal and causal relation and the misspeci…cation bias. Given
two variables Y and X; we know that

E ( Y j X) = 0 + 1 log X;

where 0 and 1 are two unknown parameters. We know that 1 6= 0: However, we estimate the following
model by OLS
Y = 0 + 1X + "; (1)
where 0 and 1 are unknown parameters, and we know that the error term " satis…es E (") = E ("X) = 0:

(a) Establish the relation between 1 and 1 :


The conditions E (") = E ("X) = 0 imply that Cov (X; ") = 0 and therefore, using (1) ; be have that
1 = Cov (Y; X) =V ar (X) : Then

Cov (Y; X)
1 = by de…nition of regression coe¢ cient in (1)
V ar (X)
Cov ( 0 + 1 log X + U; X)
= expression of Y = E ( Y j X) + U
V ar (X)
1 Cov (log X; X) + Cov (U; X)
= properties of Cov
V ar (X)
Cov (log X; X)
= 1 because Cov (U; X) = 0
V ar (X)

because Cov (U; X) = 0 as E ( U j X) = 0:


0 0
(b) Establish the relation between the 1s OLS estimator in model (1) and the 1s OLS estimator in the
model
Y = 0 + 1 log X + U;
where U is an error term.
The relationship between both OLS coe¢ cients is the approximately the same as between the correspond-
ing population coe¢ cients but replacing population moments by sample moments, i.e.

d
^1 ^ Cov (log X; X) ;
1
Vd ar (X)

as can be checked by

d (Y; X) d ^ 0 + ^ 1 log X + U
Cov ^; X
Cov
^1 = =
Vd
ar (X) Vdar (X)
d d U
Cov ^; X
= ^ Cov (log X; X) +
1
Vd ar (X) Vdar (X)
d
^ Cov (log X; X)
1
Vd ar (X)

d U
because Cov ^; X d (U; X)
Cov ^
Cov (U; X) = 0 as U ^ are the residuals of the correct
U; because U
model that is estimated consistently, and E ( U j X) = 0:

6
ANSWERS:

4. (a) -0.073
(b) -0.032
(c) The coe¢ cient has fallen by more than 50%. Thus, it seems that result in (a) did su¤er from omitted
variable bias.
(d) The regression in (b) …ts the data much better as indicated by the R2 , R2 , and SER. The R2 and R2 are
similar because the number of observations is large (n = 3796).
(e) Students with a “dadcoll = 1” (so that the student’s father went to college) complete 0.696 more years of
education, on average, than students with “dadcoll = 0” (so that the student’s father did not go to college).
(f) These terms capture the opportunity cost of attending college. As ST W M F G, the 1980 state hourly wage
in manufacturing, increases, forgone wages increase, so that, on average, college attendance declines. The
negative sign on the coe¢ cient is consistent with this. As CU E80, the county unemployment rate, increases, it
is more di¢ cult to …nd a job, which lowers the opportunity cost of attending college, so that college attendance
increases. The positive sign on the coe¢ cient is consistent with this.
(g) Bob’s predicted years of education are 14.79.
(h) Jim’s expected years of education is 0.0630 less than Bob’s. Thus, Jim’s expected years of education is
14.69.

You might also like