3334 Exam Cheat Sheet

ECON3334 Introduction to Econometrics Part 3.
(62 pts)
Final Exam, Fall 2021
There are three parts, 10 questions in total. Let us know immediately if there are questions missing in your Q10. Alice wishes to estimate the average causal effect of X on Y. She collects an i.i.d. dataset
exam paper. The total points are 100. of X, Y, and a control variable W.
Part 1 (28 pts). True or False. Provide a brief (one or two sentences) explanation for every a) (4 pts) Alice first plots the scatter plot between X (horizontal axis) and Y (vertical axis):
question.
Scatterplot
Q1. Under imperfect multicollinearity, OLS estimator must be inconsistent.
20
Q2. Omitting a variable that has nonzero effect on the dependent variable must lead to omitted
variable bias.
15
Q3. In general, one cannot only rely on LASSO to drop bad controls.
Y
Q4. If the variable of interest ! is randomized, then including more variables in the regression
10
always makes the standard error of "# on ! smaller.
Q5. An insignificant estimate implies the true effect must be zero.
5
Q6. For a Wald test, $-value equal to 0.02 means that the test is significant at 5% level.
0 2 4 6 8 10
Q7. In a linear regression model with multiple regressors, Alice wants to test a joint hypothesis
of two slope coefficients equal to zero. She conducts two t-test and rejects the null when both t- X
statistics (in absolute value) are greater than the 5% critical value. The probability of making a
Type 1 error is no greater than 5%. Suppose Alice only wishes to know the sign of the effect of ! on %, can she just run a global
linear regression using all the data points? Why? Will your answer change if she wishes to know
Part 2 (10 pts). Write a one/two-sentence explanation to each regression diagnosis problem. the magnitude as well?
Q8. Alice wants to study how hard-working causally affects students’ academic performance. b) (13 pts) Alice decides to split the sample into two halves. One with ! ≤ 6 and the other with
She surveyed a random sample of students and obtained their CGA and 3 different variables ! > 6. The regression results are as follows (the numbers in parentheses are standard errors):
measuring effort: how many days in a typical week they go to the library, how many lectures
they skip in a typical semester, and how often they discuss questions with peers or professors.
Subsample Regression with ! ≤ 6: %= = 0.1 + 3.3!
Suppose all these three effort variables are highly correlated. Now Alice runs a regression of
(0.5) (3)
CGA on these three variables in a regression. Surprisingly, she finds that none of the t-test for
the effect of these effort variables is significant (null is 0), contradicting the commonly believed
hypothesis that hard-working improves academic performance. Alice suspects that it’s because Subsample Regression with ! > 6: %= = 13 + 0.5!
the standard errors are too big. What is the most possible reason for that in this case? (1) (0.1)
Let the true slope coefficient when ! ≤ 6 be "! and the true slope coefficient for ! > 6 by "" .
Q9. Suppose Alice reads a paper which says mother smoking during pregnancy decreases a Answer the following questions:
newborn’s birthweight, and a newborn’s birthweight positively affects her/his future test scores. • Calculate the H-values (in absolute value) for "! and "" respectively. (Set the null
Inspired by this, Alice wishes to estimate the causal effect of mother smoking in pregnancy on hypothesis for each t-test to be 0.)
children’s test scores. She has a random sample of three variables: a variable of children’s test • Decide whether the results are statistically significant or not at 1%, 5%, and 10% levels.
scores, a dummy indicating whether mother smoked in pregnancy, and a variable of children’s (The critical values are 2.6, 2, and 1.6 respectively.)
birthweight. She regresses test score on the smoking dummy and birthweight. Recall that her • If in fact the true I# is positive, discuss why you get a significant or non-significant
variable of interest is smoking. Can she get a good estimate of the causal effect she’s interested result as there are only a few data points for J ≤ K.
in?
c) (10 pts) From the scatter plot, "" looks different from than "! . To formally test this
hypothesis, Alice constructs two dummy variables L! and L" . Specifically, g) (8 pts) Unfortunately, in the above estimation, P(V|S) ≠ 0, so unconfoudness P(V|!, S) =
0 does NOT hold.
L! = 1 if ! ≤ 6 and equal to 0 if ! > 6. • Under what assumptions can U! still be consistently estimated?
L" = 1 if ! > 6 and equal to 0 if ! ≤ 6. • Suppose P(V|S = −1) = 2, P(V|S = 1) = −2. Derive the numbers that can be put in
the blanks to make the equality hold: P(V|S) = ___ + ___S.
Then Alice runs the following regression (!L! means the product of ! and L! . !L" means the • Which assumption in the assumptions you specified in the first bullet point is satisfied
product of ! and L" ): then?
% = M$ + M! ! + M" !L! + M% !L" + N
• Prove that this regression suffers from perfect multicollinearity. [End of the exam.]
• By dropping which regressor can you resolve this issue?
• Choose one to drop, and based on the resulting model, write the null hypothesis that “the
marginal effects of ! on % for ! ≤ 6 and for ! > 6 are equal” in terms of the coefficients
in your regression.
d) (12 pts) Alice solves the perfect multicollinearity issue by rewriting the model as:
% = M$ + "! !L! + "" !L" + N, P(N|!, L! , L" ) = 0
• What are the interpretations of "! and "" ?

• How to express “the marginal effect of X on Y when ! > 6 is three times as large as
that when ! ≤ 6” in terms of the "s?
• Now treat it as a null hypothesis, use the “transform the regression” approach to derive
how to test it.
For part e) to part g), use the following information:
Now Alice includes a control variable W. The distribution of W is
Pr(S = −1) = Pr(S = 1) = 0.5
Imagine that after including S, the true model becomes
% = U$ + U! ! + U" S + V
where V is an error term.
e) (7 pts) Suppose ! = 1 − S + W with P(W|S) = 0. If U" ≠ 0, will there be omitted variable

bias if S is not included? Why?
f) (8 pts) Alice includes S in the regression. She conducts a Wald test for Y$ : U! = U" = 0. She
gets test statistic equal to 7. For the 1%, 5%, and 10% significance level, the critical values for
Wald test are
Degree of freedom = 1: 6.76, 4, 2.6
Degree of freedom = 2: 9.21, 5.99, 4.6
Degree of freedom = 3: 11.34, 7.8, 6.25
What is the degree of freedom of this null? At which level is the result significant? Why?
Menu of Module 8
Econ 3334
Module 8 II. Econometrics III. Bias-
Linear Regression with Too I. What is “Big
and Machine variance
Many Regressors (Big Data): Data”?
Learning tradeoff
Estimation
Department of Economics, HKUST
Instructor: Junlong Feng
Fall 2022
IV. LASSO
2
I. What is “Big Data” I. What is “Big Data”
• We are in an era of information explosion. • The term Big Data bears different meanings in different contexts.
• Data are generated and recorded every second. • When thinking about data storage and computation, volume of the data may matter
• It’s usually a good thing to have many data. most.
• KB, MB, GB, TB, what’s the next?
• A few decades ago, 30-60 data points were called a large sample.
• More information. • PB (Petabytes), EB (Exabytes), ZB (Zettabyte)…
• Check this if you are interested: Opportunities & Challenges: Lessons from Analyzing Terabytes of
• Closer to asymptotia.
Scanner Data, a lecture given at Harvard by Serena Ng from Columbia.
• But new challenges also arise.
• When thinking about how to tamer the data to extract useful information from it,
• How to store and read the data?
• How to get computation done fast enough?
dimension and structure of the data matter more.
• How many variables does the data set contain?
• Theory?
• Network structures.
3 4
I. What is “Big Data” I. What is “Big Data”
• In the course, we only focus on one type of Big Data: High dimensional data. Why is high dimensionality so special? An example:
• Dimension of data may mean different things.
• Suppose you have an independent sample !! : # = 1, … , ( .
• Example 1:
!! = #" + %#! ## + ⋯ %$! #$ + '! • However, you don’t believe the !! s have identical distribution.
when ! is close to ", ! ≈ ", or ! > ", our data set as well as our model is high-dimensional. • Insisting that they have different means, you propose to estimate each mean
• Example 2: Suppose we observe returns of " stocks over % periods separately.
{'!" : ) = 1, … , ", . = 1, … , %} • Then this is all you can do: *̂ ! = !!
We can stack all these returns in as follows:
'## ⋯ '#$ • We’ve shown in Module 3 that *̂ ! is not consistent for any #.
⋮ ⋱ ⋮
'%# ⋯ '%$
This format of listing items is called a matrix. When " and % are both large, a high-dimensional matrix.
• In this course, we only focus on the case in Example 1.
5 6
I. What is “Big Data” II. Econometrics and Machine Learning
The example is a big data problem: A set of tools developed by computer scientists and statisticians to handle some of the
• Recall that estimating group mean is the same as running a regression of ! on group Big Data problems is called Machine Learning (ML) methods.
dummies. • But…what is a machine? What is learning?
• The example is equivalent to a regression of ! on ( dummy variables (or fixed
effects, for those of you who know this jargon):
!! = +"1 # = 1 + +#1 # = 2 + +$1 # = 3 ⋯ + +% 1 # = ( + 0!
• Immediately, the number of regressors is the same as the sample size.
7 8
II. Econometrics and Machine Learning II. Econometrics and Machine Learning
ML has a lot in common with stats/metrics, only under different names. Machine learning can be divided into supervised learning and unsupervised learning
according to whether there is a label, i.e., dependent variable.
Machine Learning Stats/Metrics
Example or instance or observation Data point or observation
• Supervised learning: there is a label.
• Regression analysis belongs to this.
Label Dependent variable
• Another example is classification: Classify observations into pre-set categories. Logit/Probit.
Feature Independent variable or regressor or
predictor
Learning Estimation
Machine Estimator (sometimes regression)
Training set The sample used for estimation
9 10
II. Econometrics and Machine Learning III. Bias-variance tradeoff
Machine learning can be divided into supervised learning and unsupervised learning Now let’s focus on supervised learning. The high-dimensional problem causes huge
according to whether there is a label, i.e., dependent variable. variance:
• Unsupervised learning: there is no label. • Recall that when !! has different means, for each # we use !! to estimate *! .
• Extract main characteristics from a dataset. • Effective sample size is 1.
• Our Example 2: Returns of " stocks over % periods. Recall we have a large "×% matrix. Maybe all "
of these stocks are driven by a small number of economic factors. Recover a low-rank matrix • The variance of the estimator is large because variance is proportional to .
from it. No label. &'()*+ &!,+
• Another example: Signal denoising. • Consistency does not hold because the variance of the estimator does not converge
• Machines: (robust) principal component analysis (PCA). to 0 as ( → ∞.
• But, in Chapter 3 we did see that !! is an unbiased estimator because 3 !! = *! .
• What if we sacrifice unbiasedness to reduce the variance?
11 12
III. Bias-variance tradeoff IV. LASSO
• More generally, when we run a regression with a lot of regressors (i.e., a large p), LASSO: Least Absolute Shrinkage and Selection Operator.
each coefficient, roughly speaking, only obtain (/5 data points for estimation. • First introduced in geophysics literature in 1986.
• When 5 is close to (, or even larger than (, relative sample size is small. • Independently rediscovered by Robert Tibshirani (1996).
6
• Result in large variance of the +s. • From the name, lasso makes all the coefficients smaller (“shrinkage”).
• Consistency may fail as a consequence. • The already small coefficients will be shrunk to 0.
• Idea: Introduce a biased estimator which has a smaller variance. • Then the number of effective regressors becomes much smaller than 5 and (.
13 14
IV. LASSO IV. LASSO
LASSO minimizes the following objective function: To see why lasso is a shrinkage estimator, imagine ( = 1 and 5 = 1.
% )
# • Suppose our data is !, = = (6,2).
min : !! − <"="! − ⋯ <) =)! + > : <1
-! ,…,-" • We are looking for a < such that the following is smallest:
!0" 10"
6 − 2< # + > <
• All the data are standardized (=: demean, divide by sample sd. !: demean). So no
• The OLS estimator is when > = 0, so +6 345 = 3.
need to include intercept. (+2 = 3 ! − 3 +"=" + ⋯ + +) =) − 3(@) = 0)
) • Now suppose > = 1, substitute +6 345 into it: 0 + 3 = 3. But consider +F = 2.9:
• The only difference between lasso and OLS is > ∑10" |<1 |, which we call penalty or
regularization. 6 − 2×2.9 # + 2.9 = 2.94 < 3
So OLS estimator will no longer be the minimizer. This +F is better than +.
6
• This type of estimator that has a penalty term is called penalized estimator or
regularized estimator. • The OLS estimator only makes the first term smallest.
• The coefficient > is a constant. • For the penalty to be small, < simply need to be small.
15 16
IV. LASSO IV. LASSO
To see why lasso is a shrinkage estimator, imagine ( = 1 and 5 = 1. We should also expect to see that a larger penalty constant >, the more severe the
• Suppose our data is !, = = (0.2,2). shrinkage will be.
• We are looking for a < such that the following is smallest:
0.2 − 2< # + > < • But why does shrinkage reduce variance?
• The OLS estimator is when > = 0, so +6 345 = 0.1. #
• Consider a generic random variable L whose variance is M6 .
• Now suppose > = 1, substitute +6 345 into it: 0 + 0.1 = 0.1. But consider +F = 0: • Shrink it by NL where 0 < N < 1.
# #
0.2 − 2×0 # + 1×0 = 0.04 < 0.1 • The variance of NL is N #M6 < M6 .
So when OLS estimator is sufficiently small, LASSO will just make it exactly 0.
• As a consequence, the corresponding regressor will be dropped.
17 18
IV. LASSO IV. LASSO
• So what LASSO penalty > ∑1 <1 does is to shrink the OLS estimator towards 0. A numerical example.
• One regressor and one datapoint.
• And if the original OLS estimator is already small, meaning that the regressor is not
important, the estimator will be set to 0 and the regressor is dropped. • I = 1, J = 1.
• By the way, this is why standardization of the regressors is recommended: • Lasso objective function:
min 1 − P ' + R|P|
• Without standardization, different covariates have different scales, and thus the &
magnitudes of their coefficient can be arbitrary. • Generate 401 points to guarantee that 0
• E.g. ! = +"=" + +#=# + @ = 1000+" 0.001=" + +#=# + y. is in the sequence.
• By standardization, all =s have the same unit.

• Then the +s are comparable. When we set those who are small equal to zero, we are
really discarding those almost unnecessary regressors compared with the others.
19 20
IV. LASSO IV. LASSO 30
Legend Legend
LASSO Obj LASSO Obj
A numerical example. A numerical example.

30 SSR SSR
• One regressor and one • One regressor and one

datapoint. datapoint. 20
• I = 1, J = 1. 20 • I = 1, J = 1.
SSR
SSR
• Lasso objective function: • Lasso objective function:
min 1 − P ' + R|P| min 1 − P ' + R|P| 10
& &
• R = 0, PY = 1
10
• R = 0, PY = 1
• R = 2.5, PY = 0 • R = 1, PY = 0.5
0 0
−4 −2 0 2 4 −4 −2 0 2 4
x x
21 22
IV. LASSO IV. LASSO
When there is one regressor with ( obs., LASSO has an analytical solution: • LASSO selects a subset of regressors whose effect on Y is relatively large.
• Under the assumption that the true effects of the unselected regressors are indeed
0.5> 0, the estimates are consistent, but biased.
+6 47553 = max +6 345 − , 0 , if +6 345 ≥ 0 • To remove the bias, need to run another regressor by standard OLS only including
∑%!0" =!#
the regressors selected by LASSO.
0.5> • For causal inference:
+6 47553 = min +6 345 + , 0 , if +6 345 < 0 • Let your key variable be I and control variables be \# , … , \( .
∑%!0" =!# • First regress J on \# , … , \( by LASSO.
• Then regress ] on \# , … , \( by LASSO.
• Keep the union of the \s selected by the previous two regressions. Regress J on I and these
\s by OLS. Conduct inference in the usual way.
• R package “hdm” does all these steps for you.
23 24
Menu of Module 7.IV Regression Diagnosis

Econ 3334
Module 7 (Cont’d) IV.I IV.II IV.III
Linear Regression with
A Pretty Linear Get the Regression Significance Not
Multiple Regressors: World Under Control Too Significant
Inference
Fall 2022
IV.IV
From Data to Table
2
IV.I A Pretty Linear World IV.I A Pretty Linear World
For many people, perhaps the first issue they find in linear regression is, so, why Sometimes the true regression line is naturally linear with no assumptions.
linear? • Suppose ! ∈ {0,1}. What’s the conditional expectation of ( given !?
• The world is indeed linear: Our model tells the truth. • )((|!) can only be one of ) ( ! = 1 and ) ( ! = 0 . So
• The world is nonlinear but can be linearly approximated : Our model is a good ) ( ! =) ( ! =0 + ) ( ! =1 −) ( ! =0 ⋅!
approximation. • ) ( ! = 1 and ) ( ! = 0 are just two numbers. Let’s call them 1! and 1".
• Then
) ( ! = 1! + 1"!
• We didn’t make any assumptions on the functional form of ) ( ! . It’s linear by
definition.
3 4
Another justification: Suppose you suspect that )((|!) is not linear but quadratic in • What if ! takes on more than two values?
!: • Return to education: ! ∈ {ℎ%&ℎ '(ℎ))*, ()**,&, -.)/)01', ()**,&, 23- 24)5,}
) ( ! = 3! + 3"! + 3#! # • Let 78 = 1 if ! = ℎ%&ℎ '(ℎ))*, = 0 otherwise
• But recall ! ∈ {0,1}, so ! # = !. So ) ( ! = 3! + 3" + 3# ! • Let <= = 1 if ! = ()**,&, -.)/)01', = 0 otherwise
• Let <> = 1 if ! = ()**,&, 23- 24)5,, = 0 otherwise
• This holds for any polynomials of !!
• Let ? be income.
• For other functions, say ) ( ! = 4! sin 4"! @ ? ! = @ ? 78, <=, <>
= @ ? 78 = 1 ⋅ 78 + @ ? <= = 1 ⋅ <= + @ ? <> = 1 ⋅ <>
When ! = 1, sin 4"! = sin 4" ! ≡ D! 78 + D" <= + D# <>
When ! = 0, sin 0 = 0 = sin 4" ⋅ ! ≡ E$ + E! <= + E" <>
…
• So still ) ( ! = sin 4" ! ≡ 1! + 1"!.
• Takeaway: for a categorical variable taking on more than 2 values, the conditional
expectation of ? on it is still LINEAR in the dummies of all the categories.
5 6
• What if you have two regressors !" and !#? • More often in practice: Let J and L denote the female dummy and the male
• !" ∈ {ℎ:;ℎ <=ℎ>>?, =>??@;@ AB>C>DE<, =>??@;@ FGA FH>I@} dummy.
• !# ∈ {J@KF?@, LF?@} • Instead of creating the 6 dummies,
) ( !", !# = ) ( MN, OP, OQ, J, L
• In this case, there are six categories. It’s equivalent to have six dummies for: = 1"MN + 1#OP + 1$OQ + 1%MN×J + 1&OP×J + 1'OQ×J
Female high school, female college dropout, female college and above • Exercise: show this model is equivalent as the 6 dummies model in the previous
Male high school, male college dropout, male college and above slide.
• The conditional expectation ) ( !", !# must be linear in these 6 dummies under • This is called a saturated model. It is linear, and fully captures the conditional
the reasoning of the previous slide. expectation without any assumptions.
7 8
When the regressor is not Scatterplot 1

When the regressor is not Scatterplot 1
categorical, nonlinearity may be categorical, nonlinearity may be
1500
1500
natural. natural.
• Both ! and " are continuous; no • The red straight line is the fitted
way to make a saturated model. linear regression model !# = &%! +
1000
1000
• The relationship between X and Y &%" "
Y
Y
is obviously nonlinear. • The blue line is the true function
500
500
• Example: " distance to hospital, ! of ! of " up to the errors.

mortality rate. • They are different at almost every
0
0
• But is a linear model too wrong? point.

0 2 4 6 8 • But they are similar. 0 2 4 6 8
X X
9 10
• Whether the red line, i.e., a linear Sometimes a linear approximation

regression model is acceptable,
Scatterplot 1 Scatterplot 2
misses the point:
depends on two things:
1500
• The relationship between " and !
8
• 1. Do you only care about the sign again looks nonlinear.
of the effect?
6
1000
• The slope estimate is obviously • Also there’s not an obvious trend.

4
positive, while the blue curve is also
increasing everywhere.
Y
Y
500
• 2. At which point of " do you care

about the effect most?
0
• The true marginal effect:

0
% & ' = ) + 1 − %(&|' = )) now

−2
depends on ).
• The red curve approximates the effect 0 2 4 6 8 −2 −1 0 1 2
around ) = 3 well, but does not do a X X
good job for super small or big ).
11 12
Sometimes a linear approximation Scatterplot 2 In both examples, the fundamental issue of the linear approximation is that it
misses the point: extrapolates too much.
• Linear regression (red) says the
8
effect of " on ! is zero. • A linear function 1! + 1"S is used to approximate the function ) ( ! = S for a
range of ! that is too wide.
6
• But the truth (blue) is, the effect is

• Will be much better if we only look at smaller windows of S, and in each window,
4
negative first and then positive.

Y
estimate a linear model.

2
0
−2
−2 −1 0 1 2
13 14
• Now we estimate ! = &! + &" " + Scatterplot 2

The smaller the window you choose, the better a linear function approximates the true
( on two subsamples: " > 0 and relationship.
" < 0. • Recall the saturated model. When the windows are shorter, the model is more saturated!
8
• The blue and the green straight • Unfortunately there is NO FREE LUNCH
6
lines are the corresponding fitted
regression lines.
4
• The shorter the windows for ! are, the smaller the sample size is in each window.
• The bias decreases but the variance goes up!
Y
• They look much better.

2
• Recall the variance of your estimator is proportional to 1/#.

0
• This is an example of the famous Bias-Variance Tradeoff in any branch of data science.
−2
• We will see more such tradeoffs later.

−2 −1 0 1 2
• Of course you may use a nonlinear function as your model in the first place, but the
X
question is, there is only one linear function but infinite many nonlinear ones. How can you
know which to use?
15 16
IV.I A Pretty Linear World IV.II Get Regression Under Control
Main takeaways from IV.I Not all regressors are born equal.
• Linearity is not a crazy assumption. • In economic applications, we often have a variable of interest T. We care about its
causal effect on (.
• It’s even not an assumption for saturated models.
• However, it is rare to see a regression with a single regressor.
• For regressors where making a saturated model is impossible, PLOT your data first!
• The other regressors included in the regression is called control variables, or simply,
• It’s very likely to observe nonlinearity in the plot. controls.
• DON’T PANIC! • To emphasize that we are treating them differently, let’s write the regression as
(( = 4! + 1 !( + 4"U"( + ⋯ 4) U)( + @(
• Decide whether to run a global linear regression or a bunch of local linear
regressions based on the answers to: • Although under unconfoundedness, 1, 4", 4#, … , 4) all have causal interpretation;
• Does the nonlinear relationship increase/decrease uniformly? each of them is a marginal average treatment effect, in most cases we add
• Dose the slope change too much over the span of "? U"( , … , U)( NOT because we are interested in their effects.
17 18
IV.II Get Regression Under Control IV.II Get Regression Under Control: OVB
Controls are added to a regression for one or more of the following three reasons: As we have known, omitting variables that are
• Control the omitted variable bias (OVB). • Correlated with our variable of interest !
• Control SE. • Affecting (
• Show robustness. would lead to OVB.
But in practice, how do we know which variables would cause OVB if omitted?
19 20
IV.II Get Regression Under Control: OVB IV.II Get Regression Under Control: OVB
Example 1. Black, D. A., Smith, J. A., Berger, M. C., & Noel, B. J. (2003). Is the threat of Example 1. Black, D. A., Smith, J. A., Berger, M. C., & Noel, B. J. (2003). Is the threat of
reemployment services more effective than the services themselves? Evidence from reemployment services more effective than the services themselves? Evidence from
random assignment in the UI system. American Economic Review, 93(4), 1313-1327. random assignment in the UI system. American Economic Review, 93(4), 1313-1327.
1. Research question: Whether a mandatory re-training program is effective to • The authors found proper controls to kill OVB by exploiting the institutional
increase earnings of the unemployed later on? background.
2. Determine the dependent variable (() and the variable of interest (!). • The eligibility criteria was known: Eligibility was solely determined based on some
• Y? X? personal characteristics and past unemployment and job histories.
3. Determine whether there is OVB if no controls are included. • If there were more workers who met the criteria, determined by a lottery.
• Since this is a mandatory program, self selection may not be a concern. • Conditional on the characteristics, participation is purely determined by the lottery
• However, if the eligibility criteria are correlated with future income, OVB may still be an issue. and thus independent of any other unobserved factors.
• All of these characteristics were available in their dataset.
21 22
Example 2. Angrist, J. D. (1998). Estimating the Labor Market Impact of Voluntary Example 2. Angrist, J. D. (1998). Estimating the Labor Market Impact of Voluntary
Military Service Using Social Security Data on Military Applicants. Econometrica, 249- Military Service Using Social Security Data on Military Applicants. Econometrica, 249-
288. 288.
1. Research question: What’s the effect of voluntary military service on the later • Josh Angrist attempted to solve the problem by
earnings of soldiers?
2. Determine the dependent variable (() and the variable of interest (!). 1. Only using data from veterans and nonveterans who applied.
• Y? X? • This way, the individuals in the sample at least all had the same motivation.
3. Determine whether there is OVB if no controls are included. 2. Then controlling observed covariates like age, schooling, and test scores.
• Unlike the re-training program, now offering military service is voluntary. • The controls are both correlated with results of their applications and their later income.
• Huge concerns of self-selection: I want to join the army because I…, which would … my income if • By carefully choosing the sample and controls, the paper argues that the only
I did not join.
• Then participation is correlated with such self-characteristics which also affect income. difference in the income of the veterans and nonveterans is military service.
• Is the approach perfect? What if a qualified applicant dropped out?
23 24
However, not all controls can kill OVB. 1. Bad control

• We already learned that the controls themselves need to be exogenous, i.e., • You can not use a G that is also a consequence of H as a control!
) @( !( , U"( , … , U)( = 0. !
• There are two other issues: bad control and bad proxies.
• The examples used are adapted from Chapter 3 of Mostly Harmless Econometrics by Angrist and ?
Pischke (Princeton University Press, 2008).
U
• This is easy to illustrate in linear models: if G is linear in !, G and ! are perfectly
multicollinear.
• But the actual reason is pretty deep.
25 26
1. Bad control: A return to education example 1. Bad control

• Suppose Alice wants to know the causal effect of going to college (!) on income (?). • A rule of thumb: think twice to control variables that realize after the timing of the
• Alice thinks that income is affected by occupation (G) as well. variable of interest.
• Meanwhile, occupation is correlated with going to college. • Another type of controls that are not very likely to be bad controls are variables that
• Therefore she includes dummies for occupations as controls. are at a more aggregated level.
• The problem is that occupation is part of the consequence of going to college. • Example: individuals may be affected by the entire economic environment, but the latter is not
very likely to be a consequence of any single individual’s economic behavior.
• In fact, even if going to college is randomized, including occupation causes bias.
• Conditional on occupation, you are actually comparing the incomes of
A. People who go to college and have the occupation and
B. People who do not go to college but still have the same occupation.
• These two groups could be fundamentally different in unobservables.
27 28
2. Bad proxies 2. Bad proxies

• A similar type of controls are bad proxies. • Post-graduate IQ score is part of a consequence of education.
• Fall into the bad control problem.
• A proxy is a variable that approximates the variable we want but cannot observe.
• Conditional on the same post-graduate IQ, you are comparing income of people who
• In the return to education example, we want to include ability, but ability is not A. Attain more education and
observable. B. Attain less education
• Suppose we can observe post-graduate IQ score (proxying late ability) and pre- • If we believe that post-graduate IQ is positively associated with more education, then group
education IQ score (proxying innate ability). A has lower ability compared with group B.
• Therefore, the income of A is lower than those who have the same level of education of A
• Which one is better? and same ability as B.
• Your estimate is downward biased.
• Now run a regression without the proxy. By the OVB formula, the estimate is upward biased!
29 30
IV.II Get Regression Under Control: OVB IV.II Get Regression Under Control
2. Bad proxies Controls are added to a regression for one or more of the following three reasons:
• You got an upper and a lower bound for the average marginal effect of education on • Control the omitted variable bias (OVB).
income!!! • Control SE.
• But you can not use the standard t-tests to do inference. Mutiple testing problem.
• Bonferroni correction: construct two t-stats but use the critical values for ,/2 instead of ,. • Show robustness.
• This example schooled us with one important lesson:
There may be no perfect regression, but a “wrong regression” with careful critical
thinking may still yields part of the truth.
31 32
IV.II Get Regression Under Control: Control SE IV.II Get Regression Under Control: Control SE
Even if your regressor of interest ! is independent with everything potentially in the A simulated example
unobservable, there are still reasons to include controls. • Independently draw 200 points for
• The first reason is to reduce the standard error. ", /" and ( from 0(0,1).
• Consider two regressions: • Construct ! = 1 + " + /" + (.
?1 = K$ + D!1 + K! G1 + ,1 • Although /" affects !, it’s not
?1 = K$ + D!1 + 01 correlated with ", so omitting it
does not cause OVB.
where ! ⊥ G ⊥ ,.
• Model1=lm(Y~X)
• D is the marginal causal effect of ! (controlling G constant) in both regressions.
• Model2=lm(Y~X+W1)
• OLS is consistent for D in both regressions.
• From the results, estimates of &
• What about the SEs? This depends on whether K! = 0. on " are both good, but the SE in
model2 is much smaller.
• SE matters because a large SE makes the confidence interval too wide. Imprecise estimates.
33 34
IV.II Get Regression Under Control: Control SE IV.II Get Regression Under Control: Control SE
A simulated example Takeaways:

• Now draw 200 points of /# from • When OVB is no longer a concern, can still include relevant controls (but not bad
0(0,1)
controls of course!) to reduce the standard errors.
• /# does not affect ! at all.
• Irrelevant controls, on the contrary, increase the SEs.
• The table shows what would
happen if we throw /# into the
regression.
• Model3=lm(Y~X+W1+W2)
• The SEs of coefficients on " and
/" both increase!
35 36
IV.II Get Regression Under Control: Show robustness IV.II Get Regression Under Control
The second reason to include controls even when the variable of interest has no The last question is, how many controls should I include?
confounding issues is to show robustness. • Given all the previous guidelines, in fact the number of “good controls” may be quite
• If there was OVB before I add a control, the new estimate must be different. limited even if you have a dataset with many variables.
• A contrapositive argument: If my estimate does not change after a control is added, • Throwing in tons of regressors is a very bad practice.
my variable of interest exogenous and my results are robust. • This is called kitchen sink regression.
• Even if your goal is prediction, kitchen sink leads to overfitting.
• For causal inference, the more you add, the riskier a problematic control is
introduced to the model.
• Don’t start from big to small.
• Small to big starting from your variable of interest is more natural.
37 38
IV.II Get Regression Under Control IV.II Get Regression Under Control
You may have heard of techniques called AIC and BIC which claim to tell you how many Summary:
regressors to include. • Adding controls may be useful no matter whether the variable of interest is already
• Just like R square or the adjusted R square, these data-driven methods are only exogenous.
appropriate to forecast or predict. • Be very careful which control to choose. Need to avoid controls that are themselves
• In general, do not use them in causal inference without a strong justification. endogenous (unconfoundedeness failure), consequences of the variable of interest
• They are useful in some sub-problems in a big causal inference question, but that’s beyond the (bad control problem), and/or irrelevant in the regression (pushing up the SEs).
scope of this course.
• Sometimes, including controls or not, the estimate is biased. DO NOT PANIC!
• Machine learning techniques can also select regressors. Again, using them without Comparing the results may still tell you part of the story.
disciplines harms causal inference.
• No rules for how many controls to have. But try not to add too many.
39 40
IV.III Significance Not Too Significant IV.III Significance Not Too Significant
Too often, you may find your estimate not significant… Technically, significance or not depends on your null hypothesis. But economists often, if not
always, implicitly assume in their mind that the null is zero.
• This is because we care about causal effect and the most distinguished case is zero effect.
T-stat looks too good. 2
3
Use robust standard errors-- • Under a zero null, whether your estimate is significant 2 depends on two things: the
45 3
significance gone. magnitude of estimate DM and standard error 8@ DM .
---------- A Haiku by Keisuke Hirano • It’s all about relativity:
• Your estimate may be insignificant even under great precision. E.g. %$ = 0.001 while &' %$ =
0.0008.
• Your estimate may be significant even it’s not precise at all. E.g. %$ = 100 while &' %$ = 50.
• He wrote a lot. Find them all at https://keihirano.github.io/haiku.html. Super fun to
read when you are tired of econometrixing. • So when your estimate is not significant, maybe it’s just that the true effect is too small.
• Truthfully report this is not problematic at all.
41 42
But suppose your reader is obsessed with significance, how do we achieve a smaller Imperfect multicollinearity. Suppose we run two regressions:
SE? (( = 1! + 1"!( + D( 1
• First order effect: sample size. (( = 1! + 1"!( + 1#U( + @( 2
• There’s a factor of 1/n in SE. Denote the OLS estimator for model 1 by 1Y " and the other by 1Y #. Suppose
• Sample size thus has the dominant effect on SE, and in turn, significance.
• 1Y " is biased and inconsistent but significant.
• When you have a small sample, having an insignificant estimate is not a shame at all.
• Second order effects: imperfect multicollinearity. • 1Y # is unbiased and consistent but not significant.
• We have seen this in Module 6. Then there are several cases.
• When " and a control / are highly correlated, omitting / leads to OVB, but including / drives
the SE upward and may lead to an insignificant estimate.
• What should we do?
43 44
• 1Y " and 1Y # have different signs or if they have the same sign but 1Y " > 1Y # Finally, as the other determinant of significance, 1Y needs to be correctly estimated.
1. Do two things. A. Wald test for &" = &# = 0. B. Compute the correlation between " and /.
2. If the Wald is significant and " and / are much correlated, tell the reader that the effect is
• In finite sample, there are many issues that may make 1Y unreasonably small.
&% # but the standard error is contaminated by multicollinearity. • One issue is outlier, even for small outliers.
• 1Y " and 1Y # have the same signs and 1Y " < 1Y # . Congratulations! Now both • Like imperfect multicollinearity, small outliers does not cause theoretical problem.
estimates are useful.
• &% " , a biased and inconsistent estimator, now is a lower (in magnitude) bound for the consistent • But in finite sample, it may downward bias the estimate.
one &% # .
• The lower bound is significantly different from zero.
• Then the true effect must be different from zero.
• Provide Wald test and correlation between " and / to further support this.
• Provide economic intuition why the OVB is towards zero for further evidence.
• This is again a Bias-Variance Tradeoff.
45 46
A simulated example We can either view it as an outlier

Scatterplot 3
problem, or violation of linearity. Scatterplot 3
• Independently draw 100 points for

6
6
" and ( from 0(0,1). • Including all the data points
imposes potential (downward)
5
5
• Construct ! = 2 + " + (. finite sample bias, but SE is
4
4
smaller because of a larger 6
• For " > 1, replace ! with 2 + (.
3
3
• Only including a subsample may
Y
• Red line: estimate a linear model have a larger estimate but SE is

2
using the full sample. also larger because of a smaller 6.

1
• Blue line: estimate a linear model • Again, bias-variance tradeoff, but

this time, both cases lead to a
0
dropping the outliers.

small 7 value.
−2 −1 0 1 2 −2 −1 0 1 2
• In practice, plot the data and try
both.
X X
47 48
Now you can see significance depends on many factors. Pursuing significance is no longer desired.
• Significantly different from 0 has no empirical content in the first place when the • When your estimate is insignificant, explain why. Small sample size?
magnitude of effect is huge. Multicollinearity? Small outliers?
• Even for smaller effects, a small sample size or imperfect multicollinearity may screw • Provide evidence (Wald test, plot, ect.), and conduct some robustness checks (drop
up the SE, but that’s not evidence for the nonexistence of a causal effect. outliers, drop highly correlated controls) and see whether the situation is improved.
• Conventionally, people mark the estimates with stars as superscripts when they • Confidence intervals are much more informative. For example with a confidence
report the results. *** refers to significant at 1% level, ** at 5% and * at 1%. interval [−0.5, 5], you know your estimate is not significant, but with pretty high
• The wind is changing and it has been abandoned in top economic journals: probability it’s not negative and even if it is, the magnitude is bounded to be small.
“In tables, please report standard errors in parentheses but do not use ∗s to report
signiﬁcance levels.” −−−−− −Submission guidelines, QK@B:=FG )=>G>K:= h@I:@i
49 50
IV.IV From Data to Table IV.IV From Data to Table
Now it’s time to put things together. 3. Try to find exogenous variables from RCT, policy shock and/or natural experiment
1. Come up with an economic question. Identify what’s your variable of interest and (earthquake, rainfall) as your variable of interest.
what’s the possible outcome. 4. Try hard to think what controls to use. Economic theories may be helpful here.
2. Find a dataset. 5. Plot your ( and ! before you run regression. See if data tell the story in your head.
• It’s common to have missing values in a dataset.
Also see if there are nonlinearity or outliers.
• Usually, information is missing without any pattern (missing at random). Just ignore it (recall one
midterm question). 6. Run regression. You may run different specifications with/without controls, and/or
• Sometimes some values of the regressors are not observed. For instance you want to control age, using subsamples when nonlinearity or outliers are spotted.
but the max age in your sample is 30. Not a problem as well. Your estimated average causal
effect would be averaged for that subgroup. • These different specs. are not cherry-picking.
• Sometimes some values of the dependent variable are not observed. This is a problem. It biases • They need to be run with justification: bias-variance tradeoff in various situations we discussed.
your estimation. For instance, you want to estimate how price affect willingness to buy. You use
observed demand to measure the latter. When demand hits zero, willingness to buy may keep
decreasing but is no longer reflected in data. Bias arises.
51 52
IV.IV From Data to Table IV.IV From Data to Table

This is taken from p.46 of
Mostly Harmless Econometrics
7. Present your results in a table. • -: log wage. .: years of
• No need to report the coefficients and SEs of the controls, especially when there are many. schooling.
• Report all your specs. for robustness. • 5 specs with different set of
8. Explain your results by proposing mechanisms behind. controls.
• Why does the effect have this sign? • None is perfect.
• Why does it have this magnitude? Is it realistic? • You see a decreasing trend
• If the standard error is small, why is that? when more controls are
• How do you compare results in different specs.? added. Upward OVB.
9. Think about to what extent the results can be generalized. • (4) and (5) suffer from bad
control problem, lower
• Would the effect of STR on test score in the US be equal to that in HK? bounds.
• This question is called external validity.
• SEs in (4) and (5) increase
• Even if your regression is internally valid, it may not be generalized to other populations. because . could be much
• Be careful to make a statement that is way beyond the scope of your sample! correlated with AFQT and
occupation.
53 54
Menu of Module 7
Econ 3334
Module 7
I. Inference II. Inference III. Inference
Linear Regression with about one about multiple about functions
Multiple Regressors: coefficient coefficient of coefficients
Inference
Fall 2022
IV. Regression
diagnosis
2
I. Inference about one coefficient I. Inference about one coefficient: Testing
!! = #" + ##%#! + #$%$! + ⋯ #% %%! + '! !! = #" + ##%#! + #$%$! + ⋯ #% %%! + '!
• Estimate (#", … , #% ) using OLS. • So far everything is same as the single variable regression.
• Under the four assumptions (unconfoundedness, i,i.d., no large outliers, and no • For a hypothesis K": #& = #&" vs. K#: #& ≠ #&", construct the t-statistic:
perfect multicollinearity), the OLS estimators are unbiased, consistent, and
asymptotically normal. #0& − #&"
N) =
• For any , ∈ 0, … , / , #0& is approximately distributed as 1 #& , 2('$ . 34 #0&
!
• 2(' is unknown. In practice, replaced with a consistent estimator 34 #0& . • Reject the null if N) > Q#*"
! #
• Formula for 34 #0& available but too nasty without matrix algebra. • For R = 0.01,0.05,0.1, reject the null if N) > 2.58,1.96,1.64.
• 34 #0& can be computed by software.
3 4
I. Inference about one coefficient: Testing I. Inference about one coefficient: Confidence interval
!! = #" + ##%#! + #$%$! + ⋯ #% %%! + '! !! = #" + ##%#! + #$%$! + ⋯ #% %%! + '!
• So far everything is same as the single variable regression. • So far everything is same as the single variable regression.
• For the same hypothesis K": #& = #&" vs. K#: #& ≠ #&", we can also compute /-value • A 1 − R -confidence interval for #& (that is, a random interval that covers #& with
by probability 1 − R ):
/ = 2Φ(−[) ]^ 1 − R = #0& − 34 #0& ⋅ Q#*+ , #0& + 34 #0& ⋅ Q#*+

$ $
• Reject the null at R level if / < R.

• For any null K": #& = #&", reject at R level if #&" ∉ ]^ 1 − R for the corresponding
Q#*" .
#
5 6
I. Inference about one coefficient: Heteroscedasticity I. Inference about one coefficient: Example
!! = #" + ##%#! + #$%$! + ⋯ #% %%! + '! Test score revisited:

-./0/123. = 686.0 − 1.10×<-= − 0.650?10@A
• Heteroscedasticity is still relevant. 8.7 0.43 (0.031)
• Homoscedastic if 4 '!$ %#! , … , %%! = 4 '!$ ≡ 2,$ • ?10@A: percentage of English Learners in the district.
• Heteroscedastic if 4 '!$ %#! , … , %%! ≠ 4 '!$ ≡ 2,$ • STR still has significantly negative effect (at 5% level) on the average test score holding PctEl
!"."$
• Same as the single regressor’s case, hetero or homo, it doesn’t change any constant because = 2.56 > 1.96.
$.%&
properties of the OLS estimator we learned. • A 95% confidence interval for the coefficient on STR is
• The only difference is in the form of 2('! , thus affecting how 34 #0& is constructed. −1.1 − 1.96×0.43, −1.1 + 1.96×0.43 = −1.94, −0.26
• We then know the true marginal average causal effect of STR on test score holding the
• Same as the single regressor’s case, we always use the heteroscedastic standard percentage of English Learners constant is bounded between −1.94 and −0.26 with 95%
errors for robustness. confidence.
• So still use the command “coeftest” with HC1 to compute the standard errors. • What to do if we want to test whether both <-= and PctEL have no effects on test score?
7 8
II. Inference about multiple coefficients II. Inference about multiple coefficients
!! = #" + ##%#! + #$%$! + ⋯ #% %%! + '! !! = #" + ##%#! + #$%$! + ⋯ #% %%! + '!
• How does one test K": ## = #$ = 0 vs. K#: b[ c'bd[ ef' eg [ℎ'i jd fefQ'ke? • How does one test K": ## = #$ = 0 vs. K#: b[ c'bd[ ef' eg [ℎ'i jd fefQ'ke?
• Alice: do t-test for both ## and #$, reject the null if both tests are significant. • Alice’s idea in action:
• Is Alice’s approach correct? Why? #0# #0$
N#) = , N$) =
34 #0# 34 #0$
Reject at 0.05 level if both l-. > m. no and l/. > m. no.
9 10
!! = #" + ##%#! + #$%$! + ⋯ #% %%! + '! !! = #" + ##%#! + #$%$! + ⋯ #% %%! + '!
• Let’s calculate the prob. of making a Type I error, i.e., ## = #$ = 0 but N#) > 1.96 • Therefore, Alice’s approach is too conservative.
and N$) > 1.96. • She chooses 1.96 because she is fine with 5% Type I error probability.
pkeq. eg Nr/' ^ 'kkek = Pr N#) > 1.96, N$) > 1.96
• But it turns out the probability is smaller than 5%.
• Under ## = #$ = 0, N#) > 1.96 and N$) > 1.96 are both approximately standard
normal. • Harder to be significant than should be.
• So Pr N#) > 1.96 ≈ 0.05 and Pr N$) > 1.96 ≈ 0.05. • Given the fixed sample size, the power may be lowered.
• Hence,Pr N#) > 1.96, N$) > 1.96 ≤ 0.05. • Alice sacrifices power for a size that she doesn’t need.
11 12
!! = #" + ##%#! + #$%$! + ⋯ #% %%! + '! !! = #" + ##%#! + #$%$! + ⋯ #% %%! + '!
• How to test K": ## = #$ = 0 vs. K#: b[ c'bd[ ef' eg [ℎ'i jd fefQ'ke? • How does one test K": ## = #$ = 0 vs. K#: b[ c'bd[ ef' eg [ℎ'i jd fefQ'ke?
• Bob: do t-test for both ## and #$, reject the null if at least one of them is significant. • Bob’s idea in action:
• Is Bob’s approach correct? Why? #0# #0$
N#) = , N$) =
34 #0# 34 #0$
Reject at 0.05 level if at least one of l-. vwx l/. yz {y||}~ Ävw m. no.
13 14
!! = #" + ##%#! + #$%$! + ⋯ #% %%! + '! !! = #" + ##%#! + #$%$! + ⋯ #% %%! + '!
• To calculate the prob. of making a Type I error, it is more convenient to work with the • Therefore, Bob’s approach does not work as well.
prob. of acceptance, and subtract it from 1. • The actual prob. of Type I error is greater than he wants.
pkeq. eg Nr/' ^ 'kkek = 1 − Pr N#) ≤ 1.96, N$) ≤ 1.96
• Under ## = #$ = 0, N#) and N$) are both approximately standard normal.
• So Pr N#) ≤ 1.96 ≈ 0.95 and Pr N$) ≤ 1.96 ≈ 0.95
• Hence,Pr N#) ≤ 1.96, N$) ≤ 1.96 ≤ 0.95.
• Prob. of Type I error ≥ 0.05.
15 16
!! = #" + ##%#! + #$%$! + ⋯ #% %%! + '! !! = #" + ##%#! + #$%$! + ⋯ #% %%! + '!
• How does one test K": ## = #$ = 0 vs. K#: b[ c'bd[ ef' eg [ℎ'i jd fefQ'ke? • How does one test K": ## = #$ = 0 vs. K#: b[ c'bd[ ef' eg [ℎ'i jd fefQ'ke?
• Note that the null is equivalent to the following: • More generally, the two normal are not independent, i.e., #0# and #0$ are not
##$ + #$$ = 0 independent.
• Since #0# and #0$ are consistent of ## and #$, #0#$ + #0$$ should be close to 0 as well • One can orthogonalize them such that if we define
when f is large.
'$# '##
1 #0#$ #0$$ #0# #0$
• If that is true,
(
+
( $
, i.e., N#) + N$)$
should be close to 0 as well. á) ≡ ⋅ $+ $ − 2à('$ ('# ⋅
'$ # '# # 1 − à('$ (' 34 ## 0 34 #$0 34 #0# 34 #0$
01 ( 01 ( $ #
'
($ '
(# 1 $ $
• Recall ' is approximately 1(0,1), ' is approximately 1 0,1 . = ⋅ N#) + N $) − 2à('$ ( N N
'# #) $)
01(($ ) 01((# ) 1 − à('$ ('
• Moreover, if the two normal are independent, their squared sum should be Ü$$!
$ #
• á) is approximately Ü$$.
17 18
!! = #" + ##%#! + #$%$! + ⋯ #% %%! + '! !! = #" + ##%#! + #$%$! + ⋯ #% %%! + '!
• How does one test K": ## = #$ = 0 vs. K#: b[ c'bd[ ef' eg [ℎ'i jd fefQ'ke? A couple of remarks are in order.
• Let â#*+ be the (1 − R)-th quantile of Ü$$ distribution. 1. Although the hypothesis is two-sided, the rejection criterion is one-sided because
• Reject the null if á) > â#*+ . chi square distribution is nonnegative by construction. For the same reason, the
+
critical value is the 1 − R -th quantile, not 1 − like the t-test.
• For R = 0.01,0.05,0.1, â#*+ for Ü$$ are 9.21,5.99,4.61. $
# $ $
• This has been called the “F-test” due to historical reasons, just like the t-test. 2. From the formula á) ≡ # ⋅ N#) + N$) − 2à('$ ('# N#) N$) , it’s not defined
#*4&
% %#
• When the sample size goes to infinity, !! is distributed as an F distribution multiplied by the $&
number of equality signs in the null.
when à('$ ('# = ±1. This is a consequence of perfect multicollinearity. For all other
• Today it is more commonly known as the Wald test. cases, á) is strictly positive because 4à('$ (' − 4 < 0. In particular, when
$ #
à('$ ('# = 0, the two standard normal are independent. á) is reduced to the simple
• Just like we never use the [-distribution for critical values in a single coefficient test, case in the heuristics.
we’ll never use the F-distribution for critical values in a joint test.
19 20
!! = #" + ##%#! + #$%$! + ⋯ #% %%! + '! !! = #" + ##%#! + #$%$! + ⋯ #% %%! + '!
More generally, the Wald test can test an arbitrary number of joint hypotheses. The We can also calculate the p-value for the Wald test.
number of the hypothesis is called the degree of freedom.
• Let ç be the realized value of á) .
• Example: K": ## = #$ = #5 = 0. Three equality signs, so the degree of freedom is 3. / = 1 − é7'# ç
The critical value at 1 − R is the quantile of Ü5$.
• For " = 0.01,0.05,0.1, it’s equal to 11.34, 7.81 and 6.25. where é7'# is the CDF of Ü6$ distribution.
• In general, for a joint null with ã equality signs, need to use the quantiles of Ü6$ as • The value of é7'# ç can be obtained in R by pchisq(w,q).
the critical value.
• To obtain these values from R, type qchisq(1 − ",q) (of course you need to replace 1 − " and q • Same as the t-test, reject the null at R level if / < R.
by their values).
• A special case is å = 1. Then á) = N)$. So for a single coefficient, you could also use
Wald test instead of t-test.
21 22
Test score revisited:

II. Inference about DEFGFHIJE = 686.0 − 1.10×SDT − 0.650VHGWX II. Inference about Let the coefficients on str and el_pct be e! and e" . To test
f# : e! = e" = 0, need the package “car”. Then write:
8.7 0.43 (0.031)
multiple coefficients Here is how it looks like in R:
multiple coefficients Here is how it looks like in R:
23 24
We can also calculate the statistic by hand. Recall III. Inference about functions of coefficients
II. Inference about k$ =
1
1 − l&%" &%
"
⋅ D!$ "
+ D"$ − 2l&%!&%" D!$ D"$
multiple coefficients • l&%
! "
can be computed by the following.

%
! &" L' = M$ + M" N"' + M( N(' + ⋯ M) N)' + .'
Sometimes we are interested in testing a function of coefficients.
• Recall the gender income gap example. Now suppose age also affects income:
PQ12R.' = M$ + M" S' + M( TU.' + .'
• We may want to know whether gender and age have equal effects on income.
• Formally, H$ : M" = M( vs. H" : M" ≠ M( .
• This is NOT a joint test although it involves two coefficients.
• To check if a null is joint, count the number of equality signs.
• The number −0.0003131024 is the covariance of ep! and ep" . With their standard errors and t statistics • Let X ≡ M" − M( .
on page 23, you can calculate k$ . I tried and it’s equal to 447.6463.
• The null is equivalent as X = 0.
• Knowing this is useful because now you may realize heteroscedasticity is also relevant for Wald test. • So it is still a null with one parameter. We can either conduct t-test or Wald. The question is
Here we used heteroscedasticity robust standard errors and the covariance, so the test is valid for both what’s the standard error of XZ ≡ M[" − M[( ?
heteroscedasticity and homoscedasticity.
25 26
III. Inference about functions of coefficients III. Inference about functions of coefficients
Another example is to test Cobb-Douglas production function. There are two approaches to handle such hypotheses.
( • Approach 1: Direct testing using software.
• Suppose the output of firm j is determined by !! = è! ê+! ë! . Microeconomics tells us
that R + # = 1. We can test it by the following steps: • Approach 2: Transform the regression.
Step 1: Take log on both side of the model:
log !! = R log ê! + # log ë! + '!
where '! ≡ log è! which is usually hard to quantify or unobserved in a dataset.
Step 2: Assume technology is uncorrelated with labor and capital (this is
actually not likely). Estimate R and # by OLS.
Step 3: Test H": R + # − 1 = 0 vs H#: R + # − 1 ≠ 0.
• This is again a single parameter hypothesis. Not a joint one.
27 28
Approach 1. Direct test.

DEFGFHIJE = 686.0 − 1.10×SDT − 0.650VHGWX Approach 1. Direct test.
II. Inference about 8.7 0.43 (0.031) II. Inference about DEFGFHIJE = 686.0 − 1.10×SDT − 0.650VHGWX
8.7 0.43 (0.031)
multiple coefficients Looking at the coefficients, we may want to test whether multiple coefficients Or maybe we want to check if e! + e" = −2.
e! = 2e" .
29 30
Approach 1. Direct test. Approach 1. Direct test.

DEFGFHIJE = 686.0 − 1.10×SDT − 0.650VHGWX DEFGFHIJE = 686.0 − 1.10×SDT − 0.650VHGWX
II. Inference about 8.7 0.43 (0.031) II. Inference about 8.7 0.43 (0.031)
multiple coefficients We can even test two functions jointly using this command. multiple coefficients But this is not very interesting because it boils down to test
' "
e.g. f# : e! + e" = −2 and e! = 2e" . f# : e! = − and e" = − .
( (
31 32
III. Inference about functions of coefficients
II. Inference about Approach 2. Transform the regression.
DEFGFHIJE) = e# + e! − 2e" SDT) + e" k) + E)
multiple coefficients • k) ≡ 2SDT) + VHGWX)
Approach 2. Transform the regression.
• Consider the example N'd[dâek'! = #" + ##3Nì! + #$pâ[4ê! + '!
• Consider the null ## = 2#$, i.e., ## − 2#$ = 0.
Step 1:Replace ## with ## − 2#$ and see what happens:
N'd[dâek'! = #" + ##3Nì! + #$pâ[4ê! + '!

= #" + î- − ïî/ ñlós + #$(23Nì! + pâ[4ê! ) + '!
Step 2: Construct a new regressor á! ≡ 23Nì! + pâ[4ê! and run a regression

of test score on 3Nì and á.
Step 3: Look at the t-stat for the coefficient on 3Nì.
• You can verify that 0.4519" = 0.2042, exactly equal to the Wald statistic on page 29.
33 34
III. Inference about functions of coefficients Approach 2. Transform the regression.

II. Inference about x) = e# + e! + e" + 2 SDT) + e" k) + E)
multiple coefficients • x) = DEFGFHIJE) + 2SDT)

Approach 2. Transform the regression. • k) = VHGWX) − SDT)
• Consider the example N'd[dâek'! = #" + ##3Nì! + #$pâ[4ê! + '!

• Consider the null ## + #$ = −2, i.e., ## + #$ + 2 = 0.
Step 1:Replace ## with ## + #$ + 2 and see what happens:
N'd[dâek'! = #" + ##3Nì! + #$pâ[4ê! + '!

= #" + î- + î/ + ï ñlós + #$ pâ[4ê! − 3Nì! − 23Nì! + '!
Cannot keep −23Nì! on the RHS due to perfect multicollinearity
Step 2: Construct a new regressor á! ≡ pâ[4ê! − 3Nì! and a new dependent
variable !! = N'd[dâek'! + 23Nì! and run a regression of ! on 3Nì and á.
Step 3: Look at the t-stat for the coefficient on 3Nì.
• You can verify that 0.5746" = 0.33017, almost equal to the Wald statistic on page 30.
35 36
III. Inference about functions of coefficients

Econ 3334
Some final remarks. Module 6
• For Approach 2, the transformation is usually not unique. For instance, for the null
## − 2#$ = 0, you can also replace #$ with ## − 2#$ and see what happens. Linear Regression with
• We only focus on linear functions of the coefficients. So for both Approach 1 and Multiple Regressors:
Approach 2, if there are more functions in a joint null than the coefficients involved,
it is either automatically rejected or redundant to the case where the number of Estimation
functions is equal to the number of the coefficients involved. Department of Economics, HKUST
• Example 1: *" : ,# − 2,$ = 0 and ,# + ,$ + 2 = 0 and 2,# + ,$ + 5 = 0. Not possible because
Fall 2022
there do not exist ,# and ,$ to make all three equations to hold.
• Example 2: *" : ,# − 2,$ = 0 and ,# + ,$ + 2 = 0 and 2,# − ,$ + 2 = 0. All three hold but one
is redundant, i.e., can be implied by the other two.
37
Menu of Module 6 I. Why multiple regressors?
Consider a linear regression model with a single regressor:

I. Why II. Omitted III. Linear !! = #" + ## %! + &!
$
multiple variable bias regression For ## = (()* of % on ! by increasing one unit of %), and #+# → ## , one key assumption is
* &! %! = 0
regressors? (OVB) model • Necessarily, it has to be the case that ./0 &! , %! = 0
• Example: ! = 23./45, % = 65789 /: 9.ℎ//<=3>. One concern is that &! captures ability.
• Ability is not observable; we can do nothing about it anyways.
IV. OLS V. • But what about other variables such as family background?
VI. Prediction • Suppose parents’ education level also affects children’s income. So parents’ education is in !.
estimator Assumptions • Meanwhile, parents’ education is correlated with children’s education. So "#$ !, & ≠ 0.
• Unlike ability, suppose you can observe parents’ education in data. What should you do?
2 3
I. Why multiple regressors? II. Omitted variable bias
Discarding the information of parents’ education is harmful. In this case, what is the OLS estimator for a single regressor model estimating?
• Mathematically, let ! denote parents’ education level. Recall that
1
• If we ignore !, ! enters the error term. Here’s why. 4 ! − ")
∑!(&! − &)(" 4 ∑! &! − 9% "! − 9& ' ;<=(&, ")
$0# = ≈8 →
• Suppose the true model is ∑! &! − &4 $ 1
∑ & − 9% $ >%$
"! = $" + $#&! + $$!! + '! , and , '! &! = 0 8 ! !
• Ignoring !! , let $$!! + '! = .! , then • ;<= &, " = ;<= &, $" + $#& + . = $#>%$ + ;<=(&, .)
"! = $" + $#&! + .! • Recall . = $$! + ' and , ' & = 0. Then
• But for this model, , .! &! = $$, !! &! + , '! &! = $$, !! &! ;<= &, . = ;<= &, $$! + ' = $$;<= &, !
' ()* %,, -#
• If $$ ≠ 0 and , !! &! ≠ 0, unconfoundedness of & fails. • Therefore, $0# → $# + $$ " = $# + $$?%,
-! -!
• Equation (6.1) in the textbook (p.214) is a more general version.
4 5
II. Omitted variable bias II. Omitted variable bias
-
OVB when only ONE regressor (@) is omitted: $$?%, -# We can easily generalize OVB to the case where you have omitted multiple regressors:
!
• Suppose the true model is ! = #" + %# ## + %% #% + ⋯ %$ #$ + 5
• OVB positive if the omitted variable has positive (negative) effect on " AND the
omitted variable and the variable of interest (&) are positively (negatively) • If you omit all of %% ,…,%$ : & = %% #% + ⋯ %$ #$ + 5, * 5 %# = 0, and ! = #" + %# ## + &
correlated. • Then again,
• OVB negative if the omitted variable has positive (negative) effect on " AND the $ ./0(%# , &) ./0 %# , %% #% + ⋯ %$ #$
#+# → ## + = ## +
omitted variable and the variable of interest (&) are negatively (positively) B&%! B&%!
correlated. B&" B& B&$
= ## + #% C&! &" + #' C&! &# # + ⋯ #$ C&! &$
• No OVB if & and ! are NOT correlated, or, ! has no effect on ". B&! B&! B&!
(%" (%# (%$
• OVB = #% C&! &" + #' C&! &# + ⋯ #$ C&! &$
(%! (%! (%!
• Sign is hard to determine except in some special cases.
6 7
III. Linear regression model IV. OLS estimator
!! = #" + %#! ## + ⋯ %$! #$ + 5! . Under D E) F*) , … , F+) = H, How do we estimate $", $#, … , $' ?
• * !! %#! , … , %$! = #" + %#! ## + ⋯ %$! #$ . Called the population regression line or • Intuition is the same as the single regressor case.
function.
• #" is still called the intercept.
• Suppose the unobservable ('! ) is zero for all D:
"# − $" − &##$# − &$#$$ − ⋯ &'#$' = 0
• ## is the slope coefficient of F*) , or simply, the coefficient on F*) .
"$ − $" − &#$$# − &$$$$ − ⋯ &'$$' = 0
• #% , … , #$ have similar names. ⋯⋯
• ATE of %# changing from 7 to I on ! while keeping all other regressors fixed: ". − $" − &#. $# − &$. $$ − ⋯ &'. $' = 0
* !! %#! = I, %%! … , %$! − * !! %#! = 7, %%! … , %$! = ## (I − 7)
• 8 equations, (G + 1) unknowns.
• ## is the marginal average treatment (or causal) effect (or partial effect) of F*) on K)
holding all other regressors constant or controlling for other regressors. • In general (when '! ≠ 0), more equations than unknowns ⇒ no solution exists.
• #% , … , #$ have similar interpretations. • In big data, it can be the case that ! + 1 > %. Infinite number of solutions. The analysis would be
completely different. New techniques needed. Will be covered in Module 8.
8 9
IV. OLS estimator V. Assumptions
Instead of forcing the 8 equations to hold, find I", I#, … , I' to minimize the Again, similar to the single variable regression, we need assumptions to bridge the
Euclidean distance between the 8 equations and the zeros. following two gaps:
. • When does a linear regression model represent conditional expectation with the
$
min L "! − I" − &#! I# − ⋯ &'! I' slope coefficients equal to marginal ATE holding other regressors constant?
/$ ,/% ,…,/&
!1# • When does OLS estimator consistently estimate these coefficients?
Or equivalently, .
$
min L "! − I" − &#! I# − ⋯ &'! I'
/$ ,/% ,…,/&
!1#
• The minimizer to the problem is called the OLS estimator of $", $#, … , $' .
• Denote them by $0", $0#, … , $0'
10 11
V. Assumptions V. Assumptions
Linear regression model: "! = $" + &#! $# + ⋯ &'! $' + '! Under Assumptions 1-4,
• Assumption 1 (Unconfoundedness). , '! &#! , … , &'! = 0. • #+# , … , #+$ is unbiased of ## , … , #$
• Assumption 2 (i.i.d.) "! , &#! , … , &'! : D = 1, … , 8 are i.i.d. • #+# , … , #+$ is consistent of ## , … , #$
• Assumption 3. Large outliers are unlikely for all regressors and the dependent • For any N = 1, … , P, #+2 is approximately distributed as Q #, , B.-% for some B.-% .
' '
variable.
• Moreover, #+# , … , #+$ is approximately jointly normally distributed with mean ## , … , #$
• Assumption 4. There is no perfect multicollinearity. and some covariance structure.
• Perfect multicollinearity: one regressor is a linear combination of others.
• For instance, %#! = %%! + 2%'! .
12 13
V. Assumptions: 1. Unconfoundedness V. Assumptions: 1. Unconfoundedness
Unconfoundedness: , '! &#! , … , &'! = 0 • However, this is NOT to say that we can omit &$! , … , &'! if we only care about $#.
• We say &#, &$, … , &' are exogenous if unconfoundedness holds. • Under the true model !! = #" + %#! ## + ⋯ %$! #$ + 5! , if you omit %%! ## + ⋯ %$! #$ ,
then not only * 5! %#! = 0 matters, whether * %%! %#! = 0, * %'! %#! = 0,… will
• It’s more demanding than the single regressor model. matter too.
• Exercise: show that , '! &#! , … , &'! = 0 ⇒ , '! &#! = 0 • Now we have a tough tradeoff:
• Proof. & '! ("! = & & '! ("! , … , (#! ("! = 0. • We included %% , %' , … , %$ in order to fight OVB for %#
• But all of these other variables need to be exogenous as well.
• Therefore, if , '! &#! , … , &'! = 0, then , '! &#! = , '! &$! = ⋯ , '! &'! = 0.
• Example: We include parents’ education in the regression of children’s income on
children’s education, but is parents’ education itself exogenous?
14 15
• There is one special case where the other regressors are not exogenous but the "! = $" + &! $# + !! $$ + '!
coefficient on the regressor of interest is still causal. • Conditional mean independence: , '! &! , !! = , '! !! . Under this condition:
• For simplicity, suppose only two regressors & and !. • Claim 1: $# = , "! &! = O + 1, !! − , "! &! = O, !!
"! = $" + &! $# + !! $$ + '!
Proof.
• Suppose we only care about the causal effect of &! , i.e., $#.
, "! &! = O + 1, !! = $" + (O + 1)$# + !! $$ + , '! !!
• !! is only included to eliminate the OVB.
, "! &! = O, !! = $" + O$# + !! $$ + , '! !!
• However, suppose ;<= !! , '! ≠ 0.
Take the difference and we are done.
• Unconfoundedness fails.
• So $# is causal.
• A weaker condition: , '! &! , !! = , '! !!
• This is called conditional mean independence.
16 17
"! = $" + &! $# + !! $$ + '! "! = $" + &! $# + !! $$ + '!

• Conditional mean independence: , '! &! , !! = , '! !! . Under this condition: • Conditional mean independence: , '! &! , !! = , '! !! . Under this condition:
• Claim 2: $$ ≠ , "! &! , !! = O + 1 − , "! &! , !! = O • Although $# still captures the marginal causal effect of & on ", it in general cannot
Proof. be consistently estimated by OLS !!!
, "! &! , !! = O + 1 = $" + &! $# + O + 1 $$ + , '! !! = O + 1 • This is the first time we see discrepant consequences of an assumption in terms of
causal effect and the consistency of OLS.
, "! &! , !! = O = $" + &! $# + O$$ + , '! !! = O
• There are a few exceptions. SW discusses one exception in a way as if it is very
Take the difference: general. It is NOT!
, "! &! , !! = O + 1 − , "! &! , !! = O = $$ + , '! !! = O + 1 − , '! !! = O
• So $$ is not causal.
18 19
"! = $" + &! $# + !! $$ + '! A simulated example.

• Conditional mean independence: , '! &! , !! = , '! !! . Under this condition: • Draw ! from Q(0,1) with 8 = 200.
'
• Claim 3: $0# → $# if , '! !! is linear in !! . • Generate & = ! 3 + Q 0,1 .
Proof. Suppose , '! !! = P" + P#!! . Let =! = '! − , '! !! . Then • Draw = from Q 0,1 independently from & and !.
, =! &! , !! = , '! &! , !! − , , '! !! &! , !! = , '! !! − , '! !! = 0 • Construct '! = 1 + !! + =!
Note that the regression model is equivalent as • True model: "! = 1 + &! + !! + '!
"! = $" + &! $# + !! $$ + '!
= $" + &! $# + !! $$ + '! − , '! !! + , '! !!
= P" + $" + &! $# + !! P# + $$ + =!
Unconfoundedness holds so the OLS estimator of $# is consistent.
20 21
V. Assumptions: 1. Unconfoundedness V. Assumptions: 2&3. i.i.d. and no large outliers
A simulated example when , '! !! is nonlinear Same as the single regressor regression.
• Draw ! from Q(0,1) with 8 = 200. • Cross sectional data: usually i.i.d.
• Generate & = ! 3 + Q 0,1 . • Bounded data: no large outliers.
• Draw = from Q 0,1 independently from & and !.
• Construct '! = exp(!! ) + log |!! | + !!3 + Q(0,1)
• True model: "! = 1 + &! + !! + '!
22 23
V. Assumptions: 4. No perfect multicollinearity V. Assumptions: 4. No perfect multicollinearity
Again, consider a two-regressor model: A frequently made mistake: the dummy variable trap.
"! = $" + $#&! + $$!! + '! • Suppose you want to estimate gender income gap.
• Suppose Assumptions 1-3 hold. • Let [! ∈ 0,1 represent whether individual D is female. [ = 1 for female.
• However, suppose &! = 2!! (just an example; not necessarily 2). • Let ]! ∈ 0,1 represent whether individual D is male. ] = 1 for male.
• Then the model can be written as • What’s the problem of the following regression?
"! = $" + 0 ⋅ &! + 2$# + $$ !! + '! "! = $" + $#[! + $$]! + '!
= $" + 0.5$#&! + $# + $$ !! + '!
… • Recall $" can be viewed as $" ⋅ 1. The number 1 is a constant regressor.
• There are infinite representations with all different coefficients on the two • For simplicity, suppose there are only two genders. Then [! + ]! = 1, i.e., [! is a
regressors. linear combination of the constant and ]! .
• OLS estimator has no idea how to allocate the total effect among these regressors. • Solution: drop one of the three.
24 25
!! = ## R! + #% S! + 5! "! = $#[! + $$]! + '! :

• ## : mean wage of females. • (To exclude intercept in R, write the
following:
• #% : mean wage of males.
lm(income~M+F+0) or
• ## − #% : wage gap.
!! = T" + T# R! + 5! lm(income~M+F-1)
• T" : mean wage of males.

• T# : wage gap. "! = P" + P#[! + '! :
• T" + T# : mean wage of females.
!! = U" + U# S! + 5!
• Exercise 1: what are the mean wages of females and males and the wage gap?
• Exercise 2: Consider a different application. For example, what should you do if you want to
estimate seasonal differences in GDP?
"! = ^" + ^#]! + '! :
26 27
Perfect multicollinearity is a serious problem in theory, it’s not a huge concern in A more worrisome problem in practice is imperfect multicollinearity, i.e., regressors
practice; the software can detect it and automatically drop one regressor: are highly, but not perfectly correlated.
• Unlike perfect multicollinearity, imperfect multicollinearity does not lead to any
theoretical problems.
• But it makes estimation less precise; OLS meets some difficulties to distinguish the
regressors
• Recall single regressor model: the variance of $0# increases if the variance of & is
small.
• That is because & is more like a constant.
• Here the intuition is similar. If |?%% %" | is large, &# and &$ are similar.
28 29
A simulated example. Another simulated example.

• Draw &# and &$ from a bivariate normal distribution with 9%% = 9%" = 0, >%% = • Draw &# and &$ from a bivariate normal distribution with 9%% = 9%" = 0, >%% =
>%" = 1, and ?%% %" = 0.2, 0.99 or 1. Sample size 8 = 200. >%" = 1, and ?%% %" = 0, 0.99 or 1. Sample size 8 = 200.
• The error term ' is drawn from Q 0,1 independently from &# and &$. • The error term ' is drawn from Q 0,1 independently from &# and &$.
• True model: "! = 1 + &#! + &$! + '! • True model: "! = 1 + &#! + &$! + '!
• When ? = 0.2: • When ? = 0.99:
30 31
A simulated example. Why imperfect multicollinearity not a theoretical problem?

• Draw &# and &$ from a bivariate normal distribution with 9%% = 9%" = 0, >%% = • >54$ → 0.
>%" = 1, and ?%% %" = 0, 0.99 or 1. Sample size 8 = 200.
• Intuition: When you have a large enough sample points, arbitrary difference
• The error term ' is drawn from Q 0,1 independently from &# and &$. between &# and &$ can be detected as long as they are different (? ≠ ±1)
• True model: "! = 1 + &#! + &$! + '! • Repeat the simulation experiment with ? = 0.99 and 8 = 20000:
• When ? = 1:
32 33
V. Assumptions: 4. No perfect multicollinearity V. Prediction: Adjusted ! !
But perfect multicollinearity is always a theoretical problem. After you get OLS estimators:
• No matter how large 8 is, no way to tell &# and &$ apart. • Predicted (or fitted) value: !V! ≡ #+" + %#! #+# + ⋯ %$! #+$
Repeat the simulation experiment with ? = 1 and 8 = 20000: • Residual: 5̂! ≡ !! − !V!
• SSR and TSS: same definition as in the single regressor’s case.
//0
• Y% ≡ 1 −
1//
• Y% increases mechanically as P increases. Does not truthfully reflect whether this model fits
the data well.
23# //0
• Adjusted R% : Y[ % ≡ 1 −
23$3# 1//
• The more you throw into a regression model, the smaller 3 − P − 1 is.
• No matter whether it’s Y% or Y[ % , it’s not very relevant to causal inference.
34 35

3334 Exam Cheat Sheet

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

3334 Exam Cheat Sheet

Uploaded by

Copyright:

Available Formats

ECON3334 Introduction to Econometrics Part 3.

Q5. An insignificant estimate implies the true effect must be zero.

% = M$ + "! !L! + "" !L" + N, P(N|!, L! , L" ) = 0

• What are the interpretations of "! and "" ?

For part e) to part g), use the following information:

Now Alice includes a control variable W. The distribution of W is

Pr(S = −1) = Pr(S = 1) = 0.5

Imagine that after including S, the true model becomes

e) (7 pts) Suppose ! = 1 − S + W with P(W|S) = 0. If U" ≠ 0, will there be omitted variable

I. What is “Big Data” I. What is “Big Data”

I. What is “Big Data” II. Econometrics and Machine Learning

!! = +"1 # = 1 + +#1 # = 2 + +$1 # = 3 ⋯ + +% 1 # = ( + 0!

• Immediately, the number of regressors is the same as the sample size.

II. Econometrics and Machine Learning III. Bias-variance tradeoff

III. Bias-variance tradeoff IV. LASSO

IV. LASSO IV. LASSO

IV. LASSO IV. LASSO

• By standardization, all =s have the same unit.

A numerical example. A numerical example.

• One regressor and one • One regressor and one

IV. LASSO IV. LASSO

Menu of Module 7.IV Regression Diagnosis

IV.I A Pretty Linear World IV.I A Pretty Linear World

IV.I A Pretty Linear World IV.I A Pretty Linear World

When the regressor is not Scatterplot 1

• The relationship between X and Y &%" "

is obviously nonlinear. • The blue line is the true function

• Example: " distance to hospital, ! of ! of " up to the errors.

• But is a linear model too wrong? point.

IV.I A Pretty Linear World IV.I A Pretty Linear World

• Whether the red line, i.e., a linear Sometimes a linear approximation

• The relationship between " and !

• The slope estimate is obviously • Also there’s not an obvious trend.

• 2. At which point of " do you care

• The true marginal effect:

% & ' = ) + 1 − %(&|' = )) now

IV.I A Pretty Linear World IV.I A Pretty Linear World

• But the truth (blue) is, the effect is

negative first and then positive.

estimate a linear model.

• Now we estimate ! = &! + &" " + Scatterplot 2

• They look much better.

• Recall the variance of your estimator is proportional to 1/#.

• We will see more such tradeoffs later.

IV.I A Pretty Linear World IV.II Get Regression Under Control

However, not all controls can kill OVB. 1. Bad control

1. Bad control: A return to education example 1. Bad control

2. Bad proxies 2. Bad proxies

A simulated example Takeaways:

A simulated example We can either view it as an outlier

• Independently draw 100 points for

• Red line: estimate a linear model have a larger estimate but SE is

using the full sample. also larger because of a smaller 6.

• Blue line: estimate a linear model • Again, bias-variance tradeoff, but

dropping the outliers.

IV.IV From Data to Table IV.IV From Data to Table

/ = 2Φ(−[) ]^ 1 − R = #0& − 34 #0& ⋅ Q#*+ , #0& + 34 #0& ⋅ Q#*+

• Reject the null at R level if / < R.

!! = #" + ##%#! + #$%$! + ⋯ #% %%! + '! Test score revisited:

Test score revisited:

can be computed by the following.

Approach 1. Direct test.

/ = 2Φ(−[) ]^ 1 − R = #0& − 34 #0& ⋅ Q#+ , #0& + 34 #0& ⋅ Q#+