Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 2

Checking the Instrumental Variables Assumptions

FOR IV TO DO WHAT WE WANT IT TO, WE ARE RELYING ON THE RELEVANCE


ASSUMPTION. Remember, the relevance assumption is the assumption that the instrumental
variable Z and the treatment/endogenous variable X are related to each other.

We can go a bit further and say that we need to assume that X and Z are strongly enough
related to each other that we don’t run into the “weak instruments problem.”

A weak instrument is one that is valid and does predict the treatment variable, but it only
predicts the treatment variable a little bit. It predicts weakly. Keeping in mind our general
intuition that IV gives us then if Cov(Z,X) is small, we’re nearing a divide-
by-zero problem! The estimate as a whole, balloons up really big (since you’re dividing by
something tiny) and the sampling variation gets huge.

By far the most common way to check for relevance is the first-stage F-statistic test. This is
sometimes referred to as an under-identification test. Conveniently, it’s also the easiest. All
you have to do is:

1. Estimate the first stage of the model (regress the treatment/endogenous variable on
the controls and instruments)
2. Do a joint F test on the instruments
3. Get the F statistic from that joint F-test and use it to decide if the instrument is
relevant

That’s it! The calculation gets a bit trickier if there’s more than one treatment/endogenous
variable (since there’s not really a single first stage in the same way) but the idea remains the
same.

WE’VE GOT OUR FIRST-STAGE F-STATISTIC NOW. HOW BIG DOES IT NEED TO BE? Checking if
the instrument is statistically significant is not nearly enough - we aren’t just concerned that
the relationship is zero, we’re concerned that it’s small.

Since there’s no single precise definition of “small,” there’s no single correct cutoff F-statistic
to look for. Instead, we have a tradeoff. The bigger your F-statistic, the less bias you get. So,
the F-statistic you want will be based on how much bias you’re willing to accept.

Weak instruments lead to bias because, even if the instrument is truly valid, in an actual
sample of data the instrument will have a nonzero relationship with the error term just by
random chance, worsening validity and giving you bias. The weaker the instrument is, the
worse this gets.
We can frame our choice of cutoff F-statistic in responding to this tradeoff. Stock and Yogo
(2005Stock, James H, and Motohiro Yogo. 2005. “Testing for Weak Instruments in
Linear IV Regression.” In Identification and Inference for Econometric Models: Essays in Honor
of Thomas Rothenberg, 80–108. Cambridge: Cambridge University Press.) calculate the bias
that you get with instrumental variables relative to the bias you’d get by just running OLS on its
own. The stronger the instrument is, the less the IV bias will be relative to the OLS bias. Their
tables will show you that, for example, if you have one treatment/endogenous variable and
three instruments, you need an F-statistic above 13.91 to reduce IV bias to less than 5% of
OLS bias in 2SLS, but only an F-statistic above 9.08 to reduce IV bias to less than 10% of OLS
bias. What if you have four instruments? Then you need 16.85 or 10.27. Some first-stage test
commands (like Stata’s) will tell you the relevant cutoffs automatically. There is another rule
of thumb that just says your F-statistic must be 10 or above in general. That’s certainly much
easier to remember than looking up a value in a table, but it’s also very rough. Like many one-
size-fits-all values in statistics, this is a tradition probably better left behind.

You might also like