Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 14

BT2101 Study

Instrumental Variable Regression

The general idea is to regress Xi by an instrumental variable, and use it to predict values for
Y. We can these new values Z. These new values will be used to replace Xi such that Z is
uncorrelated with the omitted variable.

It can be thought that Zi is something like a mediator, but itself has no correlation with Y. ->
it cannot be directly related to Y.

Xi can be split into 2 different components :


1. Uncorrelated with the omitted variable.
2. Maybe correlated with the omitted variable.

We remove the part of Xi that is causing the issue, by replacing Xi with the component that
is uncorrelated with the omitted variable.

An unbiased B1 can then be estimated.

- Use of an instrumental variable, Z, which is only correlated with Xi, but not other
determinants of Y.
- Instrumental variable will be use to predict changes in Xi, that are uncorrelated with
omitted variables.

2 conditions for instrumental validity :


- Instrument relevance.
o Correlation between Xi and Instrumental variable Zi > 0
o This can be done using a simple correlation test.
- Instrument Exogeneity
o Zi must not be correlated with any other determinants of Yi
o Zi will only have an impact on Y, but only through Xi

1. 𝑍Z must be associated with 𝑋X.
2. 𝑍Z must causally affect 𝑌Y only through 𝑋X
3. There must not be any prior causes of both 𝑌Y and 𝑍Z.
4. The effect of 𝑋X on 𝑌Y must be homogeneous. This assumption/requirement has
two forms, weak and strong:
 Weak homogeneity of the effect of 𝑋X on 𝑌Y:
The effect of 𝑋X on 𝑌Y does not vary by the levels
of 𝑍Z (i.e. 𝑍Z cannot modify the effect of 𝑋X on 𝑌Y).
 Strong homogeneity of the effect of 𝑋X on 𝑌Y: The effect
of 𝑋X on 𝑌Y is constant across all individuals (or whatever your unit of
analysis is).
Instruments that do not meet these assumptions are generally invalid. (2) and (3) are
generally difficult to provide strong evidence for (hence assumptions).
The strong version of condition (4) may be a very unreasonable assumption to make
depending on the nature of the phenomena being studied (e.g. the effects of drugs on
individuals' health generally varies from individual to individual). The weak version of
condition (4) may require the use of atypical IV estimators, depending on the
circumstance.

The weakness of the effect of 𝑍Z on 𝑋X does not really have a formal definition.
Certainly IV estimation produces biased results when the effect of 𝑍Z on 𝑋X is small
relative to the effect of 𝑈U (unmeasured confounder) on 𝑋X, but there's no hard and
fast point, and the bias depends on sample size. Hernán and Robins are (respectfully and
constructively) critical of the utility of IV regression relative to estimates based on formal
causal reasoning of their approach (that is, the formal causal reasoning approach of the
counterfactual causality folks like Pearl, etc.).

Estimation with instrument :

1. Isolate the part of X that is uncorrelated with the error term ( the omitted variable)

Done by regressing X on Z using OLS (because Z is uncorrelated with component


2)

Xi = a + b*zi + e (a is the intercept, e is the error term)

We can estimate both the intercept and the beta1 of the new regressed xi using
the computed value of xi.

2. Replace Xi with the newly regressed xi.

We removed the part of Xi correlated with the omitted variable.

2SLS regression

Why does IV regression work?

In econometrics, we are interested in finding the casual effects of X on Y.

IV regression will eliminate the omitted variable bias issues by removing the correlated
component of Xi with the error term. This allows us to estimate B1 without any bias.

2SLS regression assumptions


1. Yi, Xi and Zi are i.i.d.
2. All variables have finite, non-zero 4th movements (kurtosis)
3. No perfect multicollinearity
4. Validity Instrument (see above)
5. does not assume error term to be 0.

Selecting IV

Fixed-Effects Regression

For condition 1 :
Take note that if error term should not be correlated with any x values at any point
in time, be in past, present or the future.

Mostly working with panel datas.


Each entity is found at every single point period which we are observing, that means there
are no missing data.
For e.g if we are observing x across time periods and one time period does not contain a
certain entity, then the panel data is imbalanced.

Checking for dataset imbalance

```
library(AER)
data("Guns")

# obtain an overview over the dataset


summary(Guns)

# verify that `Guns` is a balanced panel


years <- length(levels(Guns$year))
states <- length(levels(Guns$state))
years*states == nrow(Guns)
Variables are denoted with I and t subscript where i stands for the entity and t stands for
the time period.

```

Entity FE regression.

Y = B0 + B1Xi,t + B2Zi + ui,t (ui,t is the error term, and B2Zi is the time invariant, varies across
I but not across t)

All time-invariants variables can be eliminated through Entity FE regression.


Time-invariants are variables that will vary over entities but not over time. Some examples
of time-invariants include nature, characteristics of the entity, etc.

There are 3 types of entity FE regression :

N-1 binary regressors – Yi,t = B0 + B1Xi,t + B2Z,i + ui,t.


Entity Demeaned Regression – Remove the averages for the group
Yit = 𝜷0 + 𝜷1 Xit + Ɣi + uit

For fixed effects regression – use plm()

See below for Rcode

model_4 <- plm(frate ~ beertax + as.factor(state), data = Fatalities, index = c("state", "year"),
effect = "individual", model = "within")
summary(model_4)

model_5 <- plm(frate ~ beertax + as.factor(year), data = Fatalities, index = c("state", "year"),
effect = "time", model = "within")
summary(model_5)

Fatalities$demeanedfrate_y <- Fatalities$frate - ave(Fatalities$frate, Fatalities$year)


Fatalities$demeanedbeertax_y <- Fatalities$beertax - ave(Fatalities$beertax,
Fatalities$year)

model7 <- lm(demeanedfrate_y ~ demeanedbeertax_y, data = Fatalities)


summary(model7)

For multiple entity fixed effects

jtrain_demean <- jtrain %>%


mutate(hrsemp_demean = with(jtrain, hrsemp - ave(hrsemp,fcode)),
grant_demean = with(jtrain, (grant - ave(grant,fcode))),
employ_demean = with(jtrain, employ - ave(employ,fcode)))
model_demean <- lm(hrsemp_demean ~ grant_demean + employ_demean, data=jtrain_demean)
summary(model_demean)

# two-ways demean (not for this sub-question)


# jtrain_demean <- jtrain %>%

###

mutate(hrsemp_demean = with(jtrain, hrsemp - ave(hrsemp,year) - ave(hrsemp,fcode)),


grant_demean = with(jtrain, (grant - ave(grant,year) - ave(grant,fcode))),
employ_demean = with(jtrain, employ - ave(employ,year) - ave(employ,fcode)))

# estimate all seven models


fatalities_mod1 <- lm(fatal_rate ~ beertax, data = Fatalities)

fatalities_mod2 <- plm(fatal_rate ~ beertax + state, data = Fatalities)

fatalities_mod3 <- plm(fatal_rate ~ beertax + state + year,


index = c("state","year"),
model = "within",
effect = "twoways",
data = Fatalities)

fatalities_mod4 <- plm(fatal_rate ~ beertax + state + year + drinkagec


+ punish + miles + unemp + log(income),
index = c("state", "year"),
model = "within",
effect = "twoways",
data = Fatalities)

fatalities_mod5 <- plm(fatal_rate ~ beertax + state + year + drinkagec


+ punish + miles,
index = c("state", "year"),
model = "within",
effect = "twoways",
data = Fatalities)

fatalities_mod6 <- plm(fatal_rate ~ beertax + year + drinkage


+ punish + miles + unemp + log(income),
index = c("state", "year"),
model = "within",
effect = "twoways",
data = Fatalities)

fatalities_mod7 <- plm(fatal_rate ~ beertax + state + year + drinkagec


+ punish + miles + unemp + log(income),
index = c("state", "year"),
model = "within",
effect = "twoways",
data = Fatalities_1982_1988)

Autocorrelation and standard errors

Standard Errors for Fixed Effects Regression


Similar as for heteroskedasticity, autocorrelation invalidates the usual standard error
formulas as well as heteroskedasticity-robust standard errors since these are derived under
the assumption that there is no autocorrelation. When there is both
heteroskedasticity and autocorrelation so-called heteroskedasticity and autocorrelation-
consistent (HAC) standard errors need to be used. Clustered standard errors belong to
these type of standard errors. They allow for heteroskedasticity and autocorrelated errors
within an entity but not correlation across entities.

As shown in the examples throughout this chapter, it is fairly easy to specify usage of
clustered standard errors in regression summaries produced by function
like coeftest() in conjunction with vcovHC() from the package sandwich.
Conveniently, vcovHC() recognizes panel model objects (objects of class plm) and
computes clustered standard errors by default.

The regressions conducted in this chapter are a good examples for why usage of clustered
standard errors is crucial in empirical applications of fixed effects models. For example,
consider the entity and time fixed effects model for fatalities.
Since fatal_tefe_lm_mod is an object of class lm, coeftest() does not compute
clustered standard errors but uses robust standard errors that are only valid in the absence
of autocorrelated errors.

● Specification
○ Zi is the entity-fixed effects, which capture all the variables which are time-
invariant
○ St is the time-fixed effects, which capture all the variables which are entity-
invariant
○ Recall that these effects include those variables which can be confounding,
and variables which can be non-confounding together!
○ There are 3 ways of representing such models - all of them are equivalent to
one another:
■ Equation 1: Yit = 𝜷0 + 𝜷1 Xit + 𝜷2 Zi + 𝜷3 St + uit
■ Equation 2: Yit = 𝜷0 + 𝜷1 Xit + Ɣ2D2i + … + ƔnDni + 𝜹2 B2t + … + 𝜹T BTt + uit
■ Equation 3: Yit = 𝜷1 Xit + 𝛂i + Ƙt + uit

Random Experiment

Observational vs Experimental data  obs – data collected based on empirical observations,


aka natural occurences. Experimental data  data collected from conducting randomized
experiments.
Experiments can overcome many limitations like threats to internal validity, but are rare
because it is : costly, unethical, impractical, intrusive or infeasible.

Ideal Randomized Controlled Experiments : The choice of treatment group and control
group is completely random.

Randomized Controlled Experiments  provide a benchmark.

Strengths : solve threats to internal validity.


Weakness : have their own sets of internal validity.

When control and treatment groups are randomly assigned, there is no correlation between
IVs and the error terms. Can assume E(ui | Xi) ~ = 0.

Random Assignment
● When we perform random assignment, the “health-conscious” people are randomly
assigned to either the control or treatment group.
● When the groups get increasingly large, we can say that the two groups are largely
similar to each other in all characteristics - e.g., age, gender, etc
● This removes any confounding variables, as both groups are seen as the same.
To establish causal claim, we will need to use random assignment, which is highly
emphasized throughout this module.

Threats to internal validity of RCT

1. Failure to randomize (Imperfect Randomization)

2. Failure to follow Protocols.

Error-in-Variable bias. This is the case where control group may end up getting the
treatment instead of the control group.

Attrition : When people drop out of the study, which makes random assignment fail.
This is because people who drop out are always non-random.

3. Experimental Effects -> Experimenter Bias

E.g the teacher may put in more effort to teach the control group than the treatment group.
This will cause biasness in the result.

1. Double blind the experiment


2. Use placebo.

Guidelines for good experiments


1. Random allocation
2. Check attrition for non-random dropouts
3 No spillover
4. Sufficient sample size – Power of test

Difference between quasi vs natural experiments


a quasi-experiment the criterion for assignment is selected by the researcher,
while in a natural experiment the assignment occurs 'naturally,' without the
researcher's intervention.

Slides 27 - 28
Categorical Variables
 Can be binary (1 = yes, 0 = no)
o Only one regressor required to represent variables
o Gender
o Treatment / Control
 Can be non-binary (multiple categories)
o N-1 regressors required to represent variables (where N = number of categories)
 Violate perfect multicollinearity assumption if all regressors are added
o Colour {Red, Blue, Green}
o Below is an example (also known as dummy coding or one-hot encoding):
Colour x1 x2
Blue 0 0
Red 1 0
Green 0 1
 Allows you to run regression models, similarly to continuous variables
o Interpretation of coefficients is different**

Binary Categorical Variables


 Binary variable regression = difference of means (two sample t-test)

Slides 29 - 30
Binary Categorical Variables Example
wage=β 0 + β 1 × Male+ ε

 If you were to instead use: wage=β 0 + β 1 × Male+ β2 × Female+ ε :


o Violate perfect multicollinearity assumption! (Male = 1 – Female)
wage male=β 0 + β 1
wage female =β 0
 ^
β 1=wage {Male }−wage {Female } (Difference of means)

Interpretation:
If β 1 is… Interpretation
Statistically significant Statistically significant difference in wages between male and female
population
Statistically significant Being a male is associated with lower wages compared to being a
and negative female (reference group: Male dummy variable = 0)
Not statistically No statistical difference in wages between male and female population
significant
Not statistically No statistical difference in wages between male and female population,
significant and β 0 is but wages for females are statistically different from zero.
significant

See pg. 186 – 188, 230 (Dummy Variable Trap) Introduction to Econometrics 4th Edition

Slide 31
Multiple Category Example
price=β 0 + β 1 × Red+ β 2 ×Green+ ε

Colour Red Green


Blue 0 0
Red 1 0
Green 0 1

 Difference in price between red houses and blue houses statistically different?
o Check statistical significance of ^
β1
 Difference in price between blue houses and green houses statistically different?
o Check statistical significance of ^
β2

Two sample t-tests are used to determine if two populations are similar.

VIF to test for perfect multicollinearity.

Slide 32
Experiments Context
Outcome=β 0 + β 1 × Treatment +ε
 Control Group: Treatment = 0
 Treatment Group: Treatment = 1
 Similar to performing two-sample test (whether mean of treatment and control statistically
different)

 μexperiment ( ^
β1 ) is also known as average treatment effect
 Average causal effect
 Differences estimator ( ^
β 1) estimates average treatment effect

Heaping

For a wide variety of reasons, heaping is common in many types of data. For example, we
often observe heaping when data are self-reported (e.g., income, age, height), when tools
with limited precision are used for measurement (e.g., birth weight, pollution, rainfall), and
when continuous data are rounded or otherwise discretized (e.g., letter grades, grade point
averages). Heaping also occurs as a matter of practice, such as with work hours (e.g., 40
hours per week, eight hours per day) and retirement ages (e.g., 62 and 65). While ignoring
heaping might often be innocuous, in this paper we show that doing so can have serious
consequences. In particular, in RD designs, estimates are likely to be biased if attributes
related to the outcomes of interest predict heaping in the running variable

Quasi-Experiments – Regression Discontinuity Approach

It is a natural experiment set up where assignment to treatment/control depends on a


threshold value.

Comparing value just above threshold against those just below can give average treatment
effect

Bandwidth selection
The smaller the bandwidth, the closer the assignment is random.
Sample size tends to be small
Difficult to extrapolate results
Bandwidth selection is subjective and requires understanding of context and data quality

Bandwidth is large
• Assignment becomes less random
• GPA can be far from cutoff. Many exams were bad or many assignments were bad. Not
random occurrence

Note :
Except for treatment/control group, there should not be any systematic differences
between datapoints immediately around threshold. Make sure other covariates are not
statistically significant between the two groups.

Slide 10
Quasi-experiments - Differences in Differences
 Natural experiment setup in which assignment to treatment / control is not fully
random
 Based on one key assumption**:
o In absence of treatment, the unobserved differences between treatment
and control groups are the same over time. (Refer to chart below)
o Hence, data for both groups pre-intervention and post-intervention are
required.
See pg. 492 - 494 Introduction to Econometrics 4th Edition

Slides 11 - 12
Swimming Pools in HDB Example
 Even though Jurong East and Clementi are not identical, we can still estimate the
treatment effect if:
o **Fulfills the main assumption: Treatment and control groups have parallel
trends (same unobserved differences) in house prices
o Allocation of swimming pools was not dependent on house prices during
baseline (allocation of swimming pools is not related to y variable)
o Composition of treatment and control groups are stable (for e.g.: number of
4-RM HDB flats in each group remains the same, etc.)
o No spillover (remain in their same group throughout)

Slide 13
Parallel Trends Assumption
 Most critical assumption
 Hardest to fulfill – there are uncontrollable events which occur systemically in each
group
 In the absence of treatment, the difference between treatment and control group is
constant over time. (Refer to diagram above for better understanding)
 No statistical / quantitative approach to test this – judge by visual inspection and/or
with your context-specific knowledge

Slide 17
Strengths and Limitations of Differences in differences
 Strengths
o Intuitive interpretation (can be observed from the plot)
o Can obtain causal effect using observational data if assumptions are met
o Can use either individual or group level data
o Treatment and control group can have different starting points – DiD focuses
on change rather than absolute levels
o Accounts for change / non-change due to factors other than intervention (as
long as parallel trends assumption holds)
 Limitations
o Requires baseline data & non-intervention group (all 4 data must be collected
in the table above)
o Cannot use if comparison groups have different outcome trend (violates
parallel trend assumption)
o Cannot use if intervention allocation determined by baseline outcome
o Cannot use if composition of groups pre/post change are not stable
(members of treatment group cannot switch to control group and vice versa)

Interpretation of DiD regression.

What is internal validity?


A statistical analysis is said to have internal validity if the statistical inferences about causal
effects are valid for the population being studied.
• Alternative definition: The extent to which the observed results represent the truth in the
population we are studying
• When a research study is not internally valid, you cannot use your results to make causal
claims about the population

Threats to internal validity


There are five main threats to internal validity: • Omitted variables
• Simultaneous causality
• X causes Y AND Y causes X
• Errors-in-variables
• X is measured with an error
• Problems with sample selection
• Observations arenotselectedthroughrandomsampling • Functional form misspecification
• Nonlineartermsoftheprimaryindependentvariableareomitted
• In the presence of threats to internal validity, first OLS assumption is violated
• Conditional expectation of the errors is no longer zero
• Threats to internal validity bias the coefficient estimates and create
problems for causal inference

Logarithmic Functions
● There are essentially 3 types of logarithmic functions
○ Linear-Log
○ Log-Linear
○ Log-Log

Population Regression
Case Interpretation of Coefficients
Function

1% increase in X is associated
Linear-Log Yi = 𝜷0 + 𝜷1 ln(Xi) + ui
with 0.01 𝜷1 change in Y

A change in X by 1 unit is
Log-Linear ln(Yi) = 𝜷0 + 𝜷1 Xi + ui associated with 100 𝜷1 %
change in Y

A change in X by 1% is
Log-Log ln(Yi) = 𝜷0 + 𝜷1 ln(Xi) + ui associated with 𝜷1 % change
in Y

Causal inference using Linear Regression


1. Statistically significant estimates do not imply causation
a. OLS assumptions need to be met for causation inference.
b. OLS assumptions must be considered during the model specification, not
after running the model.
c. No mathematical way to prove assumptions are satisfied.
2. When assumptions are violated  garbage in, garbage out!
a. Need to check for confounding variables.
b. Cannot be fixed by increasing sample size
3. Establishing causality is not easy
4. No mathematical way to prove causality using historical data
5. Causality can only be established with “high level of confidence”
a. Based on linear regression models and econometrics on historical data, we
can never be 100% certain of a causality relationship.
6. To establish causality, the best practices are:
a. Conducting experiments / randomized controlled trials / AB testing
b. If not, observational data.
i. Using Linear Regression with OLS assumptions met on data

Handling outliers
 Large outliers from the sample can drastically change the value of the sample mean
 Including outliers in OLS regression may not give the ideal representation of the
population-level relationship
 Dealing with outliers:
o Exploratory data analysis (plot your data – boxplot / histogram)
o Use technical metrics to specify outliers such as: >3 X SD or >1.5 X IQR from
1st and 3rd quantile
o Be careful when dropping outliers – outliers may provide some useful insights

Rsquared

Multiple R2 vs Adjusted R2
 Multiple R2
o How much variance in Y is explained by the model?
 Adjusted R2
o How much variance in Y is explained by the model after controlling the
number of regressors?
o Used to mitigate overfitting
 More regressors  explained variance of model increases  higher multiple R2
 More regressors does not mean higher adjusted R 2 (penalizes including variables
which are not useful)
 Neither R2 has any impact on causality inference.

You might also like