Professional Documents
Culture Documents
Untitled
Untitled
The general idea is to regress Xi by an instrumental variable, and use it to predict values for
Y. We can these new values Z. These new values will be used to replace Xi such that Z is
uncorrelated with the omitted variable.
It can be thought that Zi is something like a mediator, but itself has no correlation with Y. ->
it cannot be directly related to Y.
We remove the part of Xi that is causing the issue, by replacing Xi with the component that
is uncorrelated with the omitted variable.
- Use of an instrumental variable, Z, which is only correlated with Xi, but not other
determinants of Y.
- Instrumental variable will be use to predict changes in Xi, that are uncorrelated with
omitted variables.
1. 𝑍Z must be associated with 𝑋X.
2. 𝑍Z must causally affect 𝑌Y only through 𝑋X
3. There must not be any prior causes of both 𝑌Y and 𝑍Z.
4. The effect of 𝑋X on 𝑌Y must be homogeneous. This assumption/requirement has
two forms, weak and strong:
Weak homogeneity of the effect of 𝑋X on 𝑌Y:
The effect of 𝑋X on 𝑌Y does not vary by the levels
of 𝑍Z (i.e. 𝑍Z cannot modify the effect of 𝑋X on 𝑌Y).
Strong homogeneity of the effect of 𝑋X on 𝑌Y: The effect
of 𝑋X on 𝑌Y is constant across all individuals (or whatever your unit of
analysis is).
Instruments that do not meet these assumptions are generally invalid. (2) and (3) are
generally difficult to provide strong evidence for (hence assumptions).
The strong version of condition (4) may be a very unreasonable assumption to make
depending on the nature of the phenomena being studied (e.g. the effects of drugs on
individuals' health generally varies from individual to individual). The weak version of
condition (4) may require the use of atypical IV estimators, depending on the
circumstance.
The weakness of the effect of 𝑍Z on 𝑋X does not really have a formal definition.
Certainly IV estimation produces biased results when the effect of 𝑍Z on 𝑋X is small
relative to the effect of 𝑈U (unmeasured confounder) on 𝑋X, but there's no hard and
fast point, and the bias depends on sample size. Hernán and Robins are (respectfully and
constructively) critical of the utility of IV regression relative to estimates based on formal
causal reasoning of their approach (that is, the formal causal reasoning approach of the
counterfactual causality folks like Pearl, etc.).
1. Isolate the part of X that is uncorrelated with the error term ( the omitted variable)
We can estimate both the intercept and the beta1 of the new regressed xi using
the computed value of xi.
2SLS regression
IV regression will eliminate the omitted variable bias issues by removing the correlated
component of Xi with the error term. This allows us to estimate B1 without any bias.
Selecting IV
Fixed-Effects Regression
For condition 1 :
Take note that if error term should not be correlated with any x values at any point
in time, be in past, present or the future.
```
library(AER)
data("Guns")
```
Entity FE regression.
Y = B0 + B1Xi,t + B2Zi + ui,t (ui,t is the error term, and B2Zi is the time invariant, varies across
I but not across t)
model_4 <- plm(frate ~ beertax + as.factor(state), data = Fatalities, index = c("state", "year"),
effect = "individual", model = "within")
summary(model_4)
model_5 <- plm(frate ~ beertax + as.factor(year), data = Fatalities, index = c("state", "year"),
effect = "time", model = "within")
summary(model_5)
###
As shown in the examples throughout this chapter, it is fairly easy to specify usage of
clustered standard errors in regression summaries produced by function
like coeftest() in conjunction with vcovHC() from the package sandwich.
Conveniently, vcovHC() recognizes panel model objects (objects of class plm) and
computes clustered standard errors by default.
The regressions conducted in this chapter are a good examples for why usage of clustered
standard errors is crucial in empirical applications of fixed effects models. For example,
consider the entity and time fixed effects model for fatalities.
Since fatal_tefe_lm_mod is an object of class lm, coeftest() does not compute
clustered standard errors but uses robust standard errors that are only valid in the absence
of autocorrelated errors.
● Specification
○ Zi is the entity-fixed effects, which capture all the variables which are time-
invariant
○ St is the time-fixed effects, which capture all the variables which are entity-
invariant
○ Recall that these effects include those variables which can be confounding,
and variables which can be non-confounding together!
○ There are 3 ways of representing such models - all of them are equivalent to
one another:
■ Equation 1: Yit = 𝜷0 + 𝜷1 Xit + 𝜷2 Zi + 𝜷3 St + uit
■ Equation 2: Yit = 𝜷0 + 𝜷1 Xit + Ɣ2D2i + … + ƔnDni + 𝜹2 B2t + … + 𝜹T BTt + uit
■ Equation 3: Yit = 𝜷1 Xit + 𝛂i + Ƙt + uit
Random Experiment
Ideal Randomized Controlled Experiments : The choice of treatment group and control
group is completely random.
When control and treatment groups are randomly assigned, there is no correlation between
IVs and the error terms. Can assume E(ui | Xi) ~ = 0.
Random Assignment
● When we perform random assignment, the “health-conscious” people are randomly
assigned to either the control or treatment group.
● When the groups get increasingly large, we can say that the two groups are largely
similar to each other in all characteristics - e.g., age, gender, etc
● This removes any confounding variables, as both groups are seen as the same.
To establish causal claim, we will need to use random assignment, which is highly
emphasized throughout this module.
Error-in-Variable bias. This is the case where control group may end up getting the
treatment instead of the control group.
Attrition : When people drop out of the study, which makes random assignment fail.
This is because people who drop out are always non-random.
E.g the teacher may put in more effort to teach the control group than the treatment group.
This will cause biasness in the result.
Slides 27 - 28
Categorical Variables
Can be binary (1 = yes, 0 = no)
o Only one regressor required to represent variables
o Gender
o Treatment / Control
Can be non-binary (multiple categories)
o N-1 regressors required to represent variables (where N = number of categories)
Violate perfect multicollinearity assumption if all regressors are added
o Colour {Red, Blue, Green}
o Below is an example (also known as dummy coding or one-hot encoding):
Colour x1 x2
Blue 0 0
Red 1 0
Green 0 1
Allows you to run regression models, similarly to continuous variables
o Interpretation of coefficients is different**
Slides 29 - 30
Binary Categorical Variables Example
wage=β 0 + β 1 × Male+ ε
Interpretation:
If β 1 is… Interpretation
Statistically significant Statistically significant difference in wages between male and female
population
Statistically significant Being a male is associated with lower wages compared to being a
and negative female (reference group: Male dummy variable = 0)
Not statistically No statistical difference in wages between male and female population
significant
Not statistically No statistical difference in wages between male and female population,
significant and β 0 is but wages for females are statistically different from zero.
significant
See pg. 186 – 188, 230 (Dummy Variable Trap) Introduction to Econometrics 4th Edition
Slide 31
Multiple Category Example
price=β 0 + β 1 × Red+ β 2 ×Green+ ε
Difference in price between red houses and blue houses statistically different?
o Check statistical significance of ^
β1
Difference in price between blue houses and green houses statistically different?
o Check statistical significance of ^
β2
Two sample t-tests are used to determine if two populations are similar.
Slide 32
Experiments Context
Outcome=β 0 + β 1 × Treatment +ε
Control Group: Treatment = 0
Treatment Group: Treatment = 1
Similar to performing two-sample test (whether mean of treatment and control statistically
different)
μexperiment ( ^
β1 ) is also known as average treatment effect
Average causal effect
Differences estimator ( ^
β 1) estimates average treatment effect
Heaping
For a wide variety of reasons, heaping is common in many types of data. For example, we
often observe heaping when data are self-reported (e.g., income, age, height), when tools
with limited precision are used for measurement (e.g., birth weight, pollution, rainfall), and
when continuous data are rounded or otherwise discretized (e.g., letter grades, grade point
averages). Heaping also occurs as a matter of practice, such as with work hours (e.g., 40
hours per week, eight hours per day) and retirement ages (e.g., 62 and 65). While ignoring
heaping might often be innocuous, in this paper we show that doing so can have serious
consequences. In particular, in RD designs, estimates are likely to be biased if attributes
related to the outcomes of interest predict heaping in the running variable
Comparing value just above threshold against those just below can give average treatment
effect
Bandwidth selection
The smaller the bandwidth, the closer the assignment is random.
Sample size tends to be small
Difficult to extrapolate results
Bandwidth selection is subjective and requires understanding of context and data quality
Bandwidth is large
• Assignment becomes less random
• GPA can be far from cutoff. Many exams were bad or many assignments were bad. Not
random occurrence
Note :
Except for treatment/control group, there should not be any systematic differences
between datapoints immediately around threshold. Make sure other covariates are not
statistically significant between the two groups.
Slide 10
Quasi-experiments - Differences in Differences
Natural experiment setup in which assignment to treatment / control is not fully
random
Based on one key assumption**:
o In absence of treatment, the unobserved differences between treatment
and control groups are the same over time. (Refer to chart below)
o Hence, data for both groups pre-intervention and post-intervention are
required.
See pg. 492 - 494 Introduction to Econometrics 4th Edition
Slides 11 - 12
Swimming Pools in HDB Example
Even though Jurong East and Clementi are not identical, we can still estimate the
treatment effect if:
o **Fulfills the main assumption: Treatment and control groups have parallel
trends (same unobserved differences) in house prices
o Allocation of swimming pools was not dependent on house prices during
baseline (allocation of swimming pools is not related to y variable)
o Composition of treatment and control groups are stable (for e.g.: number of
4-RM HDB flats in each group remains the same, etc.)
o No spillover (remain in their same group throughout)
Slide 13
Parallel Trends Assumption
Most critical assumption
Hardest to fulfill – there are uncontrollable events which occur systemically in each
group
In the absence of treatment, the difference between treatment and control group is
constant over time. (Refer to diagram above for better understanding)
No statistical / quantitative approach to test this – judge by visual inspection and/or
with your context-specific knowledge
Slide 17
Strengths and Limitations of Differences in differences
Strengths
o Intuitive interpretation (can be observed from the plot)
o Can obtain causal effect using observational data if assumptions are met
o Can use either individual or group level data
o Treatment and control group can have different starting points – DiD focuses
on change rather than absolute levels
o Accounts for change / non-change due to factors other than intervention (as
long as parallel trends assumption holds)
Limitations
o Requires baseline data & non-intervention group (all 4 data must be collected
in the table above)
o Cannot use if comparison groups have different outcome trend (violates
parallel trend assumption)
o Cannot use if intervention allocation determined by baseline outcome
o Cannot use if composition of groups pre/post change are not stable
(members of treatment group cannot switch to control group and vice versa)
Logarithmic Functions
● There are essentially 3 types of logarithmic functions
○ Linear-Log
○ Log-Linear
○ Log-Log
Population Regression
Case Interpretation of Coefficients
Function
1% increase in X is associated
Linear-Log Yi = 𝜷0 + 𝜷1 ln(Xi) + ui
with 0.01 𝜷1 change in Y
A change in X by 1 unit is
Log-Linear ln(Yi) = 𝜷0 + 𝜷1 Xi + ui associated with 100 𝜷1 %
change in Y
A change in X by 1% is
Log-Log ln(Yi) = 𝜷0 + 𝜷1 ln(Xi) + ui associated with 𝜷1 % change
in Y
Handling outliers
Large outliers from the sample can drastically change the value of the sample mean
Including outliers in OLS regression may not give the ideal representation of the
population-level relationship
Dealing with outliers:
o Exploratory data analysis (plot your data – boxplot / histogram)
o Use technical metrics to specify outliers such as: >3 X SD or >1.5 X IQR from
1st and 3rd quantile
o Be careful when dropping outliers – outliers may provide some useful insights
Rsquared
Multiple R2 vs Adjusted R2
Multiple R2
o How much variance in Y is explained by the model?
Adjusted R2
o How much variance in Y is explained by the model after controlling the
number of regressors?
o Used to mitigate overfitting
More regressors explained variance of model increases higher multiple R2
More regressors does not mean higher adjusted R 2 (penalizes including variables
which are not useful)
Neither R2 has any impact on causality inference.