Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 18

T-stat < critical value = reject null hypothesis

T-stat > critical value = fail to reject null hypothesis

p-value > a = fail to reject null hypothesis


p-value < a = reject null hypothesis

F-TEST

RSSm = restricted model


RSS = unrestricted model
M = number of restrictions in the null hypothesis/ variables you want to test
Ex: chicken and beef price = 2
k = parameters of unrestricted, all variables in unrestricted model

1. Hypotheses:

H0: null

Ha: alternate

2. Significance level

- *always a, there is no one-sided or two-sided test

3. Estimate restricted and unrestricted models, where the restricted model is one
obtained if Ho is true

4. Compare RSS across two models by comparing F-statistic

5. Find critical value for F-statistic

- Fc = Fm, n-k-1, a

- Fm = number of variables

6. Compare observed statistic to the critical value – reject Ho if F>Fc and write conclusion
in words

Restricted model goes against what you want to test


- REJECT Ho if lFl>Fc
Model F-test (join test on all independent variables)

- All independent variables used


- Therefore, no variables in the restricted model, only Y!
- No need to write equation for restricted model
- Therefore , for RSSm = TSS
- Restricted is TSS
- M=k

Omitted Variable Bias

- Violation of assumption 3 (All independent variables are uncorrelated with the error
term) (exogeneity)
- Oi: omitted variable excluded results in omitted variable bias
o Oi is a predictor of Yi (criterion 1)
o Oi is correlated to one of your Xs (criterion 2)

- Exclusion will lead effects of Oi going into error term


- Oi now related to error term
o Corr(ei,Xi) ≠ 0

- Hence violating assumption 3

OVB, leads to over or under estimated B

B1 = true beta
-a1*b2 is size of bias

- Alpha 1 = bivariate relationship between the included and omitted variable


- B2 = partial effect of the omitted variable on Y

. reg yrsmarr kids

- Yrsmarr ommited variable


- For alpha 1 use coefficient on included and omitted variable
- For b2 use coefficient on omitted variable from the unrestricted model
- If positive bias (overestimation of bias)
- If negative bias (underestimation of bias)

Rescaling variables

- If X1 is multiplied by 10
o B1 is divided by 10
o Standard error of B1 is divided by 10
- If Y1 is multiplied by 10
o Every term in regression multiplied by 10
o Standard error in all coefficients multiplied by 10

Level-Level

- Usual interpretation

Log-log
-both coefficients as a percent
- don’t multiply by 100 so 0.37 coeff equals 0.37%

Log-level
- Income increases by 1 euro, house increases by 20%
- Ln(y)
- Y interpreted as a percentage of the coefficent in x
- Multiply by 100 for percentage

Level-log
- Income increases by 1%, house size increases by 0.015 square meters
- X interpreted as a percentage, Y not
- Divide coefficient by 100

Dummy variable
- 1 or 0
- Gender

- Economic interpretation

o Men will have 0.015 more votes than women, ceteris paribus

Interaction term

- If you want to see if the impact of X on Y is the same across (2) groups

Nominal Variable

- Do not have inherit orders


- Example: race or color
- So each race is individual variable
- Add two of variables and omit one variable to avoid multicollinearity

B4 interpretation =

- By being asian you earn b4% more or less wages than a white person, ceteris paribus
- Since white is omitted you always compare with omitted ordinal variable, in this case,
white
- Compare variable against omitted variable

Ordinal variable

- Ranking
- Age group, happiness
o 0-18 years
o 18-40 years
o 40+ years

- Add two variables and omit one variable, same as nominal


- To avoid perfect multicollinearity
- Omitted variable serves as reference category

B3 interpretation =
- If you are less than 18, then you will earn b3 percent more or less than individuals who
are over 60, ceteris paribus
- As over 60 is the group omitted

Which specification to choose (ordinal variable)?


- Highest r squared since it explains more of y

If you include categorical variable (dummy) you are not comparing against the overall
population but against omitted category

Linear probability model will always be either 0 or 1 probability

Heteroskedasticity is when the variance of your estimates are non-constant


Use standard errors that are robust for your estimate

Key question of causal interpretation of a regression model is whether the error term is
correlated with the independent variable(s)

- Omitted Variable Bias


- If it is correlated, your model is bias

- B1 is always the population.


Ex: difference between the mean test of a male and female pupil, ceteris paribus in the
population
- Study of method of taking partial derivative

Week 8:

- Linear probability model always use robust command

- How to calculate predicted values outside 0-1 range for the LPM

o Outside 0 or 1 = unbounded

- Dummy variable trap

Week 7

Spurious regression

- Assumption that there is a causal relationship


- In fact, relationship between variables is NOT caused by an underlying causal
relationship
- Main causes:
o Trending time variables (variables that change with time – GDP, stock price,
predictable)
o Non-stationary variables (unpredictable)

- When dealing with time series, always check whether time trend is significant
- Include time trend in regression model to make sure you control for time
o Put year in regression
- Check P VALUE for significance

Non-stationary variables:

- When time-series statistical properties (mean, variance, covariance) are affected by a


change of time
Example:
- Trending variable
- Trending time series
- White noise time series
o Yt = et
o Constant variance
- Random walk
o No time trend, value changes randomly over time
o Ex: stock prices/exchange rates

Diagnosis of Stationary – Dickey Fuller test


If t value smaller than critical value = reject null hypothesis

Unit root = variable that follows random walk


- If stationary, allowed to use the variables in levels, so that the regression parameters
are consistent estimates

- Not cointegrated = specificed in first differences

- Cointegrated = specificed in levels

Week 8
Linear probability model:

2 problems:

- Heteroskedasticity
o Always use command robust on stata
- Predicted probabilities may lie outside the 0-1 interval
o If they are outside use logit or probit model instead of LPM

Causality

Always looking for causal effects in econometrics:

- Not always possible (OVB)


- Observational
o We cannot control for everything versus experiment (OVB not a big problem)
- Natural Experiment
o Random variation contained within the observational data (as opposed to
experiments designed by the researchers themselves)
- Observational does not collect all data therefore the causal relationship can be wrong

Solutions:
- Observational
o Control for more observables to remove OVB; use a larger sample to obtain
more precision

- Experimental
o Increase validity
o Have other age groups
o Use larger sample to obtain more precision
o Treatment vs non-treatment
 Give two groups different treatment intake levels to gauge the effect
of the treatment intensity

- Usually experimental is more accurate to determine causal relationships


Exogenity

Strict exogenity (optimal option)

- Error term at time t and explanatory variables is uncorrelated at all times: present, past,
future
- In theory, does not exist in real life

Weak exogenity

- Error term at time t and explanatory variables is correlated in the present; uncorrelated
in the past and future

TIPS FOR THE EXAM

- Practice
- Learn the pattern
- Do not spend too much time on lecture slides

- Time series economic interpretation


o Add in the same time period
- Long run of effect of unemployment on real wage would be b2+b3/1-wage(t-1)
- Sum of unemployment in t and t-1
- (b2+b3)/(1-b1)
- Add all x variables
- So can be b2+b3+b4

c.) Would it be neccesary to include time trend in the equation?

- If both wage and unemployment contain a time trend, then it would be neccesary to
include a time trend

First-order autocorrelation of error term

- Error term correlated with error term in previous time period

c. loses contain no unit root, therefore they are both stationary

Stationary gives non-spurious regression – relationship present


Non-stationary give spurious regression – misleading

For every year in education wage increases by 2.7% all else equal

c. Region 9 workers will earn 13.3% more on wages compared to region 5, all else equal

- Check to see which region has been omitted

d. effect of iq on log wages

- Variables contain iq and iq^2


- Therefore derivative have to be taken

This will be in the exam bitch

e.) Working experience is omitted therefore it can cause OVB

2 conditions:

- Omitted variable has a relationship with one of the other independent variables
- Omitted variable has a relationship with y variable

- Violates OLS assumption that error term and independent variable cannot be correlated
- Parameter estimates are biased, E(b_hat)!=B hence do not have a causal interpretation

a. Long run effect =

- B2+b3+b4/(1-b1) = 0.0009

- Hence long run effect is that 1000 more prison inmates leads to an increase of 0.0009
percentage points in the unemployment rate

- Check p value if insignificant don’t add into the long run

b. Dependent is a dummy variable

- Always multiply coefficient by 100% if dummy dependent

- Male politicians are 1.6% more likely to win in a municipal election compared to female
politicians, all else equal

c. Causal effect of age on covid-morbidity

- Another variable = co-morbidities, example would be BMI, another controlled variable


being the obesity rate of individuals selected

- This will increase validity for the regression and reduces omitted variable bias

- Variable has to correlate with age (the initial independent variable)

- The estimate for the causal effect of age would go down/become smaller once obesity
rate is controlled

d. T-TEST

a. Explain economic interpretation on ldist (average kilometer distance from nearest


ebola case)
- Brackets are the standard error

- T-stat = 7.25

- 2.9/0.4 = 7.25

- Formula = Bhat – B / standard error

- Tc = 1.96

- T>Tc =

Women who live in urban areas are 6.5% less likely to be in a polygamous relationship
compared to non urban areas, all else equal

Summary

Heteroskedasticity:

- Error term does not have constant variance

- Ho = homoskedastic (constant variance of error term) Breusch Pagan


- Ha = heteroskedastic (no constant variance)

- Heteroskedasticity does not bias coefficient estimates, but bias the standard errors, thus
not possible to do hypothesis testing

- Solution: Use robust standard errors

Omitted variable bias:


- Exogeneity means that independent variables are uncorrelated with the error terms

Multicollinearity:

- Variables are highly correlated


- Correlation =1, perfect multicollinearity

- Increases standard errors, reduces t-statistics

A1: Regression model linear in parameters


A2: Error term has a zero population mean (endogeneity)
A3: All explanatory variables are uncorrelated with the error term Corr(ei,Xi) = 0
A4: No serial correlation = error term not correlated with lagged error term
A5: No multicollinearity
A6: No heteroskedasticity
A7: Error term normally distributed

Strict exogenity (optimal option)

- Error term at time t and explanatory variables is uncorrelated at all times: present, past,
future
- In theory, does not exist in real life

Weak exogenity

- Error term at time t and explanatory variables is correlated in the present; uncorrelated
in the past and future

Biased upwards/downwards

LPM: Estimates probability of event occuring

Drawbacks of LPM:

- Heteroskedasticity, variance not constant


- Bias standard errors, can be fixed by using robust standard errors
- Predicted values are unbounded, can take any values (> 1 or < 0)
o If this occurs use different estimator, not OLS, logistic regression
Time trend:

- Include time variable to avoid spurious regression


- If one or both variables non-stationary (due to trend/root), model will be spurious
- P-value < significant level = reject null, time trend present

- Dickey-fuller to check for unit roots


o Ho = non-stationary, unit root, spurious
o Ha = stationary, no unit root, non-spurious

Partial derivative outside domain = The stationary point (10.60) is outside of the domain for
grade, since grade goes from 1 to 10 [1 pt], hence log SETs (/log evaluation scores) increase
with grades but at a decreasing rate / is part of the upward going graph of a hump-shaped
function [2 pts]

LPM – errors are heterokedastic by construction


Lag added to correct serial correlation
Serial correlation

- Standard errors biased


- Coefficients remain unbiased
- If model contain lagged dependent variable, coefficients also biased
o Thus, Corr(ei,Xi) = 0 violated as well

You might also like