Econometrics Formula

T-stat < critical value = reject null hypothesis
T-stat > critical value = fail to reject null hypothesis
p-value > a = fail to reject null hypothesis

p-value < a = reject null hypothesis
F-TEST
RSSm = restricted model

RSS = unrestricted model
M = number of restrictions in the null hypothesis/ variables you want to test
Ex: chicken and beef price = 2
k = parameters of unrestricted, all variables in unrestricted model
1. Hypotheses:
H0: null
Ha: alternate
2. Significance level
- *always a, there is no one-sided or two-sided test
3. Estimate restricted and unrestricted models, where the restricted model is one
obtained if Ho is true
4. Compare RSS across two models by comparing F-statistic
5. Find critical value for F-statistic
- Fc = Fm, n-k-1, a
- Fm = number of variables
6. Compare observed statistic to the critical value – reject Ho if F>Fc and write conclusion
in words
Restricted model goes against what you want to test

- REJECT Ho if lFl>Fc
Model F-test (join test on all independent variables)
- All independent variables used

- Therefore, no variables in the restricted model, only Y!
- No need to write equation for restricted model
- Therefore , for RSSm = TSS
- Restricted is TSS
- M=k
Omitted Variable Bias
- Violation of assumption 3 (All independent variables are uncorrelated with the error
term) (exogeneity)
- Oi: omitted variable excluded results in omitted variable bias
o Oi is a predictor of Yi (criterion 1)
o Oi is correlated to one of your Xs (criterion 2)
- Exclusion will lead effects of Oi going into error term

- Oi now related to error term
o Corr(ei,Xi) ≠ 0
- Hence violating assumption 3
OVB, leads to over or under estimated B
B1 = true beta
-a1*b2 is size of bias
- Alpha 1 = bivariate relationship between the included and omitted variable

- B2 = partial effect of the omitted variable on Y
. reg yrsmarr kids
- Yrsmarr ommited variable

- For alpha 1 use coefficient on included and omitted variable
- For b2 use coefficient on omitted variable from the unrestricted model
- If positive bias (overestimation of bias)
- If negative bias (underestimation of bias)
Rescaling variables
- If X1 is multiplied by 10
o B1 is divided by 10
o Standard error of B1 is divided by 10
- If Y1 is multiplied by 10
o Every term in regression multiplied by 10
o Standard error in all coefficients multiplied by 10
Level-Level
- Usual interpretation
Log-log
-both coefficients as a percent
- don’t multiply by 100 so 0.37 coeff equals 0.37%
Log-level
- Income increases by 1 euro, house increases by 20%
- Ln(y)
- Y interpreted as a percentage of the coefficent in x
- Multiply by 100 for percentage
Level-log
- Income increases by 1%, house size increases by 0.015 square meters
- X interpreted as a percentage, Y not
- Divide coefficient by 100
Dummy variable
- 1 or 0
- Gender
- Economic interpretation
o Men will have 0.015 more votes than women, ceteris paribus
Interaction term
- If you want to see if the impact of X on Y is the same across (2) groups
Nominal Variable
- Do not have inherit orders

- Example: race or color
- So each race is individual variable
- Add two of variables and omit one variable to avoid multicollinearity
B4 interpretation =
- By being asian you earn b4% more or less wages than a white person, ceteris paribus
- Since white is omitted you always compare with omitted ordinal variable, in this case,
white
- Compare variable against omitted variable
Ordinal variable
- Ranking
- Age group, happiness
o 0-18 years
o 18-40 years
o 40+ years
- Add two variables and omit one variable, same as nominal

- To avoid perfect multicollinearity
- Omitted variable serves as reference category
B3 interpretation =
- If you are less than 18, then you will earn b3 percent more or less than individuals who
are over 60, ceteris paribus
- As over 60 is the group omitted
Which specification to choose (ordinal variable)?

- Highest r squared since it explains more of y
If you include categorical variable (dummy) you are not comparing against the overall
population but against omitted category
Linear probability model will always be either 0 or 1 probability
Heteroskedasticity is when the variance of your estimates are non-constant

Use standard errors that are robust for your estimate
Key question of causal interpretation of a regression model is whether the error term is
correlated with the independent variable(s)
- Omitted Variable Bias

- If it is correlated, your model is bias
- B1 is always the population.

Ex: difference between the mean test of a male and female pupil, ceteris paribus in the
population
- Study of method of taking partial derivative
Week 8:
- Linear probability model always use robust command
- How to calculate predicted values outside 0-1 range for the LPM
o Outside 0 or 1 = unbounded
- Dummy variable trap
Week 7
Spurious regression
- Assumption that there is a causal relationship

- In fact, relationship between variables is NOT caused by an underlying causal
relationship
- Main causes:
o Trending time variables (variables that change with time – GDP, stock price,
predictable)
o Non-stationary variables (unpredictable)
- When dealing with time series, always check whether time trend is significant
- Include time trend in regression model to make sure you control for time
o Put year in regression
- Check P VALUE for significance
Non-stationary variables:
- When time-series statistical properties (mean, variance, covariance) are affected by a

change of time
Example:
- Trending variable
- Trending time series
- White noise time series
o Yt = et
o Constant variance
- Random walk
o No time trend, value changes randomly over time
o Ex: stock prices/exchange rates
Diagnosis of Stationary – Dickey Fuller test

If t value smaller than critical value = reject null hypothesis
Unit root = variable that follows random walk

- If stationary, allowed to use the variables in levels, so that the regression parameters
are consistent estimates
- Not cointegrated = specificed in first differences
- Cointegrated = specificed in levels
Week 8
Linear probability model:
2 problems:
- Heteroskedasticity
o Always use command robust on stata
- Predicted probabilities may lie outside the 0-1 interval
o If they are outside use logit or probit model instead of LPM
Causality
Always looking for causal effects in econometrics:
- Not always possible (OVB)

- Observational
o We cannot control for everything versus experiment (OVB not a big problem)
- Natural Experiment
o Random variation contained within the observational data (as opposed to
experiments designed by the researchers themselves)
- Observational does not collect all data therefore the causal relationship can be wrong
Solutions:
- Observational
o Control for more observables to remove OVB; use a larger sample to obtain
more precision
- Experimental
o Increase validity
o Have other age groups
o Use larger sample to obtain more precision
o Treatment vs non-treatment
 Give two groups different treatment intake levels to gauge the effect
of the treatment intensity
- Usually experimental is more accurate to determine causal relationships

Exogenity
Strict exogenity (optimal option)
- Error term at time t and explanatory variables is uncorrelated at all times: present, past,
future
- In theory, does not exist in real life
Weak exogenity
- Error term at time t and explanatory variables is correlated in the present; uncorrelated
in the past and future
TIPS FOR THE EXAM
- Practice
- Learn the pattern
- Do not spend too much time on lecture slides
- Time series economic interpretation

o Add in the same time period
- Long run of effect of unemployment on real wage would be b2+b3/1-wage(t-1)
- Sum of unemployment in t and t-1
- (b2+b3)/(1-b1)
- Add all x variables
- So can be b2+b3+b4
c.) Would it be neccesary to include time trend in the equation?
- If both wage and unemployment contain a time trend, then it would be neccesary to
include a time trend
First-order autocorrelation of error term
- Error term correlated with error term in previous time period
c. loses contain no unit root, therefore they are both stationary
Stationary gives non-spurious regression – relationship present

Non-stationary give spurious regression – misleading
For every year in education wage increases by 2.7% all else equal
c. Region 9 workers will earn 13.3% more on wages compared to region 5, all else equal
- Check to see which region has been omitted
d. effect of iq on log wages
- Variables contain iq and iq^2

- Therefore derivative have to be taken
This will be in the exam bitch
e.) Working experience is omitted therefore it can cause OVB
2 conditions:
- Omitted variable has a relationship with one of the other independent variables
- Omitted variable has a relationship with y variable
- Violates OLS assumption that error term and independent variable cannot be correlated
- Parameter estimates are biased, E(b_hat)!=B hence do not have a causal interpretation
a. Long run effect =
- B2+b3+b4/(1-b1) = 0.0009
- Hence long run effect is that 1000 more prison inmates leads to an increase of 0.0009
percentage points in the unemployment rate
- Check p value if insignificant don’t add into the long run
b. Dependent is a dummy variable
- Always multiply coefficient by 100% if dummy dependent
- Male politicians are 1.6% more likely to win in a municipal election compared to female
politicians, all else equal
c. Causal effect of age on covid-morbidity
- Another variable = co-morbidities, example would be BMI, another controlled variable

being the obesity rate of individuals selected
- This will increase validity for the regression and reduces omitted variable bias
- Variable has to correlate with age (the initial independent variable)
- The estimate for the causal effect of age would go down/become smaller once obesity
rate is controlled
d. T-TEST
a. Explain economic interpretation on ldist (average kilometer distance from nearest

ebola case)
- Brackets are the standard error
- T-stat = 7.25
- 2.9/0.4 = 7.25
- Formula = Bhat – B / standard error
- Tc = 1.96
- T>Tc =
Women who live in urban areas are 6.5% less likely to be in a polygamous relationship
compared to non urban areas, all else equal
Summary
Heteroskedasticity:
- Error term does not have constant variance
- Ho = homoskedastic (constant variance of error term) Breusch Pagan

- Ha = heteroskedastic (no constant variance)
- Heteroskedasticity does not bias coefficient estimates, but bias the standard errors, thus
not possible to do hypothesis testing
- Solution: Use robust standard errors
Omitted variable bias:

- Exogeneity means that independent variables are uncorrelated with the error terms
Multicollinearity:
- Variables are highly correlated

- Correlation =1, perfect multicollinearity
- Increases standard errors, reduces t-statistics
A1: Regression model linear in parameters

A2: Error term has a zero population mean (endogeneity)
A3: All explanatory variables are uncorrelated with the error term Corr(ei,Xi) = 0
A4: No serial correlation = error term not correlated with lagged error term
A5: No multicollinearity
A6: No heteroskedasticity
A7: Error term normally distributed
Strict exogenity (optimal option)
- Error term at time t and explanatory variables is uncorrelated at all times: present, past,
future
- In theory, does not exist in real life
Weak exogenity
- Error term at time t and explanatory variables is correlated in the present; uncorrelated
in the past and future
Biased upwards/downwards
LPM: Estimates probability of event occuring
Drawbacks of LPM:
- Heteroskedasticity, variance not constant

- Bias standard errors, can be fixed by using robust standard errors
- Predicted values are unbounded, can take any values (> 1 or < 0)
o If this occurs use different estimator, not OLS, logistic regression
Time trend:
- Include time variable to avoid spurious regression

- If one or both variables non-stationary (due to trend/root), model will be spurious
- P-value < significant level = reject null, time trend present
- Dickey-fuller to check for unit roots

o Ho = non-stationary, unit root, spurious
o Ha = stationary, no unit root, non-spurious
Partial derivative outside domain = The stationary point (10.60) is outside of the domain for
grade, since grade goes from 1 to 10 [1 pt], hence log SETs (/log evaluation scores) increase
with grades but at a decreasing rate / is part of the upward going graph of a hump-shaped
function [2 pts]
LPM – errors are heterokedastic by construction

Lag added to correct serial correlation
Serial correlation
- Standard errors biased

- Coefficients remain unbiased
- If model contain lagged dependent variable, coefficients also biased
o Thus, Corr(ei,Xi) = 0 violated as well

Econometrics Formula

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Econometrics Formula

Uploaded by

Copyright:

Available Formats

T-stat < critical value = reject null hypothesis

T-stat > critical value = fail to reject null hypothesis

p-value > a = fail to reject null hypothesis

RSSm = restricted model

- *always a, there is no one-sided or two-sided test

4. Compare RSS across two models by comparing F-statistic

5. Find critical value for F-statistic

Restricted model goes against what you want to test

- All independent variables used

Omitted Variable Bias

- Exclusion will lead effects of Oi going into error term

- Hence violating assumption 3

OVB, leads to over or under estimated B

- Alpha 1 = bivariate relationship between the included and omitted variable

. reg yrsmarr kids

- Yrsmarr ommited variable

- Do not have inherit orders

- Add two variables and omit one variable, same as nominal

Which specification to choose (ordinal variable)?

Linear probability model will always be either 0 or 1 probability

Heteroskedasticity is when the variance of your estimates are non-constant

- Omitted Variable Bias

- B1 is always the population.

- Linear probability model always use robust command

- Dummy variable trap

- Assumption that there is a causal relationship

- When time-series statistical properties (mean, variance, covariance) are affected by a

Diagnosis of Stationary – Dickey Fuller test

Unit root = variable that follows random walk

- Not cointegrated = specificed in first differences

- Cointegrated = specificed in levels

Always looking for causal effects in econometrics:

- Not always possible (OVB)

- Usually experimental is more accurate to determine causal relationships

Strict exogenity (optimal option)

TIPS FOR THE EXAM

- Time series economic interpretation

c.) Would it be neccesary to include time trend in the equation?

First-order autocorrelation of error term

- Error term correlated with error term in previous time period

c. loses contain no unit root, therefore they are both stationary

Stationary gives non-spurious regression – relationship present

- Check to see which region has been omitted

d. effect of iq on log wages

- Variables contain iq and iq^2

This will be in the exam bitch

e.) Working experience is omitted therefore it can cause OVB

a. Long run effect =

- Check p value if insignificant don’t add into the long run

b. Dependent is a dummy variable

- Always multiply coefficient by 100% if dummy dependent

c. Causal effect of age on covid-morbidity

- Another variable = co-morbidities, example would be BMI, another controlled variable

- Variable has to correlate with age (the initial independent variable)

a. Explain economic interpretation on ldist (average kilometer distance from nearest

- Formula = Bhat – B / standard error

- Error term does not have constant variance

- Ho = homoskedastic (constant variance of error term) Breusch Pagan

- Solution: Use robust standard errors

Omitted variable bias:

- Variables are highly correlated

- Increases standard errors, reduces t-statistics