Download as pdf or txt
Download as pdf or txt
You are on page 1of 6

ECONOMETRICS

Model: y= ß0 + ß1x + u

Ordinary Least Squares (OLS): estimate ß0,ß1 by minimizing sum of the squared residuals:
- Fitted value for y when x = xi: y^i = ß^0 + ß^1xi
- Residual= Di erence between actual yi and its tted value: ûi = yi - y^i = yi- ß^0-ß^1xi
- Min ∑ ûi2 = ∑(yi- ß^0 - ß^1xi)2

- Positive correlation between x and y => positive ß^1 (positive slope)


- Assumptions: (A1) Linear in parameters (y=ß0+ß1x+u), (A2) Random sampling
(yi=ß0+ß1xi+ui), (A3) Variation in indep. variable (Var(x)≠0), (A4) zero conditional mean
(exogeneity of x: E(uIx)= E(u)= 0, (A5) Homoskedasticity (Var (uIx) = σ2)
- Theorem 1- Unbiased estimates of OLS: E(ß^0) = ß0 and E(ß^1) = ß1 => if A1 and A4 hold
- Variance (e ciency) of OLS estimator: ß^1 is unbiased, but how far can we expect ß^1 to be
away from ß1 on average:
- Homoskedasticity: unobservable term u has a constant variance (under A1-A5):
σ2 ↑ => Var(ß^1)↑
Var(x)↑ => Var(ß^1)↓
=> larger sample size => Var(ß^1)↓
- σ2 is unobserved => use residuals ûi as unbiased estimator of σ2:
Under A1-A5: E(σ2)= σ2
Degrees of freedom

- Standard error of ß^1:

- Heteroskedasticity: e.g. variance in error term increases with x


- (Explained) variation in the dependent variable: SST = SSE + SSR
- Total sum of squares (SST): ∑(yi-y)2 => Total sample variation in yi
- Explained sum of squares (SSE):-∑(y^i-y)2 => sample variation in y^i (explained variation)
- Residual sum of squares (SSR): ∑ ûi2 =>- sample variation in ûi (unexplained variation)
=> R2 = fraction of the total sum of squares that is explained by the model:
0 ≤ R2 ≤ 1
Never decreases when another independent variable is added

Model: y = ß0 + ß1x1 + ß2x2 + ß3x3 + … + ßkxk + u

• In SLR A4 is often unrealistic => control for more factors (take them „out of the error term“ and
control for them, when thinking its correlated with regressor)
• ßi measures the ceteris paribus e ect of x on y, when holding the other regressors constant
• Assume SLR: y~= ß~0 + ß~1x and MLR: y^ = ß^0 + ß^1x1 + ß^2x2 => in general: ß~1 ≠ ß^1
unless: ß^2 = 0 (no partial e ect of x2) or x1 and x2 are uncorrelated in the sample
• Assumptions: (A1) Linear in parameters (linear in ßs, xi can enter non-linearly), (A2) Random
sampling, (A3) no perfect collinearity (no exact linear relationship among independent variables),
(A4) zero conditional mean (E(uIx1,x2,x3,…,xk)= 0)

• OLS estimation: ∑ ûi => min: => De ned if det(X’X) ≠ 0

- Theorem 1- Unbiasedness estimators of OLS (under A1-A4): E(ß^j) = ßj


- Omitted Variable Bias: when a model fails to include one or more relevant variables
=> biased OLS:
Bias = 0 if ß2=0 or
when x1 and x2 are
uncorrelated
fi
ffi
ff
ff
ff
fi
- A5: Homoskedasticity: Var(uIx1,…,xk) = σ2
- Theorem 2: Variance of OLS estimator:

- Variance matrix ß^: Var(ß^)= σ2(X’X)-1 => estimated: with


=> larger error variance σ2 implies larger Var(ß^)
=> larger SST (total sample variation) implies smaller Var(ß^)
=> linear relationship among independent variables: larger R2 implies larger Var(ß^)
- Theorem 3: Unbiased estimation of σ2: E(σ^2) = σ2 (under A1-A5)
- Theorem 4: Gauss Markov: (under A1-A5) ß^0, ß^1, …, ß^k are the best linear unbiased
estimators of ß0, ß1,..,ßk (=> OLS has smallest variance under all linear unbiased estimators)
- A6: Normality (for hypothesis testing): u ~ N(0,σ2) => (A1-A6): ßj^ ~ Normal (ßj, Var(ßj)
=> therefore:

Hypothesis testing:

t-Test:

• T-distributed with H0: ßj=0 and H1: ßj ≠ 0 (meaning x has no e ect on y, controlling for other x’s)

=> t- statistic f0r ßj^: here: ai = 0

• Signi cance level of 5%: 5% probability of rejecting H0 if it is really true


=> If we reject H0: xj is statistically signi cant at α % level
• Having picked a signi cance level, α we look up the (1-α)th percentile in a t-
distribution with n-k-1 df and call this c, the critical value => Reject H0 if t > c
• Symmetric t distribution: For H1: ßj < 0 => Reject H0 if t < -c
=> for a two-sided test: set c based on a/2 and accept H1: ßj≠0 if I t I > c
• Other way of statistical testing: (1-α)% con dence interval: ß^j +/- c*se(ß^j),
where c is the (1-α/2) percentile in a tn-k-1 distribution
• p-value: what is the smallest signi cance level at which H0 would be rejected?
=> Compute t-statistic and then look up what percentile it is in the appropriate t distribution

F-test:

• F-test is a joined hypothesis test that simultaneously tests two or more restrictions under H0
(can also be used for testing one restriction with more than one parameter: i.e. H0: ß1=ß3)
• Test: H0: Rß-r = 0 vs. H1: Rß-r ≠ 0 => Weighted squared distance measure for H0
• Heteroskedasticity robust F- statistic: R = q x (k+1) Matrix with q≤k+1 and rg(R) = q, r is a q x 1 vector
Sum of squared distances for q restrictions of H0
Estimator for (approximate) variance matrix for (Rß^-r)

- Estimator for variance matrix for (Rß^-r) as the weighted sum of the squared distances
(=> restrictions with high variance have a relatively lower weighting)
- Considering covariances between restrictions (if we have covariances=> no diagonal matrix):
With ai measuring the distances of the restrictions:
Rß^ -r = a ≠ 0

- Inverse: we divide by diagonal entries, i.e. the distances with a high estimated variance
get a lower weight, relative to the distances that are estimated more accurately, i.e. with
smaller variance
fi
fi
fi
fi
fi
ff
Testing linear Combinations:

• Instead of testing if ß1 is equal to a constant, test if its equal to another parameter: H0: ß1=ß2
• t- statistic:

with where s12 is an estimator

Joint Hypotheses Tests:

• Jointly test multiple hypotheses about parameters (e.g. H0: ßk-q+1=0,…,ßk=0, H1: H0 not true)
=> can’t check each separately, because we want to know if they are jointly signi cant at given
level (it’s possible for no parameter to be individually signi cant, while being jointly signi cant)
• Estimate the „restricted model“ without xk-q+1,…,xk as well as the „unrestricted model“ with all
x’s included => Does the change in SSR is big enough to warrant inclusion of xk-q+1,…,xk:
r = restricted model
ur = unrestricted model
q= number of restrictions or dfr-dfur, where dfur = n-k-1
• F statistic is always positive (SSR from the restricted model > SSR from the unrestricted)
• F statistic measures relative increase in SSR when moving from unrestricted to restricted model
• F~Fq,n-k-1, where q is referred to as numerator of freedom and n-k-1 as dominator degrees of
freedom => Reject H0 at a signi cance level α if F > c
• Alternative use:
Calculate p-values by looking up the
percentile in the appropriate F distribution

• Special case of exclusion restrictions is to test: H0: ß1=ß2=…=ßk=0:

• If only one exclusion is being tested: F = t2, and the p-values will be the same

Asymptotics:

• Under weaker assumptions (A1-A4): OLS will still be consistent: as n -> ∞


=> the distribution of the estimator converges in probability to the true parameter value
• A sequence of random vectors An E Rk converges in probability to A if:

• If we have consistency and n is large, then t and F statistics will be approximately t- and F-
distributed (under Gauss Markov assumptions, the OLS estimator is consistent and unbiased)

Central Limit Theorem:

• Standardized average of any population with mean µ and variance σ2 is asymptotically ~ N(0,1):
OLS estimators are asymptotically normal

Elasticity:

• Marginal e ects: dy/dxi


• Elasticity: dW/dA * A/W
• Marginal e ects of log model: dW/dlog(W) * dlog(W)/dA
- dlog(W)/dW = 1/W => dW/dlog(W) = W => ME = W * dlog(W)/dA
• Elasticity of W w.r.t. A: dW/dA * A/W = (dW/dlog(W) * dlog(W)/dA)* A/W = W*(dlog(W)/dA * A/W
= A + dlog(W)/dA
• Semi-elasticity of W w.r.t. FEM (dummy): derivatives of W w.r.t FEM
• Age when max income occurs: maximize logW wrt A: dlog(W) / dA = 0
ff
ff
fi
fi
fi
fi
Scaling of variables:

• Idea: replace y and each x variable with a standardized version i.e. subtract mean and divide by
standard deviation (=> variables are distributed with mean 0 and s.d.1)
=> coe cient re ects change in standard deviation of y for a one standard deviation change in x
• Changing the scale of the y variable will lead to a corresponding change in the scale of the
coe cients and standard errors, no change in the signi cance or interpretation
• Changing the scale of an x variable will lead to a change in the scale of that coe cient and
standard error, no change in signi cance or interpretation.
• Functional forms: OLS can also be used for not strictly linear relationships in x and y by using
nonlinear functions (log, quadratic, interactions) of x and y (model is still linear in its parameters)
- Quadratic models: For a model y=ß0+ß1x+ß2x2+u => change in y: dy/dx = ß1 + 2ß2x
- Interaction terms: y=ß0+ß1x1+ß2x2+ß3x1x2+u => change in y: dy/dx = ß1 + ß3x2
Log Models:

• If the model is ln(y) = ß0 + ß1ln(x) + u


=> ß1 is elasticity of y with respect to x: a 1% change in x leads to a ß1% change in y
• If the model is ln(y) = ß0 + ß1x + u
=> ß1 is approximately the percentage change in y given a 1 unit change in x
• If the model is y = ß0 + ß1 ln(x) + u
=> ß1 is approximately the (absolute) change in y given a 100% change in x
• Log models are invariant to the scale of variables since they are measuring percent changes
and give a direct estimate of elasticity => BUT: can only be used for variable values > 0

Chow-Test:

• Goal: Testing whether a regression function is di erent for one group versus another can be
thought of as simply testing for the joint signi cance of the dummy and its interactions with all
other x variables (e.g. di erence for man and woman).
- Test fro structural change over time: Interact X variables whose e ect we suspect to change
over time with time indicators (e.g. year dummies) or test for individual (t-test) or joint (F-test)
signi cance of coe cients
• If we run the restricted model for group one and get SSR1, then for group two and get SSR2
• Run the restricted model for all to get SSR:
SSR from model: y = Xß
SSR1 from model Mmale: y = Xßmale
SSR2 from model: Mwoman: y= Xßwoman
• The Chow test is just a F test for exclusion restrictions => SSRur= SSR1+ SSR2
• Note: Restrictions: k+1 (each of slope coe cients and intercept)
• Note: unrestricted model would estimate 2 di erent intercepts and 2 di erent slope coe cients:
df=n-2k-2

Heteroskedasticity:

• Def: variance of u is di erent for di erent values of the x’s, then the errors are heteroskedastic
• OLS coe cients are still unbiased and consistent, but the estimators of coe cient variances
Var(ß^j) are biased if we have heteroskedasticity
• Estimator for Var(ß^1) for SLR:

=> estimator: with

• Estimator for Var(ß^1) for MLR:

rij^ = ith residual from regressing xj on all other independent variables


ffi
fi
ffi
ffi
fl
ffi
ff
ff
fi
ff
ffi
fi
ff
ff
fi
ff
ff
ffi
ffi
ffi
Testing for Heteroskedasticity:

• Test H0: Var(uIx1,x2,…,xk) = σ2, which is equivalent to H0: E(u2 I x1,x2,…,xk) = E(u2) = s2
• If we assume the relationship between u2 and xj to be linear, we can test it as a linear restriction:
u2 = d0 + d1x1 + … + dkxk + v => H0: d1 = d2 = … = dk = 0

The Breusch-Pagan Test:

• Goal: Detect any linear form of heteroskedasticity


• Since we dont observe the errors => estimate them with the residuals from the OLS regression
• After regressing residuals squared on all of the x’s, we can use the R2 to form an F or LM test
• F-statistic:

=> Fk,n-k-1 distributed

• The X2 - statistic is X2q ~ nR2


• If the F or LM statistic is su ciently large, the H0 of homoskedasticity is rejected

White Test:

• Goal: test model for misspeci cation => test for hetersoskedasticity
• Idea: White test allows for nonlinearities by using squares and cross products of all the x’s
- test: H0: Var(ui I Xi) = σ2 vs H1: Var(ui I Xi) = σ2i (variance depends on regressors)
- Use û2 as proxy for σ2 and then test if conditional variance depends linearly on the
regressors and the squares of the regressors (standard case)
- If there is a relationship between squared residuals and the speci c function of the
regressors we have evidence against homoscedasticity (reject H0)
• Procedure: Test if σ2i = f(X1i,…,Xki)
1. Run the original OLS model, save the residuals, ûi: Yi-Xi’ß^
2. Determine R2 of the auxiliary regression/test regression:

3. Test H0: y1= … = y2k = 0 vs. H1: minimum one yj ≠ 0


=> test statistic: WH = nR2 -> X22k If H0 holds: test statistic converges asymptotically in distribution
=> Reject H0 for high values of WH: against an X22k distribution with 2 degrees of freedom
- If p-value < signi cant level: Reject H0 => Evidence for heteroskedasticity
• Or use a F statistic to test whether all the xj, xj2 and xjxh are jointly signi cant
• Extension- Interaction terms => Use WHK = nR2 -> X2p (p= number of restrictions= number of γ
coe cients whose signi cance is tested under H0)
• Alternative form of White test: consider that tted values from OLS y^ are a function of all x’s
=> y2^ will be a function of squares& cross products => y^ and y2^ proxy for all xj,xj2 and xjxh
=> Regress the residuals squared on y^ and y2^ and use the R2 to form an F statistic (note: we
only test for 2 restrictions now)

Weighted Least Squares:

• In case of heteroskedasticity robust standard errors for OLS estimates still unbiased&consistent,
but WLS/GLS is more e cient if we know smth about speci c form of heteroskedasticity
• Idea: transform model into one that has homoskedastic errors - called weighted least squares
• Suppose heteroskedasticity is modeled as Var(uIx) = σ2h(x) => gure out what h(x) = hi looks like
- E(ui/√hi Ix)= 0, because hi is only a function of x and Var(ui/√hi Ix)= σ2, because we know σ2hi
- If we divided our whole equation by √hi => model where the error is homoskedastic
ffi
fi
fi
ffi
ffi
fi
fi
fi
fi
fi
fi
Generalized Least Squares:

• Estimating the transformed equation by OLS => example of generalized least squares (GLS):

• GLS is a weighted least squares (WLS) procedure where each squared residual is weighted by
the inverse of Var(ui I xi) (OLS as special case of GLS, where all weights are equal)
• If data is heteroskedastic => GLS is more e cient than OLS (considers heteroskedasticity)
• Typically we don’t know the form of heteroskedasticity => estimate h(xi):
- Model: Var(uIx) = σ2 ed0+d1x1+…+dkxk (if we know di we could apply WLS => estimate di)
(used exponential form, because estimated variances must be positive in order to perform WLS)
1. Run the original OLS model, save the residuals, û, square them:
u2 = Var(uIx) = σ2 ed0+d1x1+…+dkxk v , where E(vIx) = 1, if E(v) = 1 (we can write e)
2. Take the log of the squared residuals:
ln(u2) = a0 + d1x1 + … + dkxk + e , where E(e) = 0 and e is independent of x
=> û is an estimate of u, so we can estimate this by OLS
3. Regress them on all of the independent variables and get the tted values g^
4. Do WLS using 1/exp(g^) as the weight
=> Estimate of h is obtained as h^=exp(g^), where g^ are the residuals from the regression
of log(û2) on the xk => Inverse of this is the weight

here: g^= Psi


h^= P

=> WLS and GLS are equal if we don’t have autocorrelation (diagonal psi matrix)

RESET- test (Ramsey’s regression speci cation error test):

• Goal: test of functional form of the model to nd the right functional form for our model
- Are there (non)linear relationships, should we use log, quadratic forms or interaction terms?
• Idea: Instead of adding functions of the x’s directly, we add and test functions of y^
=> estimate: y= ß0 + ß1x1 + … + ßkxk + d1y2^ + d2y3^ + error (with Yi^=ß0^+ß1^x1+ßk^xk)
=> test: H0: d1= 0, d2= 0 (linear model is correctly speci ed) vs: H1: minimum of one dj ≠ 0
=> test statistic: F~F2,n-k-3
=> if p-value > signi cance level: Accept H0 => linearity of regressors
=> if p-value < signi cance level: Reject H0 => non linearity of regressors /omitted variables
• Rejection of H0 is evidence for functional form misspeci cation, but doesn’t tell correct form

TO DO Wilcox test:
fi
fi
fi
ffi
fi
fi
fi
fi

You might also like