Additional Cheatsheet en

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 3

Additional Cheat Sheet Variance-covariance matrix of u Variable omission correction

By Marcelo Moreno - Universidad Rey Juan Carlos Proxy variables


Has the following
 shape:
The Econometrics Cheat Sheet Project
Var(u1 ) Cov(u1 , u2 ) . . .
Cov(u1 , un )
 Is the approach when a relevant variable is not available
 Cov(u2 , u1 ) Var(u2 ) ...
Cov(u2 , un ) because it is non-observable, and there is no data available.
OLS matrix notation Var(u) =  ˆ A proxy variable is something related with the non-

 .. .. ..
.. 
 . . . . 
Cov(un , u1 ) Cov(un , u2 ) . . .
Var(un )
observable variable that has data available.
The general econometric model: For example, the GDP per capita is a proxy variable for
yi = β0 + β1 x1i + · · · + βk xki + ui When there is no heterocedasticity and no auto-correlation,
the variance-covariance matrixof u has the form: the life quality (non-observable).
Can be written in matrix notation as: σu2 0 . . . 0

Instrumental variables
y = Xβ + u  0 σu2 . . . 0 
2
Var(u) = σu · In =  .. When the variable of interest (x) is observable but endoge-
Let’s call û the vector of estimated residuals (û ̸= u):
 
.. . . . 
 . . . ..  nous, the proxy variables approach is no longer valid.
û = y − X β̂ 0 0 . . . σu2 ˆ An instrumental variable (IV) is an observable
The objective of OLS is toPminimize the SSR: where In is an identity matrix of n × n elements. variable (z) that is related with the variable of interest
n
min SSR = min i=1 û2i = min ûT û When there is heterocedasticity and auto-correlation, that is endogenous (x), and meet the requirements:
ˆ Defining û û:
T
the variance-covariance matrix  2 of u has the shape: Cov(z, u) = 0 → instrument exogeneity
ûT û = (y − X β̂)T (y − X β̂) =

σu1 σu12 . . . σu1n
 σu21 σu 2 Cov(z, x) ̸= 0 → instrument relevance
= y T y − 2β̂ T X T y + β̂ T X T X β̂ . . . σu2n 
Var(u) = σu2 · Ω =  Instrumental variables let the omitted variable in the error
2 
 .. .. .. .
ˆ Minimizing û û:
T .. 

 . . . term, but instead of estimate the model by OLS, it uti-
∂ ûT û σun1 σun2 . . . σu2 n
∂ β̂
= −2X T y + 2X T X β̂ = 0 lizes a method that recognizes the presence of an omitted
where Ω ̸= In .
β̂ = (X T X)−1 (X T y) variable. It can also solve error measurement problems.
ˆ Heterocedasticity: Var(u) = σu2 i ̸= σu2
   P P −1  P  ˆ Two-Stage Least Squares (TSLS) is a method to esti-
β0 Pn P x21 ... P xk P y ˆ Auto-correlation: Cov(ui , uj ) = σuij ̸= 0, ∀i ̸= j
 β1   x 1 x1 ... x1 xk   yx1  mate a model with multiple instrumental variables. The
 ..  =  ..  ·  .. 
     
.. .. .. Cov(z, u) = 0 requirement can be relaxed, but there has
.
.  .
βk
P
xk
P .
xk x1 ...
P 2.
xk
  . 
P
yxk
Variable omission to be a minimum of variables that satisfies it.
∂ 2 ûT û Most of the time, is hard to get all relevant variables for an The TSLS estimation procedure is as follows:
The second derivative = X T X > 0 (is a min.)
∂ β̂ 2
analysis. For example, a true model with all variables: 1. Estimate a model regressing x by z using OLS, ob-
y = β0 + β 1 x 1 + β2 x 2 + v taining x̂:
Variance-covariance matrix of β̂ where β2 ̸= 0, v is the error term and Cov(v|x1 , x2 ) = 0. x̂ = π̂0 + π̂1 z
The model with the available variables: 2. Replace x by x̂ in the final model and estimate it by
Has the following form:
y = α0 + α1 x1 + u OLS:
Var(β̂) = σ̂u2 · (X T X)−1 =

Var(β̂0 ) Cov(β̂0 , β̂1 ) ... Cov(β̂0 , β̂k )
 where u = v + β2 x2 . y = β0 + β1 x̂ + u
Cov(β̂1 , β̂0 ) Var(β̂1 ) ... Cov(β̂1 , β̂k ) Relevant variable omission causes OLS estimators to be bi- There are some important things to know about TSLS:
= – TSLS estimators are less efficient than OLS when the
 
.. .. .. ..  ased and inconsistent, because there is no weak exogene-
 . . . . 
̸ 0. Depending on the Corr(x1 , x2 ) and the
ity, Cov(x1 , u) = explanatory variables are exogenous. The Hausman
Cov(β̂k , β̂0 ) Cov(β̂k , β̂1 ) . . . Var(β̂k )
sign of β2 , the bias on α̂1 could be: test can be used to check it:
ûT û
where: σ̂u2
= n−k−1 Corr(x1 , x2 ) > 0 Corr(x1 , x2 ) < 0 H0 : OLS estimators are consistent.
The standard errors are in the
qdiagonal of: If H0 is accepted, the OLS estimators are better than
β2 > 0 (+) bias (−) bias
se(β̂) = Var(β̂) β2 < 0 (−) bias (+) bias TSLS and vice versa.
ˆ (+) bias: α̂1 will be higher than it should be (it includes – There could be some (or all) instrument that are not
Error measurements the effect of x2 ) → α̂1 > β1 valid. This is known as over-identification, Sargan
ˆ (−) bias: α̂1 will be lower than it should be (it includes test can be used to check it:
ˆ SSR = ûT û = y T y − β̂ T X T y = (yi − ŷi )2
P
the effect of x2 ) → α̂1 < β1 H0 : all instruments are valid.
ˆ SSE = β̂ T X T y − ny 2 = (ŷi − y)2P
P
If Corr(x1 , x2 ) = 0, there is no bias on α̂1 , because the
ˆ SST = SSR + SSE = y T y − ny 2 = (yi − y)2 effect of x2 will be fully picked up by the error term, u.

ADD-24.2-EN - github.com/marcelomijas/econometrics-cheatsheet - CC-BY-4.0 license


Information criterion Incorrect functional form Statistical definitions
It is used to compare models with different number of pa- To check if the model functional form is correct, we can Let ξ, η be random variables, a, b ∈ R constants, and P
rameters (p). The general formula: use Ramsey’s RESET (Regression Specification Error denotes probability.
Cr(p) = log( SSR
n ) + cn φ(p) Test). It test the original model vs. a model with vari-
where: ables in powers.
Mean Pn
ˆ SSR is the Sum of Squared Residuals from a model of H0 : the model is correctly specified. Definition: E(ξ) = i=1 ξi · P [ξ = ξi ]
order p. Test procedure: Population mean: Sample mean:
ˆ cn is a sequence indexed by the sample size. 1. Estimate the original model and obtain ŷ and R2 : 1 PN 1 Pn
E(ξ) = ξi E(ξ) = ξi
ˆ φ(p) is a function that penalizes large p orders. ŷ = β̂0 + β̂1 x1 + · · · + β̂k xk N i=1 n i=1
Is interpreted as the relative amount of information lost by 2. Estimate a new model adding powers of ŷ and obtain Some properties:
2
the model. The p order that min. the criterion is chosen. the new Rnew : ˆ E(a) = a
There are different cn φ(p) functions: ỹ = ŷ + γ̃2 ŷ 2 + · · · + γ̃l ŷ l ˆ E(ξ + a) = E(ξ) + a
ˆ Akaike: AIC(p) = log( SSR 2
n ) + np 3. Define the test statistic, under γ2 = · · · = γl = 0 as null ˆ E(a · ξ) = a · E(ξ)
ˆ Hannan-Quinn: HQ(p) = log( SSR n )+
2 log(log(n))
n p hypothesis:
2
ˆ E(ξ ± η) = E(ξ) + E(η)
−R2 n−(k+1)−l
ˆ Schwarz: Sc(p) = log( SSR ) + log(n)
p
Rnew
F = 1−R 2 · l ∼ Fl,n−(k+1)−l ˆ E(ξ · η) = E(ξ) · E(η) only if ξ and η are independent.
n n
ˆ E(ξ − E(ξ)) = 0
new
Sc(p) ≤ HQ(p) ≤ AIC(p) If Fl,n−(k+1)−l < F , there is evidence to reject H0 .
ˆ E(a · ξ + b · η) = a · E(ξ) + b · E(η)
The non-restricted hypothesis test Logistic regression Variance
Is an alternative to the F test when there are few hypoth- When there is a binary (0, 1) dependent variable, the lin- Definition: Var(ξ) = E(ξ − E(ξ))2
esis to test on the parameters. Let βi , βj be parameters, ear regression model is no longer valid, we can use logistic Population variance: Sample variance:
Pn
a, b, c ∈ R are constants. regression instead. For example, a logit model: PN
(ξi − E(ξ)) 2
i=1 (ξi − E(ξ))
2
i=1 Var(ξ) =
ˆ H0 : aβi + bβj = c 1 eβ0 +β1 xi +ui Var(ξ) =
n−1
Pi = = N
ˆ H1 : aβi + bβj ̸= c 1 + e−(β0 +β1 xi +ui ) 1 + eβ0 +β1 xi +ui
where Pi = E(yi = 1 | xi ) and (1 − Pi ) = E(yi = 0 | xi ) Some properties:
aβ̂i + bβ̂j − c ˆ Var(a) = 0
Under H0 : t = q The odds ratio (in favor of yi = 1):
Var(aβ̂i + bβ̂j ) Pi 1 + eβ0 +β1 xi +ui ˆ Var(ξ + a) = Var(ξ)
= = eβ0 +β1 xi +ui ˆ Var(a · ξ) = a2 · Var(ξ)
aβ̂i + bβ̂j − c 1 − Pi 1 + e−(β0 +β1 xi +ui )
=q Taking the natural logarithm of the odds ratio, we obtain ˆ Var(ξ ± η) = Var(ξ) + Var(η) ± 2 · Cov(ξ, η)
a2 Var(β̂i ) + b2 · Var(β̂j ) ± 2abCov(β̂i , β̂j ) the logit: ˆ Var(a · ξ ± b · η) = a2 · Var(ξ) + b2 · Var(η) ± 2ab · Cov(ξ, η)
 
If |t| > |tn−k−1,α/2 |, there is evidence to reject H0 . Pi Covariance
Li = ln = β0 + β1 xi + ui
1 − Pi Definition: Cov(ξ, η) = E[(ξ − E(ξ)) · (η − E(η))]
ANOVA Pi is between 0 and 1, but
1
P
Population covariance: Sample covariance:
Li goes from −∞ to +∞.
Decompose the total sum of squared in sum of squared PN Pn
residuals and sum of squared explained: SST = SSR + SSE If Li is positive, it means i=1 (ξi − E(ξ)) · (ηi − E(η)) i=1 (ξi − E(ξ)) · (ηi − E(η))
Variation origin Sum Sq. df Sum Sq. Avg.
that when xi increments, the N n−1
Regression SSE k SSE/k
Residuals SSR n − k − 1 SSR/(n − k − 1) probability of yi = 1 in- Some properties:
Total SST n−1 creases, and vice versa. ˆ Cov(ξ, a) = 0
The F statistic: ˆ Cov(ξ + a, η + b) = Cov(ξ, η)
SSA of SSE SSE n − k − 1 0 ˆ Cov(a · ξ, b · η) = ab · Cov(ξ, η)
F = = · ∼ Fk,n−k−1 x
SSA of SSR SSR k ˆ Cov(ξ, ξ) = Var(ξ)
If Fk,n−k−1 < F , there is evidence to reject H0 . ˆ Cov(ξ, η) = Cov(η, ξ)

ADD-24.2-EN - github.com/marcelomijas/econometrics-cheatsheet - CC-BY-4.0 license


VAR (Vector Autoregressive) VECM (Vector Error Correction Model)
A VAR model captures dynamic interactions between time series variables. The If cointegrating relations are present in a system of variables, the VAR form is not the
VAR(p): most convenient. It is better to use a VECM, that is, the levels VAR substracting yt−1
yt = A1 yt−1 + · · · + Ap yt−p + B0 xt + · · · + Bq xt−q + CDt + ut from both sides. The VECM(p − 1):
where: ∆yt = Πyt−1 + Γ1 ∆yt−1 + · · · + Γp−1 ∆yt−p+1 + B0 xt + · · · + Bq xt−q + CDt + ut
ˆ yt = (y1t , . . . , yKt )T is a vector of K observable endogenous time series variables. where:
ˆ Ai ’s are K × K coefficient matrices. ˆ yt , xt , Dt and ut are as specified in VAR.
ˆ xt = (x1t , . . . , xM t )T is a vector of M observable exogenous time series variables. ˆ Π = −(IK − A1 − · · · − Ap ) for i = 1, . . . , p − 1 ; Πyt−1 is referred as the long-term
ˆ Bj ’s are K × M coefficient matrices. part.
ˆ Dt is a vector that contains all deterministic terms, that may be a: constant, linear ˆ Γi = −(Ai+1 + · · · + Ap ) for i = 1, . . . , p − 1 is referred as the short-term parameters.
trend, seasonal dummy, and/or any other user specified dummy variables. ˆ Ai , Bj and C are coefficient matrices of suitable dimensions.
ˆ C is a coefficient matrix of suitable dimension. If the VAR(p) process is unstable (there are roots), Π can be written as a prod-
ˆ ut = (u1t , . . . , uKt )T is a vector of K white noise series. uct of (K × r) matrices α (loading matrix) and β (cointegration matrix) with
The process is stable if: rk(Π) = rk(α) = rk(β) = r (cointegrating rank) as follows Π = αβ T .
det(IK − A1 z − · · · − Ap z p ) ̸= 0 for |z| ≤ 1 ˆ β T yt−1 contains the cointegrating relations.
this is, there are no roots in and on the complex unit circle. For example, if there are three endogenous variables (K = 3) with two cointegratig rela-
For example, a VAR model with two endogenous variables (K = 2), two lags (p = 2), an tions (r = 2), the longterm partof the VECM:    
exogenous
   contemporaneous  variable (M=1), aconstant
  (const)
 and a trend (Trend
  t ):
α11 α12   y1,t−1 α11 ec1,t−1 + α12 ec2,t−1
β β β
Πyt−1 = αβ T yt−1 = α21 α22  11 21 31 
  
y1t a
= 11,1
a12,1 y a
· 1,t−1 + 11,2
a12,2 y b c
· 1,t−2 + 11 · xt + 11
  c12
·
const u
+ 1t
y2,t−1  = α21 ec1,t−1 + α22 ec2,t−1 
y2t a21,1 a22,1 y2,t−1 a21,2 a22,2 y2,t−2 b21 c21 c22 Trendt u2t β12 β22 β32
α31 α32 y3,t−1 α31 ec1,t−1 + α32 ec2,t−1
Visualizing the separate equations: where:
y1t = a11,1 y1,t−1 + a12,1 y2,t−1 + a11,2 y1,t−2 + a12,2 y2,t−2 + b11 xt + c11 + c12 Trendt + u1t ec1,t−1 = β11 y1,t−1 + β21 y2,t−1 + β31 y3,t−1
y2t = a21,1 y2,t−1 + a22,1 y1,t−1 + a21,2 y2,t−2 + a22,2 y1,t−2 + b21 xt + c21 + c22 Trendt + u2t ec2,t−1 = β12 y1,t−1 + β22 y2,t−1 + β32 y3,t−1
If there is an unit root, the determinant is zero for z = 1, then some or all variables are
integrated and a VAR model is no longer appropiate (is unstable).

ADD-24.2-EN - github.com/marcelomijas/econometrics-cheatsheet - CC-BY-4.0 license

You might also like