1. The variance-covariance matrix of the error term (u) in a regression model takes on different forms depending on whether there is heteroscedasticity and autocorrelation.
2. When relevant variables are unavailable, proxy variables can be used if they are related to the omitted variable. However, if the included variable is endogenous, instrumental variables are needed.
3. An instrumental variable is correlated with the endogenous variable but uncorrelated with the error term, allowing estimation that accounts for omitted variable bias. Two-stage least squares is a method for using multiple instruments.
1. The variance-covariance matrix of the error term (u) in a regression model takes on different forms depending on whether there is heteroscedasticity and autocorrelation.
2. When relevant variables are unavailable, proxy variables can be used if they are related to the omitted variable. However, if the included variable is endogenous, instrumental variables are needed.
3. An instrumental variable is correlated with the endogenous variable but uncorrelated with the error term, allowing estimation that accounts for omitted variable bias. Two-stage least squares is a method for using multiple instruments.
1. The variance-covariance matrix of the error term (u) in a regression model takes on different forms depending on whether there is heteroscedasticity and autocorrelation.
2. When relevant variables are unavailable, proxy variables can be used if they are related to the omitted variable. However, if the included variable is endogenous, instrumental variables are needed.
3. An instrumental variable is correlated with the endogenous variable but uncorrelated with the error term, allowing estimation that accounts for omitted variable bias. Two-stage least squares is a method for using multiple instruments.
1. The variance-covariance matrix of the error term (u) in a regression model takes on different forms depending on whether there is heteroscedasticity and autocorrelation.
2. When relevant variables are unavailable, proxy variables can be used if they are related to the omitted variable. However, if the included variable is endogenous, instrumental variables are needed.
3. An instrumental variable is correlated with the endogenous variable but uncorrelated with the error term, allowing estimation that accounts for omitted variable bias. Two-stage least squares is a method for using multiple instruments.
Additional Cheat Sheet Variance-covariance matrix of u Variable omission correction
By Marcelo Moreno - Universidad Rey Juan Carlos Proxy variables
Has the following shape: The Econometrics Cheat Sheet Project Var(u1 ) Cov(u1 , u2 ) . . . Cov(u1 , un ) Is the approach when a relevant variable is not available Cov(u2 , u1 ) Var(u2 ) ... Cov(u2 , un ) because it is non-observable, and there is no data available. OLS matrix notation Var(u) = A proxy variable is something related with the non- .. .. .. .. . . . . Cov(un , u1 ) Cov(un , u2 ) . . . Var(un ) observable variable that has data available. The general econometric model: For example, the GDP per capita is a proxy variable for yi = β0 + β1 x1i + · · · + βk xki + ui When there is no heterocedasticity and no auto-correlation, the variance-covariance matrixof u has the form: the life quality (non-observable). Can be written in matrix notation as: σu2 0 . . . 0 Instrumental variables y = Xβ + u 0 σu2 . . . 0 2 Var(u) = σu · In = .. When the variable of interest (x) is observable but endoge- Let’s call û the vector of estimated residuals (û ̸= u): .. . . . . . . .. nous, the proxy variables approach is no longer valid. û = y − X β̂ 0 0 . . . σu2 An instrumental variable (IV) is an observable The objective of OLS is toPminimize the SSR: where In is an identity matrix of n × n elements. variable (z) that is related with the variable of interest n min SSR = min i=1 û2i = min ûT û When there is heterocedasticity and auto-correlation, that is endogenous (x), and meet the requirements: Defining û û: T the variance-covariance matrix 2 of u has the shape: Cov(z, u) = 0 → instrument exogeneity ûT û = (y − X β̂)T (y − X β̂) = σu1 σu12 . . . σu1n σu21 σu 2 Cov(z, x) ̸= 0 → instrument relevance = y T y − 2β̂ T X T y + β̂ T X T X β̂ . . . σu2n Var(u) = σu2 · Ω = Instrumental variables let the omitted variable in the error 2 .. .. .. . Minimizing û û: T .. . . . term, but instead of estimate the model by OLS, it uti- ∂ ûT û σun1 σun2 . . . σu2 n ∂ β̂ = −2X T y + 2X T X β̂ = 0 lizes a method that recognizes the presence of an omitted where Ω ̸= In . β̂ = (X T X)−1 (X T y) variable. It can also solve error measurement problems. Heterocedasticity: Var(u) = σu2 i ̸= σu2 P P −1 P Two-Stage Least Squares (TSLS) is a method to esti- β0 Pn P x21 ... P xk P y Auto-correlation: Cov(ui , uj ) = σuij ̸= 0, ∀i ̸= j β1 x 1 x1 ... x1 xk yx1 mate a model with multiple instrumental variables. The .. = .. · .. .. .. .. Cov(z, u) = 0 requirement can be relaxed, but there has . . . βk P xk P . xk x1 ... P 2. xk . P yxk Variable omission to be a minimum of variables that satisfies it. ∂ 2 ûT û Most of the time, is hard to get all relevant variables for an The TSLS estimation procedure is as follows: The second derivative = X T X > 0 (is a min.) ∂ β̂ 2 analysis. For example, a true model with all variables: 1. Estimate a model regressing x by z using OLS, ob- y = β0 + β 1 x 1 + β2 x 2 + v taining x̂: Variance-covariance matrix of β̂ where β2 ̸= 0, v is the error term and Cov(v|x1 , x2 ) = 0. x̂ = π̂0 + π̂1 z The model with the available variables: 2. Replace x by x̂ in the final model and estimate it by Has the following form: y = α0 + α1 x1 + u OLS: Var(β̂) = σ̂u2 · (X T X)−1 = Var(β̂0 ) Cov(β̂0 , β̂1 ) ... Cov(β̂0 , β̂k ) where u = v + β2 x2 . y = β0 + β1 x̂ + u Cov(β̂1 , β̂0 ) Var(β̂1 ) ... Cov(β̂1 , β̂k ) Relevant variable omission causes OLS estimators to be bi- There are some important things to know about TSLS: = – TSLS estimators are less efficient than OLS when the .. .. .. .. ased and inconsistent, because there is no weak exogene- . . . . ̸ 0. Depending on the Corr(x1 , x2 ) and the ity, Cov(x1 , u) = explanatory variables are exogenous. The Hausman Cov(β̂k , β̂0 ) Cov(β̂k , β̂1 ) . . . Var(β̂k ) sign of β2 , the bias on α̂1 could be: test can be used to check it: ûT û where: σ̂u2 = n−k−1 Corr(x1 , x2 ) > 0 Corr(x1 , x2 ) < 0 H0 : OLS estimators are consistent. The standard errors are in the qdiagonal of: If H0 is accepted, the OLS estimators are better than β2 > 0 (+) bias (−) bias se(β̂) = Var(β̂) β2 < 0 (−) bias (+) bias TSLS and vice versa. (+) bias: α̂1 will be higher than it should be (it includes – There could be some (or all) instrument that are not Error measurements the effect of x2 ) → α̂1 > β1 valid. This is known as over-identification, Sargan (−) bias: α̂1 will be lower than it should be (it includes test can be used to check it: SSR = ûT û = y T y − β̂ T X T y = (yi − ŷi )2 P the effect of x2 ) → α̂1 < β1 H0 : all instruments are valid. SSE = β̂ T X T y − ny 2 = (ŷi − y)2P P If Corr(x1 , x2 ) = 0, there is no bias on α̂1 , because the SST = SSR + SSE = y T y − ny 2 = (yi − y)2 effect of x2 will be fully picked up by the error term, u.
Information criterion Incorrect functional form Statistical definitions It is used to compare models with different number of pa- To check if the model functional form is correct, we can Let ξ, η be random variables, a, b ∈ R constants, and P rameters (p). The general formula: use Ramsey’s RESET (Regression Specification Error denotes probability. Cr(p) = log( SSR n ) + cn φ(p) Test). It test the original model vs. a model with vari- where: ables in powers. Mean Pn SSR is the Sum of Squared Residuals from a model of H0 : the model is correctly specified. Definition: E(ξ) = i=1 ξi · P [ξ = ξi ] order p. Test procedure: Population mean: Sample mean: cn is a sequence indexed by the sample size. 1. Estimate the original model and obtain ŷ and R2 : 1 PN 1 Pn E(ξ) = ξi E(ξ) = ξi φ(p) is a function that penalizes large p orders. ŷ = β̂0 + β̂1 x1 + · · · + β̂k xk N i=1 n i=1 Is interpreted as the relative amount of information lost by 2. Estimate a new model adding powers of ŷ and obtain Some properties: 2 the model. The p order that min. the criterion is chosen. the new Rnew : E(a) = a There are different cn φ(p) functions: ỹ = ŷ + γ̃2 ŷ 2 + · · · + γ̃l ŷ l E(ξ + a) = E(ξ) + a Akaike: AIC(p) = log( SSR 2 n ) + np 3. Define the test statistic, under γ2 = · · · = γl = 0 as null E(a · ξ) = a · E(ξ) Hannan-Quinn: HQ(p) = log( SSR n )+ 2 log(log(n)) n p hypothesis: 2 E(ξ ± η) = E(ξ) + E(η) −R2 n−(k+1)−l Schwarz: Sc(p) = log( SSR ) + log(n) p Rnew F = 1−R 2 · l ∼ Fl,n−(k+1)−l E(ξ · η) = E(ξ) · E(η) only if ξ and η are independent. n n E(ξ − E(ξ)) = 0 new Sc(p) ≤ HQ(p) ≤ AIC(p) If Fl,n−(k+1)−l < F , there is evidence to reject H0 . E(a · ξ + b · η) = a · E(ξ) + b · E(η) The non-restricted hypothesis test Logistic regression Variance Is an alternative to the F test when there are few hypoth- When there is a binary (0, 1) dependent variable, the lin- Definition: Var(ξ) = E(ξ − E(ξ))2 esis to test on the parameters. Let βi , βj be parameters, ear regression model is no longer valid, we can use logistic Population variance: Sample variance: Pn a, b, c ∈ R are constants. regression instead. For example, a logit model: PN (ξi − E(ξ)) 2 i=1 (ξi − E(ξ)) 2 i=1 Var(ξ) = H0 : aβi + bβj = c 1 eβ0 +β1 xi +ui Var(ξ) = n−1 Pi = = N H1 : aβi + bβj ̸= c 1 + e−(β0 +β1 xi +ui ) 1 + eβ0 +β1 xi +ui where Pi = E(yi = 1 | xi ) and (1 − Pi ) = E(yi = 0 | xi ) Some properties: aβ̂i + bβ̂j − c Var(a) = 0 Under H0 : t = q The odds ratio (in favor of yi = 1): Var(aβ̂i + bβ̂j ) Pi 1 + eβ0 +β1 xi +ui Var(ξ + a) = Var(ξ) = = eβ0 +β1 xi +ui Var(a · ξ) = a2 · Var(ξ) aβ̂i + bβ̂j − c 1 − Pi 1 + e−(β0 +β1 xi +ui ) =q Taking the natural logarithm of the odds ratio, we obtain Var(ξ ± η) = Var(ξ) + Var(η) ± 2 · Cov(ξ, η) a2 Var(β̂i ) + b2 · Var(β̂j ) ± 2abCov(β̂i , β̂j ) the logit: Var(a · ξ ± b · η) = a2 · Var(ξ) + b2 · Var(η) ± 2ab · Cov(ξ, η)
If |t| > |tn−k−1,α/2 |, there is evidence to reject H0 . Pi Covariance Li = ln = β0 + β1 xi + ui 1 − Pi Definition: Cov(ξ, η) = E[(ξ − E(ξ)) · (η − E(η))] ANOVA Pi is between 0 and 1, but 1 P Population covariance: Sample covariance: Li goes from −∞ to +∞. Decompose the total sum of squared in sum of squared PN Pn residuals and sum of squared explained: SST = SSR + SSE If Li is positive, it means i=1 (ξi − E(ξ)) · (ηi − E(η)) i=1 (ξi − E(ξ)) · (ηi − E(η)) Variation origin Sum Sq. df Sum Sq. Avg. that when xi increments, the N n−1 Regression SSE k SSE/k Residuals SSR n − k − 1 SSR/(n − k − 1) probability of yi = 1 in- Some properties: Total SST n−1 creases, and vice versa. Cov(ξ, a) = 0 The F statistic: Cov(ξ + a, η + b) = Cov(ξ, η) SSA of SSE SSE n − k − 1 0 Cov(a · ξ, b · η) = ab · Cov(ξ, η) F = = · ∼ Fk,n−k−1 x SSA of SSR SSR k Cov(ξ, ξ) = Var(ξ) If Fk,n−k−1 < F , there is evidence to reject H0 . Cov(ξ, η) = Cov(η, ξ)
VAR (Vector Autoregressive) VECM (Vector Error Correction Model) A VAR model captures dynamic interactions between time series variables. The If cointegrating relations are present in a system of variables, the VAR form is not the VAR(p): most convenient. It is better to use a VECM, that is, the levels VAR substracting yt−1 yt = A1 yt−1 + · · · + Ap yt−p + B0 xt + · · · + Bq xt−q + CDt + ut from both sides. The VECM(p − 1): where: ∆yt = Πyt−1 + Γ1 ∆yt−1 + · · · + Γp−1 ∆yt−p+1 + B0 xt + · · · + Bq xt−q + CDt + ut yt = (y1t , . . . , yKt )T is a vector of K observable endogenous time series variables. where: Ai ’s are K × K coefficient matrices. yt , xt , Dt and ut are as specified in VAR. xt = (x1t , . . . , xM t )T is a vector of M observable exogenous time series variables. Π = −(IK − A1 − · · · − Ap ) for i = 1, . . . , p − 1 ; Πyt−1 is referred as the long-term Bj ’s are K × M coefficient matrices. part. Dt is a vector that contains all deterministic terms, that may be a: constant, linear Γi = −(Ai+1 + · · · + Ap ) for i = 1, . . . , p − 1 is referred as the short-term parameters. trend, seasonal dummy, and/or any other user specified dummy variables. Ai , Bj and C are coefficient matrices of suitable dimensions. C is a coefficient matrix of suitable dimension. If the VAR(p) process is unstable (there are roots), Π can be written as a prod- ut = (u1t , . . . , uKt )T is a vector of K white noise series. uct of (K × r) matrices α (loading matrix) and β (cointegration matrix) with The process is stable if: rk(Π) = rk(α) = rk(β) = r (cointegrating rank) as follows Π = αβ T . det(IK − A1 z − · · · − Ap z p ) ̸= 0 for |z| ≤ 1 β T yt−1 contains the cointegrating relations. this is, there are no roots in and on the complex unit circle. For example, if there are three endogenous variables (K = 3) with two cointegratig rela- For example, a VAR model with two endogenous variables (K = 2), two lags (p = 2), an tions (r = 2), the longterm partof the VECM: exogenous contemporaneous variable (M=1), aconstant (const) and a trend (Trend t ): α11 α12 y1,t−1 α11 ec1,t−1 + α12 ec2,t−1 β β β Πyt−1 = αβ T yt−1 = α21 α22 11 21 31
y1t a = 11,1 a12,1 y a · 1,t−1 + 11,2 a12,2 y b c · 1,t−2 + 11 · xt + 11 c12 · const u + 1t y2,t−1 = α21 ec1,t−1 + α22 ec2,t−1 y2t a21,1 a22,1 y2,t−1 a21,2 a22,2 y2,t−2 b21 c21 c22 Trendt u2t β12 β22 β32 α31 α32 y3,t−1 α31 ec1,t−1 + α32 ec2,t−1 Visualizing the separate equations: where: y1t = a11,1 y1,t−1 + a12,1 y2,t−1 + a11,2 y1,t−2 + a12,2 y2,t−2 + b11 xt + c11 + c12 Trendt + u1t ec1,t−1 = β11 y1,t−1 + β21 y2,t−1 + β31 y3,t−1 y2t = a21,1 y2,t−1 + a22,1 y1,t−1 + a21,2 y2,t−2 + a22,2 y1,t−2 + b21 xt + c21 + c22 Trendt + u2t ec2,t−1 = β12 y1,t−1 + β22 y2,t−1 + β32 y3,t−1 If there is an unit root, the determinant is zero for z = 1, then some or all variables are integrated and a VAR model is no longer appropiate (is unstable).