GLS e FGLS

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 10

Outline of Linear Systems of Equations

POLS, GLS, FGLS, GMM


Common Coefficients, Panel Data Model

Preliminaries
The linear panel data model is a static model because all explanatory variables are dated con-
temporaenously with the dependent variable. It is also considered a common coefficient model
because β is the same for all individuals across time.

yit = xit0 β + uit

where xit : K × 1; i = 1, . . . , n and t = 1, . . . , T ; n is large and T is small (n ≥ T ). We assume i 6= j


are all independent.
Want time heterogeneity? Then use time-dummies or the Seemingly Unrelated Regression (SUR)
model.
Want individual heterogeneity? Fixed Effects (FE) and/or Random Effects (RE), or something
more general such as Random Coefficients (RC)
For right now, there is no individual or time heterogeneity present in the model. We will in-
clude unobserved individual heterogeneity into the panel data model later. We will also discuss
mulivariate linear systems with time heterogeneity, i.e., the SUR model, at another time.
To simplify the notation, we can stake the model over time

yi = xi β + ui

where yi : T × 1, ui : T × 1, and
0
 
xi1
 x0 
 i2 
xi =  .. 
 . 
0
xiT T ×K

1
POLS

Identification Assumptions
Assumption POLS.1: Exit uit = 0 ∀ i,t (Within equation, or contemporaneous, exogeneity). For
most applications, xi has a sufficient number of elements equal to unity, so that Assumption POLS.1
implies that E(ui ) = 0
This is the weakest assumption we can impose in a regression framework to get consistent
estimators of β, and it can hold when some elements of xi are correlated with some elements
of ui . For example, it allows xis and uit to be correlated when s 6= t.
Under Assumption POLS.1, the vector β satisfies E [xi0 (yi − xi β)] = 0 (1) or E(xi0 xi )β = E(xi0 yi ).
For each i, xi0 yi is a K × 1 vectors and xi0 xi is a K × Ksymmetric, positive semidefinite random
matrix. Therefore, E(xi0 xi ) is always a K × K symmetric, postive semidefinite nonrandom matrix
(the expectation here is define over the population distribution of xi ).
To be able to estimate β, we need to assume that it is the only K × 1 vectors that satisfies (1).
E(xit xit0 ) = K
 T 
Assumption POLS.2: rank ∑t=1
Under Assumptions POLS.1 and POLS.2, we can write β = [E(xi0 xi )]−1 E(xi0 yi ), which shows that
the two assumptions identify the vector β.

Estimator
Define the Pooled Ordinary Least Squares (POLS) estimator as:
!−1 ! !−1 !
n T n T n n
βˆ POLS = ∑ ∑ xit xit0 ∑ ∑ xit yit = ∑ xi0xi ∑ xi0yi
i=1t=1 i=1t=1 i=1 i=1

For computing βˆ POLS using matrix language programming, it is sometimes useful to write βˆ =
(X 0 X)−1 X 0Y where
 
x1
X ≡  ... 
 
xn nT ×K

and
 
y1
Y ≡  ... 
 
yn nT ×1

This estimator is called the pooled ordinary least squares (POLS) estimator because it corre-
sponds to running OLS on the observations pooled across i and t.

Page 2 of 10
Asymptotic Properties
Consistency

Since E(xit uit ) = 0 by assumption, then


!−1
p
!
1 n T 1 n T
βˆ POLS − β = ∑ ∑ xit xit0
nT i=1 ∑ ∑ xit uit
nT i=1
→ E(xit xit0 )−1 E(xit uit ) = 0
t=1 t=1

Asymptotic Normality

yi = xi β + ui
!−1
d
!
√ ˆ  1 n 0 1 n
n βPOLS − β = ∑ xixi √ ∑ xi0 ui → E(xi0 xi )−1 · N(0, Exi0 ui u0i xi )
n i=1 n i=1
−1 −1
VR = Exi0 xi (Exi0 ui u0i xi ) Exi0 xi
!−1 ! !−1
1 n 0 1 n 0 0 1 n 0
V̂R = ∑ xixi ∑ xiûiûixi n ∑ xixi
n i=1 n i=1 i=1

where ûi = yi − xi βˆ POLS


H0 : Rβ = r;
Wald:

−1 d
n(Rβˆ − r)0 RV̂R R0 (Rβˆ − r) → χ2 (K − q)

System Conditional Homoskedasticity (SCH)

Assumption: E(ui u0i |xi ) = E(ui u0i )


By the law of iterated expectations, the SCH assumption implies that E(xi0 ui u0i xi ) = E(xi0 Ωxi ) where
ΩT ×T ≡ E(ui u0i ).
−1 −1
VNR = Exi0 xi (Exi0 Ωxi ) Exi0 xi
!−1 ! !−1
1 n 0 1 n 0 1 n 0
V̂NR = ∑ xixi ∑ xiΩ̂xi n ∑ xixi
n i=1 n i=1 i=1

1 n 0
p
Ω̂ = ∑ ûi û → E(ui u0i ) = Ω
n i=1 i

Page 3 of 10
Homoskedasticity and No Serial Correlation
To apply the usual OLS statistics from the pooled OLS regression across i and t and for pooled
OLS to be relatively efficient, we require that uit be homoskedastic across t and serially uncorre-
lated. The weakest forms of these conditions are the following:
Assumption POLS.3: (a) E(u2it xit xit0 ) = σ2 E(xit xit0 ), t = 1, . . . , T , where σ2 = E(ut2 ) ∀ t; (b)
E(uit uis xit xis0 ) = 0, t 6= s, t, s = 1, . . . , T .
The first part of Assumption POLS.3 is a fairly strong homoskedasticity assumption; sufficient is
E(u2it |xit ) = E(u2it ) = σ2 ∀ t. This means not only that the conditional variance does not depend
on xit , but also that the unconditional variance is the same in every time period. Assumption
POLS3(b) essentially resticts the conditional covariances of the errors across different time periods
to be zero. In fact, since xit almost always contains a constant, POLS.3(b) requires at a minimum
that E(uit uis ) = 0, t 6= s. Sufficient for POLS.3(b) is E(uit uis |xit xis ) = E(uit uis ) = 0, t 6= s, t, s =
1, . . . , T .
It is important to remember that Assumption POLS.3 implies more than just a certain form of
the unconditional variance matrix of ui . Assumption POLS.3 implies E(ui u0i ) = σ2 IT , which means
that the unconditional variances are constant and the unconditional covariances are zero, but it also
effectively restricts the conditional variances and covariances.
−1
If Assumption POLS.3 holds, then AVar(βˆ POLS ) = σ2 [E(xi0 xi )] /n, so its appropriate estimator is
!−1
n T
σ̂2 (X 0 X)−1 = ∑ ∑ xit xit0
i=1t=1

where σ̂2 is the usual OLS variance estimator from the pooled regression yit on xit .

GLS

Identification Assumptions
Assumption SGLS.1: Exit uis = 0 ∀ t, s = 1, . . . , T (Cross equation exogeneity, i.e., strict exogene-
ity).
This assumption is more easily stated using the Kronecker product, E(xi ⊗ ui ) = 0. Typically, at
least one element of xi is unity, so in practice Assumption SGLS.1 implies that E(ui ) = 0.
SGLS.1 is stronger than POLS.1, i.e., SGLS.1 implies POLS.1. This stronger assumption is
needed for GLS to be consistent. Note, GLS is less robust than POLS, but it is more effiicent than
POLS if SGLS.1 holds and we add assumptions on the conditional variance matrix of ui . A suffi-
cient condition for Assumption SGLS.1 is the zero conditional mean assumption, i.e., E(ui |xi ) = 0.
The second moment matrix of ui , which is necessarily constant across i by the random sampling
assumption, plays a critical role for GLS estimation of systems of equations. Define the G × G
positive semidefinite matrix Ω ≡ E(ui u0i ). Because E(ui ) = 0 in the vast majority of applications,

Page 4 of 10
we will refer to Ω as the unconditional variance matrix of ui . Sometimes, an equation must be
dropped to ensure that Ω is nonsingluar. Here, we assume Ω is nonsingular, so Assumption SGLS.1
implies that E(xi0 Ω−1 ui ) = 0
In place of Assumption POLS.2, we assume that a weighted expected outer product of xi is
nonsingular. Here we insert the assumption of a nonsingular variance matrix for completeness.
Assumption SGLS.2: Ω is positive definite and E(xi0 Ω−1 xi ) is nonsingular.

Estimator
1 1
Let Ω = C 2 C 2 0 . (We use Cholesky or Traingular decomposition, which we can do for any
1 1
symmetric and positive semidefinite matrix.) Sincet Ω is invertible, then Ω−1 = C− 2 C− 2 0 . The
usual motivation for the GLS estimator is to transform a system of equations where the error has
nonscalar variance-covariance matrix into a system where the error vector has a scalar variance-
1
covariance matrix. We obtain this by premultiplying the stacked equation by C− 2 , and then we get
1 1 1
ỹi = x̃i β + ũi where ỹi = C− 2 yi , x̃i = C− 2 xi , and ũi = C− 2 ui . Simple algebra shows that E(ũi ũi ) =
IT .
The generalized least squares (GLS) estimator of β is obtained by performing POLS of ỹi on
x̃i .
!−1 ! !−1 !
n n n n −1  0
βˆ GLS = ∑ x̃i0x̃i ∑ x̃i0ỹi ∑ xi0Ω−1xi ∑ xi0Ω−1yi = X 0 (IN ⊗ Ω−1 )X X (IN ⊗ Ω−1 )Y

=
i=1 i=1 i=1 i=1

Asymptotic Properties
Consistency

Since E(xi0 Ω−1 ui ) = 0, then


!−1 !
n n p
βˆ GLS − β = + ∑ xi0Ω−1xi ∑ xi0Ω−1ui −→ A−1 E(xi0 Ω−1 ui ) = 0
i=1 i=1

where A = E(xi0 Ω−1 xi )


 

If we are willing to make the zero conditional mean assumption, βˆ GLS can be shown to be
unbiased conditional on X. Note, consistency fails if we only make Assumption POLS.1. E(xi0 ui ) =
0 does not imply E(xi0 Ω−1 ui ) = 0. If Assumption POLS.1 holds but Assumption SGLS.1 fails, the
transformation equation ỹi = x̃i β + ũi generally induces correlation between x̃i and ũi .

Asymptotic Normality
!−1 !
√ ˆ 1 n 0 −1 1 n
n(βGLS − β) = ∑ xiΩ xi √ ∑ xi0 Ω−1 ui
n i=1 n i=1

Page 5 of 10
By the CLT,
!
1 n d
√ ∑ xi0 Ω−1 ui −→ N(0, B)
n i=1

where B ≡ E(xi0 Ω−1 ui u0i Ω−1 xi )


! !−1
n n
Since √1
n ∑ xi0Ω−1ui = O p (1) and 1
n ∑ xi0Ω−1xi − A−1 = o p (1), we can write
i=1 i=1
!
√ ˆ 1 n 0 −1
n(βGLS − β) = A−1 √ ∑ xi Ω ui + o p (1)
n i=1

√ ˆ a
It follows from the asymptotic equivalence lemma that n(βGLS − β) ∼ N(0, A−1 BA−1 ).
−1 −1
VR = E x̃i x̃i0 E x̃i0 ũi ũ0i x̃i E x̃i x̃i0 ⇐⇒ AVar(βˆ GLS ) = A−1 BA−1/n


SE: Use the robust standard error for POLS of ỹi on x̃i

Feasible Generalized Least Squares (FGLS)

Asymptotic Properties
Obtaining the GLS estimator βˆ GLS requires knowing Ω up to scale. That is, we must be able
to write Ω = σ2C, where C is a known T × T positive definite matrix and σ2 is allowed to be an
unknown constant. Sometimes C is known, but much often it is unknown. Therefore, we now turn
to the analysis of feasible GLS (FGLS) estimation.
In FGLS estimation, we replace the unknown matrix Ω with a consistent estimator. Because the
estimator of Ω appears highly nonlinearly in the expression for the FGLS estimator, deriving finite
sample properties of FGLS is generally difficult. The asymptotic properties of the FGLS estimator
are easily established as n → ∞ because its first-order asymptotic properties are identical to those
of the GLS estimator under Assumptions SGLS.1 and SGLS.2
We initially assume we have a consisten estimator, Ω̂, of Ω: p lim Ω̂ = Ω. When Ω is allowed to
n→∞
be a general positive definite matrix, the following estimation approach can be used. First, obtain
the POLS estimator of β, which we denote β. ˇ We already showed that βˇ is consistent for β under
Assumptions POLS.1 and POLS.2, and therefore under Assumptions SGLS.1 and POLS.2. So, a
natural estimator of Ω is
1 n
Ω̂ ≡ ∑ ǔiǔ0i
n i=1

Page 6 of 10
where ǔi ≡ yi − xi βˇ are the POLS residuals. We can show that this estimator is consistent for Ω
under Assumptions SGLS.1 and SOLS.2 and standed moment conditions. Given Ω̂, the feasible
GLS (FGLS) estimator of β is
" #−1 " #
n n  −1  0
βˆ FGLS = x0 Ω̂−1 xi x0 Ω̂−1 yi = X 0 IN ⊗ Ω̂−1 X X IN ⊗ Ω̂−1 Y
  
∑ i ∑ i
i=1 i=1

We already know that GLS is consitent and asymptotically normal. Because Ω̂ converges to Ω, it
is not surprising that FGLS is consistent,
√ and we can also verify that FGLS has the same limiting
distribution of GLS, i.e., they are n-equivalent. This asymptotic equivalence is important because
we not have to worry that Ω̂ is an estimator when performing asymptotic inference about β using
βˆ FGLS .
In the FGLS context, a consistent estimator of A is
1 n 0 −1
 ≡ ∑ xiΩ̂ xi
n i=1

A consistent estimator of B is also readily available after FGLS estimation. Define the FGLS
residuals by ûi ≡ yi − xi βˆ FGLS . Using standard arguements, a consistent estimator of B is

1 n 0 −1 0 −1
B̂ ≡ ∑ xiΩ̂ ûiûiΩ̂ xi
n i=1

ˆ can be written as Â−1 B̂Â−1/n. This is the extension of the White heteroskedasticity-
The estimator of AVar(β)
robust asymptotic variance estimator, and it is robust under Assumptions SGLS.1 and SGLS.2

System Conditional Homoskedasticity (SCH) Assumption


Under the assumptions so far, FGLS has nothing to offer over POLS, and it is less robust.
However, under an addition assumption, FGLS is asymptotically more efficeint that POLS (and
other estimators).
Assumption SGLS.3: E(ui u0i |xi ) = E(ui u0i ) = Ω
The SCH assumption puts restrictions on the conditional variances and covariances of elements
of ui . If E(ui |xi ) = 0, then this assumption is the same as assuming Var(ui |xi ) = Var(ui ) = Ω.
Another way to state this assumption is B = A, so this simplifies the asymptotic variance.
By the law of iterated expectations, the SCH assumption implies that E(xi0 Ω−1 ui u0i Ω−1 xi ) =
E(xi0 Ω−1 xi ) where ΩT ×T ≡ E(ui u0i ). Note, we only need the weaker condition to determine the
usual variance matrix for FGLS. Under this weaker assumption, along with Assumptions SGLS.1
and SGLS.2, the asymptotic variance of the FGLS estimator is AVar(β) ˆ ≡ A−1/n. We obtain an
ˆ ˆ
estimator of this variance matrix by using our consistent estimator of A, so AVar(β) = Â−1/n. This
is the “usual” formula for the asymptotic variance of FGLS. It is nonrobust in the sense that it relies
on homoskedasticity assumption. If heteroskedasticity in ui is suspected, then the robust estimator,
which was derived earlier, should be used.

Page 7 of 10
Under Assumptions SGLS.1, POLS.2, SGLS.2, and SGLS.3 the FGLS estimator is more effi-
ceint that the POLS estimator. We can actually say much more: FGLS is more efficient than any
other estimtator that uses the orthogonality conditions E(xi ⊗ ui ) = 0.

Summary of the Various System GMM Estimators

Preliminaries
yit = xit0 β + uit
For all t, xit is a K × 1 vector. Suppose we have a Lt × 1 vector of instruments zit , so the number of
instruments can vary with time. The instruments must satisfy E(zit uit ) = 0 for all t. Stacking the
equations over t, we have

yi = xi β + ui

which is the same setup up as in (2), and zi has the structure of (4). Thus, the moment conditions
are given by:

E(z0i [yi − xi β]) = E(z0i ui ) = E(gi )L×1 = 0

The efficient GMM estimator that uses only the moments E(zit uit ) = 0 for all t, is the GMM
estimator with optimal weighting matrix. However, the choice of instrument matrix in (5) means
T
we are only using the moment conditions aggregated across time, ∑t=1 E(zit uit ) = 0. Thus, to
obtain the efficient GMM estimator, the matrix of instruments should be as in (4) because this
expresses the full set of moment conditions.
List of estimators to deal with endogeneity are: System GMM, 3SLS, S2SLS, P2SLS, SIV, PIV,
HT.

1. GMM Estimator
!0 !
n n
βˆ GMM = arg min 1
n ∑ z0iui Ŵ 1
n ∑ z0iui
β i=1 i=1
" ! !#−1 " ! !#
n n n n
= ∑ xi0zi Ŵ ∑ z0ixi ∑ xi0 zi Ŵ ∑ z0iyi
i=1 i=1 i=1 i=1

To obtain the optimal GMM esitmator, we choose Ŵ such that lim Ŵ = W ≡ [E(z0i ui u0i zi )]−1 . Thus,
n→∞
the weighting matrix for the optimal GMM estimator is
!−1
1 n 0 0
Ŵ = ∑ ziuiuizi
n i=1

Page 8 of 10
2. 3SLS
The weighting matrix used by the 3SLS estimator is
!−1
1 n 0
Ŵ = ∑ ziΩ̂zi
n i=1
n
where Ω̂ = 1
n ∑ ûiû0i.
i=1
The procedure on how to obtain the 3SLS estimator is:
• First two stages: Run P2SLS to get ûi
• Third Stage: Obtain Ŵ and perform system GMM estimation.
The 3SLS estimator is efficient under the conditional homoskedasiticy assumption: E(ui u0i |zi ) =
E(ui u0i ) ≡ Ω

3. S2SLS
The weighting matrix used by the S2SLS estimator is
!−1
1 n 0
Ŵ = ∑ zizi
n i=1
The S2SLS estimator is efficient under the conditional homoskedasticity assumption and when Ω
is spherical, i.e., Ω = σ2 IT .

4. P2SLS
If Lt is the same for all t, i.e., Lt = L for all t, then zi has the structure of (5). The P2SLS estimator
exploits the orthogonality condition
E(z0i ui ) = E(zi1 ui1 + · · · + ziT uiT ) = 0
and the conditional homoskedasticity assumption. So, when zi has the structure of (5), the weight-
ing matrix used by the P2SLS estimator is
!−1
1 n T
Ŵ = ∑ ∑ zit z0it
n i=1t=1
and the P2SLS estimator is given by
 ! !−1 !−1 ! !−1 !
n T n T n T n T n T n T
βˆ =  ∑ ∑ xit z0it ∑ ∑ zit z0it ∑ ∑ zit xit0  ∑ ∑ xit z0it ∑ ∑ zit z0it ∑ ∑ zit yit
i=1t=1 i=1t=1 i=1t=1 i=1t=1 i=1t=1 i=1t=1

The P2SLS is efficient under the conditional homoskedasticity assumption. Note, when zit = xit ,
this estimator reduces to the POLS estimator.

Page 9 of 10
5. SIV
If zi has the structure of (4) and L = K, then we have exactly enough IVs for the explanatory
variables in the system. Thus, the SIV estimator is given by
!−1 !
N N
βˆ = N −1 ∑ z0i xi N −1 ∑ z0i yi
i=1 i=1

6. PIV
If zi has the structure of (5) and L = K, then we have exactly enough IVs for the explanatory
variables in the system. Thus, the pooled instrumental variables (PIV) estimator is given by
!−1 !
n T n T
βˆ = ∑ ∑ zit xit0 ∑ ∑ zit yit
i=1t=1 i=1t=1

Note, when zit = xit , this estimator reduces to the POLS estimator.

Page 10 of 10

You might also like