Professional Documents
Culture Documents
Quasi Maximum Likelihood Theory - Lecture Notes
Quasi Maximum Likelihood Theory - Lecture Notes
Quasi Maximum Likelihood Theory - Lecture Notes
CHUNG-MING KUAN
Department of Finance & CRETA
C.-M. Kuan (National Taiwan Univ.) Quasi Maximum Likelihood Theory December 20, 2010 1 / 119
Lecture Outline
C.-M. Kuan (National Taiwan Univ.) Quasi Maximum Likelihood Theory December 20, 2010 2 / 119
Lecture Outline (cont’d)
C.-M. Kuan (National Taiwan Univ.) Quasi Maximum Likelihood Theory December 20, 2010 3 / 119
Quasi-Maximum Likelihood (QML) Theory
C.-M. Kuan (National Taiwan Univ.) Quasi Maximum Likelihood Theory December 20, 2010 4 / 119
Kullback-Leibler Information Criterion (KLIC)
Given IP(A) = p, the message that A will surely occur would be more
(less) valuable or more (less) surprising when p is small (large).
The information content of the message above ought to be a
decreasing function of p. An information function is
ι(p) = log(1/p),
C.-M. Kuan (National Taiwan Univ.) Quasi Maximum Likelihood Theory December 20, 2010 5 / 119
The information that IP(A) changes from p to q would be useful
when p and q are very different. The information content is
Remark: The KLIC measures the “closeness” between f and g , but is not
a metric because it is not reflexive in general, i.e., II(g : f ) 6= II(f : g ), and
does not obey the triangle inequality.
C.-M. Kuan (National Taiwan Univ.) Quasi Maximum Likelihood Theory December 20, 2010 7 / 119
Let zt = (yt w0t )0 be ν × 1 and zt = {z1 , z2 , . . . , zt }.
C.-M. Kuan (National Taiwan Univ.) Quasi Maximum Likelihood Theory December 20, 2010 8 / 119
For a sample of T obs, the average of T individual KLICs is:
T
1 X
ĪIT ({gt : ft }; θ) := II(gt : ft ; θ)
T
t=1
T
1 X
= IE[log gt (yt | xt )] − IE[log ft (yt | xt ; θ)] .
T
t=1
C.-M. Kuan (National Taiwan Univ.) Quasi Maximum Likelihood Theory December 20, 2010 9 / 119
If there exists a θ o ∈ Θ such that ft (yt | xt ; θ o ) = gt (yt | xt ) for all t,
we say that {ft } is correctly specified for {yt | xt }. In this case,
II(gt : ft ; θ o ) = 0, so that ĪIT ({gt : ft }; θ) is minimized at θ ∗ = θ o .
Maximizing the sample counterpart of L̄T (θ):
T
T T 1 X
LT (y , x ; θ) := log ft (yt | xt ; θ),
T
t=1
C.-M. Kuan (National Taiwan Univ.) Quasi Maximum Likelihood Theory December 20, 2010 10 / 119
Concentrating on certain conditional attribute of yt , we have, for
example, yt | xt ∼ N µt (xt ; β), σ 2 , or more generally,
yt | xt ∼ N µt (xt ; β), h(xt ; α) .
maximizer of T −1 T
P
t=1 log ft (yt | xt ; θ) also solves
T
1 X
min [yt − µt (xt ; β)]0 [yt − µt (xt ; β)] .
β T
t=1
C.-M. Kuan (National Taiwan Univ.) Quasi Maximum Likelihood Theory December 20, 2010 11 / 119
Asymptotic Properties of the QMLE
C.-M. Kuan (National Taiwan Univ.) Quasi Maximum Likelihood Theory December 20, 2010 12 / 119
Asymptotic Normality
where the left-hand side is zero because θ̃ T must satisfy the first order
condition.
C.-M. Kuan (National Taiwan Univ.) Quasi Maximum Likelihood Theory December 20, 2010 13 / 119
When HT (θ ∗ ) is nonsingular,
√ √
T (θ̃ T − θ ∗ ) = −HT (θ ∗ )−1 T ∇LT (y T , xT ; θ ∗ ) + oIP (1).
√
Let BT (θ) = var T ∇LT (y T , xT ; θ) be the information matrix. When
C.-M. Kuan (National Taiwan Univ.) Quasi Maximum Likelihood Theory December 20, 2010 14 / 119
√
This shows that T (θ̃ T − θ ∗ ) is asymptotically equivalent to
√
−HT (θ ∗ )−1 BT (θ ∗ )1/2 BT (θ ∗ )−1/2 T ∇LT (y T , xT ; θ ∗ ) ,
C.-M. Kuan (National Taiwan Univ.) Quasi Maximum Likelihood Theory December 20, 2010 15 / 119
Information Matrix Equality
HT (θ o ) + BT (θ o ) = 0.
st (yt , xt ; θ) = ∇ log ft (yt |xt ; θ) = ft (yt |xt ; θ)−1 ∇ft (yt |xt ; θ),
C.-M. Kuan (National Taiwan Univ.) Quasi Maximum Likelihood Theory December 20, 2010 16 / 119
By permitting interchange of differentiation and integration,
Z Z
st (yt , xt ; θ)ft (yt |xt ; θ) dyt = ∇ ft (yt |xt ; θ) dyt = 0.
R R
If {ft } is correctly specified for {yt |xt }, IE[st (yt , xt ; θ o )|xt ] = 0, where the
conditional expectation is taken with respect to gt (yt |xt ) = ft (yt |xt ; θ o ).
Thus, mean score is zero: IE[st (yt , xt ; θ o )] = 0.
= 0.
C.-M. Kuan (National Taiwan Univ.) Quasi Maximum Likelihood Theory December 20, 2010 17 / 119
We have shown
It follows that
T T
1 X 1 X
IE[∇st (yt , xt ; θ o )] + IE[st (yt , xt ; θ o )st (yt , xt ; θ o )0 ]
T T
t=1 t=1
T
1 X
= HT (θ o ) + IE[st (yt , xt ; θ o )st (yt , xt ; θ o )0 ]
T
t=1
= 0.
C.-M. Kuan (National Taiwan Univ.) Quasi Maximum Likelihood Theory December 20, 2010 18 / 119
By definition, the information matrix is
" T ! T !#
1 X X
0
BT (θ o ) = IE st (yt , xt ; θ o ) st (yt , xt ; θ o )
T
t=1 t=1
T
1 X
= IE[st (yt , xt ; θ o )st (yt , xt ; θ o )0 ]
T
t=1
T −1 T
1 X X
+ IE[st−τ (yt−τ , xt−τ ; θ o )st (yt , xt ; θ o )0 ]
T
τ =1 t=τ +1
T −1 T
1 X X
+ IE[st (yt , xt ; θ o )st+τ (yt+τ , xt+τ ; θ o )0 ].
T
τ =1 t=τ +1
That is, the information matrix involves the variances and autocovariances
of individual score functions.
C.-M. Kuan (National Taiwan Univ.) Quasi Maximum Likelihood Theory December 20, 2010 19 / 119
A specification of {yt |xt } is said to have dynamic misspecification if it is
not correctly specified for {yt |wt , zt−1 }; that is, there does not exist any
θ o such that ft (yt |xt ; θ o ) = gt (yt |wt , zt−1 ).
C.-M. Kuan (National Taiwan Univ.) Quasi Maximum Likelihood Theory December 20, 2010 20 / 119
Theorem 9.3
Suppose that there exists a θ o such that ft (yt |xt ; θ o ) = gt (yt |xt ) and
there is no dynamic misspecification. Then,
HT (θ o ) + BT (θ o ) = 0,
PT
where HT (θ o ) = T −1 t=1 IE[∇st (yt , xt ; θ o )] and
T
1 X
BT (θ o ) = IE[st (yt , xt ; θ o )st (yt , xt ; θ o )0 ].
T
t=1
√ D
When Theorem 9.3 holds, BT (θ o )1/2 T (θ̃ T − θ o ) −→ N (0, Ik ). That is,
the QMLE is asymptotically efficient because it achieves the Cramér-Rao
lower bound asymptotically.
C.-M. Kuan (National Taiwan Univ.) Quasi Maximum Likelihood Theory December 20, 2010 21 / 119
Example: Assume: yt |xt ∼ N (x0t β, σ 2 ) for all t, so that
T
1 1 1 X (yt − x0t β)2
LT (y T , xT ; θ) = − log(2π) − log(σ 2 ) − .
2 2 T 2σ 2
t=1
C.-M. Kuan (National Taiwan Univ.) Quasi Maximum Likelihood Theory December 20, 2010 22 / 119
If the specification above is correct for {yt |xt }, there exists θ o = (β 0o σo2 )0
such that yt |xt ∼ N (x0t β o , σo2 ). Taking expectation with respect to the
true distribution,
IE[xt (yt − x0t β)] = IE[xt (IE(yt |xt ) − x0t β)] = IE(xt x0t )(β o − β),
C.-M. Kuan (National Taiwan Univ.) Quasi Maximum Likelihood Theory December 20, 2010 23 / 119
The results above show that
and
0
1 XT − IE(xσt2xt ) 0
o
HT (θ o ) = .
T
t=1 00 − 2(σ12 )2
o
C.-M. Kuan (National Taiwan Univ.) Quasi Maximum Likelihood Theory December 20, 2010 24 / 119
Without dynamic misspecification, the information matrix BT (θ) is
(yt −x0t β)2 xt x0t 0 0 3
t −xt β) t −xt β)
T
1 X (σ 2 )2
− xt (y
2(σ 2 )2
+ xt (y2(σ 2 )3
IE (yt −x0t β)x0t (yt −x0t β)3 x0t (yt −x0t β)2 (yt −x0t β)4
.
T − + 1
2 2 − +
t=1 2 2 2(σ ) 2 32(σ ) 4(σ ) 2 3 2(σ ) 2 4
4(σ )
IE[(yt − x0t β)3 ] = 3σo2 IE[(x0t β o − x0t β)] + IE[(x0t β o − x0t β)3 ],
IE[(yt − x0t β)4 ] = 3(σo2 )2 + 6σo2 IE[(x0t β o − x0t β)2 ] + IE[(x0t β o − x0t β)4 ],
C.-M. Kuan (National Taiwan Univ.) Quasi Maximum Likelihood Theory December 20, 2010 25 / 119
It is now easily seen that the information matrix equality holds because
IE(xt x0t )
T 0
1 X σo2
BT (θ o ) = .
T
t=1 00 1
2
2(σ )2
o
C.-M. Kuan (National Taiwan Univ.) Quasi Maximum Likelihood Theory December 20, 2010 26 / 119
Example: The specification is yt |xt ∼ N (x0t β, σ 2 ), but the true conditional
behavior is yt |xt ∼ N x0t β o , h(xt , αo ) . That is, our specification is
correct only for the conditional mean. Due to misspecification, the KLIC
minimizer is θ ∗ = (β 0o (σ ∗ )2 )0 . Then,
0
1 XT − IE(x
(σ )∗
t xt )
2 0
HT (θ ∗ ) =
IE[h(xt ,αo )]
,
T
t=1 00 1
∗
2(σ ) 4 − ∗
(σ ) 6
and
IE[h(xt ,αo )xt x0t ]
T 0
1 X (σ ∗ )4
BT (θ ∗ ) =
IE[h(xt ,αo )] 3 IE[h(xt ,αo )2 ]
.
T 0 0 1
− +
t=1 4(σ ∗ )4 2(σ ∗ )6 4(σ ∗ )8
The information matrix equality does not hold here, despite that the
conditional mean function is specified correctly.
C.-M. Kuan (National Taiwan Univ.) Quasi Maximum Likelihood Theory December 20, 2010 27 / 119
The upper-left block of He is − PT x x0 /(T σ̃ 2 ), which remains a
T t=1 t t T
consistent estimator of the corresponding block in HT (θ ∗ ). The
information matrix BT (θ ∗ ) can be consistently estimated by a
block-diagonal matrix with the upper-left block:
PT 2 0
t=1 êt xt xt
.
T (σ̃T2 )2
T
!−1 T
! T
!−1
1 X 1 X 2 0 1 X
xt x0t êt xt xt xt x0t .
T T T
t=1 t=1 t=1
C.-M. Kuan (National Taiwan Univ.) Quasi Maximum Likelihood Theory December 20, 2010 28 / 119
Wald Test
D
WT = T (Rθ̃ T − r)0 (RC
e R0 )−1 (Rθ̃ − r) −→ χ2 (q).
T T
C.-M. Kuan (National Taiwan Univ.) Quasi Maximum Likelihood Theory December 20, 2010 29 / 119
Example: Specification:yt |xt ∼ N (x0t β, σ 2 ). Writing θ = (σ 2 β 0 )0 and
β = (b01 b02 )0 , where b1 is (k − s) × 1, and b2 is s × 1. We are interested in
the hypothesis b∗2 = Rθ ∗ = 0, where R = [0 R1 ] and R1 = [0 Is ] is s × k.
e −1 so that
When the information matrix equality holds, C̃T = −H T
C.-M. Kuan (National Taiwan Univ.) Quasi Maximum Likelihood Theory December 20, 2010 30 / 119
LM (Score) Test
LT (θ) + θ 0 R0 λ,
C.-M. Kuan (National Taiwan Univ.) Quasi Maximum Likelihood Theory December 20, 2010 31 / 119
Recall from the discussion of the Wald test that
√ √
T (θ̃ T − θ ∗ ) = −HT (θ ∗ )−1 T ∇LT (θ ∗ ) + oIP (1),
we obtain
√ √
0 = HT (θ ∗ )−1 T ∇LT (θ ∗ ) − HT (θ ∗ )−1 ∇2 LT (θ †T ) T (θ̈ T − θ ∗ )
√
− HT (θ ∗ )−1 T R0 λ̈T
√ √ √
= T (θ̃ T − θ ∗ ) − T (θ̈ T − θ ∗ ) − HT (θ ∗ )−1 R0 T λ̈T + oIP (1).
C.-M. Kuan (National Taiwan Univ.) Quasi Maximum Likelihood Theory December 20, 2010 32 / 119
For ΛT (θ ∗ ) = [RHT (θ ∗ )−1 R0 ]−1 RCT (θ ∗ )R0 [RHT (θ ∗ )−1 R0 ]−1 , the
normalized Lagrangian multiplier is
√ √
ΛT (θ ∗ )−1/2 T λ̈T = ΛT (θ ∗ )−1/2 [RHT (θ ∗ )−1 R0 ]−1 R T (θ̃ T − θ ∗ )
D
−→ N (0, Iq ).
It follows that
−1/2 √ D
Λ̈T T λ̈T −→ N (0, Iq ).
−1 −1
where Λ̈T = (RḦT R0 )−1 RC̈T R0 (RḦT R0 )−1 , and ḦT and C̈T are
consistent estimators based on the constrained QMLE θ̈ T . The LM test is
0 −1 D
LMT = T λ̈T Λ̈T λ̈T −→ χ2 (q).
C.-M. Kuan (National Taiwan Univ.) Quasi Maximum Likelihood Theory December 20, 2010 33 / 119
In the light of the saddle-point condition: ∇LT (θ̈ T ) + R0 λ̈T = 0,
0 −1 −1
LMT = T λ̈T RḦT R0 (RC̈T R0 )−1 RḦT R0 λ̈T
−1 −1
= T [∇LT (θ̈ T )]0 ḦT R0 (RC̈T R0 )−1 RḦT [∇LT (θ̈ T )],
The LM test is also known as the score test in the statistics literature.
C.-M. Kuan (National Taiwan Univ.) Quasi Maximum Likelihood Theory December 20, 2010 34 / 119
Example: Specification: yt |xt ∼ N (x0t β, σ 2 ). Let θ = (σ 2 β 0 )0 and
β = (b01 b02 )0 , where b1 is (k − s) × 1, and b2 is s × 1. The null hypothesis
is b∗2 = Rθ ∗ = 0, where R = [0 R1 ] is s × (k + 1) and R1 = [0 Is ] is s × k.
From the saddle-point condition, ∇LT (θ̈ T ) = −R0 λ̈T , and hence
∇σ2 LT (θ̈ T ) 0
∇LT (θ̈ T ) =
∇b1 LT (θ̈ T ) =
0 .
∇b2 LT (θ̈ T ) −λ̈T
C.-M. Kuan (National Taiwan Univ.) Quasi Maximum Likelihood Theory December 20, 2010 35 / 119
The LM test statistic now reads:
0
0 0
−1 0
ḦT R (RC̈T R0 )−1 RḦ−1
T 0 T
0 ,
X20 ¨/(T σ̈T2 ) X20 ¨/(T σ̈T2 )
C.-M. Kuan (National Taiwan Univ.) Quasi Maximum Likelihood Theory December 20, 2010 36 / 119
Example (Breusch-Pagan Test):
Let h : R → (0, ∞) be a differentiable function. Consider the specification:
Pp
where ζ 0t α = α0 + i=1 ζti αi . Under this specification,
T
1 1 X
LT (y T , xT , ζ T ; θ) = − log(2π) − log h(ζ 0t α)
2 2T
t=1
T
1 X (yt − x0t β)2
− .
T
t=1
2h(ζ 0t α)
C.-M. Kuan (National Taiwan Univ.) Quasi Maximum Likelihood Theory December 20, 2010 37 / 119
For the LM test, the corresponding score vector is
T
1 X h1 (ζ 0t α)ζ t (yt − x0t β)2
T T T
∇α LT (y , x , ζ ; θ) = −1 ,
T
t=1
2h(ζ 0t α) h(ζ 0t α)
C.-M. Kuan (National Taiwan Univ.) Quasi Maximum Likelihood Theory December 20, 2010 38 / 119
It can be shown that the (p + 1) × (p + 1) block of the Hessian matrix
corresponding to α is
T
1 X −(yt − x0t β)2
1
3 (ζ 0 α)
+ 2 0 [h1 (ζ 0 α)]2 ζ t ζ 0t
T h t 2h (ζ t α)
t=1
where h2 (η) = dh1 (η)/ dη. Evaluating the expectation of this block at
θ o = (β 0o α0 00 )0 and noting that σo2 = h(α0 ), we have
T
!
c2
1 X
− IE(ζ t ζ 0t ) .
2[(σo2 )2 T
t=1
C.-M. Kuan (National Taiwan Univ.) Quasi Maximum Likelihood Theory December 20, 2010 39 / 119
This block of the expected Hessian matrix can be consistently estimated by
T
!
c2
1 X
− (ζ t ζ 0t ) .
2(σ̈T2 )2 T
t=1
T
! T
!−1 T
!
1 X X X D
LMT = dt ζ 0t ζ t ζ 0t ζ t dt −→ χ2 (p),
2
t=1 t=1 t=1
C.-M. Kuan (National Taiwan Univ.) Quasi Maximum Likelihood Theory December 20, 2010 40 / 119
Remarks:
In the Breusch-Pagan test, the function h does not show up in the
statistic. As such, this test is capable of testing general conditional
heteroskedasticity without specifying a functional form of h.
When ζ t contains the squares and all cross-product terms of the
non-constant elements of xt : xit2 and xit xjt (let there be n of such
terms), the resulting Breusch-Pagan test is also the test of
heteroskedasticity of unknown form due to White (1980) and has the
limiting distribution χ2 (n).
The Breusch-Pagan test is valid when the information matrix equality
holds. Thus, the Breusch-Pagan test is not robust to dynamic
misspecification, e.g., when the errors are serially correlated.
C.-M. Kuan (National Taiwan Univ.) Quasi Maximum Likelihood Theory December 20, 2010 41 / 119
PT IP
Koenker (1981): As T −1 4
t=1 êt −→ 3(σo2 )2 under the null,
T T T
1 X 2 1 X êt4 2 X êt2 IP
dt = 2 2
− 2
+ 1 −→ 2.
T T (σ̈T ) T σ̈
t=1 t=1 t=1 T
T
! T
!−1 T
!, T
X X X X
LMT = T dt ζ 0t ζ t ζ 0t ζ t dt dt2 ,
t=1 t=1 t=1 t=1
C.-M. Kuan (National Taiwan Univ.) Quasi Maximum Likelihood Theory December 20, 2010 42 / 119
Example (Breusch-Godfrey Test):
Given the specification yt |xt ∼ N (x0t β, σ 2 ), we are interested in testing if
yt − x0t β are serially uncorrelated. For the AR(1) error:
with |ρ| < 1 and {ut } a white noise. The null hypothesis is ρ∗ = 0.
Consider a general specification that admits serial correlations:
C.-M. Kuan (National Taiwan Univ.) Quasi Maximum Likelihood Theory December 20, 2010 43 / 119
When the information matrix equality holds, the LM test can be computed
as TR 2 , where R 2 is from the regression of êt = yt − x0t β̂ T on xt and
yt−1 − x0t−1 β. Replacing β with its constrained estimator β̂ T , we can also
obtain R 2 from the regression of êt on xt and êt−1 . This test has the
limiting χ2 (1) distribution under the null.
C.-M. Kuan (National Taiwan Univ.) Quasi Maximum Likelihood Theory December 20, 2010 44 / 119
If the specification is yt − x0t β = ut + αut−1 , i.e., the errors follow an
MA(1) process, we can write
C.-M. Kuan (National Taiwan Univ.) Quasi Maximum Likelihood Theory December 20, 2010 45 / 119
Likelihood Ratio Test
C.-M. Kuan (National Taiwan Univ.) Quasi Maximum Likelihood Theory December 20, 2010 46 / 119
By Taylor expansion of LT (θ̈ T ) about θ̃ T , we have
LRT = −2T LT (θ̈ T ) − LT (θ̃ T )
The first term on the RHS would have an asymptotic χ2 (q) distribution
provided that the information matrix equality holds. (Why?)
C.-M. Kuan (National Taiwan Univ.) Quasi Maximum Likelihood Theory December 20, 2010 47 / 119
Remarks:
C.-M. Kuan (National Taiwan Univ.) Quasi Maximum Likelihood Theory December 20, 2010 48 / 119
Testing Non-Nested Models
H0 : yt |xt , ξ t ∼ N (x0t β, σ 2 ), β ∈ B ⊆ Rk ,
H1 : yt |xt , ξ t ∼ N (ξ 0t δ, σ 2 ), δ ∈ D ⊆ Rr ,
C.-M. Kuan (National Taiwan Univ.) Quasi Maximum Likelihood Theory December 20, 2010 50 / 119
Under the null: yt |xt , ξ t ∼ N (x0t β o , σo2 ), IE(ξ t yt ) = IE(ξ t x0t )β o , and hence
T
!−1 T
!
1 X 1 X IP
δ̂ T = ξ t ξ 0t ξ t yt −→ M−1
ξξ Mξx β o ,
T T
t=1 t=1
where
T T
1 X 1 X
Mξξ = lim IE(ξ t ξ 0t ), Mξx = lim IE(ξ t x0t ).
T →∞ T T →∞ T
t=1 t=1
Clearly, the pseudo-true parameter δ(β o ) = M−1 ξξ Mξx β o would not be the
0
probability limit of δ̂ T if xt β is an incorrect specification of the conditional
mean. Thus, whether δ̂ T and the sample counterpart of δ(β o ) is
sufficiently close to zero constitutes an evidence for or against the null
hypothesis.
C.-M. Kuan (National Taiwan Univ.) Quasi Maximum Likelihood Theory December 20, 2010 51 / 119
The WET is then based on:
T
!−1 T
!
1 X 1 X
δ̂ T − δ̂(β̂ T ) = δ̂ T − ξ t ξ 0t ξ t x0t β̂ T
T T
t=1 t=1
T
!−1 T
!
1 X 1 X
= ξ t ξ 0t 0
ξ t (yt − xt β̂ T ) ,
T T
t=1 t=1
which is
T T
! T
!−1 T
!
1 X 1 X 1 X 1 X
√ ξ t t − ξ t x0t xt x0t √ xt t .
T t=1 T T T t=1
t=1 t=1
C.-M. Kuan (National Taiwan Univ.) Quasi Maximum Likelihood Theory December 20, 2010 52 / 119
Setting ξ̂ t = Mξx M−1
xx xt , we can write
T T
1 X 1 X
√ ξ t êt = √ (ξ t − ξ̂ t )t + oIP (1).
T t=1 T t=1
where
T
!
1 X
σo2 IE (ξ t − ξ̂ t )(ξ t − ξ̂ t )0
Σo = lim
T →∞ T
t=1
C.-M. Kuan (National Taiwan Univ.) Quasi Maximum Likelihood Theory December 20, 2010 53 / 119
Consequently,
D
T 1/2 [δ̂ T − δ̂(β̂ T )] −→ N 0, M−1 −1
ξξ Σo Mξξ ,
and hence
0 D 2
T δ̂ T − δ̂(β̂ T ) Mξξ Σ−1
o Mξξ δ̂ T − δ̂(β̂ T ) −→ χ (r ).
T
! T
!−1 T
!
1 X 1 X 1 X
− ξ t x0t xt x0t xt ξ 0t .
T T T
t=1 t=1 t=1
C.-M. Kuan (National Taiwan Univ.) Quasi Maximum Likelihood Theory December 20, 2010 54 / 119
The WET statistic reads
0
WE T = T δ̂ T − δ̂(β̂ T )
T T
! !
1 X b −1 1 X
ξ t ξ 0t ξ t ξ 0t
Σ T δ̂ T − δ̂(β̂ T )
T T
t=1 t=1
D
−→ χ2 (r ).
C.-M. Kuan (National Taiwan Univ.) Quasi Maximum Likelihood Theory December 20, 2010 55 / 119
Score Encompassing Test
Under H1 : yt |xt , ξ t ∼ N (ξ 0t δ, σ 2 ), the score function evaluated at the
pseudo-true parameter δ(β o ) is (apart from a constant σ −2 )
T T
1 X 1 X
ξ t [yt − ξ 0t δ(β o )] = ξ t yt − ξ 0t M−1
ξξ Mξx β o .
T T
t=1 t=1
T
1 X
= ξ t êt .
T
t=1
C.-M. Kuan (National Taiwan Univ.) Quasi Maximum Likelihood Theory December 20, 2010 56 / 119
The score encompassing test (SET) is based on
T
1 X D
√ ξ t êt −→ N (0, Σo ),
T t=1
The WET and SET are based on the same ingredient and both
require evaluating the pseudo-true parameter.
The WET and SET are difficult to implement when the pseudo-true
parameter is not readily derived; this may happen when, e.g., the
QMLE does not have an analytic form. (Examples?)
C.-M. Kuan (National Taiwan Univ.) Quasi Maximum Likelihood Theory December 20, 2010 57 / 119
Pseudo-True Score Encompassing Test
Chen and Kuan (2002) propose the pseudo-true score encompassing (PSE)
test which is based on the pseudo-true value of the score function under
the alternative. Although it may be difficult to evaluate the pseudo-true
value of a QMLE when it does not have a closed form, it would be easier
to evaluate the pseudo-true score because the analytic from of the score
function is usually available.
C.-M. Kuan (National Taiwan Univ.) Quasi Maximum Likelihood Theory December 20, 2010 58 / 119
Let sf ,t (θ) = ∇ log f (yt |xt ; θ), sϕ,t (ψ) = ∇ log ϕ(yt |ξ t ; ψ), and
T T
1 X 1 X
∇Lf ,T (θ) = sf ,t (θ), ∇Lϕ,T (ψ) = sϕ,t (ψ).
T T
t=1 t=1
As the pseudo-true parameter ψ(θ o ) is the KLIC minimizer when the null
hypothesis is specified correctly, we have
Jϕ (θ o , ψ(θ o )) = 0.
C.-M. Kuan (National Taiwan Univ.) Quasi Maximum Likelihood Theory December 20, 2010 59 / 119
Following Wooldridge (1990), we can incorporate the null model into the
the score function and write
T T
1 X 1 X
∇Lϕ,T (θ, ψ) = d1,t (θ, ψ) + d2,t (θ, ψ)ct (θ),
T T
t=1 t=1
T
1 X
Jϕ (θ, ψ) = lim IEf (θ) d1,t (θ, ψ) ,
T →∞ T
t=1
C.-M. Kuan (National Taiwan Univ.) Quasi Maximum Likelihood Theory December 20, 2010 60 / 119
For non-nested, linear specifications for the conditional mean, we can
incorporate the null model (yt = x0t β + εt ) into the score and obtain
T T
1 X 1 X
∇δ Lϕ,T (θ, ψ) = ξ t (x0t β − ξ 0t δ) + ξ t εt
T T
t=1 t=1
with d1,t (θ, ψ) = ξ t (x0t β − ξ 0t δ), d2,t (θ, ψ) = ξ t , and ct (θ) = εt . Then,
T T
1 X 1 X
ξ t x0t β̂ T − ξ 0t δ̂ T = −
Jϕ θ̂ T , ψ̂ T =
b ξ t êt ,
T T
t=1 t=1
C.-M. Kuan (National Taiwan Univ.) Quasi Maximum Likelihood Theory December 20, 2010 61 / 119
For nonlinear specifications, it can be seen that
T
1 X
Jϕ θ̂ T , ψ̂ T =
b ∇δ µ ξ t , δ̂ T m xt , β̂ T − µ ξ t , δ̂ T
T
t=1
T
1 X
=− ∇δ µ ξ t , δ̂ T êt ,
T
t=1
with êt = yt − m(xt , β̂ T ) the nonlinear OLS residuals. Yet, it is not easy
to compute the SET here because evaluating the pseudo-true value of δ̂ T
is a formidable task.
C.-M. Kuan (National Taiwan Univ.) Quasi Maximum Likelihood Theory December 20, 2010 62 / 119
The linear expansion of T 1/2 b
Jϕ θ̂ T , ψ̂ T about (θ o , ψ(θ o )) is
√ T √
1 X
Jϕ θ̂ T , ψ̂ T ≈ − √
Tb d2,t (θ o , ψ(θ o ))ct (θ o )−Ao T (θ̂ T −θ o ),
T t=1
where Ao = limT →∞ T −1 T
P
t=1 IEf (θ o ) d2,t (θ o , ψ(θ o ))∇θ ct (θ o ) . Note
that the other terms in the expansion that involve ct would vanish in the
limit because they have zero mean. Recall also that
√ T
1 X
T (θ̂ T − θ o ) = −HT (θ o )−1 √ sf ,t (θ o ) + oIP (1),
T t=1
C.-M. Kuan (National Taiwan Univ.) Quasi Maximum Likelihood Theory December 20, 2010 63 / 119
Collecting terms we have
√ T
1 X
Jϕ θ̂ T , ψ̂ T = − √
Tb bt (θ o , ψ(θ o )) + oIP (1),
T t=1
C.-M. Kuan (National Taiwan Univ.) Quasi Maximum Likelihood Theory December 20, 2010 64 / 119
It follows that the PSE test is
0 − D 2
PSE T = T b
Jϕ θ̂ T , ψ̂ T Σ T Jϕ θ̂ T , ψ̂ T −→ χ (k),
b b
C.-M. Kuan (National Taiwan Univ.) Quasi Maximum Likelihood Theory December 20, 2010 65 / 119
Example: Consider non-nested specifications of conditional variance:
H0 : yt |xt , ξ t ∼ N 0, h(xt , α) ,
H1 : yt |xt , ξ t ∼ N 0, κ(ξ t , γ) .
∇α ht 2
sh,t (α) = (y − ht ),
2ht2 t
∇γ κt 2 ∇ γ κt ∇γ κt 2
sκ,t (γ) = (yt − κt ) = (ht − κt ) + (y − h ),
2κt2 2
2κt 2κ2t | t {z t }
| {z } | {z } ct
d1,t d2,t
C.-M. Kuan (National Taiwan Univ.) Quasi Maximum Likelihood Theory December 20, 2010 66 / 119
The sample counterpart of the pseudo-true score function is thus
T
1 X ∇γ κ̂t 2
(yt − ĥt ).
T
t=1
2κ̂2t
C.-M. Kuan (National Taiwan Univ.) Quasi Maximum Likelihood Theory December 20, 2010 67 / 119
Binary Choice Models
Consider the binary dependent variable:
(
1, with probability IP(yt = 1|xt ),
yt =
0, with probability 1 − IP(yt = 1|xt ).
The density function of yt given xt is:
C.-M. Kuan (National Taiwan Univ.) Quasi Maximum Likelihood Theory December 20, 2010 68 / 119
Probit model:
Z x0t θ
1 2
F (xt ; θ) = Φ(x0t θ) = √ e −v /2 dv ,
−∞ 2π
where Φ denotes the standard normal distribution function.
Logit model:
1 exp(x0t θ)
F (xt ; θ) = G (x0t θ) = = ,
1 + exp(−x0t θ) 1 + exp(x0t θ)
C.-M. Kuan (National Taiwan Univ.) Quasi Maximum Likelihood Theory December 20, 2010 69 / 119
Global Concavity
+ (θ − θ̃ T )0 ∇2θ LT (θ † )(θ − θ̃ T )
C.-M. Kuan (National Taiwan Univ.) Quasi Maximum Likelihood Theory December 20, 2010 70 / 119
For the logit model, as G 0 (u) = G (u)[1 − G (u)], we have
T
G 0 (x0t θ) G 0 (x0t θ)
1 X
∇θ LT (θ) = yt − (1 − yt ) x
T G (x0t θ) 1 − G (x0t θ) t
t=1
T
1 X
yt [1 − G (x0t θ)] − (1 − yt )G (x0t θ) xt
=
T
t=1
T
1 X
= [yt − G (x0t θ)]xt ,
T
t=1
C.-M. Kuan (National Taiwan Univ.) Quasi Maximum Likelihood Theory December 20, 2010 71 / 119
For the probit model,
T
φ(x0t θ) φ(x0t θ)
1 X
∇θ LT (θ) = yt − (1 − yt ) x
T Φ(x0t θ) 1 − Φ(x0t θ) t
t=1
T
1 X yt − Φ(x0t θ)
= φ(x0t θ)xt ,
T Φ(x0t θ)[1 − Φ(x0t θ)]
t=1
which is also negative definite; see e.g., Amemiya (1985, pp. 273–274).
C.-M. Kuan (National Taiwan Univ.) Quasi Maximum Likelihood Theory December 20, 2010 72 / 119
As IE(yt | xt ) = IP(yt = 1 | xt ), we can write
yt = F (xt ; θ) + et .
Thus, the probit and logit models are also different nonlinear mean
specifications with conditional heteroskedasticity.
C.-M. Kuan (National Taiwan Univ.) Quasi Maximum Likelihood Theory December 20, 2010 73 / 119
Marginal response: For the probit model,
∂Φ(x0t θ)
= φ(x0t θ)θj ;
∂xtj
∂G (x0t θ)
= G (x0t θ)[1 − G (x0t θ)]θj .
∂xtj
C.-M. Kuan (National Taiwan Univ.) Quasi Maximum Likelihood Theory December 20, 2010 74 / 119
Letting p = IP(y = 1 | x), p/(1 − p) is the odds ratio: the probability
of y = 1 relative to the probability of y = 0.
For the logit model, the log-odds-ratio is
p
ln t
= ln(exp(x0t θ)) = x0t θ.
1 − pt
so that
∂ ln(pt /(1 − pt ))
= θj .
∂xtj
C.-M. Kuan (National Taiwan Univ.) Quasi Maximum Likelihood Theory December 20, 2010 75 / 119
Measure of Goodness of Fit
which is also IP(et < xt β|xt ) provided that et is symmetric about zero.
The probit specification Φ or the logit specification G can be understood
as specifications of the conditional distribution of et .
C.-M. Kuan (National Taiwan Univ.) Quasi Maximum Likelihood Theory December 20, 2010 77 / 119
Multinomial Models
Consider J + 1 mutually exclusive choices that do not have a natural
ordering, e.g., employment status and commuting mode. The dependent
variable yt takes on J + 1 values such that
0, with probability IP(yt = 0|xt ),
1, with probability IP(y = 1|x ),
t t
yt = ..
.
J, with probability IP(yt = J|xt ).
C.-M. Kuan (National Taiwan Univ.) Quasi Maximum Likelihood Theory December 20, 2010 78 / 119
The density function of dt,0 , . . . , dt,J given xt is then
J
Y
g (dt,0 , . . . , dt,J |xt ) = IP(yt = j|xt )dt,j .
j=0
C.-M. Kuan (National Taiwan Univ.) Quasi Maximum Likelihood Theory December 20, 2010 79 / 119
Multinomial Logit Model
exp(x0t θ j )
Fj (xt ; θ) = Gt,j = PJ , j = 0, . . . , J,
0
k=0 exp(xt θ k )
exp[x0t (θ 0 + γ)]
exp[x0t (θ 0 + γ)] + Jk=1 exp(x0t θ k )
P
exp(x0t θ 0 )
= .
exp(x0t θ 0 ) + Jk=1 exp[x0t (θ k − γ)]
P
C.-M. Kuan (National Taiwan Univ.) Quasi Maximum Likelihood Theory December 20, 2010 80 / 119
For normalization, we set θ 0 = 0, so that
1
F0 (xt ; θ) = Gt,0 = PJ ,
1+ 0
k=1 exp(xt θ k )
exp(x0t θ j )
Fj (xt ; θ) = Gt,j = PJ , j = 1, . . . , J,
1+ 0
k=1 exp(xt θ k )
C.-M. Kuan (National Taiwan Univ.) Quasi Maximum Likelihood Theory December 20, 2010 81 / 119
It is easy to see that ∇θj Gt,k = −Gt,k Gt,j xt for k 6= j, and
∇θj Gt,j = Gt,j 1 − Gt,j xt .
It follows that
T
1 X
∇θj LT (θ) = (dt,j − Gt,j )xt , j = 1, . . . , J.
T
t=1
C.-M. Kuan (National Taiwan Univ.) Quasi Maximum Likelihood Theory December 20, 2010 82 / 119
The marginal response of Gt,j to the change of xt are
J
X
∇xt Gt,0 = −Gt,0 Gt,i θ i ,
i=1
J
!
X
∇xt Gt,j = Gt,j θj − Gt,i θ i , j = 1, . . . , J.
i=1
C.-M. Kuan (National Taiwan Univ.) Quasi Maximum Likelihood Theory December 20, 2010 83 / 119
Conditional Logit Model
exp(x0t,j θ)
Gt,j = PJ , j = 0, 1, . . . , J.
0
k=0 exp(xt,k θ)
C.-M. Kuan (National Taiwan Univ.) Quasi Maximum Likelihood Theory December 20, 2010 84 / 119
The quasi-log-likelihood function is
T J T J
!
1 XX 1 X X
LT (θ) = dt,j x0t,j θ − log exp(x0t,k θ) .
T T
t=1 j=0 t=1 k=0
T J
1 XX
=− Gt,j (xt,j − x̄t )(xt,j − x̄t )0 ,
T
t=1 j=0
PJ
where x̄t = j=0 Gt,j xt,j is the weighted average of xt,j .
C.-M. Kuan (National Taiwan Univ.) Quasi Maximum Likelihood Theory December 20, 2010 85 / 119
It can be shown that the Hessian matrix is negative definite so that the
quasi-log-likelihood function is globally concave.
Each choice probability Gt,j is affected not only by xt,j but also the
regressors for other choices, xt,i , because
Note that for positive θ, an increase in xt,j increases the probability of the
j th choice but decreases the probabilities of other choices.
C.-M. Kuan (National Taiwan Univ.) Quasi Maximum Likelihood Theory December 20, 2010 86 / 119
Random Utility Interpretation
McFadden (1974): Consider the random utility of the choice j:
Letting ε̃t,ji = εt,j − εt,i and Ṽt,ji = Vt,j − Vt,i . Then we have (for j = 0),
C.-M. Kuan (National Taiwan Univ.) Quasi Maximum Likelihood Theory December 20, 2010 87 / 119
This setup is consistent with the theory of decision making. Different
models are obtained by imposing different assumptions on the joint
distribution of εt,j .
Suppose that εt,j are independent random variables across j with the type
I extreme value distribution: exp[− exp(−εt,j )], and the density:
We obtain the conditional logit model when Vt,j = x0t,j θ and multinomial
logit model when Vt,j = x0t θ j .
C.-M. Kuan (National Taiwan Univ.) Quasi Maximum Likelihood Theory December 20, 2010 88 / 119
Remarks:
1 In the conditional logit model, the choices must be quite different
such that they are independent of each other. This is known as
independence of irrelevant alternatives (IIA) which is a restrictive
condition in practice.
2 The multinomial probit model is obtained by assuming that εt,j are
jointly normally distributed. This model permits correlations among
the choices, yet it requires numerical or simulation method to
evaluate the J-fold integral for choice probabilities.
C.-M. Kuan (National Taiwan Univ.) Quasi Maximum Likelihood Theory December 20, 2010 89 / 119
Nested Logit Models
C.-M. Kuan (National Taiwan Univ.) Quasi Maximum Likelihood Theory December 20, 2010 90 / 119
Consider the case that there are J groups of choices, in which the j th
group has 1, . . . , Kj choices. The random utility is:
For example, one must first choose between taking public transportation
and driving and then select a vehicle in the chosen group. The nested logit
model postulates the GEV distribution for εt,jk :
where rj = [1 − corr(εjk , εjm )]1/2 . When the choices in the j th group are
uncorrelated, we have rj = 1 and hence the multinomial logit model.
C.-M. Kuan (National Taiwan Univ.) Quasi Maximum Likelihood Theory December 20, 2010 91 / 119
Define the binary variable dt,jk as
(
1, if yt = jk,
dt,jk = j = 1, . . . , J, k = 1, . . . , Kj .
0, otherwise,
C.-M. Kuan (National Taiwan Univ.) Quasi Maximum Likelihood Theory December 20, 2010 92 / 119
PKj 0
Let It,j = ln k=1 exp(wt,jk γ j /rj ) denote the inclusive value, where rj
are known as scale parameters. We can write
details can be found in Cameron and Trivedi (2005, pp. 526–527). Note
that for a given j, pt,k|j is a conditional logit model with the parameter
γ j /rj .
C.-M. Kuan (National Taiwan Univ.) Quasi Maximum Likelihood Theory December 20, 2010 93 / 119
The approximating density is
Kj
J Y J Kj
p dt,jk dt,jk
Y dt,jk Y Y
f = pt,j × pt,k|j = t,j pt,k|j ,
j=1 k=1 j=1 k=1
T J T J Kj
1 XX 1 XXX
LT = dt,jk ln(pt,j ) + dt,jk ln(pt,k|j ).
T T
t=1 j=1 t=1 j=1 k=1
C.-M. Kuan (National Taiwan Univ.) Quasi Maximum Likelihood Theory December 20, 2010 94 / 119
Ordered Multinomial Models
where yt∗ are latent variables. For example, language proficiency status
and education level have a natural ordering. Such models may be
estimated using multinomial logit models; yet taking into account this
structure yields simpler models.
C.-M. Kuan (National Taiwan Univ.) Quasi Maximum Likelihood Theory December 20, 2010 95 / 119
Setting yt∗ = x0t θ + et , the conditional probabilities are:
Pratt (1981) showed that the Hessian matrix is negative definite so that
the quasi-log-likelihood function is globally concave.
C.-M. Kuan (National Taiwan Univ.) Quasi Maximum Likelihood Theory December 20, 2010 96 / 119
The marginal response of the choice probabilities to the change of
regressors are
C.-M. Kuan (National Taiwan Univ.) Quasi Maximum Likelihood Theory December 20, 2010 97 / 119
Truncated Regression Models
1 ∞ yt − x0t β
Z Z ∞
IP(yt > c|xt ) = φ dyt = φ(ut ) dut
σ c σ (c−x0t β)/σ
c − x0 β
t
=1−Φ .
σ
The truncated density function g̃ (yt |yt > c, xt ) is then approximated by
C.-M. Kuan (National Taiwan Univ.) Quasi Maximum Likelihood Theory December 20, 2010 99 / 119
The quasi-log-likelihood function is
T
1 1 X
2
− [log(2π) + log(σ )] − (yt − x0t β)2
2 2T σ 2
t=1
T c − x 0 β
1 X t
− log 1 − Φ .
T σ
t=1
C.-M. Kuan (National Taiwan Univ.) Quasi Maximum Likelihood Theory December 20, 2010 100 / 119
The first-order condition is
0
h i
0 α) − φ(γc−xt α)
PT
1
T t=1 (γyt − x t 0
1−Φ(γc−xt α)
xt
∇θ LT (θ) = PT h 0
i = 0,
1 1 0 φ(γc−xt α)
γ − T t=1 (γyt − xt α)yt − 1−Φ(γc−x0 α) c t
from which we can solve for the QMLE of θ. It can also be verified that
LT (θ) is globally concave in θ.
C.-M. Kuan (National Taiwan Univ.) Quasi Maximum Likelihood Theory December 20, 2010 101 / 119
Letting ut,o = (yt − x0t β o )/σo and ct,o = (c − x0t β o )/σo , we have
Z ∞ φ(ut,o )
IE(yt |yt > c, xt ) = (σo ut,o + x0t β o ) du
ct,o [1 − Φ(ct,o )] t,o
Z ∞
σo
= ut,o φ(ut,o ) dut,o + x0t β o
1 − Φ(ct,o ) ct,o
Z ∞
σo
= −φ0 (ut,o ) dut,o + x0t β o
1 − Φ(ct,o ) ct,o
φ(ct,o )
= σo + x0t β o .
1 − Φ(ct,o )
That is, even when yt has a linear conditional mean function, its truncated
mean function is necessarily nonlinear. The OLS estimator of regressing yt
on xt is inconsistent for β o . Although NLS estimation may be employed,
QML estimation is typically preferred in practice.
C.-M. Kuan (National Taiwan Univ.) Quasi Maximum Likelihood Theory December 20, 2010 102 / 119
Define the hazard function λ as λ(u) = φ(u)/[1 − Φ(u)], which is also
known as the inverse Mill’s ratio. Then,
2
dλ(u) φ(u)u φ(u)
=− + = −λ(u)u + λ(u)2 .
du 1 − Φ(u) 1 − Φ(u)
The truncated mean is IE(yt |yt > c, xt ) = x0t β o + σo λ(ct,o ), and it can be
shown that the truncated variance is
instead of σo2 . Note that the marginal response of IE(yt |yt > c, xt ) to a
change of xt is proportional to conditional variance:
β
β o 1 + λ(ct,o )ct,o − λ2 (ct,o ) = 2o var(yt |yt > c, xt ).
σo
C.-M. Kuan (National Taiwan Univ.) Quasi Maximum Likelihood Theory December 20, 2010 103 / 119
Consider now the case that the dependent variable yt is truncated from
above at c, i.e., yt < c. The truncated density function g̃ (yt |yt < c, xt )
can be approximated by
C.-M. Kuan (National Taiwan Univ.) Quasi Maximum Likelihood Theory December 20, 2010 104 / 119
Censored Regression Models
where yt∗ = x0t β + et is an index variable. It does not matter whether the
threshold value of yt∗ is zero or a non-zero constant c.
Let g and g ∗ denote the densities of yt and yt∗ conditional on xt . When
yt∗ > 0, g (yt |xt ) = g ∗ (yt∗ |xt ), and when yt∗ ≤ 0, censoring yields
Z 0
IP(yt = 0|xt ) = g ∗ (yt∗ |xt ) dyt∗ .
−∞
C.-M. Kuan (National Taiwan Univ.) Quasi Maximum Likelihood Theory December 20, 2010 105 / 119
The density g is a hybrid of g ∗ and IP(yt = 0|xt ). Define
(
1, if yt > 0,
dt =
0, if yt = 0,
Then,
C.-M. Kuan (National Taiwan Univ.) Quasi Maximum Likelihood Theory December 20, 2010 106 / 119
The approximating desnity is
dt x0 β 1−dt
1 yt∗ − x0t β
2
f (yt |xt ; β, σ ) = φ 1−Φ t .
σ σ σ
1 X
− [(yt − x0t β)/σ]2 ,
2T
{t:yt >0}
C.-M. Kuan (National Taiwan Univ.) Quasi Maximum Likelihood Theory December 20, 2010 107 / 119
Letting α = β/σ and γ = σ −1 , the QMLE of θ = (α0 γ)0 is obtained by
maximizing
1 X T
LT (θ) = log 1 − Φ(x0t α) + 1 log γ
T T
{t:yt =0}
1 X
− (γyt − x0t α)2 ,
2T
{t:yt >0}
= 0,
C.-M. Kuan (National Taiwan Univ.) Quasi Maximum Likelihood Theory December 20, 2010 108 / 119
The conditional mean of censored yt is
C.-M. Kuan (National Taiwan Univ.) Quasi Maximum Likelihood Theory December 20, 2010 109 / 119
Thus, given yt∗ > 0,
Z ∞
IE(yt∗ |yt∗ > 0, xt ) = yt∗ g̃ (yt∗ |yt∗ > 0, xt ) dyt∗
0
φ(x0t β o /σo )
= x0t β o + σo ;
Φ(x0t β o /σo )
This shows that x0t β can not be the correct specification of the conditional
mean of censored yt . Regressing yt on xt thus results in an inconsistent
estimator for β o .
C.-M. Kuan (National Taiwan Univ.) Quasi Maximum Likelihood Theory December 20, 2010 110 / 119
Consider the case that yt is censored from above:
(
yt∗ , yt∗ < 0,
yt =
0, yt∗ ≥ 0.
φ(x0t β o /σo )
IE(yt∗ |yt∗ < 0, xt ) = x0t β o − σo .
1 − Φ(x0t β o /σo )
C.-M. Kuan (National Taiwan Univ.) Quasi Maximum Likelihood Theory December 20, 2010 111 / 119
The conditional mean of yt censored from above is
C.-M. Kuan (National Taiwan Univ.) Quasi Maximum Likelihood Theory December 20, 2010 112 / 119
Sample Selection Models
C.-M. Kuan (National Taiwan Univ.) Quasi Maximum Likelihood Theory December 20, 2010 113 / 119
In general, the likelihood function of yt contains 2 parts:
∗
1−y1,t ∗ ∗
y
P(y1,t ≤ 0|xt ) f (y2,t |y1,t > 0, xt ) IP(y1,t > 0|xt ) 1,t ;
We know
∗ ∗
IE(y2,t |y1,t ) = x02,t γ o + σ12,o (y1,t
∗
− x01,t β o )/σ1,o
2
,
∗ |y ∗ ) = σ 2 − σ 2 /σ 2 .
and var(y2,t 1,t 2,o 12,o 1,o
C.-M. Kuan (National Taiwan Univ.) Quasi Maximum Likelihood Theory December 20, 2010 114 / 119
Recall from truncated regression that
∗ ∗
IE(y1,t |y1,t > c) = x01,t β o + σ1,o φ(ct,o )/[1 − Φ(ct,o )],
x01,t β o
∗ ∗
IE(y1,t |y1,t > 0) = x01,t β o + σ1,o λ ,
σ1,o
∗
σ12,o
> 0, xt ) = x02,t γ o + ∗ ∗ 0
IE(y2,t |y1,t 2
IE(y 1,t |y 1,t > 0, xt ) − x1,t β o
σ1,o
0
σ12,o x1,t β o
0
= x2,t γ o + λ .
σ1,o σ1,o
C.-M. Kuan (National Taiwan Univ.) Quasi Maximum Likelihood Theory December 20, 2010 115 / 119
Again from the truncation regression result, we have
" 0 #
x1,t β o x01,t β o
0
x1,t β o 2
∗ ∗ 2
var(y1,t |y1,t > 0, xt ) = σ1,o 1 − λ −λ .
σ1,o σ1,o σ1,o
Writing
∗
y2,t = x02,t γ o + σ12,o (y1,t
∗
− x01,t β o )/σ1,o
2
+ vt ,
∗ ∼ N (0, σ 2 − σ 2 /σ 2 ). Then var(y |y ∗ > 0, x ) is
where vt |y1,t 2,o 12,o 1,o 2,t 1,t t
2
σ12,o ∗ ∗ ∗
4
var(y1,t |y1,t > 0, xt ) + var(vt |y1,t > 0, xt )
σ1,o
" #
x01,t β o x01,t β o
2 0 2
σ12,o x1,t β o 2 σ12,o
2
= 2 1−λ −λ + σ2,o − 2
σ1,o σ1,o σ1,o σ1,o σ1,o
" #
x01,t β o x01,t β o
2 0
σ12,o x1,t β o 2
2
= σ2,o − 2 λ +λ .
σ1,o σ1,o σ1,o σ1,o
C.-M. Kuan (National Taiwan Univ.) Quasi Maximum Likelihood Theory December 20, 2010 116 / 119
Heckman’s Two-Step Estimator
C.-M. Kuan (National Taiwan Univ.) Quasi Maximum Likelihood Theory December 20, 2010 117 / 119
The variance estimate can be computed as
T 2
1 X 2 σ̂12
σ̂22 = ẽt + 2 λ̃t x01,t α̃T + λ̃t ,
T σ̂1
t=1
where ẽt are the residuals of the regression in the 2nd step.
Remarks:
C.-M. Kuan (National Taiwan Univ.) Quasi Maximum Likelihood Theory December 20, 2010 118 / 119
Time Series Models
C.-M. Kuan (National Taiwan Univ.) Quasi Maximum Likelihood Theory December 20, 2010 119 / 119