Professional Documents
Culture Documents
GLM Project 2
GLM Project 2
net/publication/277137869
CITATIONS READS
0 2,000
1 author:
Kuan-Wei Tseng
National Taiwan University
12 PUBLICATIONS 2 CITATIONS
SEE PROFILE
All content following this page was uploaded by Kuan-Wei Tseng on 25 May 2015.
1. Kullback-Leibler Distance
The Kullback-Leibler distance or Kullback-Leibler information is a
measure of the difference between two distributions, which is de-
fined as
f (y|x; θ0 )
K = min E ln
θ∈Θ f (y|x; θ)
f (y|x; θ0 )
= E ln ,
f (y|x; θ∗ )
where f (y|x; θ0 ) denotes the “true” conditional density of y given
x.
The Kullback-Leibler information can be used as a criterion for
model selections. For a given model with parameters estimated θ̂,
its Kullback-Leibler distance is
" !#
f (y|x; θ0 )
K̂ = E ln .
f (y|x; θ̂)
1
with maximum value of
n
1X
ln f (yi |xi ; θ̂)
n i=1
is chosen.
This criterion has similar disadvantage to R2 , that is, log-likelihood
is non-decreasing as the number of indepedent variables increases.
This criterion tends to prefer the large-scale models, and hence
overfitting problems may occur.
where the first term is zero by the likelihood equation. Thus, nK̃
is asymptotically equivalent to
n
− (θ̂ − θ0 )0 I(θ0 )(θ̂ − θ0 ),
2
where I(θ0 ) is the Fisher information. We obtain that
d
− χ2p ,
−2nK̃ →
2
Similarly, from the Taylor expansion of nK̂ around θ0 ,
∂f (y|x; θ0 )
nK̂ = −nE (θ̂ − θ0 )
∂θ
2
n 0 ∂ f (y|x; θ0 )
− (θ̂ − θ0 ) E (θ̂ − θ0 ) + op (1)
2 ∂θ∂θ0
n
= (θ̂ − θ0 )0 I(θ0 )(θ̂ − θ0 ) + op (1),
2
since the first term is zero by the property of the log-likelihood. nK̂
is asymptotically distributed as χ2p and its asymptotic expecation
is p/2.
By the argument above, we have that the asymptotic expecta-
tion of the difference between nK̃ and nK̂ is −p. To find a K̃∗ such
that E(nK̃∗ − nK̂) = 0 in the limit sense, consider
p
K̃∗ = K̃ + .
n
Therefore, the Akaike information criterion is defined as
n
X
AIC = ln f (yi |xi ; θ̂) − p = `(θ̂) − p.
i=1
Then choose the model which has the maximum value. We see that
the AIC is a criterion based on minimizing of the Kullback-Leibler
distance but with a penal term for the number of variables.
Another form of the AIC is that the model with the minimum
value of
n
X
AIC = −2 ln f (yi |xi ; θ̂) + 2p = −2`(θ̂) + 2p
i=1
is chosen. With the form, the first term −2`(θ̂) can be directly
replaced by other asymptotically identical test statistics such as in
the Wald or the score test.
The AIC can be applied to both nested and non-nested models.
However, the AIC procedure cannot be used as a hypothesis testing.
The numerical value does not imply whether model is true. More-
over, in some cases, such as with two normal nested linear models,
suppose M1 ⊂ M2 and M1 is true, the probability of choosing M1
does not converge to one.
3
3. Bayesian Information Criterion (Schwarz Criterion)
Another criterion for model selections is the Bayesian Informa-
tion Criterion (BIC), also as known as Schwarz Criterion. Schwarz
(1987) developed the criterion from Bayesian likelihood maximiza-
tion. Schwarz also proved that the BIC is valid since it does not
depend on the prior distribution.
The BIC is defined as
p
BIC = `(θ̂) − ln n,
2
BIC = −2`(θ̂) + p ln n.