Professional Documents
Culture Documents
Genlog Poisson
Genlog Poisson
Notation
The following notation is used throughout this chapter unless otherwise stated:
xik An element in the ith row and the kth column of the design matrix.
1
2 GENLOG Poisson Loglinear Model
Random Component
The random component describes the joint distribution of the counts.
• ;@
The count ni has a Poisson distribution with parameter mi .
• ;@
The joint probability distribution of ni is the product of these r independent
Poisson distributions. The probability density function is
mi i e − mi
r n
∏ ni !
i =1
1
cov ni , ni′ = 6 %&'0mi if i = i ′
if i ≠ i ′
Systematic Component
The systematic component describes the linkage function between the expected
counts and the parameters. The expected counts are themselves functions of
parameters. For i = 1,K, r ,
%Kz eβ 0 + vi
mi = &K0 i if zi > 0
' if zi ≤ 0
GENLOG Poisson Loglinear Model 3
where
p
vi = ∑x ik β k (1)
k =1
Maximum-Likelihood Estimation
The multinomial log-likelihood is
r
16 3 8
L β = L β 0 ,K, β p = constant + ∑ 2n log1m 6 − m 7
i i i (2)
i =1
Likelihood Equations
It can be shown that
∑ 1n − m 6
∂L
= i i
∂β 0
i =1
∑ 1n − m 6 x
∂L
= i i ik k = 1,K, p
∂β k
i =1
4 GENLOG Poisson Loglinear Model
16 3 16 1 68 0 5
Let g β = g0 β ,K, g p β ′ be the p + 1 × 1 gradient vector with
16
gk β =
∂L
∂β k
4 ′
9
The maximum-likelihood estimates β$ = β$ 0 ,K, β$ p are regarded as a solution to
the vector of likelihood equations:
16
g β =0 (3)
Hessian Matrix
The likelihood equations are nonlinear functions of β. Solving them for β$ requires
an iterative method. The Newton-Raphson method is used. It can be shown that
∂2L
r
∂ 2β 0 ∑
= − mi
i =1
∂2L
r
∂β 0 ∂β 1 ∑
= − mi x il
i =1
∂ 2L
r
∂β k ∂β 0 i =1
∑
= − mi xik
∂2L
r
∂β k ∂β l ∑
= − mi x ik x il
i =1
16 0 5 0 5 16
Let H β be the p + 1 × p + 1 information matrix, where −H β is the Hessian
matrix of (2). The elements of H1 β 6 are
16
hkl β =
∂2L
∂β k ∂β l
k = 0,K, p and l = 1,K, p (4)
GENLOG Poisson Loglinear Model 5
16
Note: H β is a symmetric positive definite matrix. The asymptotic covariance
matrix of β$ is estimated by H −1 β . 16
Newton-Raphson Method
05
Let β s denote the sth approximation for the solution to (3). By the Newton-
Raphson method,
β 0 s +15 = β 0 s 5 + H −1 β 0s 5 g β 0s 5
4 94 9
16 16 16
Define q β = H β β + g β . The kth element of q β is 16
r
1 6 ∑η x
qk β = i ik (5)
i =1
where
ηi =
%&m v + 1n − m 6
i i i i if zi > 0 and m i > 0
'0 otherwise
Then
H β 0 s5 β 0 s +15 = q β 0 s5
4 9 4 9 (6)
05 0 5 0 5
Thus, given β s , the s + 1 th approximation β s+1 is found by solving the system
of equations in (6).
Initial Values
05
SPSS uses the β 0 , which corresponds to a saturated model as the initial value for
β . Then the initial estimates for the expected cell counts are
005 = %&ni + ∆
if zi > 0
mi
0 ' if zi ≤ 0
(7)
where ∆ ≥ 0 is a constant.
6 GENLOG Poisson Loglinear Model
Note: For saturated models, SPSS adds ∆ to ni if zi > 0 . This is done to avoid
numerical problems in the case that some observed counts are 0. We advise users to
set ∆ to 0 whenever all observed counts (other than structural zeros) are positive.
The initial values for η i are
η 0i
0 5 = %K&mi log4mi005 / zi 9 + 4ni − mi005 9 if zi > 0 and mi
005 > 0
K'0 otherwise
(8)
Stopping Criteria
SPSS checks the following conditions for convergence:
1. 0
max i mi
s +15 − m 0s5 / m 0s5 < ε
i i
0s5 > 0
provided that mi
max m
0 5 − m 0s5 < ε
s +1
2. i i i
∑ p
gk2 β$ 4 9 / 0 p + 15 < ε
3.
k =0
Algorithm
The iteration process uses the following steps:
1. Calculate mi
005 using (7) and n005 using (8).
i
2. Set s = 0 .
3. Calculate H β s 4 0 5 9 using (4) evaluated at mi0s5 , and q4β 0 5 9 using (5) evaluated
s
0s5
at η i = η i .
4. 0 5
Solve for β s+1 using (6).
0 s +15 0s +15 and
∑
p
5. Calculate vi = xik β k
k =1
GENLOG Poisson Loglinear Model 7
0s +15 = %K&zi e β
0 s +15 + v0 s +15
0 i
if zi > 0
K'0
m i
if zi ≤ 0
6. Check whether the stopping criteria are satisfied. If yes, iteration stops and the
process is declared converged. Otherwise continue.
7. Increase s by 1 and check whether the maximum iteration has been reached. If
yes, iteration stops and the process is declared not converged. Otherwise,
repeat steps 3-7.
%Kβ 0 + vi$
&K
$
m$ i = zi e if zi > 0
0 ' if zi ≤ 0
where
p
v$i = ∑x ik β k
$
k =1
Goodness-of-Fit Statistics
The Pearson chi-square statistic is
r
X2 = ∑X 2
i
i =1
where
r
G2 = 2 ∑G 2
i
i =1
where
=&
if zi > 0, ni > 0, and m$ i = 0
Gi2
KKm$ i if zi > 0, ni = 0, and m$ i > 0
'0 if zi ≤ 0 or ni = m$ i
Degrees of Freedom
The degrees of freedom for each statistic is defined as a = r − 1 − p − E , where E is
the number of cells with zi ≤ 0 or m$ i = 0 .
Significance Level
The significance level (or the p value) for the Pearson chi-square statistic is
4
Prob χ 2a > X 2 9 and that for the likelihood-ratio chi-square statistic is
Prob4 χ 2
a >G 2
9 . In both cases, χ 2a is the central chi-square distribution with a
degrees of freedom.
GENLOG Poisson Loglinear Model 9
Residuals
Goodness-of-fit statistics provide only broad summaries of how models fit data.
The pattern of lack of fit is revealed in cell-by-cell comparisons of observed and
fitted cell counts.
Simple Residuals
The simple residual of the ith cell is
ri =
%&n − m$
i i if zi > 0
'SYSMIS if zi ≤ 0
Standardized Residuals
The standardized residual for the ith cell is
%K1n − m$ 6 /
i i m$ i if zi > 0 and 0 < m$ i
riS = &0 if zi > 0 and ni = m$ i
K'SYSMIS otherwise
∑ 4r 9
r S 2
Notice that i = X 2 when all zi > 0 . Hence, standardized residuals are
i =1
also known as Pearson residuals. Although the standardized residuals are
Adjusted Residuals
The adjusted residual is the simple residual divided by its estimated standard
error. This statistic for the ith cell is
10 GENLOG Poisson Loglinear Model
%K1n − m$ 6 / m$ 11 − a 6
i i i ii if zi > 0, ni ≠ m$ i , and m$ i > 0
A
= &0 if zi > 0 and ni = m$ i
KKSYSMIS
ri
' otherwise
where
p p p
aii = m$ i h 00 + 2 ∑ xik h k 0 + ∑∑ xik xil h kl
k =1 k =1 l =1
49
h kl is the (k,l)th element of H −1 β$ . The adjusted residuals are asymptotically
standard normal.
Deviance Residuals
Pierce and Schafer (1986) and McCullagh and Nelder (1989) define the signed
square root of the individual contribution to the G 2 statistic as the deviance
residual. This statistic for the ith cell is
1
riD = sign ni − m$ i 6 di
where
∑ 4r 9
r D 2
When all zi > 0 , i = G2 .
i =1
GENLOG Poisson Loglinear Model 11
Generalized Residual
∑
r
Consider a linear combination of the cell counts di ni , where di are real
i =1
numbers.
The estimated expected value is
∑ d m$ i i
i =1
∑ d 1n − m$ 6
i i i
i =1
∑ d 1n − m$ 6
r
i i i
i =1
∑ d m$
r 2
i i
i =1
Using the results in Christensen (1990, p. 227), the adjusted residual for this linear
combination is
∑
r
i =1
1
di ni − m$ i 6
V
where
12 GENLOG Poisson Loglinear Model
r r
V= ∑ ∑ d d m$ 3δ i j i ij − aij 8
i =1 j =1
r r r
= ∑ di2 m$ i − ∑ ∑ d d a m$ i j ij i
i =1 i =1 j =1
where
p p p
aij = m$ i h 00 + ∑ 3x 8
+ x jk h k 0 + ∑∑ xik x jl h kl
k =1
ik
k =1 l =1
h kl is the (k,l)th element of H −1 β . 16
∑ d log1m 6
i i (9)
i =1
∑d i =0
i =1
r r r p
∑ 1 6 ∑ d log1z 6 + ∑ ∑ d x
di log m$ i = j i
$
i ik β k
i =1 i =1 i =1 k =1
GENLOG Poisson Loglinear Model 13
The variance is
r p p
var ∑ d log1m$ 6 = ∑ ∑ w w h
kl
i i k l (10)
i =1 k =1 l =1
where
r
wk = ∑d x i ik k = 1,K, p
i =1
Wald Statistic
The null hypothesis is
r
H0 : ∑ d log1m 6 = 0
i i
i =1
∑ d log1m$ 6 r 2
W=
i =1
i i
∑ ∑ w wh
p p kl
k l
k =1 l =1
r p p
∑ 1 6
di log m$ i ± zα / 2 ∑ ∑ wk wl h kl
i =1 k =1 l =1
where zα /2 is the upper α / 2 point of the standard normal distribution. The default
value of α is 0.05.
where
nis+ =
%&nis if nis > 0 and zis > 0
'0 if nis ≤ 0 and zis > 0
∑
*
and means summation over the range of s with the terms zis > 0 .
1≤ s≤ vi
%K∑ *
nis+ zis / nij if ni > 0 and vi+ > 0
KK 1≤ s ≤ vi
= &∑
*
zis / vi+ if ni = 0 and vi+ > 0
KK0
zi 1≤ s ≤ vi
if vi+ = 0
K'1 if vi = 0
If no variable is specified as the cell weight variable, then all cases have unit cell
weights by default.
The cell covariate value is
%K∑ *
nis+ xis / ni if ni > 0 and vi > 0
KK 1≤ s ≤ vi
= &∑
*
xis / vi+ if ni = 0 and vi+ > 0
KK0
xij
1≤ s ≤ vi
if vi+ = 0 or vi = 0
K'
The cell GRESID coefficient is
%K∑ *
nis+ cis / ni if ni > 0 and vi > 0
KK 1≤ s ≤ vi
c = &∑
*
cis / vi+ if ni = 0 and vi+ > 0
i
KK0 1≤ s ≤ vi
if vi+ = 0 or vi = 0
K'
There are no defaults for the GRESID coefficients.
The cell coefficient is
%K∑ *
nis+ eis / ni if ni > 0 and vi > 0
KK 1≤ s ≤ vi
e = &∑
*
eis / vi+ if ni = 0 and vi+ > 0
i
KK0 1≤ s ≤ vi
if vi+ = 0 or vi = 0
K'
There are no defaults for the GLOR coefficients.
16 GENLOG Poisson Loglinear Model
References
Agresti, A. 1990. Categorical data analysis. New York: John Wiley & Sons, Inc.
McCullagh, P., and Nelder, J. A. 1989. Generalized linear models. 2nd ed. London:
Chapman and Hall.