Download as pdf or txt
Download as pdf or txt
You are on page 1of 16

GENLOG

Poisson Loglinear Model

This chapter describes the algorithm to calculate maximum-likelihood estimates for


the Poisson loglinear model. This algorithm is applicable only to aggregated data.
See “Aggregated Data (Poisson)” on p. 16 for producing aggregated data.

Notation
The following notation is used throughout this chapter unless otherwise stated:

B Generic categorical dependent (response) variable. Its categories are


indexed by an array of integers.
r Number of categories of B.
p Number of nonredundant (nonaliased) parameters.
i Generic index for the category of B.
k Generic index for the parameter.
ni Observed count in the ith response of B.
r
N Total observed count, equal to ∑n . i
i =1
mi Expected count.
zi Cell structure value.
βk The kth nonredundant parameter.
β
3 8
Vector of β 0 , β 1, K, β p ′ .

xik An element in the ith row and the kth column of the design matrix.

• Because of the Poisson distribution assumptions, the logit model is not


applicable for a Poisson distribution.

• The Poisson distribution is available in GENLOG only.

1
2 GENLOG Poisson Loglinear Model

Components of the Model


There are two components in a loglinear model: the random component and the
systematic component.

Random Component
The random component describes the joint distribution of the counts.
• ;@
The count ni has a Poisson distribution with parameter mi .

• The counts ni and ni′ are independent if i ≠ i ′ .

• ;@
The joint probability distribution of ni is the product of these r independent
Poisson distributions. The probability density function is

mi i e − mi
r n

∏ ni !
i =1

• The expected count is E ni = mi . 1 6


• The covariance is

1
cov ni , ni′ = 6 %&'0mi if i = i ′
if i ≠ i ′

Systematic Component
The systematic component describes the linkage function between the expected
counts and the parameters. The expected counts are themselves functions of
parameters. For i = 1,K, r ,

%Kz eβ 0 + vi
mi = &K0 i if zi > 0
' if zi ≤ 0
GENLOG Poisson Loglinear Model 3

where

p
vi = ∑x ik β k (1)
k =1

Since there are no constraints on the observed counts, β 0 is a free parameter in a


Poisson loglinear model.

Cell Structure Values


Cell structure values play two roles in SPSS loglinear procedures, depending on
their signs. If zi > 0 , it is a usual weight for the corresponding cell and log zi is 1 6
sometimes called the offset. If zi ≤ 0 , a structural zero is imposed on the cell
( B = i ). Contingency tables containing at least one structural zero are called
incomplete tables. If ni = 0 but zi > 0 , the cell ( B = i ) contains a sampling zero.
Although SPSS still considers a structural zero part of the contingency table, it is
not used in fitting the model. Cellwise statistics are not computed for structural
zeros.

Maximum-Likelihood Estimation
The multinomial log-likelihood is

r
16 3 8
L β = L β 0 ,K, β p = constant + ∑ 2n log1m 6 − m 7
i i i (2)
i =1

Likelihood Equations
It can be shown that

∑ 1n − m 6
∂L
= i i
∂β 0
i =1

∑ 1n − m 6 x
∂L
= i i ik k = 1,K, p
∂β k
i =1
4 GENLOG Poisson Loglinear Model

16 3 16 1 68 0 5
Let g β = g0 β ,K, g p β ′ be the p + 1 × 1 gradient vector with

16
gk β =
∂L
∂β k

4 ′
9
The maximum-likelihood estimates β$ = β$ 0 ,K, β$ p are regarded as a solution to
the vector of likelihood equations:

16
g β =0 (3)

Hessian Matrix

The likelihood equations are nonlinear functions of β. Solving them for β$ requires
an iterative method. The Newton-Raphson method is used. It can be shown that

∂2L
r

∂ 2β 0 ∑
= − mi
i =1

∂2L
r

∂β 0 ∂β 1 ∑
= − mi x il
i =1

∂ 2L
r

∂β k ∂β 0 i =1

= − mi xik

∂2L
r

∂β k ∂β l ∑
= − mi x ik x il
i =1

16 0 5 0 5 16
Let H β be the p + 1 × p + 1 information matrix, where −H β is the Hessian
matrix of (2). The elements of H1 β 6 are

16
hkl β =
∂2L
∂β k ∂β l
k = 0,K, p and l = 1,K, p (4)
GENLOG Poisson Loglinear Model 5

16
Note: H β is a symmetric positive definite matrix. The asymptotic covariance
matrix of β$ is estimated by H −1 β . 16
Newton-Raphson Method
05
Let β s denote the sth approximation for the solution to (3). By the Newton-
Raphson method,

β 0 s +15 = β 0 s 5 + H −1 β 0s 5 g β 0s 5
4 94 9
16 16 16
Define q β = H β β + g β . The kth element of q β is 16
r
1 6 ∑η x
qk β = i ik (5)
i =1

where

ηi =
%&m v + 1n − m 6
i i i i if zi > 0 and m i > 0
'0 otherwise

Then

H β 0 s5 β 0 s +15 = q β 0 s5
4 9 4 9 (6)

05 0 5 0 5
Thus, given β s , the s + 1 th approximation β s+1 is found by solving the system
of equations in (6).

Initial Values
05
SPSS uses the β 0 , which corresponds to a saturated model as the initial value for
β . Then the initial estimates for the expected cell counts are

005 = %&ni + ∆
if zi > 0
mi
0 ' if zi ≤ 0
(7)

where ∆ ≥ 0 is a constant.
6 GENLOG Poisson Loglinear Model

Note: For saturated models, SPSS adds ∆ to ni if zi > 0 . This is done to avoid
numerical problems in the case that some observed counts are 0. We advise users to
set ∆ to 0 whenever all observed counts (other than structural zeros) are positive.
The initial values for η i are

η 0i
0 5 = %K&mi log4mi005 / zi 9 + 4ni − mi005 9 if zi > 0 and mi
005 > 0
K'0 otherwise
(8)

Stopping Criteria
SPSS checks the following conditions for convergence:

1.  0
max i mi
s +15 − m 0s5 / m 0s5  < ε
i i 
0s5 > 0
provided that mi

max  m
0 5 − m 0s5  < ε
 
s +1
2. i i i

 ∑ p
gk2 β$ 4 9 / 0 p + 15 < ε
3.
 k =0

The iteration is said to be converged if either conditions 1 and 3 or conditions 2


and 3 are satisfied. The iteration is said to be not converged if neither pair of
conditions is satisfied within the maximum number of iterations.

Algorithm
The iteration process uses the following steps:
1. Calculate mi
005 using (7) and n005 using (8).
i
2. Set s = 0 .

3. Calculate H β s 4 0 5 9 using (4) evaluated at mi0s5 , and q4β 0 5 9 using (5) evaluated
s

0s5
at η i = η i .

4. 0 5
Solve for β s+1 using (6).
0 s +15 0s +15 and

p
5. Calculate vi = xik β k
k =1
GENLOG Poisson Loglinear Model 7

0s +15 = %K&zi e β
0 s +15 + v0 s +15
0 i
if zi > 0
K'0
m i
if zi ≤ 0

6. Check whether the stopping criteria are satisfied. If yes, iteration stops and the
process is declared converged. Otherwise continue.
7. Increase s by 1 and check whether the maximum iteration has been reached. If
yes, iteration stops and the process is declared not converged. Otherwise,
repeat steps 3-7.

Estimated Cell Counts


The estimated expected count is

%Kβ 0 + vi$

&K
$
m$ i = zi e if zi > 0
0 ' if zi ≤ 0

where

p
v$i = ∑x ik β k
$
k =1

Goodness-of-Fit Statistics
The Pearson chi-square statistic is

r
X2 = ∑X 2
i
i =1

where

%K1ni − m$ i 62 / m$ i if zi > 0, ni > 0, and m$ i > 0


Xi2 = &SYSMIS if zi > 0, ni > 0, and m$ i = 0
KK'0 if zi ≤ 0 or ni = m$ i
8 GENLOG Poisson Loglinear Model

If any Xi2 is system missing, then X 2 is also system missing.


The likelihood-ratio chi-square statistic is

r
G2 = 2 ∑G 2
i
i =1

where

%Kn 2log1n / m$ 67 − 1n − m$ 6 if zi > 0, ni > 0, and m$ i > 0


KSYSMIS
i i i i i

=&
if zi > 0, ni > 0, and m$ i = 0
Gi2
KKm$ i if zi > 0, ni = 0, and m$ i > 0
'0 if zi ≤ 0 or ni = m$ i

If any Gi2 is system missing, then G 2 is also system missing.

Degrees of Freedom
The degrees of freedom for each statistic is defined as a = r − 1 − p − E , where E is
the number of cells with zi ≤ 0 or m$ i = 0 .

Significance Level

The significance level (or the p value) for the Pearson chi-square statistic is

4
Prob χ 2a > X 2 9 and that for the likelihood-ratio chi-square statistic is
Prob4 χ 2
a >G 2
9 . In both cases, χ 2a is the central chi-square distribution with a
degrees of freedom.
GENLOG Poisson Loglinear Model 9

Residuals
Goodness-of-fit statistics provide only broad summaries of how models fit data.
The pattern of lack of fit is revealed in cell-by-cell comparisons of observed and
fitted cell counts.

Simple Residuals
The simple residual of the ith cell is

ri =
%&n − m$
i i if zi > 0
'SYSMIS if zi ≤ 0

Standardized Residuals
The standardized residual for the ith cell is

%K1n − m$ 6 /
i i m$ i if zi > 0 and 0 < m$ i
riS = &0 if zi > 0 and ni = m$ i
K'SYSMIS otherwise

∑ 4r 9
r S 2
Notice that i = X 2 when all zi > 0 . Hence, standardized residuals are
i =1
also known as Pearson residuals. Although the standardized residuals are

asymptotically normal, their asymptotic variances are less than 1.

Adjusted Residuals
The adjusted residual is the simple residual divided by its estimated standard
error. This statistic for the ith cell is
10 GENLOG Poisson Loglinear Model

%K1n − m$ 6 / m$ 11 − a 6
i i i ii if zi > 0, ni ≠ m$ i , and m$ i > 0
A
= &0 if zi > 0 and ni = m$ i
KKSYSMIS
ri

' otherwise

where

 p p p 

aii = m$ i h 00 + 2 ∑ xik h k 0 + ∑∑ xik xil h kl 
 k =1 k =1 l =1 
49
h kl is the (k,l)th element of H −1 β$ . The adjusted residuals are asymptotically
standard normal.

Deviance Residuals
Pierce and Schafer (1986) and McCullagh and Nelder (1989) define the signed
square root of the individual contribution to the G 2 statistic as the deviance
residual. This statistic for the ith cell is

1
riD = sign ni − m$ i 6 di

where

%K24n 2log1n / m$ 67 − 1n − m$ 69 if zi > 0, m$ i > 0, and ni > 0


K2m$
i i i i i

=& i if zi > 0, m$ i ≥ 0, and ni = 0


KK0
di
if zi > 0 and ni = m$ i
'SYSMIS otherwise

∑ 4r 9
r D 2
When all zi > 0 , i = G2 .
i =1
GENLOG Poisson Loglinear Model 11

Generalized Residual

r
Consider a linear combination of the cell counts di ni , where di are real
i =1
numbers.
The estimated expected value is

∑ d m$ i i
i =1

The simple residual for this linear combination is

∑ d 1n − m$ 6
i i i
i =1

The standardized residual for this linear combination is

∑ d 1n − m$ 6
r
i i i
i =1

∑ d m$
r 2
i i
i =1

Using the results in Christensen (1990, p. 227), the adjusted residual for this linear
combination is


r
i =1
1
di ni − m$ i 6
V

where
12 GENLOG Poisson Loglinear Model

r r
V= ∑ ∑ d d m$ 3δ i j i ij − aij 8
i =1 j =1
r r r
= ∑ di2 m$ i − ∑ ∑ d d a m$ i j ij i
i =1 i =1 j =1

where

 p p p 

aij = m$ i h 00 + ∑ 3x 8
+ x jk h k 0 + ∑∑ xik x jl h kl 
 k =1
ik
k =1 l =1 
h kl is the (k,l)th element of H −1 β . 16

Generalized Log-Odds Ratio


Consider a linear combination of the natural logarithm of cell counts

∑ d log1m 6
i i (9)
i =1

where di are real numbers with the restriction

∑d i =0
i =1

The quantity (9) is estimated by

r r r p

∑ 1 6 ∑ d log1z 6 + ∑ ∑ d x
di log m$ i = j i
$
i ik β k
i =1 i =1 i =1 k =1
GENLOG Poisson Loglinear Model 13

The variance is

 r  p p
var  ∑ d log1m$ 6 = ∑ ∑ w w h
 
kl
i i k l (10)
i =1 k =1 l =1

where

r
wk = ∑d x i ik k = 1,K, p
i =1

Wald Statistic
The null hypothesis is

r
H0 : ∑ d log1m 6 = 0
i i
i =1

The Wald statistic is

 ∑ d log1m$ 6 r 2

W=
  i =1
i i

∑ ∑ w wh
p p kl
k l
k =1 l =1

Under H0 , W asymptotically distributes as a chi-square distribution with 1 degree


4
of freedom. The significance level is Prob χ 12 ≥ W . Note: W will be system 9
missing if (10) is 0.

Asymptotic Confidence Interval


0 5
The asymptotic 1 − α × 100 % confidence interval for (9) is
14 GENLOG Poisson Loglinear Model

r p p

∑ 1 6
di log m$ i ± zα / 2 ∑ ∑ wk wl h kl
i =1 k =1 l =1

where zα /2 is the upper α / 2 point of the standard normal distribution. The default
value of α is 0.05.

Aggregated Data (Poisson)


This section shows how data are aggregated for a Poisson distribution. The
following notation is used in this section:

vi The number of SPSS cases for B = i i = 1,K, r 0 5


nis 1
The sth SPSS caseweight for B = i s = 1,K, vi 6
xis Covariate
zis Cell weight
c is GRESID coefficient
eis GLOR coefficient
vi+ The number of positive zis (cell weights) for 1 ≤ s ≤ vi

The cell count is

%K∑* nis+ if vi+ > 0


ni = & 1≤ s ≤v
K'0 i
if vi = 0 or vi+ = 0

where

nis+ =
%&nis if nis > 0 and zis > 0
'0 if nis ≤ 0 and zis > 0


*
and means summation over the range of s with the terms zis > 0 .
1≤ s≤ vi

The cell weight value is


GENLOG Poisson Loglinear Model 15

%K∑ *
nis+ zis / nij if ni > 0 and vi+ > 0
KK 1≤ s ≤ vi

= &∑
*
zis / vi+ if ni = 0 and vi+ > 0
KK0
zi 1≤ s ≤ vi
if vi+ = 0
K'1 if vi = 0

If no variable is specified as the cell weight variable, then all cases have unit cell
weights by default.
The cell covariate value is
%K∑ *
nis+ xis / ni if ni > 0 and vi > 0
KK 1≤ s ≤ vi

= &∑
*
xis / vi+ if ni = 0 and vi+ > 0
KK0
xij
1≤ s ≤ vi
if vi+ = 0 or vi = 0
K'
The cell GRESID coefficient is

%K∑ *
nis+ cis / ni if ni > 0 and vi > 0
KK 1≤ s ≤ vi

c = &∑
*
cis / vi+ if ni = 0 and vi+ > 0
i
KK0 1≤ s ≤ vi
if vi+ = 0 or vi = 0
K'
There are no defaults for the GRESID coefficients.
The cell coefficient is

%K∑ *
nis+ eis / ni if ni > 0 and vi > 0
KK 1≤ s ≤ vi

e = &∑
*
eis / vi+ if ni = 0 and vi+ > 0
i
KK0 1≤ s ≤ vi
if vi+ = 0 or vi = 0
K'
There are no defaults for the GLOR coefficients.
16 GENLOG Poisson Loglinear Model

References
Agresti, A. 1990. Categorical data analysis. New York: John Wiley & Sons, Inc.

Christensen, R. 1990. Log-linear models. New York: Springer-Verlag.

Haberman, S. J. 1973. The analysis of residuals in cross-classified tables.


Biometrics, 29: 205–220.

McCullagh, P., and Nelder, J. A. 1989. Generalized linear models. 2nd ed. London:
Chapman and Hall.

Pierce, D. A., and Schafer, D. W. 1986. Residuals in generalized linear models.


Journal of the American Statistical Association, 81: 977–986.

You might also like