Professional Documents
Culture Documents
03 Mart 4
03 Mart 4
03 Mart 4
Example:
A marketing survey on laundry detergents was conducted on a sample of 1008 customers.
Each customer was classified according to whether they prefered Brand X or or Brand M,
whether they had previously used brand M and whether they washed in high or low
temperature. The data are summarized in the following multidimensional three-way
contingency table:
Previous Use Brand Preference
Temperature of M X M Total
High Yes 66 119 185
Low Yes 141 156 297
High No 104 80 184
Low No 197 145 342
Total 508 500 1008
Denote the variables in such a table by X, Y and Z. Notice that we display the distribution of
X − Y cell counts at different level of Z using cross-sections of the three-way contingency
table. These cross-sections are called partial tables. In the partial tables, Z is controlled;
that is, its value is held constant. The two-way contingency table obtained by combining the
partial tables is caled the X − Y marginal table. this table, rather than controlling Z, ignores
it. For the example, the following are the marginal tables:
Brand Preference
Temperature X M Total
High 170 199 369
Low 338 301 639
Total 508 500 1008
1
Previous use of M
Temperature Yes No Total
High 185 184 369
Low 297 342 639
Total 482 526 1008
To study the relationship between the variables, consider for each pair of variables,
estimates for the marginal odds ratio and the partial (conditional) odds ratio at each
level of the third variable. The marginal odds ratios describe the association when the
third variable is ignored. The partial odds ratios describe the association when the
third variable is controlled.
.
VARIABLES
ASSOCIATION Brand-PreviousUse Brand-Temp PreviousUse-Temp
Marginal 0.563 0.761 1.158
Partial Level 1 0.427 0.614 0.887
Level 2 0.665 0.957 1.383
Let η ijk = logm ijk and a dot in the subscript will denote the average with repect to that
index so that
I
∑ η ijk
η ⋅jk = i=1
I
Then
μ = η ⋅⋅⋅
λ Xi = η i⋅⋅ − η ⋅⋅⋅
λ Yj = η i⋅k − η ⋅⋅⋅
λ Zk = η ⋅⋅k − η ⋅⋅⋅
λ XY
ij = η ij⋅ − η i⋅⋅ − η ⋅j⋅ + η ⋅⋅⋅
λ XZ
ik = η i⋅k − η i⋅⋅ − η ⋅⋅k + η ⋅⋅⋅
λ YZ
jk = η ⋅jk − η ⋅j⋅ − η ⋅⋅k + η ⋅⋅⋅
λ XYZ
ijk = η ijk − η ij⋅ − η i⋅k − η ⋅jk + η i⋅⋅ + η ⋅j⋅ + η ⋅⋅k −η ⋅⋅⋅
I J K
with ∑ λ Xi =∑ λ Yj =∑ λ Zk = 0
i=1 j=1 k=1
2
I J I K J K I J K
∑ λ XY
ij =∑ λ ij = 0, ∑ λ ik =∑ λ ik = 0, ∑ λ jk =∑ λ jk = 0, ∑ λ ijk =∑ λ ijk =∑ λ ijk = 0
XY XZ XZ YZ YZ XYZ XYZ XYZ
Setting certain parameters to zero in the general model above yield models such as the
following.
Note: Each model is assigned a symbol that lists the highest order term(s) for each
variable.
Loglinear Model Symbol
logm ijk = μ + λ Xi + λ Yj + λ Zk X, Y, Z
logm ijk = μ + λ Xi + λ Yj + λ Zk + λ XY
ij XY, Z
logm ijk = μ + λ Xi + λ Yj + λ Zk + λ XY
ij + λ ik
XZ
XY, XZ
logm ijk = μ + λ Xi + λ Yj + λ Zk + λ XY
ij + λ ik + λ jk
XZ YZ
XY, XZ, YZ
logm ijk = μ + λ Xi + λ Yj + λ Zk + λ XY
ij + λ ik + λ jk + λ ijk
XZ YZ XYZ
XYZ
The models above are called hierarchical models. This means that whenever the model
contains higher-order effects, it also incorporates lower-order effects composed from the
variables.
To intepret loglinear models, we describe their marginal and partial associations using odds
ratios.
The X − Y marginal table consisting of π ij+ uses a set of I − 1J − 1 odds ratios such as
π ij+ π i+1,j+1,+
ij = π i+1,j,+ π i,j+1,+ for 1 ≤ i ≤ I − 1, 1 ≤ j ≤ J − 1
θ XY
Analogous sets of odds ratios can be defined for the X − Z and Y − Z marginal tables.
3
Loglinear model parameters are functions of the conditional odds ratios. For example, for a
2x2x2 table, we can show that for the general model XYZ above that
To obtain estimates of expected frequencies m ijk when the data are based on a multinomial
sample, we only have to get the estimates under the assumption that the data are based on
the Poisson distribution, regardless of the model. (Remember that μ is not relevant for a
multinomial sample). Under this assumption, the distribution of the data would be
−m n
I J K e ijk m ijk
ijk ijk
ΠΠ Π
i=1j=1k=1 n ijk !
which has kernel
I J K −m n
Π Π Π e ijk ijk m ijkijk
i=1j=1k=1
I J K I J K
Lm =∑∑∑ n ijk logm ijk −∑∑∑ m ijk
i=1 j=1 k=1 i=1 j=1 k=1
we get
4
I J K I J I K J K I J K
Lm = nμ +∑ n i++ λ Xi +∑ n +j+ λ Yj +∑ n ++k λ Zk +∑∑ n ij+ λ XY
ij +∑∑ n i+k λ ik +∑∑ n +jk λ jk +∑∑∑
XZ YZ
i=1 j=1 k=1 i=1 j=1 i=1 k=1 j=1 k=1 i=1 j=1 k=1
I J K
−∑∑∑ exp μ + λ Xi + λ Yj + λ Zk + λ XY
ij + λ ik + λ jk + λ ijk
XZ YZ XYZ
The following table presents the derivatives of Lm to be taken and the resulting likelihood
equation
DERIVATIVE LIKELIHOOD EQUATION
∂L/∂μ not relevant for multinomial sample
∂L/∂λ Xi m i++ = n i++
∂L/∂λ Yj m +j+ = n +j+
∂L/∂λ Zk m ++k = n ++k
∂L/∂λ XY
ij m ij+ = n ij+
∂L/∂λ XZ
ik m i+k = n i+k
∂L/∂λ YZ
jk m +jk = n +jk
∂L/∂λ XYZ
ijk m ijk = n ijk
Note that the likelihood equation m ijk = n ijk associated with ∂L/∂λ XYZ
ijk implies that the other
likelihood equations hold.
Similar to the saturated model for a two-way table, the general model XYZ for a three-way
table will fit the data perfectly since m ijk = n ijk . Thus X 2 and G 2 will both be zero. For the
model XYZ, we also have that
5
df = IJK − 1 + I − 1 + J − 1 + K − 1 + I − 1J − 1 + I − 1K − 1 + J − 1K − 1 + I − 1J − 1K −
=0
options pagesize=max;
data market;
input brand $ prevuse $ temp $ count;
cards;
x yes high 66
x yes low 141
x no high 104
x no low 197
m yes high 119
m yes low 156
m no high 80
m no low 145
;
proc catmod order=data;
model brand*prevuse*temp=_response_/ covb pred=freq;
loglin brand prevuse temp brand*(prevuse brand*temp prevuse*temp brand*prevuse*temp;
weight count;
run;