03 Mart 4

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 6

Three-Way Tables I

To study the relationship between a response variable and an explanatory variable, we


should control covariates that can influence the relationship.

Example:
A marketing survey on laundry detergents was conducted on a sample of 1008 customers.
Each customer was classified according to whether they prefered Brand X or or Brand M,
whether they had previously used brand M and whether they washed in high or low
temperature. The data are summarized in the following multidimensional three-way
contingency table:
Previous Use Brand Preference
Temperature of M X M Total
High Yes 66 119 185
Low Yes 141 156 297
High No 104 80 184
Low No 197 145 342
Total 508 500 1008

Denote the variables in such a table by X, Y and Z. Notice that we display the distribution of
X − Y cell counts at different level of Z using cross-sections of the three-way contingency
table. These cross-sections are called partial tables. In the partial tables, Z is controlled;
that is, its value is held constant. The two-way contingency table obtained by combining the
partial tables is caled the X − Y marginal table. this table, rather than controlling Z, ignores
it. For the example, the following are the marginal tables:

Previous use Brand Preference


of M X M Total
Yes 207 275 482
No 301 225 526
Total 508 500 1008

Brand Preference
Temperature X M Total
High 170 199 369
Low 338 301 639
Total 508 500 1008

1
Previous use of M
Temperature Yes No Total
High 185 184 369
Low 297 342 639
Total 482 526 1008

To study the relationship between the variables, consider for each pair of variables,
estimates for the marginal odds ratio and the partial (conditional) odds ratio at each
level of the third variable. The marginal odds ratios describe the association when the
third variable is ignored. The partial odds ratios describe the association when the
third variable is controlled.
.
VARIABLES
ASSOCIATION Brand-PreviousUse Brand-Temp PreviousUse-Temp
Marginal 0.563 0.761 1.158
Partial Level 1 0.427 0.614 0.887
Level 2 0.665 0.957 1.383

The general loglinear model for a three-way table is:


logm ijk  = μ + λ Xi + λ Yj + λ Zk + λ XY
ij + λ ik + λ jk + λ ijk
XZ YZ XYZ

Let η ijk = logm ijk  and a dot in the subscript will denote the average with repect to that
index so that
I
∑ η ijk
η ⋅jk = i=1
I
Then
μ = η ⋅⋅⋅
λ Xi = η i⋅⋅ − η ⋅⋅⋅
λ Yj = η i⋅k − η ⋅⋅⋅
λ Zk = η ⋅⋅k − η ⋅⋅⋅
λ XY
ij = η ij⋅ − η i⋅⋅ − η ⋅j⋅ + η ⋅⋅⋅
λ XZ
ik = η i⋅k − η i⋅⋅ − η ⋅⋅k + η ⋅⋅⋅
λ YZ
jk = η ⋅jk − η ⋅j⋅ − η ⋅⋅k + η ⋅⋅⋅
λ XYZ
ijk = η ijk − η ij⋅ − η i⋅k − η ⋅jk + η i⋅⋅ + η ⋅j⋅ + η ⋅⋅k −η ⋅⋅⋅

I J K
with ∑ λ Xi =∑ λ Yj =∑ λ Zk = 0
i=1 j=1 k=1

2
I J I K J K I J K
∑ λ XY
ij =∑ λ ij = 0, ∑ λ ik =∑ λ ik = 0, ∑ λ jk =∑ λ jk = 0, ∑ λ ijk =∑ λ ijk =∑ λ ijk = 0
XY XZ XZ YZ YZ XYZ XYZ XYZ

i=1 j=1 i=1 k=1 j=1 k=1 i=1 j=1 k=1

Singly-subscripted terms pertain to main effects, doubly-subscripted terms pertain to partial


associations (two-way interactions), and triply-subscripted terms pertain to three-factor
interactions. Similar to the saturated model for a two-way table, this general model will fit
the data perfectly.

Setting certain parameters to zero in the general model above yield models such as the
following.
Note: Each model is assigned a symbol that lists the highest order term(s) for each
variable.
Loglinear Model Symbol
logm ijk  = μ + λ Xi + λ Yj + λ Zk X, Y, Z
logm ijk  = μ + λ Xi + λ Yj + λ Zk + λ XY
ij XY, Z
logm ijk  = μ + λ Xi + λ Yj + λ Zk + λ XY
ij + λ ik
XZ
XY, XZ
logm ijk  = μ + λ Xi + λ Yj + λ Zk + λ XY
ij + λ ik + λ jk
XZ YZ
XY, XZ, YZ
logm ijk  = μ + λ Xi + λ Yj + λ Zk + λ XY
ij + λ ik + λ jk + λ ijk
XZ YZ XYZ
XYZ

The models above are called hierarchical models. This means that whenever the model
contains higher-order effects, it also incorporates lower-order effects composed from the
variables.

To intepret loglinear models, we describe their marginal and partial associations using odds
ratios.
The X − Y marginal table consisting of π ij+ uses a set of I − 1J − 1 odds ratios such as
π ij+ π i+1,j+1,+
ij = π i+1,j,+ π i,j+1,+ for 1 ≤ i ≤ I − 1, 1 ≤ j ≤ J − 1
θ XY

Analogous sets of odds ratios can be defined for the X − Z and Y − Z marginal tables.

Within a fixed level k of X, the corresponding odds ratios


π ijk π i+1,j+1,k
θ ijk = π i+1,j,k π i,j+1,k for 1 ≤ i ≤ I − 1, 1 ≤ j ≤ J − 1
describe the conditional X − Y association. Similarly, conditional association between X and
Z is described using the I − 1K − 1 odds ratios θ ijk at each of the J levels of Y and the
conditional association between Y and Z is described by the J − 1K − 1 odds ratios θ ijk
at each of the I levels of X.

3
Loglinear model parameters are functions of the conditional odds ratios. For example, for a
2x2x2 table, we can show that for the general model XYZ above that

λ XYZ 1 log θ 111 = 1 log θ 111 = 1 log θ 111


111 =
8 θ 112 8 θ 121 8 θ 211
for zero-sum constraints on λ XYZ
ijk .

To obtain estimates of expected frequencies m ijk when the data are based on a multinomial
sample, we only have to get the estimates under the assumption that the data are based on
the Poisson distribution, regardless of the model. (Remember that μ is not relevant for a
multinomial sample). Under this assumption, the distribution of the data would be
−m n
I J K e ijk m ijk
ijk ijk
ΠΠ Π
i=1j=1k=1 n ijk !
which has kernel
I J K −m n
Π Π Π e ijk ijk m ijkijk
i=1j=1k=1

The log likelihood of this kernel is

I J K I J K
Lm =∑∑∑ n ijk logm ijk  −∑∑∑ m ijk
i=1 j=1 k=1 i=1 j=1 k=1

For the general model XYZ given by


logm ijk  = μ + λ Xi + λ Yj + λ Zk + λ XY
ij + λ ik + λ jk + λ ijk
XZ YZ XYZ

we get

4
I J K I J I K J K I J K
Lm = nμ +∑ n i++ λ Xi +∑ n +j+ λ Yj +∑ n ++k λ Zk +∑∑ n ij+ λ XY
ij +∑∑ n i+k λ ik +∑∑ n +jk λ jk +∑∑∑
XZ YZ

i=1 j=1 k=1 i=1 j=1 i=1 k=1 j=1 k=1 i=1 j=1 k=1

I J K
−∑∑∑ exp μ + λ Xi + λ Yj + λ Zk + λ XY
ij + λ ik + λ jk + λ ijk 
XZ YZ XYZ

i=1 j=1 k=1

The following table presents the derivatives of Lm to be taken and the resulting likelihood
equation
DERIVATIVE LIKELIHOOD EQUATION
∂L/∂μ not relevant for multinomial sample
∂L/∂λ Xi m i++ = n i++
∂L/∂λ Yj m +j+ = n +j+
∂L/∂λ Zk m ++k = n ++k
∂L/∂λ XY
ij m ij+ = n ij+
∂L/∂λ XZ
ik m i+k = n i+k
∂L/∂λ YZ
jk m +jk = n +jk
∂L/∂λ XYZ
ijk m ijk = n ijk

Note that the likelihood equation m ijk = n ijk associated with ∂L/∂λ XYZ
ijk implies that the other
likelihood equations hold.

To assess the fit of loglinear models for three-way tables, we use


I J K I J K
n ijk − m ijk  2
X =∑∑∑
2
and G 2 = 2 ∑∑∑ n ijk logn ijk /m ijk 
m ijk
i=1 j=1 k=1 i=1 j=1 k=1

Similar to the saturated model for a two-way table, the general model XYZ for a three-way
table will fit the data perfectly since m ijk = n ijk . Thus X 2 and G 2 will both be zero. For the
model XYZ, we also have that

5
df = IJK − 1 + I − 1 + J − 1 + K − 1 + I − 1J − 1 + I − 1K − 1 + J − 1K − 1 + I − 1J − 1K −
=0

SAS program for above analysis:

options pagesize=max;
data market;
input brand $ prevuse $ temp $ count;
cards;
x yes high 66
x yes low 141
x no high 104
x no low 197
m yes high 119
m yes low 156
m no high 80
m no low 145
;
proc catmod order=data;
model brand*prevuse*temp=_response_/ covb pred=freq;
loglin brand prevuse temp brand*(prevuse brand*temp prevuse*temp brand*prevuse*temp;
weight count;
run;

You might also like