Professional Documents
Culture Documents
ST 544: Applied Categorical Data Analysis: Daowen Zhang
ST 544: Applied Categorical Data Analysis: Daowen Zhang
ST 544: Applied Categorical Data Analysis: Daowen Zhang
D.
c Zhang
Daowen Zhang
zhang@stat.ncsu.edu
http://www4.stat.ncsu.edu/∼dzhang2
Slide 1
TABLE OF CONTENTS ST 544, D. Zhang
Contents
1 Introduction 3
2 Contingency Tables 40
1 Introduction
I. Categorical Data
Definition
• A categorical variable is a (random) variable that can only take finite
or countably many values (categories).
• Type of categorical variables:
? Gender: F/M or 0/1; Race: White, Black, Others – Nominal
? Patient’s Health Status: Excellent, Good, Fair, Bad – Ordinal
? # of car accidents in next Jan in Wake County – Interval
Slide 3
CHAPTER 1 ST 544, D. Zhang
Slide 4
CHAPTER 1 ST 544, D. Zhang
Slide 5
CHAPTER 1 ST 544, D. Zhang
Slide 6
CHAPTER 1 ST 544, D. Zhang
P [Y ≥ 4] = 1 − P [Y ≤ 3] = 1 − 0.0548 = 0.9452.
Slide 7
CHAPTER 1 ST 544, D. Zhang
E(Y ) = nπ
var(Y ) = nπ(1 − π)
p p
σ = var(Y ) = nπ(1 − π)
Slide 8
CHAPTER 1 ST 544, D. Zhang
Slide 9
CHAPTER 1 ST 544, D. Zhang
Y 1 2 ··· c
Prob π1 π2 · · · πc
Pc
where πi = P [Y = j] > 0, j=1 πj = 1.
• Each trial of n trials results in an outcome in one (and only one) of c
categories, represented by
Yi1 0
Yi2 1
Yi = . , i = 1, 2, ..., n. For example, Y i = .
.. ..
e
e
.
Yic 0
• Often time, we may not have the individual outcome. Instead, we have
the following summary:
n1
n2
n = . ,
.
e .
nc
where nj is the # of trials resulting outcome in the j category. That is
Pn
nj = i=1 Yij .
• The probability of observing n is
e
n!
p(n1 , n2 , ..., nc ) = π1n1 π2n2 · · · πcnc .
n1 !n2 ! · · · nc !
Slide 11
CHAPTER 1 ST 544, D. Zhang
Slide 12
CHAPTER 1 ST 544, D. Zhang
cov(ni , nj ) = −nπi πj , i 6= j.
• n can be written:
e
n1
n
n2 X
n= = Y i.
..
e
. i=1 e
nc
Slide 13
CHAPTER 1 ST 544, D. Zhang
Slide 16
CHAPTER 1 ST 544, D. Zhang
Slide 17
CHAPTER 1 ST 544, D. Zhang
Slide 19
CHAPTER 1 ST 544, D. Zhang
`0 = y log π0 + (n − y) log(1 − π0 )
`1 = y log p + (n − y) log(1 − p)
G2 = 2(`1 − `0 )
= 2 [y(log p − log π0 ) + (n − y){log(1 − p) − log(1 − π0 )}]
p (1 − p)
= 2 y log + (n − y) log
π0 (1 − π0 )
np n(1 − p)
= 2 y log + (n − y) log
nπ0 n(1 − π0 )
y (n − y)
= 2 y log + (n − y) log
nπ0 n − nπ0
X obs.
= 2 obs. log
exp.
2 cells
Slide 20
CHAPTER 1 ST 544, D. Zhang
Compare G2 to χ21 .
Slide 21
CHAPTER 1 ST 544, D. Zhang
Slide 22
CHAPTER 1 ST 544, D. Zhang
2. Score test:
p − π0 0.448 − 0.5
z=p =p = −3.11.
π0 (1 − π0 )/n 0.5 × (1 − 0.5)/893
Since z < −1.96, reject H0 at 0.05 significance level.
Slide 23
CHAPTER 1 ST 544, D. Zhang
3. LRT:
2
X obs.
G = 2 obs. log
exp.
2 cells
= 2[400 × log{400/(893 × 0.5)}
+(893 − 400) × log{(893 − 400)/(893 − 893 × 0.5)}]
= 9.7 > 1.962 = 3.84,
Slide 24
CHAPTER 1 ST 544, D. Zhang
Slide 25
CHAPTER 1 ST 544, D. Zhang
Slide 26
CHAPTER 1 ST 544, D. Zhang
⇒ [0.416, 0.481].
Note: Here the sample size n is very large, the Wald CI and the score
CI are very close.
Slide 27
CHAPTER 1 ST 544, D. Zhang
Slide 28
CHAPTER 1 ST 544, D. Zhang
Slide 29
CHAPTER 1 ST 544, D. Zhang
Slide 30
CHAPTER 1 ST 544, D. Zhang
• Note: We see from the GSS example that, for large sample size n, the
Wald, score, LR CIs are all very close. However, if n is not large, there
will be some discrepancy among them.
• For example, if y = 9, n = 10, then:
1. Wald CI: [0.714, 1.086] = [0.714, 1]
2. Score CI: [0.596, 0.982]
3. LR CI: [0.628, 0.994]
Slide 31
CHAPTER 1 ST 544, D. Zhang
Given data y ∼ Bin(n, π), the testing procedure would be: Reject H0
if y is large.
Slide 32
CHAPTER 1 ST 544, D. Zhang
Given data y ∼ Bin(n, π), the testing procedure would be: Reject H0
if |y − nπ0 | is large.
Slide 34
CHAPTER 1 ST 544, D. Zhang
For example, if we are testing H0 : π = 0.5 v.s. Ha : π > 0.5 and our
significance level =0.05 using data y from Bin(n = 10, π). Then based
on Table 1.2, we should reject H0 only if y = 9 or y = 10. However,
the actual type I error probability is 0.011 < α = 0.05. Conservative!
Slide 35
CHAPTER 1 ST 544, D. Zhang
Slide 36
CHAPTER 1 ST 544, D. Zhang
Slide 38
CHAPTER 1 ST 544, D. Zhang
2 Contingency Tables
Slide 41
CHAPTER 2 ST 544, D. Zhang
Slide 45
CHAPTER 2 ST 544, D. Zhang
Slide 46
CHAPTER 2 ST 544, D. Zhang
Slide 47
CHAPTER 2 ST 544, D. Zhang
Slide 48
CHAPTER 2 ST 544, D. Zhang
Slide 49
CHAPTER 2 ST 544, D. Zhang
? In general,
Lung Cancer
Yes No
Smoking Yes n11 n12
No n21 n22
n+1 n+2
where n+1 , n+2 , are all fixed.
n11 ⊥ n12
n11 ∼ Bin(n+1 , π1 ), π1 = P [smoking|case].
n12 ∼ Bin(n+2 , π2 ), π2 = P [smoking|control].
Slide 50
CHAPTER 2 ST 544, D. Zhang
Slide 51
CHAPTER 2 ST 544, D. Zhang
Slide 52
CHAPTER 2 ST 544, D. Zhang
Slide 53
CHAPTER 2 ST 544, D. Zhang
Slide 54
CHAPTER 2 ST 544, D. Zhang
Slide 55
CHAPTER 2 ST 544, D. Zhang
π11 π21
π1 = = = π2
π1+ π2+
Slide 56
CHAPTER 2 ST 544, D. Zhang
Slide 57
CHAPTER 2 ST 544, D. Zhang
1. Estimate of π1 − π2 :
n11 n21
p1 − p2 = − .
n1+ n2+
3. Large-sample (1 − α) CI for π1 − π2 :
p1 − p2 ± zα/2 SE(p1 − p2 ).
Slide 58
CHAPTER 2 ST 544, D. Zhang
Slide 60
CHAPTER 2 ST 544, D. Zhang
Slide 61
CHAPTER 2 ST 544, D. Zhang
Slide 62
CHAPTER 2 ST 544, D. Zhang
Slide 63
CHAPTER 2 ST 544, D. Zhang
Slide 64
CHAPTER 2 ST 544, D. Zhang
Slide 65
CHAPTER 2 ST 544, D. Zhang
• var(log θ)
b can be estimated by
1 1 1 1
var(log θ) =
b + + + .
n11 n12 n21 n22
Slide 66
CHAPTER 2 ST 544, D. Zhang
189×10933
θb = 10845×104 = 1.8321(≈ RR)
1 1 1 1
var(log θ)
b =
189 + 10845 + 104 + 10933 = 0.01509
√
95%CI for log θ: log(1.8321) ± 1.96 0.01509 = [0.3647, 0.8462].
Slide 67
CHAPTER 2 ST 544, D. Zhang
-
0 θ RR 1 RR θ
Slide 69
CHAPTER 2 ST 544, D. Zhang
Slide 70
CHAPTER 2 ST 544, D. Zhang
1 1 1 1
var(log θ)
b = + + + .
n11 n12 n21 n22
Slide 71
CHAPTER 2 ST 544, D. Zhang
1 1 1 1
var(log θ) =
b + + + = 0.0256.
172 173 90 346
95% CI for log θ:
√
log(3.82) ± 1.96 × 0.0256 = [1.02665, 1.65385]
95% CI for θ: [e1.02665 , e1.65385 ] = [2.79, 5.227].
• Since MI is a rare event, RR ≈ θ, so
d ≈ 3.82 ≈ 4.
RR
That is, ever smokers is about 3 times more likely to develop MI than
never smokers.
Slide 72
CHAPTER 2 ST 544, D. Zhang
Y
1 2 J
X 1 π11 π12 . π1J
2 π21 π22 . π2J
. . . . .
I πI1 πI2 . πIJ
Given data {nij }’s from a multinomial sampling, we would like to test
H0 : πij = πij (θ), for i = 1, .., I, and j = 1, ..., J, where θ is a parameter
vector with dim(θ) = k.
If dim(θ) = 0, then πij ’s are totally known under H0 .
Slide 73
CHAPTER 2 ST 544, D. Zhang
• MLE θb of θ under H0 ; µ
bij = nπij (θ),
b where n = n++ .
Some χ2 distributions
Slide 75
CHAPTER 2 ST 544, D. Zhang
df = IJ − 1 − (I − 1 + J − 1) = (I − 1)(J − 1).
Reject H0 : X ⊥ Y if χ2 or G2 ≥ χ2df,α .
Slide 76
CHAPTER 2 ST 544, D. Zhang
It can be shown that the Pearson χ2 and LRT test stats are the same
with the same null dist χ2(I−1)(J−1) .
Slide 77
CHAPTER 2 ST 544, D. Zhang
Y –Party Identification
Democrat Independent Republican Total
X – Gender Female 762 327 468 1557
Male 484 239 477 1200
1246 566 945 n = 2757
b11 = 1557 × 1246/2757 = 703.7,
Then µ
b12 = 1557 × 566/2757 = 319.6, etc.
µ
Slide 79
CHAPTER 2 ST 544, D. Zhang
Slide 80
CHAPTER 2 ST 544, D. Zhang
• Under H0 : X ⊥ Y , E(est st st
ij ) ≈ 0, var(eij ) ≈ 1, and eij behaves like a
N(0, 1) variable.
• We can use est
ij to check the departure from H0 : X ⊥ Y .
Slide 81
CHAPTER 2 ST 544, D. Zhang
Slide 82
CHAPTER 2 ST 544, D. Zhang
Y –Party Identification
Democrat Independent Republican Total
X – Gender Female 4.5 0.7 -5.3
Male -4.5 -0.7 5.3
We see from the table that the independence model does not fit the
data well.
Slide 83
CHAPTER 2 ST 544, D. Zhang
Slide 84
CHAPTER 2 ST 544, D. Zhang
Patient X Y
1 u1 v1
2 u1 v1
3 u1 v2
4 u1 v3
Y 5 u1 v3
v1 v2 v3 6 u1 v3
u1 2 1 3 ⇒ 7 u2 v1
X u2 1 2 1 8 u2 v2
u3 1 1 2 9 u2 v2
10 u2 v3
11 u3 v1
12 u3 v2
13 u3 v3
14 u3 v3
Slide 85
CHAPTER 2 ST 544, D. Zhang
1
Pn
n−1 − x̄)(yi − ȳ)
i=1 (xi
r= q Pn ,
1 2 1
P n 2
n−1 i=1 (xi − x̄) n−1 i=1 (yi − ȳ)
where
n I I
1X 1X X
x̄ = xi = ni+ ui = pi+ ui = ū
n i=1 n i=1 i=1
n J J
1X 1X X
ȳ = yi = n+j vj = p+j vj = v̄
n i=1 n j=1 j=1
Slide 86
CHAPTER 2 ST 544, D. Zhang
=⇒ PI PJ
i=1 pij (ui − ū)(vj − v̄)
j=1
r = qP
I 2
PJ 2
i=1 pi+ (ui − ū) j=1 p +j (v j − v̄)
Slide 87
CHAPTER 2 ST 544, D. Zhang
• How to choose scores {ui }’s for X and {vj }’s for Y :
1. Any increasing/decreasing seq is ok for {ui }’s and {vj }’s. They
have to be chosen before analyzing data.
2. Mid-rank. For example,
Y
1 2 3 ui
1 2 1 3 6 3.5
X 2 1 2 1 4 8.5
3 1 1 2 4 12.5
4 4 6
vj 2.5 6.5 11.5
Proc Freq order=data
tables x*y/CMH1 Scores=rank;
run;
Slide 88
CHAPTER 2 ST 544, D. Zhang
√ √
From slide 80, n − 1r = 28.98 = 5.4 ⇒ reject H0 : X ⊥ Y in
favor of H1 : ρ > 0 (even if r = 0.1).
Slide 89
CHAPTER 2 ST 544, D. Zhang
Alcohol Malformation
Consumption Present (Y = 1) Absent (Y = 0)
0 48 17, 066
<1 38 14, 464
1−2 5 788
3−5 1 126
≥6 1 37
Slide 90
CHAPTER 2 ST 544, D. Zhang
• SAS program:
data table2_7;
input alcohol malform count @@;
datalines;
0 1 48 0 0 17066
0.5 1 38 0.5 0 14464
1.5 1 5 1.5 0 788
4 1 1 4 0 126
7 1 1 7 0 37
;
title "Analysis of infant malformation data";
proc freq data=table2_7;
weight count;
tables alcohol*malform / measures chisq cmh;
run;
Slide 91
CHAPTER 2 ST 544, D. Zhang
Slide 92
CHAPTER 2 ST 544, D. Zhang
Consider
πi = P [Y = 1|X = ui ].
πi = α + βui
Then H0 : X ⊥ Y =⇒ H0∗ : β = 0
An unbiased estimate of πi :
ni1
π
bi = = pi ← sample proportion at X = ui
ni+
pi = α + βui + εi ,
Slide 93
CHAPTER 2 ST 544, D. Zhang
var(β)
b under H0 can be estimated by
var b = P p(1 − p)
c H0 (β) .
I 2
i=1 ni+ (ui − ū)
Slide 94
CHAPTER 2 ST 544, D. Zhang
βb
Z=q
var
c H0 (β)
b
Slide 95
CHAPTER 2 ST 544, D. Zhang
Slide 96
CHAPTER 2 ST 544, D. Zhang
Slide 97
CHAPTER 2 ST 544, D. Zhang
proc freq;
weight count;
tables x*y / cmh2;
run;
Slide 99
CHAPTER 2 ST 544, D. Zhang
• When {nij }’s are large, we can use the Pearson χ2 or LRT G2 to test
H0 : X ⊥ Y .
• However, when some cell counts {nij }’s are small, the exact dist. of
χ2 or LRT G2 under H0 may be far from χ21 , =⇒ use of asym. dist
may give wrong conclusions.
• Fisher’s tea example: Fisher’s colleague, Muriel Bristol claimed she
could tell whether or not tea (or milk) was added to the cup first.
Muriel’s Guess
Milk Tea
True Milk 3 1 4
Tea 1 3 4
4 4
Slide 101
CHAPTER 2 ST 544, D. Zhang
Slide 102
CHAPTER 2 ST 544, D. Zhang
• With multinomial sampling, (n11 , n12 , n21 , n22 ) are random variables
(only the sum n = n++ is fixed).
• Under H0 : θ = 1(X ⊥ Y ), πij = πi+ π+j , there are two unknown
π1+ , π+1 parameters. So the distribution of data (n11 , n12 , n21 , n22 ) is
unknown even under H0 .
• It can be shown that under H0 : θ = 1(X ⊥ Y ), the conditional
distribution of n11 |n1+ , n+1 is totally known:
n1+
n2+
t0 n+1 −t0
P [n11 = t0 ] = n
.
n+1
Slide 103
CHAPTER 2 ST 544, D. Zhang
Slide 104
CHAPTER 2 ST 544, D. Zhang
Slide 105
CHAPTER 2 ST 544, D. Zhang
Slide 106
CHAPTER 2 ST 544, D. Zhang
Slide 107
CHAPTER 2 ST 544, D. Zhang
Slide 108
CHAPTER 2 ST 544, D. Zhang
Slide 109
CHAPTER 2 ST 544, D. Zhang
Lung Cancer
Yes No
Second Hand Smoking Yes π11 π12
No π21 π22
Slide 110
CHAPTER 2 ST 544, D. Zhang
Slide 111
CHAPTER 2 ST 544, D. Zhang
53 × 176
ψ = 1.39, θ =
b b = 1.45
430 × 15
⇒ White defendants are (40%) more likely to receive a death penalty
than black defendants.
• Maybe the race of victims (Z) affects the XY association?
Slide 112
CHAPTER 2 ST 544, D. Zhang
Slide 113
CHAPTER 2 ST 544, D. Zhang
Slide 114
CHAPTER 2 ST 544, D. Zhang
Slide 115
CHAPTER 2 ST 544, D. Zhang
Slide 116
CHAPTER 2 ST 544, D. Zhang
θbXY = 1.45
53 × 37
θbXY (1) = = 0.43
11 × 414
0 × 139
θbXY (2) = =0
4 × 16
mod 0.5 × 139.5
θbXY (2) = = 0.94
4.5 × 16.5
Slide 117
CHAPTER 2 ST 544, D. Zhang
Slide 118
CHAPTER 2 ST 544, D. Zhang
Marginally,
Y
S F
X A 20 20
B 20 40
θbXY = 2 ⇒A>B
Slide 119
CHAPTER 2 ST 544, D. Zhang
Marginally,
Y
S F
X A 10 10
B 10 10
θbXY = 1 ⇒A=B
Slide 120
CHAPTER 2 ST 544, D. Zhang
When θXY (k) are not all the same, Z is called an effect modifier (there
is interaction).
0 Introduction
• In a simple linear regression model for continuous Y :
Y = α + βx + ε,
iid
usually ε ∼ N(0, σ 2 ).
Y = response
x = (numeric) covariate, indep or explanatory variable
β = E(Y |x + 1) − E(Y |x)
2β = E(Y |x + 2) − E(Y |x), etc.
Slide 122
CHAPTER 3 ST 544, D. Zhang
where x̄ is the sample mean of {xi }’s, ȳ is the sample mean of {yi }’s.
• α
b, βb have good statistical properties.
• Normality is Not required for the LS estimation.
Slide 123
CHAPTER 3 ST 544, D. Zhang
Slide 124
CHAPTER 3 ST 544, D. Zhang
iid
• Under ε ∼ N(0, σ 2 ) (so Y is also normal), the above model can be
re-written as
ind
Y |x ∼ N(α + βx, σ 2 ),
or equivalently
ind
Y |x ∼ N(µ(x), σ 2 ), µ(x) = α + βx
yi = response
xi = (x1i , x2i , · · · xpi ) covariate, indep or explanatory variable
Slide 126
CHAPTER 3 ST 544, D. Zhang
α + β1 x 1 + β2 x 2 + · · · + βp x p .
Slide 127
CHAPTER 3 ST 544, D. Zhang
g(µ) = α + β1 x1 + β2 x2 + · · · + βp xp .
g(µ) = µ.
Slide 128
CHAPTER 3 ST 544, D. Zhang
g(Y ) = α + βx + .
g(µ) = α + βx
Slide 129
CHAPTER 3 ST 544, D. Zhang
Slide 130
CHAPTER 3 ST 544, D. Zhang
? If y is binary (1/0) with 1 being the success (that is, we would like
to model P [Y = 1]), we should use descending option in Proc
Genmod.
Note: y and n are two variables in the data set. We don’t define a
new variable p = y/n and use “model p = x”. The / in y/n is
just a symbol.
• Data is organized in the same way as for Proc Reg of SAS.
Slide 131
CHAPTER 3 ST 544, D. Zhang
µ = E(Y ) = 1 × P [Y = 1] + 0 × P [Y = 0] = P [Y = 1] = π
g(π) = α + βx.
Slide 132
CHAPTER 3 ST 544, D. Zhang
π = α + βx.
β = π(x + 1) − π(x)
π = α + βx.
Slide 134
CHAPTER 3 ST 544, D. Zhang
Slide 135
CHAPTER 3 ST 544, D. Zhang
Slide 136
CHAPTER 3 ST 544, D. Zhang
Slide 137
CHAPTER 3 ST 544, D. Zhang
Slide 138
CHAPTER 3 ST 544, D. Zhang
• The plots indicates linear probability model with the chosen scores for
snoring may fit the data well (good choice of snoring scores).
• Consider linear probability model:
π = α + βx,
Slide 139
CHAPTER 3 ST 544, D. Zhang
• SAS output:
**************************************************************************
Snoring and heart disease data using score with identity link 13
The GENMOD Procedure
Model Information
Data Set WORK.TABLE3_1
Distribution Binomial
Link Function Identity
Response Variable (Events) y
Response Variable (Trials) n
Slide 140
CHAPTER 3 ST 544, D. Zhang
π
b = 0.017 + 0.0198x, x = 0, 2, 4, 5
Slide 141
CHAPTER 3 ST 544, D. Zhang
• From the fitted model, we can calculate the estimated heart disease
probability for each level of snoring:
Heart Disease Linear
Snoring(x) Yes (yi ) No ni pi Fit
0 Never 24 1355 1379 0.017 0.017
2 Occasionally 35 605 638 0.055 0.057
4 Nearly every night 21 192 213 0.099 0.096
5 Every night 30 224 254 0.118 0.116
b ≈ pi , the linear probability model fits the data
Since the fitted values π
well.
• The model has a nice interpretation: For non-snorers, the heart disease
prob is 0.017 (the intercept).
For occasional snorers, the HD prob increases 0.04 (more than double),
etc.
Slide 142
CHAPTER 3 ST 544, D. Zhang
• Note: We can recover the original binary data (1/0 – called hd in the
new data set) with 1 for heart disease, and use the following program
to get exactly the same results:
title "Snoring and binary heart disease in proc genmod";
proc genmod descending;
model hd = score / dist=bin link=identity;
run;
**************************************************************************
Analysis Of Maximum Likelihood Parameter Estimates
Standard Wald 95% Confidence Wald
Parameter DF Estimate Error Limits Chi-Square
Intercept 1 0.0172 0.0034 0.0105 0.0240 25.18
score 1 0.0198 0.0028 0.0143 0.0253 49.97
Scale 0 1.0000 0.0000 1.0000 1.0000
Slide 143
CHAPTER 3 ST 544, D. Zhang
• We can also fit a linear regression model to the binary data and will
get similar results.
title "Snoring and binary heart disease with LS approach";
proc reg;
model hd = score;
run;
************************************************************************
Parameter Standard
Variable DF Estimate Error t Value Pr > |t|
Intercept 1 0.01687 0.00516 3.27 0.0011
score 1 0.02004 0.00232 8.65 <.0001
Note: Since proc reg models E(Y ) = π, the above results should be
similar to the linear prob model with the option descending (if binary
response data is used).
Slide 144
CHAPTER 3 ST 544, D. Zhang
logit(π) = α + βx.
eαb+βx
b
π
b(x) = .
1+ eαb+βx
b
Slide 145
CHAPTER 3 ST 544, D. Zhang
Slide 146
CHAPTER 3 ST 544, D. Zhang
• Interpretation of β:
π(x)
π at x : log = α + βx
1 − π(x)
π(x + 1)
π at x + 1 : log = α + β(x + 1)
1 − π(x + 1)
π(x + 1) π(x)
log − log =β
1 − π(x + 1) 1 − π(x)
π(x + 1)/{1 − π(x + 1)}
β = log
π(x)/{1 − π(x)}
β π(x + 1)/{1 − π(x + 1)}
e =
π(x)/{1 − π(x)}
odds-ratio with one unit increase in x
π(x + 2)/{1 − π(x + 2)}
⇒ 2β = log
π(x)/{1 − π(x)}
log odds-ratio with two unit increase in x, etc.
Slide 147
CHAPTER 3 ST 544, D. Zhang
logit(π) = α + βx.
Slide 148
CHAPTER 3 ST 544, D. Zhang
Slide 149
CHAPTER 3 ST 544, D. Zhang
Slide 150
CHAPTER 3 ST 544, D. Zhang
title "Snoring and heart disease data using score with logit link";
proc genmod;
model y/n = score / dist=bin link=logit;
run;
**************************************************************************
Analysis Of Maximum Likelihood Parameter Estimates
Standard Wald 95% Confidence Wald
Parameter DF Estimate Error Limits Chi-Square
Intercept 1 -3.8662 0.1662 -4.1920 -3.5405 541.06
score 1 0.3973 0.0500 0.2993 0.4954 63.12
• We can also use the original binary response hd and use the following
SAS program with descending option and will get the same results.
title "Snoring and heart disease data using score with logit link";
proc genmod descending;
model hd = score / dist=bin link=logit;
run;
**************************************************************************
Analysis Of Maximum Likelihood Parameter Estimates
Standard Wald 95% Confidence Wald
Parameter DF Estimate Error Limits Chi-Square
Intercept 1 -3.8662 0.1662 -4.1920 -3.5405 541.06
score 1 0.3973 0.0500 0.2993 0.4954 63.12
log(π) = α + βx.
π = eα+βx .
Of course, the model is only reasonable if the model produces valid π’s
in (0,1) for x in the valid range.
Slide 153
CHAPTER 3 ST 544, D. Zhang
• Interpretation of β:
log π(x) = α + βx
log π(x + 1) = α + β(x + 1)
log π(x + 1) − log π(x) = β
π(x + 1)
β = log
π(x)
β π(x + 1)
e =
π(x)
RR with one unit increase in x
2β π(x + 2)
⇒ e =
π(x)
RR with two unit increase in x
Slide 154
CHAPTER 3 ST 544, D. Zhang
log(π) = α + βx.
Testing H0 : β = 0 ⇔ H0 : X ⊥ Y .
Slide 155
CHAPTER 3 ST 544, D. Zhang
Slide 156
CHAPTER 3 ST 544, D. Zhang
π = Φ(α + βx).
Φ−1 {π(x)} = α + βx
is true, then
logit {π(x)} ≈ α∗ + β ∗ x
with α∗ = 1.7α and β ∗ = 1.7β. However, the fitted probs from these 2
models will be similar.
Slide 157
CHAPTER 3 ST 544, D. Zhang
Note: 1.7 × (−2.0606) = −3.5, 1.7 × 0.1878 = 0.32, very close to the
estimates from the logistic model.
Slide 158
CHAPTER 3 ST 544, D. Zhang
• We can also use the original binary response hd and use the following
SAS program with descending option and will get the same results.
title "Snoring and heart disease data using score with logit link";
proc genmod descending;
model hd = score / dist=bin link=probit;
run;
**************************************************************************
Analysis Of Maximum Likelihood Parameter Estimates
Standard Wald 95% Confidence Wald
Parameter DF Estimate Error Limits Chi-Square
Intercept 1 -2.0606 0.0704 -2.1986 -1.9225 855.49
score 1 0.1878 0.0236 0.1415 0.2341 63.14
Therefore, all estimates will be the mirror image of those from the
previous probit model.
Slide 159
CHAPTER 3 ST 544, D. Zhang
Slide 160
CHAPTER 3 ST 544, D. Zhang
Slide 161
CHAPTER 3 ST 544, D. Zhang
III.1 Example: Female horseshoe crabs and their satellites (Table 3.2,
page 76-77)
Slide 163
CHAPTER 3 ST 544, D. Zhang
• Data (a subset):
data crab;
input color spine width satell weight;
weight=weight/1000; color=color-1;
datalines;
3 3 28.3 8 3050
4 3 22.5 0 1550
2 1 26.0 9 2300
4 3 24.8 0 2100
4 3 26.0 4 2600
3 3 23.8 0 2100
2 1 26.5 0 2350
4 2 24.7 0 1900
.
.
.
Slide 164
CHAPTER 3 ST 544, D. Zhang
b(x) = e−3.3048+0.1640x .
⇒µ
Slide 165
CHAPTER 3 ST 544, D. Zhang
Slide 166
CHAPTER 3 ST 544, D. Zhang
var(Y ) = φE(Y ),
φ− over-dispersion parameter.
• Estimation of φ using the Pearson statistic
X (yi − µ 2
1 bi )
φbP =
df µ
bi
This can be specified by scale=pearson or scale=p in Proc Genmod.
A common choice.
• Estimation of φ using the Deviance statistic:
2[log(LS ) − log(LM )]
φbD =
df
This can be specified by scale=deviance or scale=d in Proc
Genmod.
Slide 167
CHAPTER 3 ST 544, D. Zhang
Slide 168
CHAPTER 3 ST 544, D. Zhang
• With
q the option scale=pearson, the Pearson estimate
φbP = 1.7839, indicating a lot of over-dispersion.
• From the output, we see that we got the sameqestimates of α and β.
However, their standard errors are inflated by φb = 1.7839 (larger
SE’s).
• Based on the estimated model:
Slide 169
CHAPTER 3 ST 544, D. Zhang
Slide 170
CHAPTER 3 ST 544, D. Zhang
µ = α + βx.
⇒
1/2
1. A lot of over-dispersion: φbP = 1.7811.
2. Significant evidence against H0 : β = 0.
3. Fitted model: µb = −11.5321 + 0.5495x.
Slide 171
CHAPTER 3 ST 544, D. Zhang
Slide 172
CHAPTER 3 ST 544, D. Zhang
⇒ 1. D b = 1.1.
µ) = −4.0525 + 0.1921x. similar fit.
2. Fitted model: log(b
• Note: We don’t use the option scale=. There may be some
computational issue with neg. bin. dist.
Slide 174
CHAPTER 3 ST 544, D. Zhang
log(r) = α + βx,
• Example: British train accidents over time (Table 3.4, page 83):
Slide 176
CHAPTER 3 ST 544, D. Zhang
Slide 177
CHAPTER 3 ST 544, D. Zhang
1 T
= α + βx, ⇒ = α + βx
r µ
⇒
1
= α(1/T ) + β(x/T ).
µ
So the link function is g(µ) = µ−1 . If we define t1 for 1/T and x1
for x/T in our data set, then we can use the following program to
fit the above model:
proc genmod data=mydata;
model y = t1 x1 / noint dist=poi link=power(-1) scale=pearson;
run;
Slide 178
CHAPTER 3 ST 544, D. Zhang
G2 = 2(log L1 − log L0 ),
Slide 180
CHAPTER 3 ST 544, D. Zhang
2(68.4463−35.9898)
G2 = 3.1822 = 20.2, compared to χ21 .
Slide 181
CHAPTER 3 ST 544, D. Zhang
? Construct a (1 − α) CI for β:
[eβL , eβU ].
b b
Slide 182
CHAPTER 3 ST 544, D. Zhang
g(µ) = α + β1 x1 + · · · + βp xp
Slide 183
CHAPTER 3 ST 544, D. Zhang
logit{π(x)} = α + βx.
⇒ ML LM .
? A Saturated model has a separate πi for each value of x (perfect
fit).
⇒ ML LS .
Slide 184
CHAPTER 3 ST 544, D. Zhang
where E(y
b i )model is the est. mean of yi under current model,
var(y
c i )model is the est. variance of yi under current model.
Slide 186
CHAPTER 3 ST 544, D. Zhang
IV.3 Residuals
• We can obtain Deviance residuals or Pearson χ2 residuals after fitting
a GLM.
• Deviance residuals:
X
Dev = 2[log(LS ) − log(LM )] = di ,
1/2
rDi = di sign(yi − µi ) is the deviance residual.
• Standardized Deviance residuals is the standardized version of rDi .
Standardized deviance residuals can be used to identify outliers.
• Pearson residuals:
yi − µbi
ei = p .
var(y
c i)
E(ei ) ≈ 0, var(ei ) < 1.
Slide 187
CHAPTER 3 ST 544, D. Zhang
Slide 188
CHAPTER 4 ST 544, D. Zhang
4 Logistic Regression
π(x)
= eα+βx
1 − π(x)
eα+βx
π(x) = .
1 + eα+βx
Slide 189
CHAPTER 4 ST 544, D. Zhang
Slide 190
CHAPTER 4 ST 544, D. Zhang
logitπ(xi ) = α + βxi
is reasonable.
Slide 192
CHAPTER 4 ST 544, D. Zhang
data crab;
input color spine width satell weight;
weight=weight/1000; color=color-1;
y=(satell>0);
datalines;
3 3 28.3 8 3050
4 3 22.5 0 1550
2 1 26.0 9 2300
....
;
title "Define mid width for every [w+0.25, w+1.25)";
data crab; set crab;
if width <=23.25 then
mid_width = 22.75;
else if width <= 29.25 then
mid_width = ceil(width-0.25) - 0.25;
else
mid_width = 29.75;
run;
proc sort data=crab;
by mid_width;
run;
proc summary data=crab noprint;
var y;
by mid_width;
output out=crab2 sum=y;
run;
Slide 193
CHAPTER 4 ST 544, D. Zhang
• The above plot indicates that the logistic model may be reasonable.
Slide 194
CHAPTER 4 ST 544, D. Zhang
logitπ(xi ) = α + βxi .
Slide 195
CHAPTER 4 ST 544, D. Zhang
Response Profile
Ordered Total
Value y Frequency
1 1 111
2 0 62
Probability modeled is y=1.
Slide 196
CHAPTER 4 ST 544, D. Zhang
Slide 197
CHAPTER 4 ST 544, D. Zhang
Slide 198
CHAPTER 4 ST 544, D. Zhang
Slide 199
CHAPTER 4 ST 544, D. Zhang
Slide 200
CHAPTER 4 ST 544, D. Zhang
Covariate Y =1 Y =0
x1
X x2
.. .. ..
. . .
xI
n1 n0
• With a multinomial sample (random sample), or a product-binomial
sample on X, we can model π(x) = P [Y = 1|x].
• Assume the logistic model
logit{π(x)} = α + βx
is true in the population, we then can make inference on α and β using
Slide 201
CHAPTER 4 ST 544, D. Zhang
the data.
• However, for rare events (either in terms of Y = 1 or Y = 0), it is not
efficient to conduct a multinomial sampling or a product-binomial
sampling on X. A solution is to conduct case-control studies.
• Question: Suppose we have data from a case-control study, can we still
make inference on α, β (especially on β)?
• In a case-control study, we (randomly) sample n1 cases and n0
controls (we may over-sample or under-sample cases). Then their
exposure history (x) is identified.
• Let π ∗ (x) = P [Y = 1|x, design], then it can be shown that π ∗ (x) also
has a logistic model with the same slope β:
logit{π ∗ (x)} = α∗ + βx,
where α∗ depends on α and sampling prob’s for cases and controls.
We can ignore the design and fit the logistic model!
Logistic model is the ONLY GLM that has this invariance property!
Slide 202
CHAPTER 4 ST 544, D. Zhang
logitπ(x) = α + βx
Slide 203
CHAPTER 4 ST 544, D. Zhang
2. LRT Test:
Fit the full model logit{π(x)} = α + βx ⇒ `1
Fit the null model logit{π(x)} = α ⇒ `0
Compare G2 = 2(`1 − `0 ) to χ21 .
∂`
3. Score Test: based on U = ∂β .
H0
Proc Logistic of SAS reports all of them.
Slide 204
CHAPTER 4 ST 544, D. Zhang
or
Proc Genmod; * may need "descending" here;
model y = x / dist=bin LRCI;
* or model y/n = x / dist=bin LRCI;
Run;
ηb(x0 ) = α
b + βx
b 0,
var(b
η (x0 )) = var(b
α) + 2x0 cov(b b + x2 var(β)
α, β) b
0
c η (x0 ))}1/2 = [b
=⇒ (1 − α) CI for η(x0 ): ηb(x0 ) ± zα/2 {var(b η1 , ηb2 ]
=⇒ (1 − α) CI for π(x0 ):
ηb1
eηb2
e
η
,
1+e b 1 1 + eηb2
• Note: Need to use option covout in Proc Logistic, or option covb
in model statement of Proc GenMod to get cd
ov(b
α, β).
b
Slide 206
CHAPTER 4 ST 544, D. Zhang
(1 − α) CI for α∗ is α
b∗ ± zα/2 SE(b
c α∗ ) = [b b2∗ ].
α1∗ , α
α ∗ , α ∗ .
1+e b 1 1+e b 2
Slide 207
CHAPTER 4 ST 544, D. Zhang
e−12.351+0.497(26.5)
π
b(x0 ) = −12.351+0.497(26.5)
= 0.695.
1+e
e0.44 e1.21
[ , ] = [0.61, 0.77].
1 + e0.44 1 + e1.21
Slide 208
CHAPTER 4 ST 544, D. Zhang
• Note: The CI for π(26.5) can also be obtained from Proc Logistic:
proc logistic data=crab descending;
model y=width;
output out=out predicted=pihat lower=lower upper=upper / alpha=0.05;
run;
************************************************************************
mid_
Obs color spine width satell weight y width _LEVEL_ pihat lower upper
97 2 3 26.3 1 2.400 1 26.75 1 0.67400 0.59147 0.74700
98 1 1 26.5 0 2.350 0 26.75 1 0.69546 0.61205 0.76775
• If the value x0 is not in the data set, we can insert one data point with
x0 only (others are missing). For example, x0 = 22.8 is not in the data
set, then we insert one data point before we run the above program:
data x0;
input width y;
cards;
22.8 .
;
run;
data crab; set crab x0;
run;
***********************************************************************
mid_
Obs color spine width satell weight y width _LEVEL_ pihat lower upper
5 4 3 22.5 4 1.475 1 22.75 1 0.23810 0.12999 0.39528
6 . . 22.8 . . . 22.75 1 0.26621 0.15454 0.41861
Slide 209
CHAPTER 4 ST 544, D. Zhang
In the data set, at x = 26.5, there are 6 female crabs with 4 having
satellites. So another estimate of π(26.5) is p = 4/6 = 0.667. A large
sample 95% CI without using the logistic model is:
p
4/6 ± 1.96 0.667(1 − 0.667)/6 = [0.290, 1.044] = [0.29, 1].
The exact 95% CI for π(26.5) 4/6 is [0.22, 0.96]. Both the large
sample and exact CIs are much wider than the one based on the model.
Slide 210
CHAPTER 4 ST 544, D. Zhang
Y =1 Y =0 Y =1 Y =0
1 14 93 109 1 11 52 63
X 0 32 81 113 X 0 12 43 55
Z=1 Z=0
Slide 211
CHAPTER 4 ST 544, D. Zhang
logit{π(x, z)} = α + β1 x + β2 z.
• Model implies:
logitπ(x = 1, z) = α + β1 + β2 z
logitπ(x = 0, z) = α + 0 + β2 z
⇒ β1 = logitπ(x = 1, z) − logitπ(x = 0, z)
π(x = 1, z)/{1 − π(x = 1, z)}
⇒ eβ1 =
π(x = 0, z)/{1 − π(x = 0, z)}
Slide 212
CHAPTER 4 ST 544, D. Zhang
logitπ(x, z = 1) = α + β1 x + β2
logitπ(x, z = 0) = α + β1 x + 0
⇒ β2 = logitπ(x, z = 1) − logitπ(x, z = 0)
π(x, z = 1)/{1 − π(x, z = 1)}
⇒ eβ2 =
π(x, z = 0)/{1 − π(x, z = 0)}
⇒ The partial associations between Z and Y are the same at X = 0
and X = 1 and are equal to eβ2 .
⇒ homogeneous ZY association across levels of X.
Y =1 Y =0 Y =1 Y =0
1 1
X 0 X 0
Z=1 Z=0
we can fit the above homogeneous model and test the above
conditional independence hypotheses (particularly X ⊥ Y |Z) under the
assumed model using the Wald, LRT and score test.
Slide 214
CHAPTER 4 ST 544, D. Zhang
proc genmod;
model sym/n = azt race / dist=bin link=logit type3 lrci;
run;
Slide 215
CHAPTER 4 ST 544, D. Zhang
******************************************************************************
Criteria For Assessing Goodness Of Fit
Criterion DF Value Value/DF
Deviance 1 1.3835 1.3835
Scaled Deviance 1 1.3835 1.3835
Pearson Chi-Square 1 1.3910 1.3910
Scaled Pearson X2 1 1.3910 1.3910
Slide 216
CHAPTER 4 ST 544, D. Zhang
Slide 217
CHAPTER 4 ST 544, D. Zhang
βb1 = −0.72
β
e = 0.49
b1
SE(
d βb1 ) = 0.2790
95% LRCI for β1 = [−1.2773, −0.1799]
95% LRCI for eβ1 = [e−1.2773 , e−0.1799 ] = [0.28, 0.84].
⇒ For each race, the odds of having AIDS symptom for patients with
immediate AZT treatment is only about half of the odds for patients
with delayed AZT treatment.
• Note 1: The first program also gives goodness-of-fit Pearson
χ2 = 1.39 and deviance=1.38, with df = 1, p-value=0.24, indicating
reasonable fit of the model to the data.
Slide 218
CHAPTER 4 ST 544, D. Zhang
Model implies:
logitπ(x = 1, z) = α + β 1 + β 2 z + β3 z
logitπ(x = 0, z) = α + 0 + β2 z + 0
⇒ logitπ(x = 1, z) − logitπ(x = 0, z) = β1 + β3 z
π(x = 1, z)/{1 − π(x = 1, z)}
⇒ = eβ1 +β3 z
π(x = 0, z)/{1 − π(x = 0, z)}
The model allows different treatment effects for different races.
Slide 219
CHAPTER 4 ST 544, D. Zhang
proc genmod;
model sym/n = azt race azt*race / dist=bin type3 lrci;
run;
*******************************************************************
Analysis Of Maximum Likelihood Parameter Estimates
Likelihood Ratio
Standard 95% Confidence Wald
Parameter DF Estimate Error Limits Chi-Square Pr > ChiSq
Intercept 1 -1.2763 0.3265 -1.9611 -0.6692 15.28 <.0001
azt 1 -0.2771 0.4655 -1.2024 0.6394 0.35 0.5518
race 1 0.3476 0.3875 -0.3930 1.1367 0.80 0.3698
azt*race 1 -0.6878 0.5852 -1.8452 0.4599 1.38 0.2399
Scale 0 1.0000 0.0000 1.0000 1.0000
NOTE: The scale parameter was held fixed.
The Wald and LRT statistics are all equal to 1.38 (df = 1), with
p-value=0.24.
The LRT statistic 1.38 is the same as the deviance 1.38 from the
homogeneous model since the model with interaction is saturated.
Slide 220
CHAPTER 4 ST 544, D. Zhang
S F S F S F S F
trt 6 11 1 10 1 4 4 2
control 0 12 0 10 1 8 6 1
Z=5 Z=6 Z=7 Z=8
What we observed: There is a lot of variation in success
probabilities among centers.
Slide 221
CHAPTER 4 ST 544, D. Zhang
Slide 222
CHAPTER 4 ST 544, D. Zhang
Slide 223
CHAPTER 4 ST 544, D. Zhang
Slide 224
CHAPTER 4 ST 544, D. Zhang
SE(
d β) b = 0.3067 ⇒ 95% Wald CI of β: [0.176, 1.378], 95% Wald CI
for eβ : [1.19, 3.97]
Wald test for H0 : β = 0(X ⊥ Y |Z) : χ2 = 6.42, p-value=0.01
Score test for H0 : β = 0(X ⊥ Y |Z) : χ2 = 6.56, p-value=0.01.
Slide 225
CHAPTER 4 ST 544, D. Zhang
Slide 226
CHAPTER 4 ST 544, D. Zhang
The above program also gives the Pearson χ2 = 8.03 and deviance =
9.75 with df = 7 for goodness-of-fit (p-values = 0.33 and 0.20).
Slide 227
CHAPTER 4 ST 544, D. Zhang
Slide 228
CHAPTER 4 ST 544, D. Zhang
Slide 229
CHAPTER 4 ST 544, D. Zhang
Slide 230
CHAPTER 4 ST 544, D. Zhang
Slide 231
CHAPTER 4 ST 544, D. Zhang
Slide 232
CHAPTER 4 ST 544, D. Zhang
logit{π(x)} = α + β1 x1 + β2 x2 + · · · + βp xp .
Slide 234
CHAPTER 4 ST 544, D. Zhang
M1 : logit{π(x, c)} = α + β1 c1 + β2 c2 + β3 c3 + β4 x
Slide 236
CHAPTER 4 ST 544, D. Zhang
βb1 = 1.330, eβ1 = e1.330 = 3.78. The odds that medium light crabs
b
have satellites is 3.78 times the odds that dark crabs have satellites.
For crabs with the same color, one cm increase in carapace width will
increase the odds by e0.468 − 1 = 0.60 (60%).
From the fitted model, we can obtain a fitted model for crabs with a
particular color. For example, for medium light crabs with width x, the
fitted model is
Slide 237
CHAPTER 4 ST 544, D. Zhang
Slide 238
CHAPTER 4 ST 544, D. Zhang
Slide 239
CHAPTER 4 ST 544, D. Zhang
However, the estimated effects from these 2 models are very different.
Slide 240
CHAPTER 4 ST 544, D. Zhang
• Fitted model (M1 ) and Figure 4.4 showed that c1 , c2 and c3 have
similar effects, indicating that we can group crabs with colors 1, 2, 3
and divide crabs into 2 groups: non-dark (color = 1, 2, 3) and dark
(color = 4). Denote c = 1 for non-dark crabs and c = 0 for dark crabs
and consider the model
M3 : logit{π(x, c)} = α + β1 c + β2 x
Slide 241
CHAPTER 4 ST 544, D. Zhang
M4 : logit{π(x, c)} = α + β1 c + β2 x + β3 c × x.
Slide 242
CHAPTER 4 ST 544, D. Zhang
The LRT for the interaction: G2 = 1.17 (df = 1), p-value=0.28, not
significant.
Slide 243
CHAPTER 4 ST 544, D. Zhang
logit{π(x)} = α + β1 x1 + β2 x2 + · · · + βp xp .
Slide 244
CHAPTER 4 ST 544, D. Zhang
• When [Y = 1|x]’s are not rare events (π(x)’s are not close to 0), we
can apply the linear approximation to π(x):
∂π(x)
= βk π(x){1 − π(x)}.
∂xk
⇒ With 1 unit increase in xk , the success probability will increase
additively by approximately βk π(x){1 − π(x)}.
Slide 245
CHAPTER 4 ST 544, D. Zhang
• For example, for the crab data with the fitted model:
M3 : logit{π(x, c)} = −12.980 + 1.301c + 0.478x,
where c = 1 for non-dark crabs, c = 0 for dark crabs, x = carapace
width.
The approximation will be better if we use π(c̄, x̄) = 0.674 for 0.51:
0.478 × 0.674(1 − 0.674) = 0.105.
Slide 247
CHAPTER 5 ST 544, D. Zhang
Slide 248
CHAPTER 5 ST 544, D. Zhang
Slide 250
CHAPTER 5 ST 544, D. Zhang
The table indicates that model 5 (M3 on slide 241) may be considered
the final model.
Slide 251
CHAPTER 5 ST 544, D. Zhang
Slide 252
CHAPTER 5 ST 544, D. Zhang
Slide 254
CHAPTER 5 ST 544, D. Zhang
• An example:
Y π
b Yb0.3− Yb0.4− Yb0.5− Yb0.6− Yb0.7− Yb0.8− Yb0.8+
1 0.8 1 1 1 1 1 1 0
1 0.6 1 1 1 1 0 0 0
1 0.4 1 1 0 0 0 0 0
0 0.7 1 1 1 1 1 0 0
0 0.5 1 1 1 0 0 0 0
0 0.3 1 0 0 0 0 0 0
Yb
Y 1 0
1 3 0 3 0 2 1 2 1 1 2 1 2 0 3
0 3 0 2 1 2 1 1 2 1 2 0 3 0 3
3 3 2 2 1 1 0
se = 3
se = 3
se = 3
se = 3
se = 3
se = 3
se = 3
0 1 1 2 2 3 3
sp = 3
sp = 3
sp = 3
sp = 3
sp = 3
sp = 3
sp = 3
Slide 256
CHAPTER 5 ST 544, D. Zhang
Slide 257
CHAPTER 5 ST 544, D. Zhang
Slide 258
CHAPTER 5 ST 544, D. Zhang
• If there are ties in πbi ’s, need to do some adjustment. For example,
suppose two π bi ’ for a Yi = 1 and a Yi = 0 are the same (0.4):
Y π
b Yb0.4− Yb0.5− Yb0.6− Yb0.7− Yb0.8− Yb0.8+
1 0.8 1 1 1 1 1 0
1 0.6 1 1 1 0 0 0
1 0.4 1 0 0 0 0 0
0 0.7 1 1 1 1 0 0
0 0.5 1 1 0 0 0 0
0 0.4 1 0 0 0 0 0
The corresponding classification tables are:
Yb
Y 1 0
1 3 0 2 1 2 1 1 2 1 2 0 3
0 3 0 2 1 1 2 1 2 0 3 0 3
3 2 2 1 1 0
se = 3
se = 3
se = 3
se = 3
se = 3
se = 3
0 1 2 2 3 3
sp = 3
sp = 3
sp = 3
sp = 3
sp = 3
sp = 3
Slide 259
CHAPTER 5 ST 544, D. Zhang
Slide 260
CHAPTER 5 ST 544, D. Zhang
• AUC = 5.5
9
9 = # of pairs with diff outcomes
5.5 = # of concordant pairs (5) + 0.5 × # of ties in π
bi ’s with diff.
outcomes (1).
• Note: For binomial data, we need to decompose them as binary data.
There will be a lot tied predicted probabilities.
• The program to get π
bi , ROC curve and c-index:
Proc logistic; * may need descending for binary y;
model y/n = x / outroc=roc;
output out=outpred predicted=pihat;
run;
title "ROC Plot";
symbol1 v=dot i=join;
proc gplot data=roc;
plot _sensit_*_1mspec_;
run;
Slide 261
CHAPTER 5 ST 544, D. Zhang
• SAS program and output for the logistic model for crab data:
M3 : logit{π(x, c)} = α + β1 c + β2 x
Slide 262
CHAPTER 5 ST 544, D. Zhang
Slide 263
CHAPTER 5 ST 544, D. Zhang
log{(π(x)} = α + βx
fits the data well, we can fit a more complex model such as
log{(π(x)} = α + β1 x + β2 x2 .
and test H0 : β2 = 0 using the Wald, score and LRT tests. LRT is
usually preferred.
Slide 264
CHAPTER 5 ST 544, D. Zhang
II.2 Goodness of fit using deviance and Pearson χ2 for grouped data
• For binomial data like the Snoring/Heart disease example:
Heart Disease
x Yes (yi ) No ni
0 Never 24 1355 1379
Snoring 2 Occasionally 35 605 638
4 Nearly every night 21 192 213
5 Every night 30 224 254
where ni → ∞, we can use the deviance or Pearson χ2 to check the
goodness of fit of the logistic model
logit{(π(x)} = α + βx.
Slide 265
CHAPTER 5 ST 544, D. Zhang
Slide 266
CHAPTER 5 ST 544, D. Zhang
Slide 267
CHAPTER 5 ST 544, D. Zhang
Slide 268
CHAPTER 5 ST 544, D. Zhang
logit(πi ) = α + βxi .
b, βb ⇒ π
After we got α bi :
eαb+xi β
b
π
bi = .
1+ eαb+xi βb
• Pearson Residual:
yi − ni π
bi
ei = p
bi (1 − π
ni π bi )
• Standardized Pearson residual
yi − n i π
bi yi − n i π
bi ei
est
i = =p =√
SE bi (1 − π
ni π bi )(1 − hi ) 1 − hi
where hi is the ith element of the hat matrix.
Slide 269
CHAPTER 5 ST 544, D. Zhang
• E(est st st
i ) ≈ 0, var(ei ) ≈ 1 for large ni . So ei behaves like a N(0,1)
random variable. Large est st
i ( |ei | > 2) indicates potential outlier.
• Plots of est
i v.s. xi or xi β may detect lack of fit.
b
Slide 270
CHAPTER 5 ST 544, D. Zhang
Slide 271
CHAPTER 5 ST 544, D. Zhang
Slide 272
CHAPTER 5 ST 544, D. Zhang
Slide 274
CHAPTER 5 ST 544, D. Zhang
Slide 277
CHAPTER 5 ST 544, D. Zhang
Slide 278
CHAPTER 5 ST 544, D. Zhang
M1 : logit(πi ) = α + βx1i
and
M2 : logit(πi ) = α + βx2i ?
Slide 280
CHAPTER 5 ST 544, D. Zhang
Complete separation in x1
If we fit M1 , α → −∞, β → ∞.
How about M2 ?
Slide 281
CHAPTER 5 ST 544, D. Zhang
Slide 282
CHAPTER 5 ST 544, D. Zhang
Slide 283
CHAPTER 5 ST 544, D. Zhang
proc genmod;
class center;
model y/n = center trt / noint;
run;
*********************************************************************************
Analysis Of Maximum Likelihood Parameter Estimates
Standard Wald 95% Wald
Parameter DF Estimate Error Confidence Limits Chi-Square Pr > ChiSq
Intercept 0 0.0000 0.0000 0.0000 0.0000 . .
center 1 1 -28.0221 213410.4 -418305 418248.7 0.00 0.9999
center 2 1 -4.2025 1.1891 -6.5331 -1.8720 12.49 0.0004
center 3 1 -27.9293 188688.5 -369851 369794.7 0.00 0.9999
center 4 1 -0.9592 0.6548 -2.2426 0.3242 2.15 0.1430
center 5 1 -2.0223 0.6700 -3.3354 -0.7092 9.11 0.0025
trt 1 1.5460 0.7017 0.1708 2.9212 4.85 0.0276
Scale 0 1.0000 0.0000 1.0000 1.0000
• From the output, we know that for centers 1 & 3, βbkZ = −∞.
• βb = 1.546, SE(
d β)b = 0.702, p-value from Wald test = 0.0276. May
not be valid!
Slide 284
CHAPTER 5 ST 544, D. Zhang
Slide 285
CHAPTER 5 ST 544, D. Zhang
Slide 286
CHAPTER 5 ST 544, D. Zhang
Slide 287
CHAPTER 5 ST 544, D. Zhang
• However, since the tables are sparse, all three tests may not be valid ⇒
exact conditional inference!
Slide 288
CHAPTER 5 ST 544, D. Zhang
Slide 289
CHAPTER 5 ST 544, D. Zhang
• Note: Since the above exact test is based on the conditional dist. of
n11k given margins, which is the dist that CMH test is based, it can be
shown that the above exact score test is actually the exact CMH test!
Compare this to the large-sample CMH test on the next slide.
data y1; set fungal;
count=y;
drop y0;
y=1;
run;
Slide 290
CHAPTER 5 ST 544, D. Zhang
Slide 291
CHAPTER 5 ST 544, D. Zhang
logit{π(x)} = α + β1 x1 + β2 x2 + · · · + βp xp
Slide 292
CHAPTER 5 ST 544, D. Zhang
logit(P [Y = 1]) = α + βx
Slide 293
CHAPTER 5 ST 544, D. Zhang
logit{π(x)} = α + βx.
It can be shown that the resulting exact score test is the exact
Cochran-Armitage trend test.
• Example: Mother’s alcohol consumption and infant malformation
Alcohol Malformation
Consumption Present (Y = 1) Absent (Y = 0)
0 (0) 48 17, 066
< 1 (0.5) 38 14, 464
1 − 2 (1.5) 5 788
3 − 5 (4) 1 126
≥ 6 (7) 1 37
Slide 294
CHAPTER 5 ST 544, D. Zhang
|T | ≥ zα/2 ,
Slide 296
CHAPTER 5 ST 544, D. Zhang
P [T ≥ zα/2 |Ha : π1 − π2 = δ] = 1 − β.
• Assume equal sample size for each group: n1 = n2 , then the above
power statement leads to (approximately)
"
p1 − p2 − δ
P p
π1 (1 − π1 )/n1 + π2 (1 − π2 )/n1
#
δ
≥ zα/2 − p Ha = 1 − β
π1 (1 − π1 )/n1 + π2 (1 − π2 )/n1
⇒
√ p
P [Z ≥ zα/2 − δ n1 / π1 (1 − π1 ) + π2 (1 − π2 )] = 1 − β
Slide 297
CHAPTER 5 ST 544, D. Zhang
⇒
√ p
zα/2 − δ n1 / π1 (1 − π1 ) + π2 (1 − π2 ) = −zβ
⇒
(zα/2 + zβ )2 [π1 (1 − π1 ) + π2 (1 − π2 )]
n1 = n2 = 2
.
(π1 − π2 )
• For example, if we would like to detect Ha : π1 = 0.3, π2 = 0.2 with
90% power at level 0.05, then
(z0.05/2 + z0.1 )2 [0.3(1 − 0.3) + 0.2(1 − 0.2)]
n1 = n2 =
(0.3 − 0.2)2
(1.96 + 1.28)2 [0.3(1 − 0.3) + 0.2(1 − 0.2)]
= 2
= 388.4 = 389.
(0.3 − 0.2)
Slide 298
CHAPTER 6 ST 544, D. Zhang
Note: Each quantity on the LHS is a generalized logit. π1 (xi )/πJ (xi )
is the conditional odds that Yi is in cell 1 v.s. that Yi is in cell J given
that Yi is in either cell 1 or cell J.
Slide 300
CHAPTER 6 ST 544, D. Zhang
Slide 301
CHAPTER 6 ST 544, D. Zhang
Slide 302
CHAPTER 6 ST 544, D. Zhang
Slide 303
CHAPTER 6 ST 544, D. Zhang
• Software:
Proc Logistic;
freq count;
model y (ref="1") = x / link=glogit aggregate=(x) scale=none;
run;
• Want to see how alligators’ size (length) affects their food choice.
Slide 305
CHAPTER 6 ST 544, D. Zhang
Slide 306
CHAPTER 6 ST 544, D. Zhang
Slide 307
CHAPTER 6 ST 544, D. Zhang
Slide 309
CHAPTER 6 ST 544, D. Zhang
Slide 310
CHAPTER 6 ST 544, D. Zhang
Slide 312
CHAPTER 6 ST 544, D. Zhang
Table of racesex by y
racesex y
Frequency |
Row Pct | 1| 2| 3| Total
---------------+--------+--------+--------+
Black Female | 64 | 9 | 15 | 88
| 72.73 | 10.23 | 17.05 |
---------------+--------+--------+--------+
Black Male | 25 | 5 | 13 | 43
| 58.14 | 11.63 | 30.23 |
---------------+--------+--------+--------+
White Female | 371 | 49 | 74 | 494
| 75.10 | 9.92 | 14.98 |
---------------+--------+--------+--------+
White Male | 250 | 45 | 71 | 366
| 68.31 | 12.30 | 19.40 |
---------------+--------+--------+--------+
Total 710 108 173 991
Slide 313
CHAPTER 6 ST 544, D. Zhang
Slide 314
CHAPTER 6 ST 544, D. Zhang
Slide 315
CHAPTER 6 ST 544, D. Zhang
• Compared to the saturated model, this model has a good fit (small
deviance and Pearson χ2 - valid for non-sparse contingency tables.
• Gender has a significant overall effect, race is not significant!
Slide 316
CHAPTER 6 ST 544, D. Zhang
Slide 317
CHAPTER 6 ST 544, D. Zhang
Slide 319
CHAPTER 6 ST 544, D. Zhang
Slide 320
CHAPTER 6 ST 544, D. Zhang
Slide 321
CHAPTER 6 ST 544, D. Zhang
eαj +βx
τj (x) = , j = 1, 2..., J − 1
1 + eαj +βx
⇒
π1 (x) = τ1 (x)
π2 (x) = τ2 (x) − τ1 (x)
...
πj (x) = τj (x) − τj−1 (x)
...
πJ−1 (x) = τJ−1 (x) − τJ−2 (x)
πJ (x) = 1 − τJ−1 (x)
Slide 322
CHAPTER 6 ST 544, D. Zhang
Slide 323
CHAPTER 6 ST 544, D. Zhang
Slide 324
CHAPTER 6 ST 544, D. Zhang
Slide 325
CHAPTER 6 ST 544, D. Zhang
Slide 326
CHAPTER 6 ST 544, D. Zhang
Slide 327
CHAPTER 6 ST 544, D. Zhang
Slide 328
CHAPTER 6 ST 544, D. Zhang
θj (x) = e−0.366+0.509x
= e−0.366+0.509 = 1.15 for Democrats (x = 1)
=
= e−0.366+0 = 0.69 for Republicans (x = 0)
Slide 329
CHAPTER 6 ST 544, D. Zhang
Slide 330
CHAPTER 6 ST 544, D. Zhang
Slide 331
CHAPTER 6 ST 544, D. Zhang
• We can also consider a more complicated model with different β’s for
different category j for the same x and conduct a score test. For
example, for the political ideology example,
Slide 332
CHAPTER 6 ST 544, D. Zhang
P [Y ≤ j]
log = αj + β1 x1 + β2 x2 , j = 1, 2, 3.
1 − P [Y ≤ j]
Slide 333
CHAPTER 6 ST 544, D. Zhang
Slide 334
CHAPTER 6 ST 544, D. Zhang
Cumulative logistic model for mental impairment example with main effects 1
The LOGISTIC Procedure
Probabilities modeled are cumulated over the lower Ordered Values.
Slide 335
CHAPTER 6 ST 544, D. Zhang
Slide 336
CHAPTER 6 ST 544, D. Zhang
? Fitted model:
⇒ The odds for subjects with higher SES to have better mental
health is e1.1111 = 3.038 times the odds for subjects lower SES to
have better mental health.
⇒ The odds for subjects with one less life event index to have
better mental health is e0.3189 = 1.38 times the odds for subjects
with one more life event index to have better mental health.
Slide 337
CHAPTER 6 ST 544, D. Zhang
Slide 338
CHAPTER 6 ST 544, D. Zhang
• Note 1: The score GOF test for the cumulative logit model
P [Y ≤ j]
log = αj + β1 x1 + β2 x2 , j = 1, 2, 3,
1 − P [Y ≤ j]
has test statistic = 2.33 with df :
df = (J − 2) × dim(x) = (4 − 2) × 2 = 4.
title "Fitting the above cumulative logistic model using proc genmod";
proc genmod; * default is ascending, may put order=data or descending here;
* we can put a freq statement here;
model mental = life ses / dist=multinomial link=cumlogit
aggregate=(life ses);
run;
Slide 339
CHAPTER 6 ST 544, D. Zhang
Slide 340
CHAPTER 6 ST 544, D. Zhang
Slide 341
CHAPTER 6 ST 544, D. Zhang
Y ∗ = −βx + ε,
such that
[Y = j] ⇐⇒ αj−1 < Y ∗ ≤ αj
Slide 342
CHAPTER 6 ST 544, D. Zhang
Then
τj (x) = P [Y ≤ j|x]
= P [Y ∗ ≤ αj |x]
= P [Y ∗ + βx ≤ αj + βx|x]
= P [ε ≤ αj + βx|x]
= G(αj + βx).
Slide 343
CHAPTER 6 ST 544, D. Zhang
logit(τj ) = αj + βx,
logit(P [Y ≤ j]) = αj + β1 x1 + β2 x2 , j = 1, 2, 3.
Slide 344
CHAPTER 6 ST 544, D. Zhang
Slide 345
CHAPTER 6 ST 544, D. Zhang
Slide 346
CHAPTER 6 ST 544, D. Zhang
• What we observed:
Overall, the original model is more efficient (with smaller SE’s for
model parameter estimates), even though the model with combined
categories has a better fit! (P-value from score test is 0.9142)
Slide 347
CHAPTER 6 ST 544, D. Zhang
Slide 348
CHAPTER 6 ST 544, D. Zhang
Slide 349
CHAPTER 6 ST 544, D. Zhang
Slide 350
CHAPTER 6 ST 544, D. Zhang
Slide 351
CHAPTER 6 ST 544, D. Zhang
Slide 352
CHAPTER 6 ST 544, D. Zhang
Slide 353
CHAPTER 6 ST 544, D. Zhang
Slide 354
CHAPTER 6 ST 544, D. Zhang
proc logistic;
freq count;
model y = x / aggregate scale=none;
run;
*************************************************************************
Testing Global Null Hypothesis: BETA=0
Test Chi-Square DF Pr > ChiSq
Likelihood Ratio 54.7744 1 <.0001
Score 53.5619 1 <.0001
Wald 53.8161 1 <.0001
Analysis of Maximum Likelihood Estimates
Standard Wald
Parameter DF Estimate Error Chi-Square Pr > ChiSq
Intercept 1 1 -0.9555 0.1559 37.5777 <.0001
Intercept 2 1 1.9249 0.1627 139.9875 <.0001
x 1 -0.5575 0.0760 53.8161 <.0001
Slide 357
CHAPTER 6 ST 544, D. Zhang
logit(P [Y ≤ j]) = αj + β1 x1 + β2 x2 , j = 1, 2
Slide 358
CHAPTER 6 ST 544, D. Zhang
Slide 359
CHAPTER 6 ST 544, D. Zhang
IV.2 Tests of X ⊥ Y |Z
• Test independence between income (X) and job satisfaction (Y ) given
gender (Z). Data – 1991 GSS.
Slide 360
CHAPTER 6 ST 544, D. Zhang
Slide 361
CHAPTER 6 ST 544, D. Zhang
logit(P [Y ≤ j]) = αj + βx + βz z, j = 1, 2, 3.
proc logistic;
freq count;
class gender / param=ref;
model y = gender / aggregate=(income gender) scale=none;
run;
Deviance and Pearson Goodness-of-Fit Statistics
Criterion Value DF Value/DF Pr > ChiSq
Deviance 19.6230 20 0.9812 0.4817
Pearson 20.9457 20 1.0473 0.4003
proc logistic;
freq count;
class gender / param=ref;
model y = incscore gender / aggregate=(income gender) scale=none;
run;
Deviance and Pearson Goodness-of-Fit Statistics
Criterion Value DF Value/DF Pr > ChiSq
Deviance 13.9519 19 0.7343 0.7865
Pearson 14.3128 19 0.7533 0.7652
logit(P [Y ≤ j]) = αj + β1 x1 + β2 x2 + β3 x3 + βz z, j = 1, 2, 3
Slide 363
CHAPTER 6 ST 544, D. Zhang
Slide 365
CHAPTER 8 ST 544, D. Zhang
• If we convert table to
Yes No
Pay higher taxes 359 785 1144
Cut living standard 334 810 1144
P [Y1 = 1]: π
b1 = 359/1144 = 0.314
P [Y2 = 1]: π
b2 = 334/1144 = 0.292
b1 − π
Difference π b2 = 0.022
π1 − π
var(b b2 )?
π1 − π
No way to get var(b b2 ) if data is summarized using this table.
Need to go back to the original table!
Slide 367
CHAPTER 8 ST 544, D. Zhang
Slide 368
CHAPTER 8 ST 544, D. Zhang
Slide 370
CHAPTER 8 ST 544, D. Zhang
⇒
2 n12 + n21 n12 + n21
var(δ)
bH
0
= × =
n 2n n2
δb2
χ2 =
var(δ)
bH
0
Slide 371
CHAPTER 8 ST 544, D. Zhang
Slide 372
CHAPTER 8 ST 544, D. Zhang
Slide 373
CHAPTER 8 ST 544, D. Zhang
• Note: The McNemar’s test can be derived from the Pearson χ2 test.
Slide 374
CHAPTER 8 ST 544, D. Zhang
Let π(x) = P [Y = 1|X = x]. If we fit a GLM link to π(x) with the
identity
π(x) = α + βx,
then β = δ, the risk difference.
As we indicated before, vd
ar(δ)
b cannot be derived from this table and
we need to go back to the original table
Slide 375
CHAPTER 8 ST 544, D. Zhang
Slide 376
CHAPTER 8 ST 544, D. Zhang
Slide 377
CHAPTER 8 ST 544, D. Zhang
Slide 378
CHAPTER 8 ST 544, D. Zhang
Slide 379
CHAPTER 8 ST 544, D. Zhang
population-averaged odds-ratio.
• We can also consider models at the individual level
Y
X Yes (1) No (0)
Pay higher taxes (1) yi1 1 − yi1 1
Cut living standard (0) yi2 1 − yi2 1
Let πi (x) = P [Yij = 1|x, αi ] the individual probability of responding
“Yes” to question j and consider the logit model:
logit{πi (x)} = αi + βs x,
logit{πi (x)} = αi + βs x, i = 1, 2, · · · , n.
Slide 384
CHAPTER 8 ST 544, D. Zhang
• Let Yij = 1/0 for MI/control for subject j in pair i, x = 1/0 for
diabetes/no diabetes. There are 144 tables like the following:
Y
1 0 1 0 1 0 1 0
X 1 1 1 0 1 1 0 0 0
0 0 0 1 0 0 1 1 1
Type I: 9 III: 16 II: 37 IV: 82
Slide 385
CHAPTER 8 ST 544, D. Zhang
Slide 386
CHAPTER 8 ST 544, D. Zhang
Slide 387
CHAPTER 8 ST 544, D. Zhang
Slide 388
CHAPTER 8 ST 544, D. Zhang
⇒
[n11 (1 − 1) + n12 (1 − 0.5) + n21 (0 − 0.5) + n22 (0 − 0)]2
χ2CM H =
n11 × 0 + n12 × 0.25 + n21 × 0.25 + n22 × 0
(n12 − n21 )2
= ,
n21 + n12
the same as the McNemar’s test!
Slide 389
CHAPTER 8 ST 544, D. Zhang
Slide 390
CHAPTER 8 ST 544, D. Zhang
• Let
Y1 = coffee brand choice at first purchase,
Y2 = coffee brand choice at second purchase.
Then construct
H
χ2 = dT {var(d)}−1 d ∼0 χ2I−1 .
Slide 391
CHAPTER 8 ST 544, D. Zhang
Slide 392
CHAPTER 8 ST 544, D. Zhang
Slide 393
CHAPTER 8 ST 544, D. Zhang
Slide 394
CHAPTER 8 ST 544, D. Zhang
Slide 395
CHAPTER 8 ST 544, D. Zhang
• Let Yi1 be the subject i’s response to “How often do you make a
special effort to sort ...”, Yi2 be the subject i’s response to “How often
do you cut back on driving ...”.
Slide 396
CHAPTER 8 ST 544, D. Zhang
Slide 397
CHAPTER 8 ST 544, D. Zhang
Slide 398
CHAPTER 8 ST 544, D. Zhang
Slide 399
CHAPTER 8 ST 544, D. Zhang
Response Profile
Ordered Total
Value y Frequency
1 4 471
2 3 382
3 2 676
4 1 931
PROC GENMOD is modeling the probabilities of levels of y having LOWER Ordered
Values in the response profile table.
Analysis Of GEE Parameter Estimates
Empirical Standard Error Estimates
Standard 95% Confidence
Parameter Estimate Error Limits Z Pr > |Z|
Intercept1 -3.3511 0.0829 -3.5136 -3.1886 -40.43 <.0001
Intercept2 -2.2767 0.0743 -2.4224 -2.1311 -30.64 <.0001
Intercept3 -0.5849 0.0588 -0.7002 -0.4696 -9.94 <.0001
x 2.7536 0.0815 2.5939 2.9133 33.80 <.0001
Slide 400
CHAPTER 8 ST 544, D. Zhang
Slide 401
CHAPTER 8 ST 544, D. Zhang
Slide 402
CHAPTER 8 ST 544, D. Zhang
Slide 403
CHAPTER 8 ST 544, D. Zhang
Slide 404
CHAPTER 8 ST 544, D. Zhang
Slide 405
CHAPTER 8 ST 544, D. Zhang
Slide 406
CHAPTER 8 ST 544, D. Zhang
• We can also use Proc Logistic to fit the above model and get a test
of symmetry under the Quasi-symmetry model.
title "Quasi-symmetry model using proc logistic";
proc logistic descending;
freq count;
model y = x1 x2 x3 x4 / link=logit noint;
run;
*************************************************************************
Testing Global Null Hypothesis: BETA=0
Test Chi-Square DF Pr > ChiSq
Likelihood Ratio 12.4989 4 0.0140
Score 12.2913 4 0.0153
Wald 11.8742 4 0.0183
Slide 407
CHAPTER 8 ST 544, D. Zhang
with df = 10.
• Assuming quasi-symmetry model, symmetry model can be tested
using LRT
LRT = 22.5 − 10.0 = 12.5,
with df = 10 − 6 = 4, ⇒ Reject symmetry model under
quasi-symmetry model.
Slide 408
CHAPTER 8 ST 544, D. Zhang
Slide 409
CHAPTER 8 ST 544, D. Zhang
• Let us use the recycle example to illustrate this above model. SAS
program and part of output:
data table8_6; set table8_6;
if recycle=driveles then delete;
if recycle>driveles then do;
y=1;
x=recycle-driveles;
ind1=driveles;
ind2=recycle;
end;
else do;
y=0;
x=driveles-recycle;
ind1=recycle;
ind2=driveles;
end;
array z {4};
do k=1 to 4;
if k=ind1 then
z[k]=1;
else if k=ind2 then
z[k]=-1;
else
z[k]=0;
end;
run;
Slide 410
CHAPTER 8 ST 544, D. Zhang
Slide 411
CHAPTER 8 ST 544, D. Zhang
log(b
π12 /b
π21 ) = 2.3936(2 − 1) = 2.3936
log(b
π13 /b
π31 ) = 2.3936(3 − 1) = 4.78
log(b
π14 /b
π41 ) = 2.3936(4 − 1) = 7.18
log(b
π23 /b
π32 ) = 2.3936(3 − 2) = 2.3936
log(b
π24 /b
π42 ) = 2.3936(4 − 2) = 4.78
log(b
π34 /b
π43 ) = 2.3936
For example,
π b21 e2.3936 = 11b
b12 = π π21
That is,
Slide 412
CHAPTER 8 ST 544, D. Zhang
Slide 413
CHAPTER 8 ST 544, D. Zhang
log(b
π12 /b
π21 ) = 6.9269 − 4.3452 = 2.58
log(b
π13 /b
π31 ) = 6.9269 − 1.9937 = 4.93
log(b
π14 /b
π41 ) = 6.9269
log(b
π23 /b
π32 ) = 4.3452 − 1.9937 = 2.35
log(b
π24 /b
π42 ) = 4.3452
log(b
π34 /b
π43 ) = 1.9937
Very similar to the results from the ordinal quasi-symmetry model fit.
• Note: Pearson GOF and LRT for symmetry: χ2 = 856, G2 = 1093,
df = 6. Very poor fit!
Slide 414
CHAPTER 8 ST 544, D. Zhang
Slide 415
CHAPTER 8 ST 544, D. Zhang
• Usually, the diagnoses (Y1 , Y2 ) between two raters are correlated (not
independent). So if we use Pearson χ2 or LRT G2 , we would reject
independence. Indeed,
χ2 = 120, G2 = 118, df = 9,
even without taking into the ordinal scale. See the program and output
on the next slide.
• However, (Y1 , Y2 ) being dependent does not mean Y1 agrees well with
Y2 . That is, association is not the same as agreement.
• Pearson χ2 for symmetry H0 : πij = πji is χ2 = 30.3 with df = 6.
Symmetry model not good either!
• We may consider models that captures agreement and disagreement.
Slide 416
CHAPTER 8 ST 544, D. Zhang
data table8_7;
input rater1 y1-y4;
cards;
1 22 2 2 0
2 5 7 14 0
3 0 2 36 0
4 0 1 17 10
;
data table8_7; set table8_7;
array temp {4} y1-y4;
do rater2=1 to 4;
count=temp(rater2);
output;
end;
run;
proc freq;
weight count;
tables rater1*rater2 / norow nocol chisq;
test agree;
run;
*************************************************************************
Statistics for Table of rater1 by rater2
Statistic DF Value Prob
------------------------------------------------------
Chi-Square 9 120.2635 <.0001
Likelihood Ratio Chi-Square 9 117.9569 <.0001
Mantel-Haenszel Chi-Square 1 73.4843 <.0001
Test of Symmetry
------------------------
Statistic (S) 30.2857
DF 6
Pr > S <.0001
Slide 417
CHAPTER 8 ST 544, D. Zhang
Slide 418
CHAPTER 8 ST 544, D. Zhang
log µij = λ + λX Y
i + λj + δi I(i = j).
Slide 419
CHAPTER 8 ST 544, D. Zhang
χ2 = 11.5, G2 = 13.2, df = 5.
Slide 421
CHAPTER 8 ST 544, D. Zhang
π
b12 /b
π21 = eβ1 −β2 = 0.51
b b
π
b13 /b
π31 = eβ1 −β3 = 4.48
b b
π
b14 /b
π41 = eβ1 = 0
b
π
b23 /b
π32 = eβ2 −β3 = 8.78
b b
π
b24 /b
π42 = eβ2 = 0
b
π
b34 /b
π43 = eβ3 = 0
b
Slide 423
CHAPTER 8 ST 544, D. Zhang
Slide 424
CHAPTER 8 ST 544, D. Zhang
Slide 425
CHAPTER 8 ST 544, D. Zhang
Slide 426
CHAPTER 8 ST 544, D. Zhang
Slide 427
CHAPTER 8 ST 544, D. Zhang
• Let
Πij = P [Player i wins Player j].
Consider Bradley-Terry model for comparison:
Need to set βI = 0.
• We can rank players based on βi ’s.
• The above model can be fit by treating it as a quasi-symmetry model.
Slide 428
CHAPTER 8 ST 544, D. Zhang
data table8_9;
input winner player $ y1-y5;
cards;
1 Agassi . 0 0 1 1
2 Federer 6 . 3 9 5
3 Henman 0 1 . 0 1
4 Hewitt 0 0 2 . 3
5 Roddick 0 0 1 2 .
;
data table8_9; set table8_9;
array temp {5} y1-y5;
do loser=1 to 5;
count=temp(loser);
output;
end;
run;
data table8_9; set table8_9;
if winner=loser then delete;
if winner<loser then do;
y=1; ind1=winner; ind2=loser;
end;
else do ;
y=0; ind1=loser; ind2=winner;
end;
array x {5};
do k=1 to 5;
if k=ind1 then
x[k]=1;
else if k=ind2 then
x[k]=-1;
else
x[k]=0;
end;
drop y1-y5 k;
run;
Slide 429
CHAPTER 8 ST 544, D. Zhang
proc sort;
by ind1 ind2 descending y;
run;
proc print;
run;
**************************************************************************
Obs winner player loser count y ind1 ind2 x1 x2 x3 x4 x5
1 1 Agassi 2 0 1 1 2 1 -1 0 0 0
2 2 Federer 1 6 0 1 2 1 -1 0 0 0
3 1 Agassi 3 0 1 1 3 1 0 -1 0 0
4 3 Henman 1 0 0 1 3 1 0 -1 0 0
5 1 Agassi 4 1 1 1 4 1 0 0 -1 0
6 4 Hewitt 1 0 0 1 4 1 0 0 -1 0
7 1 Agassi 5 1 1 1 5 1 0 0 0 -1
8 5 Roddick 1 0 0 1 5 1 0 0 0 -1
9 2 Federer 3 3 1 2 3 0 1 -1 0 0
10 3 Henman 2 1 0 2 3 0 1 -1 0 0
11 2 Federer 4 9 1 2 4 0 1 0 -1 0
12 4 Hewitt 2 0 0 2 4 0 1 0 -1 0
13 2 Federer 5 5 1 2 5 0 1 0 0 -1
14 5 Roddick 2 0 0 2 5 0 1 0 0 -1
15 3 Henman 4 0 1 3 4 0 0 1 -1 0
16 4 Hewitt 3 2 0 3 4 0 0 1 -1 0
17 3 Henman 5 1 1 3 5 0 0 1 0 -1
18 5 Roddick 3 1 0 3 5 0 0 1 0 -1
19 4 Hewitt 5 3 1 4 5 0 0 0 1 -1
20 5 Roddick 4 2 0 4 5 0 0 0 1 -1
Slide 430
CHAPTER 8 ST 544, D. Zhang
Slide 431
CHAPTER 8 ST 544, D. Zhang
Slide 432
CHAPTER 8 ST 544, D. Zhang
c βb2 − βb1 )
var( = var( c βb1 ) − 2d
c βb2 ) + var( cov(βb2 , βb1 )
= 1.73340 + 1.93092 − 2 × 1.06655 = 1.5312
c βb2 − βb1 )
SE( = 1.24
A 95% CI for β2 − β1 :
Slide 433
CHAPTER 8 ST 544, D. Zhang
Slide 434
CHAPTER 9 ST 544, D. Zhang
Denote
yi1 µi1
yi2 µi2
yi = , µi = .
.. ..
.
.
yini µini
Slide 435
CHAPTER 9 ST 544, D. Zhang
• Under some regularity conditions, the solution βb from the above GEE
Slide 436
CHAPTER 9 ST 544, D. Zhang
where
Σ = I0−1 I1 I0−1
Xm
I0 = DiT Vi−1 Di
i=1
Xm
I1 = DiT Vi−1 var(yi |xi )Vi−1 Di
i=1
Xm
= DiT Vi−1 (yi − µi (β))(y
b b T −1
i − µi (β)) Vi Di
i=1
where
var(yi1 |xi1 ) 0 ··· 0
0 var(yi2 |xi2 ) · · · 0
Ai = ,
.. .. .. ..
. . . .
0 ··· 0 var(yini |xini )
Slide 438
CHAPTER 9 ST 544, D. Zhang
3. AR(1) (ar(1)):
2 ni −1
1 ρ ρ ··· ρ
ni −2
ρ 1 ρ ··· ρ
Ri =
.. .. .. ..
. . . .
ρni −1 ρni −2 ρni −3 ··· 1
∗∗
Pm
where N = i=1 (ni − 1) (total # of adjacent pairs).
4. Unstructured (un): Let data determine Ri .
• Many more can be found in Proc GenMod of SAS.
Slide 440
CHAPTER 9 ST 544, D. Zhang
Slide 442
CHAPTER 9 ST 544, D. Zhang
µ(x) = α + β1 x1 + · · · + βp xp
logit{π(x)} = α + β1 x1 + · · · + βp xp
Slide 443
CHAPTER 9 ST 544, D. Zhang
log{λ(x)} = α + β1 x1 + · · · + βp xp
λ(x) is the rate (e.g. λ(x) is the incidence rate of a disease) for the
count data (number of events) y over a (time, space) region T such
that
y|x ∼ Poisson{λ(x)T }
Here log(.) link is used. Other link functions are possible.
Note: For count data, we usually have to be concerned about the
possible over-dispersion in the data. That is
Slide 444
CHAPTER 9 ST 544, D. Zhang
Slide 445
CHAPTER 9 ST 544, D. Zhang
Severity Time
Week 1 Week 2 Week 4
Mild 52% 68% 81% 150
Severe 19% 38% 64% 190
• We could analyze data at each time point using ML ⇒ multiple test
issues, no way to assess time effect.
• Assessment of the treatment effect over time should take into account
the correlation of 3 observations from each patient.
Slide 446
CHAPTER 9 ST 544, D. Zhang
• Let s = 1/0 for severe/mild, d = 1/0 for new drug and standard,
t = log2 (week) time in log2 scale, and
Slide 447
CHAPTER 9 ST 544, D. Zhang
Slide 448
CHAPTER 9 ST 544, D. Zhang
logit{π(s, d = 1, t} = α + β1 s + β2 × 1 + β3 t + β4 (1 × t)
logit{π(s, d = 0, t} = α + β1 s + β2 × 0 + β3 t + β4 (0 × t)
logit{π(s, d = 1, t} − logit{π(s, d = 0, t} = β4 t + β2
θ(s, t) = eβ4 t+β2
Slide 449
CHAPTER 9 ST 544, D. Zhang
The new drug is much better at week 4 than the standard drug.
• Working correlation: ρb12 = 0.07, ρb13 = −0.03, ρb23 = 0.06.
• Note: If there is baseline response Y , we can put it as part of the
outcome Y and model the change since baseline.
Slide 450
CHAPTER 9 ST 544, D. Zhang
yig ∼ Bin(nig , πg )?
Slide 451
CHAPTER 9 ST 544, D. Zhang
Slide 452
CHAPTER 9 ST 544, D. Zhang
Slide 453
CHAPTER 9 ST 544, D. Zhang
• We could also model the original binary data, but need to account for
correlation:
title "Recover individual rat’s data";
data rat2; set rat;
do i=1 to n1;
y=1;
output;
end;
do i=1 to n0;
y=0;
output;
end;
run;
title "GEE for individual rat’s data";
Proc Genmod data=rat2 descending;
class litter group;
model y = gp2 gp3 gp4 / dist=bin link=logit;
repeated subject=litter / type=exch corrw;
run;
Slide 454
CHAPTER 9 ST 544, D. Zhang
Slide 455
CHAPTER 9 ST 544, D. Zhang
1 101 76 1 0 8 18
2 101 11 1 1 2 18
3 101 14 1 2 2 18
4 101 9 1 3 2 18
5 101 8 1 4 2 18
6 102 38 1 0 8 32
7 102 8 1 1 2 32
8 102 7 1 2 2 32
9 102 9 1 3 2 32
10 102 4 1 4 2 32
11 103 19 1 0 8 20
12 103 0 1 1 2 20
13 103 4 1 2 2 20
14 103 3 1 3 2 20
15 103 0 1 4 2 20
16 104 11 0 0 8 31
17 104 5 0 1 2 31
18 104 3 0 2 2 31
19 104 3 0 3 2 31
20 104 3 0 4 2 31
Slide 457
CHAPTER 9 ST 544, D. Zhang
Slide 458
CHAPTER 9 ST 544, D. Zhang
• Data:
? 59 patients, 28 in control group, 31 in treatment (progabide) group.
? 5 seizure counts (including baseline) were obtained.
? Covariates: treatment (covariate of interest), age.
• GEE Poisson model: yij =seizure counts obtained at the jth
(j = 1, ..., 5) time point for patient i, yij ∼ over-dispersed
Poisson(µij ), µij = E(yij ) = tij λij , where tij is the length of time
from which the seizure count yij was observed, λij is hence the rate to
have a seizure. First consider model
Slide 459
CHAPTER 9 ST 544, D. Zhang
• Interpretation of β’s:
Slide 460
CHAPTER 9 ST 544, D. Zhang
data seizure;
infile "seize.dat";
input id seize visit trt age;
nobs=_n_;
interval = 2;
if visit=0 then interval=8;
logtime = log(interval);
assign = (visit>0);
run;
proc genmod data=seizure;
class id;
model seize = assign trt assign*trt
/ dist=poisson link=log offset=logtime;
repeated subject=id / type=exch corrw;
run;
Working Correlation Matrix
Col1 Col2 Col3 Col4 Col5
Row1 1.0000 0.7716 0.7716 0.7716 0.7716
Row2 0.7716 1.0000 0.7716 0.7716 0.7716
Row3 0.7716 0.7716 1.0000 0.7716 0.7716
Row4 0.7716 0.7716 0.7716 1.0000 0.7716
Row5 0.7716 0.7716 0.7716 0.7716 1.0000
Analysis Of GEE Parameter Estimates
Empirical Standard Error Estimates
Standard 95% Confidence
Parameter Estimate Error Limits Z Pr > |Z|
Intercept 1.3476 0.1574 1.0392 1.6560 8.56 <.0001
assign 0.1108 0.1161 -0.1168 0.3383 0.95 0.3399
trt 0.0265 0.2219 -0.4083 0.4613 0.12 0.9049
assign*trt -0.1037 0.2136 -0.5223 0.3150 -0.49 0.6274
Slide 461
CHAPTER 9 ST 544, D. Zhang
Slide 462
CHAPTER 9 ST 544, D. Zhang
Slide 463
CHAPTER 9 ST 544, D. Zhang
• Interpretation of β1 , β2 , β3 :
1. β1 : Effect of time + placebo
2. β2 : Group difference at baseline (can be set to 0 by randomization)
3. β3 : Treatment effect after taking into account the time and
placebo effects.
Slide 464
CHAPTER 9 ST 544, D. Zhang
Slide 466
CHAPTER 9 ST 544, D. Zhang
baseline.
2. There is not much group difference at baseline (p-value = 0.88),
which is expected.
3. Strong evidence of treatment effect: βb3 = 0.71(SE = 0.24).
eβ1 +β3 = e1.746 = 5.7: the odds that treated patients have shorter
b b
Slide 467
CHAPTER 9 ST 544, D. Zhang
• Interpretation of β1 , β2 , β3 :
1. β1 : Effect of time + placebo
2. β2 : Group difference at baseline (can be set to 0 by randomization)
3. β3 : Treatment effect after taking into account the time and
placebo effects.
Slide 468
CHAPTER 9 ST 544, D. Zhang
Slide 469
CHAPTER 9 ST 544, D. Zhang
Slide 470
CHAPTER 9 ST 544, D. Zhang
VI Transitional Models
VI.1 Use previous responses as covariates
• In a longitudinal study with time t = 1, 2, · · · , for each individual, we
have response variables {y1 , y2 , · · · , yt , ·}.
• We may model Yt given the past {y1 , y2 , · · · , yt−1 } and covariates
x1 , x2 , · · · , xk . Usually, the correlation in {Yt }’s can be totally
explained by the past ⇒ {Yt }’s are conditionally independent given the
past
⇒ Markov chain.
• In the above Markov chain model, we may assume that Yt only
depends on yt−1 , this is the Markov chain with order = 1.
• When Y is binary, the above Markov model with order 1 may be
Slide 472
CHAPTER 9 ST 544, D. Zhang
Slide 473
CHAPTER 9 ST 544, D. Zhang
Slide 476
CHAPTER 9 ST 544, D. Zhang
Slide 477
CHAPTER 9 ST 544, D. Zhang
Slide 478
CHAPTER 9 ST 544, D. Zhang
Slide 479
CHAPTER 10 ST 544, D. Zhang
Slide 480
CHAPTER 10 ST 544, D. Zhang
Slide 482
CHAPTER 10 ST 544, D. Zhang
• Note 1: In the above model, we usually assume that Yi1 , Yi2 are
conditionally independent given random effects ui . However, marginally
Yi1 , Yi2 are correlated. The correlation is induced by the shared
random effect ui . The variance σ 2 of ui characterizes the magnitude
of between-subject variance, and hence the correlation. Greater σ 2
corresponds to greater marginal correlation between Yi1 and Yi2 .
• Note 2: We could also estimate random effects ui by borrowing
information from other subjects (taking into account ui ∼ N(0, σ 2 )).
This method is different from treating ui as parameters. The only
model parameters are α, β and σ 2 .
Slide 483
CHAPTER 10 ST 544, D. Zhang
Slide 484
CHAPTER 10 ST 544, D. Zhang
Slide 485
CHAPTER 10 ST 544, D. Zhang
Fit Statistics
-2 Log Likelihood 2520.54
AIC (smaller is better) 2526.54
AICC (smaller is better) 2526.55
BIC (smaller is better) 2541.67
CAIC (smaller is better) 2544.67
HQIC (smaller is better) 2532.26
Fit Statistics for Conditional
Distribution
-2 log L(y | r. effects) 1041.77
Pearson Chi-Square 702.92
Pearson Chi-Square / DF 0.31
Slide 486
CHAPTER 10 ST 544, D. Zhang
Slide 487
CHAPTER 10 ST 544, D. Zhang
Slide 488
CHAPTER 10 ST 544, D. Zhang
• Let yit = 1/0 be the response (1=yes, 0=no) for subject i on item
t(t = 1, 2, 3) and consider
Slide 489
CHAPTER 10 ST 544, D. Zhang
Slide 490
CHAPTER 10 ST 544, D. Zhang
title "Use GLMM for opinion on abortion: dummies for items 1, 2";
proc glimmix method=quad(qpoints=19);
class id;
model y = item1 item2 female / dist=bin link=logit s;
random int / subject=id type=vc;
run;
************************************************************************
Covariance Parameter Estimates
Standard
Cov Parm Subject Estimate Error
Intercept id 77.4375 8.0860
title "Use GLMM for opinion on abortion: dummies for items 1, 3";
proc glimmix method=quad(qpoints=19);
class id;
model y = item1 item3 female / dist=bin link=logit s;
random int / subject=id type=vc;
run;
************************************************************************
Solutions for Fixed Effects
Standard
Effect Estimate Error DF t Value Pr > |t|
Intercept -0.3224 0.3754 1848 -0.86 0.3905
item1 0.5344 0.1558 3698 3.43 0.0006
item3 -0.2878 0.1554 3698 -1.85 0.0641
female 0.01258 0.4868 3698 0.03 0.9794
Slide 493
CHAPTER 10 ST 544, D. Zhang
Slide 494
CHAPTER 10 ST 544, D. Zhang
logit(πi ) = α + ui ,
where ui ∼ N(0, σ 2 ).
• After we fit this GLMM, we can get the estimates α
b and u
bi , and then
get the new estimate of πi :
eαb+bui −1
π
bi = α +bu
= logit (b
α+u
bi ),
1+e b i
Slide 496
CHAPTER 10 ST 544, D. Zhang
Slide 497
CHAPTER 10 ST 544, D. Zhang
Slide 498
CHAPTER 10 ST 544, D. Zhang
Slide 499
CHAPTER 10 ST 544, D. Zhang
ui ∼ N(0, σ 2 )
P [−1.96σ ≤ ui ≤ 1.96σ] = 0.95
P [α − 1.96σ ≤ α + ui ≤ α + 1.96σ] = 0.95
P [logit−1 (α − 1.96σ) ≤ logit−1 (α + ui ) ≤ logit−1 (α + 1.96σ)] = 0.95
P [logit−1 (α − 1.96σ) ≤ πi ≤ logit−1 (α + 1.96σ)] = 0.95
α−1.96σ 0.9076−1.96×0.424
e e
logit−1 (α − 1.96σ) = = = 0.52
1 + eα−1.96σ 1 + e0.9076−1.96×0.424
α+1.96σ 0.9076+1.96×0.424
e e
logit−1 (α + 1.96σ) = = = 0.85
1 + eα+1.96σ 1 + e0.9076+1.96×0.424
P [0.52 ≤ πi ≤ 0.85] = 0.95,
that is, the prob that this player’s success prob is between 0.52 and
0.85 is 0.95.
Slide 500
CHAPTER 10 ST 544, D. Zhang
Slide 502
CHAPTER 10 ST 544, D. Zhang
data rat;
input litter group n y;
gp1 = (group=1); gp2 = (group=2); gp3 = (group=3); gp4 = (group=4);
datalines;
1 1 10 1
2 1 11 4
3 1 12 9
4 1 4 4
5 1 10 10
6 1 11 9
7 1 9 9
8 1 11 11
9 1 10 10
10 1 10 7
11 1 12 12
12 1 10 9
13 1 8 8
14 1 11 9
15 1 6 4
16 1 9 7
17 1 14 14
18 1 12 7
19 1 11 9
20 1 13 8
21 1 14 5
22 1 10 10
23 1 12 10
24 1 13 8
25 1 10 10
26 1 14 3
27 1 13 13
28 1 4 3
29 1 8 8
30 1 13 5
31 1 12 12
32 2 10 1
33 2 3 1
34 2 13 1
35 2 12 0
Slide 503
CHAPTER 10 ST 544, D. Zhang
36 2 14 4
37 2 9 2
38 2 13 2
39 2 16 1
40 2 11 0
41 2 4 0
42 2 1 0
43 2 12 0
44 3 8 0
45 3 11 1
46 3 14 0
47 3 14 1
48 3 11 0
49 4 3 0
50 4 13 0
51 4 9 2
52 4 17 2
53 4 15 0
54 4 2 0
55 4 14 1
56 4 8 0
57 4 6 0
58 4 17 0
;
Slide 504
CHAPTER 10 ST 544, D. Zhang
where
µbij = E(yij |bi ) = tij λbij , var(yij |bi ) = φµbij ,
λbij is the rate to have a seizure for subject i. Consider model
Slide 506
CHAPTER 10 ST 544, D. Zhang
• Interpretation of β’s:
Slide 507
CHAPTER 10 ST 544, D. Zhang
title "Random intercept model for seizure data with conditional overdispersion";
proc glimmix data=seizure;
class id;
model seize = assign agn_trt / dist=poisson link=log offset=logtime s;
random int / subject=id type=vc;
random _residual_; *for conditional overdispersion;
run;
Slide 508
CHAPTER 10 ST 544, D. Zhang
Slide 509
CHAPTER 10 ST 544, D. Zhang
Slide 510
CHAPTER 10 ST 544, D. Zhang
Slide 512
CHAPTER 10 ST 544, D. Zhang
• βb1 = 1.6, eβ1 = 5: for a placebo patient, his/her odds of having shorter
b
Slide 513
CHAPTER 10 ST 544, D. Zhang
Slide 514