Download as pdf or txt
Download as pdf or txt
You are on page 1of 12

Ch7 Logistic Regression

The Basic Situation:

Observe a set of n independent binary responses

0, if ith outcome is "failure"


{x1 , ⋯ xn },where xi = {
1, if ith outcome is "success"

In addition to binary response xi ,observed values of

explanatory variables, say z′i =(z1i , ⋯ zri ), they provide

information about the condition under which the ith response

was obtained.

OBJECTIVE:

Construct a model for the conditional probability (π1,⋯, πn )

given z′i =(z1i , ⋯ zri ), πi = pr {xi = 1│z′i = (z1i , ⋯ zri )}

1 − πi = pr {xi = 0│z′i = (z1i , ⋯ zri )}

Linear models ?

You might consider πi = β0 + β1 Z1i + ⋯ + βr Zri , i = 1, ⋯ , n

and use the least squares to obtain parameter estimates

(b0 , ⋯ br ).
One problem is that

π
̂i = b0 + b1 Z1i + ⋯ + br Zri

can be less than 0 or large than 1.

If we set

πi = Pr {Yi = 1│Z′i = (z1i , ⋯ zri )} = F(β0 + ∑rj=1 βj zji )

where F(.) is a cumulative dist. function(c.d.f) and

β = (β0 , β1 , ⋯ βr )′ is a vector of unknown parameters.

Clearly, 0 ≤ πi ≤ 1 for all Zi .

πi can achieve any value between 0 and 1 where F is

continuous. F(.) links the conditional probabilities {πi } to the

parameters β. It is a special kind of “link function”.

(1) Probit model:


u2
w 1 −
Let F(w)=Φ(w) = ∫−∞ √2π e 2 du

be the c.d.f for the standard Normal Distribution. Then


r β0 +∑rj=1(βj zji )
1 u2
−2
πi = Φ (β0 + ∑(βj zji )) = ∫ e du
j=1 −∞ √2π
(2)Logit Model: (Logistic regression)

The c.d.f for the “standard” logistic dist. is


1 ey
F(y)= Pr (Y ≤ y) = =
1+e−y 1+ey

and the logistic regression model is


r
exp(β0 + ∑rj=1 βj zji )
πi = F (β0 + ∑ βj zji ) =
1 + exp(β0 + ∑rj=1 βj zji )
j=1

which is equivalent to
πi
ln (
1−πi
) = β0 + ∑rj=1 βj zji , i=1,⋯ , n.

On the left side is the natural logarithm of the odd of success

given Z′i , which is called a logit.

On the right side is a linear function of the parameters.

Probit and logit models are quite similar except for values of π

close to 0 or 1

exp(β0 +∑rj=1 βj zji )


Logit link: πi =
1+exp(β0 +∑rj=1 βj zji )
2
β0 +∑rj=1(βj zji ) 1 −
u
Probit link: πi = ∫−∞ e 2 du
√2π
Ex: Mortality of a certain species of beetle after 5 hours

exposure to gaseous carbon disulfide.


Dose(mg/liter) Z=ln(dose) Number Number Pi Empirical
killed of P
logit=ln(1−Pi )
survival i

49.057 3.893 6 53 6/(6+53) -2.179

52.991 3.970 13 47 13/(13+47) -1.285

56.991 4.043 18 44 … -0.894

60.542 4.103 28 28 … 0

64.359 4.164 52 11 …. 1.553

68.891 4.233 53 6 … 2.179

72.611 4.258 61 1 … 4.111

76.542 4.338 60 0 60/(60+0) -------------

Logistic regression model:


exp((β0 +β1 Zi ))
πi = Pr {an insect dies exposure to zi = ln(dosei )}=
1+exp(β0 +β1 Zi )
πi
or ln( )=β0 + β1 Zi , Zi = ln(dosei ), i=1,2,…,7
1−πi
xi number of dead insects for the ith dose
Pi = =
ni #exposed to the ith dose

is an unbiased estimate of πi .
Pi
Plot the empirical logit ln( ) against zi = ln(dosei ) to
1−Pi

access the fit of the propose logistic regression model.

Interpretation of model parameters:


πi
We define ln( ) = β0 + β1 Zi , where zi = ln(dosei ).
1−πi
odds that an π
The β0 = ln ( ) = ln (1−π0 ) when dose=1
insect is deaed 0

mg/liter or the conditional probability that a randomly selected

insect is killed exposure to 1 mg/liter above of gaseous carbon


exp (β0 )
disulfide is π0 =
1+exp (β0 )
πi
Since ln( ) = β0 + β1 Zi, β1 represents the increase in the
1−πi

log-dose zi is increased to zi + 1

β1 = (β0 + β1 (Z + 1)) − (β0 + β1 Z)


πz+1 π
= ln (
1−πz+1
) − ln (1−πz )
z

(i.e. conditional log-odds for mortality given exposure to z=

ln(dose) )
πz+1
(1−πz+1 )
= ln ( πz )(i.e. ln of a ratio of odds)
1−πz
π π
Then, ( z+1 ) = eβ1 ( z ) ←
1−π z+1 1−π z

multiplicative increas in the log −

odds when Z is increased by 1

Parameter estimation:

Maximum likelihood estimation:

First, construct the likelihood function. Let ni denote the

number of insects exposed to zi = ln(dosei ).

Assume each of these ni insects corresponds to an independent

trial with

exp((β0 + β1 Zi ))
πi = , i = 1,2, … , r
1 + exp(β0 + β1 Zi )

are the conditional probabilities of mortality.

Let X i =observed # of dead(成功) insects after espouse to zi =

ln(dosei ).

Then X i ~Bin(ni , πi )
n
(i.e. Pr(X i = xi ))=Cxii ∙ πi xi ∙ (1 − πi )ni −xi , xi = 0, ⋯ , ni )

Assume that the results are ”independent” among those given


doses. Then the joint likelihood function is
8
n
∏{Cxii ∙ πi xi ∙ (1 − πi )ni −xi }
i=1
exp((β0 +β1 Zi ))
where πi = .
1+exp(β0 +β1 Zi )

β0 ′
Let β = ( ) and x = (x1,⋯, x8 ) .
β1

The log-likelihood is

l (β, X) = ∑8i=1{ln(ni !) − ln(xi !) − ln(ni − xi ) !}+


8 8

∑ xi ln(πi ) + ∑(ni − xi ) ∙ ln (1 − πi )
i=1 i=1
n!
=∑8i=1 {ln ( i ) − ln(ni − xi ) ! + β0 ∑8i=1 xi }+
x! i

β1 ∑8i=1 xi zi − ∑8i=1 ni ln (1 + eβ0 +β1 Zi )

Likelihood equations:
8
∂l(β, X)
= ∑(xi − ni πi ) = 0
∂β0
i=1
8
∂l(β, X)
= ∑(xi − ni πi )zi = 0
∂β1
i=1

Matrix representation of the likelihood equations


1 z1 x1 n1 n1 π1
1 z x2 n2 n2 π2
T = ( 2 ), x = ( ⋮ ), n = ( ⋮ ), M = ( ⋮ ), m =

1 z8 x8 n8 n8 π8

n1 π
̂1
n π̂
( 2 2)

n8 π
̂8
where π1 , π2 , ⋯ , π8 are the real parameter values and π
̂1 , π
̂2 , ⋯ , π
̂8
are the estimations computed from the likelihood equations.

Then the likelihood equations are:

∂l
0 ∂β0
0=( )= = T′(x − m) = 0
0 ∂l
(∂β1 )

Matrix of second partial derivatives of the log-likelihood

function (multiplied by -1)

∂2 l ∂2 l
− −
∂β0 2 ∂β0 ∂β1
H= = (T′VT)
∂2 l ∂2 l
− −
( ∂β1 ∂β0 ∂β1 2 )

where
n1 π1 (1 − π1 ) ⋯ 0
V=( ⋮ ⋱ ⋮ )
0 ⋯ n8 π8 (1 − π8 )

Var(x1 ) ⋯ 0
=( ⋮ ⋱ ⋮ )
0 ⋯ Var(x8 )

Odds ratio:
π z+1 )
(1−π
z+1
πz =ratio of conditional odds for mortality at levels Z+1
(1−π )
z

and Z.

Newton-Raphson(fisher scoring) algorithm:

β0
To evaluate the m.l.e.’s for β = ( ), we use
β1

β̂(new) = β̂(old) + αH ̂ = β̂(old) + α(T ′ VT)−1 T ′ (x − m)


̂ −1 Q

(1) Take α=1 unless α=1 is too big to give improvement in

log-likelihood,

(2) Evaluate the matrix V and m on right side with β̂(odd) .

For starting values you might use

P
̂ (old)
β = (ln ( 1−P
))
0
total # of success ∑x
where P = = ∑ i.
total # of trials ni

Continue iterations until β̂(new) is close enough to β̂(odd) .

Let β̂ denote the final numerical vector. Since β̂ is an m.l.e.,

we have the following result for large samples,

β̂~N (β, (T ′ VT)−1 )

which the inverse of the Fisher information matrix, or the

covariance matrix for β̂ is (T ′ VT)−1 when V is evaluated at

β̂.

For above data with Z=ln(dose), the m.l.e.’s are


̂
̂β = (β0 ) = (−60.717) , and m. l. e, for
β̂1 14.883
exp ((β̂0 + β̂1 Zi ))
πi = Pr (mortality|Zi ) is π
̂i =
1 + exp(β̂0 + β̂1 Zi )
for i = 1,2, … ,8
0.0586
Thus we have π
̂ = ( ⋮ ) , and
0.9791
8 8 −1

∑ ni π
̂i (1 − π
̂i ) ⋯ ∑ ni zi π
̂i (1 − π
̂i )
i=1 i=1
̂ (β̂) = (T ′ VT)−1 =
V ⋮ ⋱ ⋮
8 8

∑ ni zi π ̂i ) ⋯ ∑ ni zi 2 π
̂i (1 − π ̂i (1 − π
̂i )
( i=1 i=1 )
26.84 −6.55
=( )
−6.55 1.6

To test H0 : β0 = 0 v. s. H1 : β0 ≠ 0 , we use

β̂0 − 0 2 −60.717 2
2
χ =[ ] =[ ] = 137.36~χ2 (1)
S0 5.18
The intercept is smaller than 0 which implies that the probability

of mortality is less than 0.5 when ln(dose)=0 or dose=1.

To test H0 : β1 = 0 v. s. H1 : β1 ≠ 0 , we have

β̂1 − 0 2 14.883 2
2
χ =[ ] =[ ] = 138.49~χ2 (1)
S1 1.265
β1 is positive which implies that the mortality(死亡率) rate

increases as Z=ln(dose) increases.

Note that β̂1 = 14.883 implies that the log-odds for mortality
are about 15 times higher when Z=ln(dose) is increased to Z+1.

When the dose is double Z=ln(dose) increases to

ln(2*dose)=ln2+ ln(dose). Then

π
̂z+ln2 π
̂z
ln ( ) − ln ( ) = β̂1 × ln2
1 − (π
̂z+ln2 ) 1−π̂z

and the odds for mortality increases by a factor of

π
̂z+ln2
1 − (π
̂z+ln2 ) ̂
= exp(β̂1 ln2) = 2β1 = 215
π
̂z
1−π ̂z

An approximation 95% C.I. for


β1 is β̂1 ± (1.96) ∙ S1 or 14.883 ± (1.96)(1.265) ⇒ [12.40,17.36]
An approximation 95% C.I. for 2β1 is:

[β̂1 × ln2] ± 1.96[S1 × ln2] ⟹ [8.60,12.03]

(This is C. I. for (ln2)β1 )

The 95% C.I. for 2β1 = exp (β1 × ln2)

[e8.6 , e12.03 ] ⟹ [5,418,167,700]

So, doubling the dose increases the odds of mortality by a factor

between 5000 and 168,000.

You might also like