Regression 12

Ch7 Logistic Regression
The Basic Situation:
Observe a set of n independent binary responses
0, if ith outcome is "failure"

{x1 , ⋯ xn }，where xi = {
1, if ith outcome is "success"
In addition to binary response xi ，observed values of
explanatory variables, say z′i =(z1i ， ⋯ zri )， they provide
information about the condition under which the ith response
was obtained.
OBJECTIVE:
Construct a model for the conditional probability (π1,⋯, πn )
given z′i =(z1i ， ⋯ zri ), πi = pr {xi = 1│z′i = (z1i ， ⋯ zri )}
1 − πi = pr {xi = 0│z′i = (z1i ， ⋯ zri )}
Linear models ?
You might consider πi = β0 + β1 Z1i + ⋯ + βr Zri , i = 1, ⋯ , n
and use the least squares to obtain parameter estimates
(b0 , ⋯ br ).
One problem is that
π
̂i = b0 + b1 Z1i + ⋯ + br Zri
can be less than 0 or large than 1.
If we set
πi = Pr {Yi = 1│Z′i = (z1i ， ⋯ zri )} = F(β0 + ∑rj=1 βj zji )
where F(.) is a cumulative dist. function(c.d.f) and
β = (β0 , β1 , ⋯ βr )′ is a vector of unknown parameters.
Clearly, 0 ≤ πi ≤ 1 for all Zi .
πi can achieve any value between 0 and 1 where F is
continuous. F(.) links the conditional probabilities {πi } to the
parameters β. It is a special kind of “link function”.
(1) Probit model:

u2
w 1 −
Let F(w)=Φ(w) = ∫−∞ √2π e 2 du
be the c.d.f for the standard Normal Distribution. Then

r β0 +∑rj=1(βj zji )
1 u2
−2
πi = Φ (β0 + ∑(βj zji )) = ∫ e du
j=1 −∞ √2π
(2)Logit Model: (Logistic regression)
The c.d.f for the “standard” logistic dist. is

1 ey
F(y)= Pr (Y ≤ y) = =
1+e−y 1+ey
and the logistic regression model is

r
exp(β0 + ∑rj=1 βj zji )
πi = F (β0 + ∑ βj zji ) =
1 + exp(β0 + ∑rj=1 βj zji )
j=1
which is equivalent to
πi
ln (
1−πi
) = β0 + ∑rj=1 βj zji , i=1,⋯ , n.
On the left side is the natural logarithm of the odd of success
given Z′i , which is called a logit.
On the right side is a linear function of the parameters.
Probit and logit models are quite similar except for values of π
close to 0 or 1
exp(β0 +∑rj=1 βj zji )

Logit link: πi =
1+exp(β0 +∑rj=1 βj zji )
2
β0 +∑rj=1(βj zji ) 1 −
u
Probit link: πi = ∫−∞ e 2 du
√2π
Ex: Mortality of a certain species of beetle after 5 hours
exposure to gaseous carbon disulfide.

Dose(mg/liter) Z=ln(dose) Number Number Pi Empirical
killed of P
logit=ln(1−Pi )
survival i
49.057 3.893 6 53 6/(6+53) -2.179
52.991 3.970 13 47 13/(13+47) -1.285
56.991 4.043 18 44 … -0.894
60.542 4.103 28 28 … 0
64.359 4.164 52 11 …. 1.553
68.891 4.233 53 6 … 2.179
72.611 4.258 61 1 … 4.111
76.542 4.338 60 0 60/(60+0) -------------
Logistic regression model:

exp((β0 +β1 Zi ))
πi = Pr {an insect dies exposure to zi = ln(dosei )}=
1+exp(β0 +β1 Zi )
πi
or ln( )=β0 + β1 Zi , Zi = ln(dosei ), i=1,2,…,7
1−πi
xi number of dead insects for the ith dose
Pi = =
ni #exposed to the ith dose
is an unbiased estimate of πi .
Pi
Plot the empirical logit ln( ) against zi = ln(dosei ) to
1−Pi
access the fit of the propose logistic regression model.
Interpretation of model parameters:

πi
We define ln( ) = β0 + β1 Zi , where zi = ln(dosei ).
1−πi
odds that an π
The β0 = ln ( ) = ln (1−π0 ) when dose=1
insect is deaed 0
mg/liter or the conditional probability that a randomly selected
insect is killed exposure to 1 mg/liter above of gaseous carbon

exp (β0 )
disulfide is π0 =
1+exp (β0 )
πi
Since ln( ) = β0 + β1 Zi, β1 represents the increase in the
1−πi
log-dose zi is increased to zi + 1
β1 = (β0 + β1 (Z + 1)) − (β0 + β1 Z)

πz+1 π
= ln (
1−πz+1
) − ln (1−πz )
z
(i.e. conditional log-odds for mortality given exposure to z=
ln(dose) )
πz+1
(1−πz+1 )
= ln ( πz )(i.e. ln of a ratio of odds)
1−πz
π π
Then, ( z+1 ) = eβ1 ( z ) ←
1−π z+1 1−π z
multiplicative increas in the log −
odds when Z is increased by 1
Parameter estimation:
Maximum likelihood estimation:
First, construct the likelihood function. Let ni denote the
number of insects exposed to zi = ln(dosei ).
Assume each of these ni insects corresponds to an independent
trial with
exp((β0 + β1 Zi ))
πi = , i = 1,2, … , r
1 + exp(β0 + β1 Zi )
are the conditional probabilities of mortality.
Let X i =observed # of dead(成功) insects after espouse to zi =
ln(dosei ).
Then X i ~Bin(ni , πi )
n
(i.e. Pr(X i = xi ))=Cxii ∙ πi xi ∙ (1 − πi )ni −xi , xi = 0, ⋯ , ni )
Assume that the results are ”independent” among those given

doses. Then the joint likelihood function is
8
n
∏{Cxii ∙ πi xi ∙ (1 − πi )ni −xi }
i=1
exp((β0 +β1 Zi ))
where πi = .
1+exp(β0 +β1 Zi )
β0 ′
Let β = ( ) and x = (x1,⋯, x8 ) .
β1
The log-likelihood is
l (β, X) = ∑8i=1{ln(ni !) − ln(xi !) − ln(ni − xi ) !}+

8 8
∑ xi ln(πi ) + ∑(ni − xi ) ∙ ln (1 − πi )
i=1 i=1
n!
=∑8i=1 {ln ( i ) − ln(ni − xi ) ! + β0 ∑8i=1 xi }+
x! i
β1 ∑8i=1 xi zi − ∑8i=1 ni ln (1 + eβ0 +β1 Zi )
Likelihood equations:
8
∂l(β, X)
= ∑(xi − ni πi ) = 0
∂β0
i=1
8
∂l(β, X)
= ∑(xi − ni πi )zi = 0
∂β1
i=1
Matrix representation of the likelihood equations

1 z1 x1 n1 n1 π1
1 z x2 n2 n2 π2
T = ( 2 ), x = ( ⋮ ), n = ( ⋮ ), M = ( ⋮ ), m =
⋮
1 z8 x8 n8 n8 π8
n1 π
̂1
n π̂
( 2 2)
⋮
n8 π
̂8
where π1 , π2 , ⋯ , π8 are the real parameter values and π
̂1 , π
̂2 , ⋯ , π
̂8
are the estimations computed from the likelihood equations.
Then the likelihood equations are:
∂l
0 ∂β0
0=( )= = T′(x − m) = 0
0 ∂l
(∂β1 )
Matrix of second partial derivatives of the log-likelihood
function (multiplied by -1)
∂2 l ∂2 l
− −
∂β0 2 ∂β0 ∂β1
H= = (T′VT)
∂2 l ∂2 l
− −
( ∂β1 ∂β0 ∂β1 2 )
where
n1 π1 (1 − π1 ) ⋯ 0
V=( ⋮ ⋱ ⋮ )
0 ⋯ n8 π8 (1 − π8 )
Var(x1 ) ⋯ 0
=( ⋮ ⋱ ⋮ )
0 ⋯ Var(x8 )
Odds ratio:
π z+1 )
(1−π
z+1
πz =ratio of conditional odds for mortality at levels Z+1
(1−π )
z
and Z.
Newton-Raphson(fisher scoring) algorithm:
β0
To evaluate the m.l.e.’s for β = ( ), we use
β1
β̂(new) = β̂(old) + αH ̂ = β̂(old) + α(T ′ VT)−1 T ′ (x − m)

̂ −1 Q
(1) Take α=1 unless α=1 is too big to give improvement in
log-likelihood,
(2) Evaluate the matrix V and m on right side with β̂(odd) .
For starting values you might use
P
̂ (old)
β = (ln ( 1−P
))
0
total # of success ∑x
where P = = ∑ i.
total # of trials ni
Continue iterations until β̂(new) is close enough to β̂(odd) .
Let β̂ denote the final numerical vector. Since β̂ is an m.l.e.,
we have the following result for large samples,
β̂~N (β, (T ′ VT)−1 )
which the inverse of the Fisher information matrix, or the
covariance matrix for β̂ is (T ′ VT)−1 when V is evaluated at
β̂.
For above data with Z=ln(dose), the m.l.e.’s are

̂
̂β = (β0 ) = (−60.717) , and m. l. e, for
β̂1 14.883
exp ((β̂0 + β̂1 Zi ))
πi = Pr (mortality|Zi ) is π
̂i =
1 + exp(β̂0 + β̂1 Zi )
for i = 1,2, … ,8
0.0586
Thus we have π
̂ = ( ⋮ ) , and
0.9791
8 8 −1
∑ ni π
̂i (1 − π
̂i ) ⋯ ∑ ni zi π
̂i (1 − π
̂i )
i=1 i=1
̂ (β̂) = (T ′ VT)−1 =
V ⋮ ⋱ ⋮
8 8
∑ ni zi π ̂i ) ⋯ ∑ ni zi 2 π
̂i (1 − π ̂i (1 − π
̂i )
( i=1 i=1 )
26.84 −6.55
=( )
−6.55 1.6
To test H0 : β0 = 0 v. s. H1 : β0 ≠ 0 , we use
β̂0 − 0 2 −60.717 2
2
χ =[ ] =[ ] = 137.36~χ2 (1)
S0 5.18
The intercept is smaller than 0 which implies that the probability
of mortality is less than 0.5 when ln(dose)=0 or dose=1.
To test H0 : β1 = 0 v. s. H1 : β1 ≠ 0 , we have
β̂1 − 0 2 14.883 2
2
χ =[ ] =[ ] = 138.49~χ2 (1)
S1 1.265
β1 is positive which implies that the mortality(死亡率) rate
increases as Z=ln(dose) increases.
Note that β̂1 = 14.883 implies that the log-odds for mortality
are about 15 times higher when Z=ln(dose) is increased to Z+1.
When the dose is double Z=ln(dose) increases to
ln(2*dose)=ln2+ ln(dose). Then
π
̂z+ln2 π
̂z
ln ( ) − ln ( ) = β̂1 × ln2
1 − (π
̂z+ln2 ) 1−π̂z
and the odds for mortality increases by a factor of
π
̂z+ln2
1 − (π
̂z+ln2 ) ̂
= exp(β̂1 ln2) = 2β1 = 215
π
̂z
1−π ̂z
An approximation 95% C.I. for

β1 is β̂1 ± (1.96) ∙ S1 or 14.883 ± (1.96)(1.265) ⇒ [12.40,17.36]
An approximation 95% C.I. for 2β1 is:
[β̂1 × ln2] ± 1.96[S1 × ln2] ⟹ [8.60,12.03]
(This is C. I. for (ln2)β1 )
The 95% C.I. for 2β1 = exp (β1 × ln2)
[e8.6 , e12.03 ] ⟹ [5,418,167,700]
So, doubling the dose increases the odds of mortality by a factor
between 5000 and 168,000.

Regression 12

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Regression 12

Uploaded by

Copyright:

Available Formats

Ch7 Logistic Regression

The Basic Situation:

Observe a set of n independent binary responses

0, if ith outcome is "failure"

In addition to binary response xi ，observed values of

explanatory variables, say z′i =(z1i ， ⋯ zri )， they provide

information about the condition under which the ith response

Construct a model for the conditional probability (π1,⋯, πn )

given z′i =(z1i ， ⋯ zri ), πi = pr {xi = 1│z′i = (z1i ， ⋯ zri )}

1 − πi = pr {xi = 0│z′i = (z1i ， ⋯ zri )}

You might consider πi = β0 + β1 Z1i + ⋯ + βr Zri , i = 1, ⋯ , n

and use the least squares to obtain parameter estimates

can be less than 0 or large than 1.

πi = Pr {Yi = 1│Z′i = (z1i ， ⋯ zri )} = F(β0 + ∑rj=1 βj zji )

where F(.) is a cumulative dist. function(c.d.f) and

β = (β0 , β1 , ⋯ βr )′ is a vector of unknown parameters.

Clearly, 0 ≤ πi ≤ 1 for all Zi .

πi can achieve any value between 0 and 1 where F is

continuous. F(.) links the conditional probabilities {πi } to the

parameters β. It is a special kind of “link function”.

(1) Probit model:

be the c.d.f for the standard Normal Distribution. Then

The c.d.f for the “standard” logistic dist. is

and the logistic regression model is

On the left side is the natural logarithm of the odd of success

given Z′i , which is called a logit.

On the right side is a linear function of the parameters.

exp(β0 +∑rj=1 βj zji )

exposure to gaseous carbon disulfide.

49.057 3.893 6 53 6/(6+53) -2.179

52.991 3.970 13 47 13/(13+47) -1.285

56.991 4.043 18 44 … -0.894

64.359 4.164 52 11 …. 1.553

68.891 4.233 53 6 … 2.179

72.611 4.258 61 1 … 4.111

76.542 4.338 60 0 60/(60+0) -------------

Logistic regression model:

access the fit of the propose logistic regression model.

Interpretation of model parameters:

mg/liter or the conditional probability that a randomly selected

insect is killed exposure to 1 mg/liter above of gaseous carbon

β1 = (β0 + β1 (Z + 1)) − (β0 + β1 Z)

(i.e. conditional log-odds for mortality given exposure to z=

multiplicative increas in the log −

odds when Z is increased by 1

Maximum likelihood estimation:

First, construct the likelihood function. Let ni denote the

number of insects exposed to zi = ln(dosei ).

Assume each of these ni insects corresponds to an independent

are the conditional probabilities of mortality.

Let X i =observed # of dead(成功) insects after espouse to zi =

Assume that the results are ”independent” among those given

l (β, X) = ∑8i=1{ln(ni !) − ln(xi !) − ln(ni − xi ) !}+

β1 ∑8i=1 xi zi − ∑8i=1 ni ln (1 + eβ0 +β1 Zi )

Matrix representation of the likelihood equations

Then the likelihood equations are:

Matrix of second partial derivatives of the log-likelihood

function (multiplied by -1)

Newton-Raphson(fisher scoring) algorithm:

β̂(new) = β̂(old) + αH ̂ = β̂(old) + α(T ′ VT)−1 T ′ (x − m)

(1) Take α=1 unless α=1 is too big to give improvement in

(2) Evaluate the matrix V and m on right side with β̂(odd) .