Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 37

Epidemiology and Data Analysis

Lecture 16

Logistic Regression

Tue, 10 July 2012

Masashi Kizuki
Health Promotion/ International Health
Tokyo Medical and Dental University
Today’s Topics

• Contingency table method (review)

• Logistic regression analysis

• Likelihood ratio test

• The Hosmer-Lemeshow goodness-of-fit test


Case 1

We will compare two antibiotics, cefaclor and


amoxicillin, in a RCT of 214 children with
acute otitis media. The primary endpoint is
cure within 14 days.

Are two antibiotics different? Use children as


the unit of analysis.
2x2 Contingency Table
Table. Crude association
Cured Not cured
Cefaclor 89 61
Amoxicillin 56 72

The odds of cure among cefaclor is


89/61 = 1.459
The odds of cure among amoxicillin is
56/72 = 0.778
The odds ratio (OR) of cure comparing cefaclor
with amoxicillin is
OR = 1.459/0.778 = 1.88
95% Confidence Interval for OR
Chi-square Test for 2x2 Table
H0: not different vs. H1: different
Expected table
Cured Not cured
Cefaclor 150*145/278 150*133/278 150
=78.24 =71.76
Amoxicillin 128*145/278 133*128/278 128
=66.76 =61.24
145 133 278

Because 6.72 >Χ21df,0.95, we reject H0


Case 2

We want to repeat the same analysis using a


mathematical model.
We will develop a statistical model to predict the
cure of disease.

Let
Y: cure by 14 days (1=yes, 0=no)
X: antibiotic (1=cefaclor, 2=amoxicillin)
Link Function

In a linear regression analysis, we expect that the relation


between and X is liner.
i.e. = α + β1x1 + β2x2 + …

In reality, there are many non-linear relations.


For example, when the outcome is a binary outcome (0
and 1), the relation cannot be liner.

We will expand the model using a link function.


{Link Function of } = α + β1x1 + β2x2+ …

A link function is a kind of transformation of Y.


Hypothetical Relation Between Probability and Exposure
To analyze binary outcome variable, we usually use p =
Pr(Y=1) rather than Y (0 or 1) itself.
Empirically, we assume a logistic curve between
probability of event and level of exposure as below.

1
Probability of event

logistic function
p

0
Level of exposure x
Transformation From Logistic Function to Logit

Then

Take loge of each side

Mathematically
equivalent

logit
Calculation of Logit

Logit of probability p is defined as log e of odds.

p logit(p)
0 loge(0/1) = loge(0) =
0.2 loge(0.2/0.8) = loge(0.25) = -1.39
0.5 loge(0.5/0.5) = loge(1) = 0
0.8 loge(0.8/0.2) = loge(4) = 1.39
1 loge(1/0) = loge() =
loge() ranges from (p=0) to (p=1).
Logistic Model

The relation between a binary outcome variable Y and


dependent variables, x1, x2, …, can be expressed using a
logistic model,
α+β1x1+β2x2+ … +e
where p = Pr(Y=1) and the link function is logit(p).

Here, we assume that the relation between logit(p) and


predictors is linear, or equivalently, that the relation
between p and predictors is logistic curve.

In logistic regression, maximum likelihood method is used


to estimate model parameters, α and β’s.
Result of Logistic Regression

Because X, antibiotic, is a categorical variable, we


should create a dummy variable for antibiotic.

Antibo(1) = 0 if amoxicillin and = 1 if cefaclor.

The log(odds) increases by 0.629 when cefaclor is used


compared with when amoxicillin is used.
Interpretation of β for Dummy Variable

Let pA be the probability of event when X1=1,


and pB be the probability when X1=0. Then,

Therefore, exp(β) = odds ratio comparing X1=1 with X1=0


(reference).
Interpretation of Results

exp(β) = exp(0.629) = 1.876


Odds ratio of cure comparing cefaclorwith
amoxicillin is 1.88.
95% confidence interval for β is
0.629 1.960.244 = (0.1508, 1.1072)
95% confidence interval for OR = exp(β) is
(exp(0.1508), exp(1.1072)) = (1.16, 3.03)
Hypothesis Testing in Logistic Regression

Hypotheses are H0: β=0 vs. H1: β≠0


Test statistic is under H0
The test statistic for β is
Because 2.58 > z0.975, we reject H0
p-value = 2*Pr(z>2.58) = 0.010
Prediction of Probability

The logistic model is mathematically equivalent to

We can estimate the probability of event (e.g. Y=1) for any


combination of dependent variables, x’s.

In our example,

For cefaclor (x=1),

This is same as observed relative frequency


Interpretation of β for Continuous Variable

Let pA be the probability of event when X=x1,


and pB be the probability when X=x1+1. Then
logit(pA) = α + βx1
logit(pB) = α + β(x1+1)
Then
β = logit(pA) - logit(pB)

Therefore, exp(β) = odds ratio for increase in X by 1.


Case 3

We also measured child age and side of ear. We


want to use these additional information to
improve our model.

Age: 1=0-1 years, 2=2-5 years, 3=6+ years


Side of ear: 1= one side, 2=both sides
Dummy Coding

Antibiotic (reference is amoxicillin)

Age (reference is 0-1 years)

Side of ear (reference is one side)


Multivariable Logistic Regression Model

logit(p) = -1.045 + 0.746cef


+ 1.000age2_5 + 1.605age6_
+ (-0.256)bothsides

The adjusted odds ratio comparing cefaclor with


amoxicillin is 2.11.
Difference between the adjusted and crude odds ratio
(1.88) indicates the presence of confounding.
Number of Variables in Logistic Regression

It is recommended to have ≥10 cases with an event (or


with no event) for every independent variable in the model.

In our example, there are 145 cured and 133 non-cured


children. Because 145 > 133, we will use 133.
133/10 = 13.3
We can include ≤13 independent variables in the
multivariable logistic regression.
Note that number of all dummy variables should be
counted. So if one characteristic has 3 categories and we
create 2 dummy variables, they are counted as 2 variables.
Likelihood

In logistic regression, likelihood quantifies the probability


of obtaining our sample data given the specified model.

Maximum likelihood method finds model parameters, α


and β’s, which maximize the overall likelihood.

We consider that a model with a higher likelihood is a


better model.

We use “-2 log likelihood” as an indicator for assessing


model fit. Because of its minus sign, lower “-2 log
likelihood” indicates better fit.
Likelihood Ratio Test

We can compare two models by -2 log likelihood.


Model 1 (k1 variables) -2 log L1
Model 2 (k1+k2 variables)-2 log L2
Model 2 is more complicated than model 1.

Hypotheses are
H0: -2 log L1 = -2 log L2
H1: -2 log L1 > -2 log L2

Likelihood ratio test statistic is


LR = (-2 log L1) - (-2 log L2)
The LR statistic is distributed as chi-square distribution
with k degrees of freedom.
Example of Likelihood Ratio Test

Model 1 (simpler model)


1 variable: antibiotic (1)
Model 2 (more complicated model)
1+2=3 variables: antibiotic (1), age (2)

Test statistic is
LR = (-2 log L1) - (-2 log L2)
= 378.127 - 354.078 = 24.05 ~ Χ2df
Because 24.05 > Χ2df, 0.95=5.99, we reject H0

Model 2 is significantly better than simpler model 1.


Age is significantly related with probability of cure.
-2 Log Likelihood in SPSS
Model 1 (antibiotic)

Model 2 (antibiotic & age)


Example of Likelihood Ratio Test

Model 1 (simpler model)


1 variable: antibiotic (1)
Model 2 (more complicated model)
1+1=2 variables: antibiotic (1), side of ear (1)

Test statistic is
LR = (-2 log L1) - (-2 log L2)
= 378.127 - 375.706 = 2.42 ~ Χ1df
Because 2.42 > Χ1df, 0.95=3.84, we accept H0

Model 2 is not better than simpler model 1.


Side of ear is not related with probability of cure.
-2 Log Likelihood in SPSS
Model 1 (antibiotic)

Model 2 (antibiotic & side of ear)


Nested Models

The likelihood ratio test can compare a simpler model 1


with a more complicated model 2.

The model 1 is said to be nested within model 2.


The simpler model should be obtained from the more
complicated model either by
• Removing some independent variables
e.g. remove age from the model, remove x2 from the
model, etc.
• Imposing some assumptions about the relation
e.g. use a continuous variable instead of categorical
variable, to impose a linear association
Case 4

Now we want to use contingency tables to adjust


for confounders and compare the result with
logistic regression.

We will create 6 tables and use the Mantel-


Haenszel weight method to estimate pooled odds
ratio.
Mantel-Haenszel Pooled Odds Ratio
OR M-H weight

14*21/17/6 17*6/58
=2.88 =1.76

39*25/13/20 13*20/97
=3.75 =2.68

15*8/8/17 8*17/48
=0.88 =2.83

8*13/10/2 10*2/33
=5.20 =0.61

10*4/12/5 12*5/31
=0.67 =1.94

3*1/1/6 1*6/11
=0.50 =0.55

¿ pool =
∑ ( wi × ¿^ i )
=2.16 cf. ORadj=2.11 in
∑ wi logistic regression
Logistic Regression & Contingency Table Analysis
Logistic regression Contingency table
analysis analysis
Transparency Black box Clear
of procedure Easy to check details
Adjustment for Easy Difficult or inefficient
confounders Add variables in the Number of tables
model becomes too many

Continuous OK Not OK
independent Use as continuous Variables should be
variables variables directly categorized

Many Relatively OK Number of strata


independent ≥10 events per each becomes too many
variables variable in the model is Some strata have too few
recommended data to analyze
Case 5

We want to assess the goodness of fit of a logistic


regression model.

Here, we will use the Hosmer-Lemeshow goodness-of-fit


test.
H0: model fits the data
H1: model does not fit the data
We will test if the predicted probabilities from the model
are reasonably close to the observed relative frequency.
Hosmer-Lemeshow goodness-of-fit test

1. Order each subject based on predicted probability from


the lowest to the highest.

2. Group ordered subjects into g (≥3) groups according to


the predicted probabilities (usually g=10)
3. Compute following values
Oj = ∑ y (observed frequency for group j)
Ej = ∑(expected frequency for group j)
= Ej / nj (expected mean probability for group j)
4. Compute a test statistic
Example of Hosmer and Lemeshow Test
Independent variables: antibiotic, age, side of ear

Observed and
expected
frequencies are
close.

Because p=0.498, we accept H0.


The model fits the data.
Today’s Progress

• Contingency table method (review)

• Logistic regression analysis

• Likelihood ratio test

• The Hosmer-Lemeshow goodness-of-fit test


Next Topics

• Interaction
• Variable selection

You might also like