L7 Confounding

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 57

PHPC2017

Confounding

Prof. Joey Yang


Division of Epidemiology
JC School of Public Health and Primary Care
The Chinese University of Hong Kong
Learning Objectives

Understand confounding and confounders


Understand the methods to control for
confounding in different stages of epidemiological
research
Review: What is epidemiology?

World Health Organization: Epidemiology is the study of


the distribution and determinants of health-related states
or events (including disease), and the application of this
study to the control of diseases and other health problems.

Distribution : prevalence, incidence,


Determinants : cause of disease or health-related events
(etiology)
Importance of research on etiology

Evaluate the causal relationship between exposures (risk


factors or protective factors) and health outcomes

?
Inform the prevention, prediction and treatment of disease

iii. 鼠
ft urmeranalysí fhypomesi
Review: Major types of
epidemiological study designs

Exploratory research (hypothesis generating)


Case reports/case series
Correlational (ecological) studies
Cross-sectional studies
Scientific
rigor
Further examination
higher
Case-control studies
Cohort studies
Randomized controlled trials (next lecture)
Research on etiology: exposurecau.se
① The need for a control group
disease ?

Follow-up
2181
31 (1.4%) lung caner
smokers

˙
Is smoking a risk factor of lung cancer?

0
蹦 ⼼

2181
Follow-up
?? lung cancer
non-smokers
Research on etiology:
The need for a control group 囇筑
2 weeks later

100 patients with Drug treatment


95 patients recover
common cold

0
Is the drug treatment effective?

100 patients with No drug treatment


common cold ?? recover
Research on etiology:
②Problem of incomparable control groups
92.4% young,
7.6% old
Follow-up
2181
31 (1.4%) lung caner
smokers
Follow-up
3327
118 (3.5%) lung cancer
non-smokers

maynotdrawarigntconclusion
45.4% young,
54.6% old

Is smoking a protective factor of lung cancer?


B-ngoge.com
Definition of confounding
A situation in which the effect of exposure or association
between exposure and outcome is distorted by the
presence of another variable (confounder)
The observed effect of exposure on the outcome is not
purely the effect of exposure itself, but a mixture of the
effects of exposure and other factors
position
Example 1: A hypothetical cohort study for
smoking and lung cancer

Cancer+ Cancer- Total


Smokers (92.4% young) 31 2150 2181
Non-smokers (45.4% young) 118 3209 3327

RR = (31/2181)/(118/3327)= 0.40
-> Smoking is a protective factor for lung cancer?

Does this reflect the effect of smoking itself or of the


generally younger age in the smokers, or both?
Cases of
Example 2:Down syndroms
Risk factor by birthSyndrome
of Down order
Cases per 100 000
live births
180
160
140
120
100
80
60
40
20
0
1 2 3 4 5


Birth order
Birth order is associated with Down Syndrome
Example 2: Risk factor of Down Syndrome

Cases per 1000


100000 live 900
800
births
700
600
500
400
300
200
100
0
< 20 20-24 25-29 30-34 35-39 40+

Age groups

.
Maternal age is also associated with Down Syndrome
cntwiaiffulfillallthree
3
Definition of confounder (the three criteria)

conf nder gsmokingwilnotafe tmechmfeo


A) Associated with the disease 先你
Cause or a proxy for a cause (determinant of the disease)
Not an effect of the disease

B) Not part of the causal pathway of the exposure


Must not be an effect of the exposure

C) Associated with the exposure qe


Imbalance in the comparison groups (exposed vs. non-exposed)

(A) and (B): judged by prior knowledge, common sense or


biological reasoning, not testable by the data from your own
study. (C): can be tested with your study data.
Definition of confounder

B) not part of the causal


pathway of the exposure

lmgcnncerage
smokng
Exposure Outcome

C) associated with A) a determinant


the exposure of the disease

The third variable (confounder)

The relationship between exposure, outcome and confounder


Is age a confounder in Example 1?

A) Associated with the disease


Cause or a proxy for a cause (determinant of the disease)
Not an effect of the disease

Prior knowledge: Old people are more likely to develop lung cancer,
regardless of their smoking status

Lung cancer

Age
Is age a confounder in Example 1?

B) Not part of the causal pathway of the exposure


Must not be an effect of the exposure

Common sense / biological reasoning:

changes in the cells/ Lung cancer


Smoking
genes/ proteins of lung

changes in age

Smoking Lung cancer

Age
Is age a confounder in Example 1?
C) Associated with the exposure
Imbalance in the comparison groups (exposed vs. non-exposed)

Test with data from the study:


Cancer+ Cancer- Total
Smokers (92.4% young) 31 2150 2181
Non-smokers (45.4% young) 118 3209 3327

Old people Young people Total


Smokers 165 2016 2181
Non-smokers 1815 1512 3327
noassociation
如果 -1 ,
RR = (165/2181)/(1815/3327)= 0.14 ≠ 1 如果 >"< 1 有 associát ,

Age is associated with smoking (old people less likely to smoke)


Is age a confounder in Example 1?

Smoking Lung cancer

Age

The relationship between smoking (exposure), lung cancer


(outcome) and age (the third variable): age is a confounder
Is maternal age a confounder in Example 2?

Birth order Down Syndrome

Maternal age

A) Maternal age is a risk factor of Down Syndrome ✓


B) Maternal age is not part of the causal pathway of the
exposure (e.g., birth order -> maternal age -> DS)
ˇ

C) Maternal age is correlated with birth order ˇ

Maternal age is a confounder


Result of confounding: change in effect estimate
rr 細过0 :
protectiu
Young+Old Cancer+ Cancer- Total
Smokers 31 2150 2181 RRcrude=0.4

smokeispotectiadjustedo
Non-smokers 118 3209 3327

Old People Cancer + Cancer - Total


Smokers 15 150 165 RRold=1.5
Non-smokers 110 1705 1815
Young People Cancer + Cancer - Total
Smokers 16 2000 2016 RRyoung=1.5
Non-smokers 8 1504 1512

RRadjusted (1.5) RRcrude (0.4)


smokeisriskfactor
Result of confounding: change in effect estimate

Young+Old Cancer+ Cancer- Total


Smokers 150 350 500 RRcrude=2.3
Non-smokers 170 1150 1320

Old People Cancer + Cancer - Total


Smokers 100 200 300 RRold=2.0
Non-smokers 20 100 120
Young People Cancer + Cancer - Total
Smokers 50 150 200 RRyoung=2.0
Non-smokers 150 1050 1200

RRadjusted (2.0) RRcrude (2.3)


Result of confounding: change in effect estimate

Confounder may spuriously strengthen (positive


confounding) or weaken (negative confounding) the
association between exposure and outcome.

After controlling (adjusting for) confounding:

RRadjusted RRcrude

True False
Result of confounding: change in effect estimate

As long as the third factor (suspected confounder) meets all of


the three criteria, any difference between the adjusted and
crude effects indicates the existence of confounding
The magnitude of difference (small or big) is a matter of
whether the confounding effect is (large enough so that it is)
clinically/practically important , but not whether the
confounding effect exists
The so-called rule of thumb (e.g., using ~10% difference as
the threshold to define confounding) is not justifiable, because
the magnitude of difference is related to many factors,
including sample size and the inherent ability of the third
factor in affecting the outcome (strong or weak)
Rationale for controlling confounding
A) Associated with the disease
Cause or a proxy for a cause (determinant of the disease)
Not an effect of the disease

B) Not part of the causal pathway of the exposure


Must not be an effect of the exposure

C) Associated with the exposure


Imbalance in the comparison groups (exposed vs. non-exposed)

(A) and (B): judged by prior knowledge, common sense or


biological reasoning, not testable by the data from your own
study. not modifiable in your study
(C): can be tested with your study data. modifiable in your
study
Rationale for controlling confounding
Make the comparison groups comparable in the
distribution of potential confounders in terms of their
average values (e.g., for age) or percentages (e.g., for sex)
Then, the potential confounders will no longer be
associated with the exposure, hence not fulfilling the
criteria of confounder

Exposure Outcome

The third variable


(potential confounder)
Methods for controlling confounding

In the design stage


1) Restriction
2) Matching
3) Randomization (random allocation)

In the analysis stage


1) Stratified analysis
2) Standardization (Discussed in Lecture 2)
3) Multivariable regression analyses (discussed in
biostatistics courses)
Methods for controlling confounding in the
design stage: (1) restriction

Inclusion/exclusion criteria: Limits study to one


category/level of the potential confounder.
Smoking status (2 categories): smoker vs non-smoker
Sex (2 categories): male vs female

Example: A study aims to examine the association of


exposure to asbestos (石棉) with lung cancer.
Previous studies/ prior knowledge showed that
smoking could be an important confounder.
Methods for controlling confounding in the
design stage: (1) restriction
To prevent the confounding caused by smoking, smokers
are excluded from the study, i.e. only non-smokers are
included.
In the exposed group, 100% are non-smokers; in the non-
exposed group, the same. Thus, smoking status is not
associated with the exposure and will not cause
confounding.

Asbestos Lung cancer

Smoking
Tthezidcnterianotfulfilled
Methods for controlling confounding in the
design stage: (1) restriction
i
Cannot study the effect of the restricting factor
May be difficult to achieve the desired sample size if the
restricting factors are more than a few and if each of
them represents a fairly large part of the population
Undermine the generalizability of study results

An extreme case: if you restrict the study to young, non-


smoking female

lo 器器
-
Methods for controlling confounding in the
design stage: (2) matching
The selection of unexposed subjects that in certain
important characteristics are identical, or nearly so,
to the exposed ones.
For example, in the following study about smoking
and lung cancer, each smoker is matched with a non-
smoker of the same sex
Case no. 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
Smokers F M M M F F M F M M M M F M M
Non-smokers F M M M F F M F M M M M F M M
Methods for controlling confounding in the
design stage: (2) matching
The proportion of female: 33.3% (5/15) in both smokers
and non-smokers
Sex is not associated with the exposure and will not cause
confounding

Smoking Lung cancer

Sex
Methods for controlling confounding in the
design stage: (2) matching

In the following study about smoking and lung cancer,


each smoker is matched with a non-smoker of similar
age (+/- 3 years)
Case no. 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
Smokers 52 49 59 63 46 72 57 55 61 63 50 68 56 54 65
Non-smokers 52 50 56 61 47 69 60 56 59 62 50 66 57 55 63
Methods for controlling confounding in the
design stage: (2) matching
Mean age of smokers (58 yrs) is similar to that of non-
smokers (57.5 yrs); thus, age is not associated with the
exposure.
Confounding may not be totally removed in this case, but
has been reduced to minimum while ensuring the study s
feasibility.

Smoking Lung cancer

Age
Methods for controlling confounding in the
design stage: (2) matching

The selection of unexposed subjects that in certain


important characteristics are identical, or nearly so,
to the exposed ones.

Cons:
Not practical if the confounders are more than a few
(feasibility issue)
Cannot study the effects of matching factors
Methods for controlling confounding in the
design stage: (3) randomization

Randomly assign the exposure (a certain


intervention, e.g. drug treatment) to study subjects
Balance all factors (including potential
confounders), known or unknown, especially for
large studies

Further discussed in next lecture ( intervention


studies )
Methods for controlling confounding in the
design stage: (3) randomization
In a study that aimed to evaluate the efficacy of ramipril
(an anti-hypertensive drug) in reducing cardiovascular
events in high-risk patients:
ntis
Patient Characteristics Treatment Control
Age - year 66 66
Female sex % 27.5 25.8
Blood pressure mmHg 139/79 139/79
Heart rate - beats/min 69 69
History of coronary artery disease % 79.5 81.4
History of stroke % 10.8 11.0
Methods for controlling confounding in the
design stage: (3) randomization
All factors are similar between exposed (treatment) and
non-exposed (control) groups
Thus, all confounders, known or unknown, are not
associated with the exposure (treatment)

Treatment CVD events

All potential confounders


Methods for controlling confounding in the
design stage: (3) randomization
Cons: can only be used to study potentially beneficial
intervention
The inherent characteristics of study subjects,
such as age, sex and history of stroke, can by no
means assigned/changed by others
It is not ethical to assign potentially harmful
intervention to study subjects
Methods for controlling confounding in the
analysis stage: (1) stratified analysis
1. Calculate the crude measure of association (e.g. crude RR)
2. Divide the data into strata according to categories of a third factor
(e.g., gender, age).
3. Within each stratum, calculate a stratum-specific RR (e.g. gender-
specific RRs)
4. If the stratum-specific RRs were similar to each other (needs to be
examined by a formal statistical test of their difference), pool them
over all strata to calculate a weighted average (i.e. the adjusted RR)
using the Mantel-Haenszel method
5. Compare the adjusted RR with the crude one. If there is a difference
between the two, confounding exists.
Methods for controlling confounding in the
analysis stage: (1) stratified analysis
Young+Old Cancer+ Cancer- Total
Smokers 31 2150 2181 RRcrude=0.4
Non-smokers 118 3209 3327

Old People Cancer + Cancer - Total


Smokers 15 150 165 RRold=1.5
Non-smokers 110 1705 1815
Young People Cancer + Cancer - Total
Smokers 16 2000 2016 RRyoung=1.5
Non-smokers 8 1504 1512

RRadjusted (1.5) RRcrude (0.4)


Cases of Down syndrom
Methods for controlling
by birth order andconfounding
in the
mother's age
analysis stage: (1) stratified analysis

Cases per 100000


live births

1000

800

600

400

200

0
1 2 3 4 5
Birth order

Older maternal age is a risk factor of Down Syndrome;


bi h o de i n
Methods for controlling confounding in the
analysis stage: (1) stratified analysis
This approach is simple and intuitive, but not practical for
controlling many potential confounders.
The above example is stratified according to age, so only 2
age-specific RRs need to be calculated.
If both age and sex are standardized, then the above
calculations will need to be done in male and female
separately (total calculations: 2×2=4).
In epidemiological studies, generally there are many
confounders, e.g. more than 10
Total number of calculations = levels of factor 1 × levels of
factor 2 × × levels of factor n (could be huge!!!)
Methods for controlling confounding in the
analysis stage: (2) standardization

Summary from Lecture 2:


Using a reference population
Calculate and compare the expected rate in the two (or
more) studied populations
Methods for controlling confounding in the
analysis stage: (2) standardization
Mortality in Sweden, 1962:
Age Deaths Population Mortality rate per 1000 persons
All ages 73,555 7,496,000 9.8
0-29 3,523 3,145,000 1.1
30-59 10,928 3,057,000 3.6
60+ 59,104 1,294,000 45.7

Mortality in Panama, 1962:


Age Deaths Population Mortality rate per 1000 persons
All ages 8,281 1,075,000 7.7
0-29 3,904 741,000 5.3
30-59 1,421 275,000 5.2
60+ 2,956 59,000 50.1
Methods for controlling confounding in the
analysis stage: (2) standardization
Age-specific mortality rates
(per 1000 persons) Standard/reference
Age
population
Sweden Panama
0-29 1.1 5.3 56,000
30-59 3.6 5.2 33,000
60+ 45.7 50.1 11,000
All ages 100,000

Age-standardized (adjusted) mortality rate in Sweden=


(1.1 56, 000 ) + ( 3.6 33, 000 ) + ( 45.7 11, 000 )
= 6.8 per1000 persons
100, 000
Methods for controlling confounding in the
analysis stage: (2) standardization
Age-specific mortality rates
(per 1000 persons) Standard/reference
Age
population
Sweden Panama
0-29 1.1 5.3 56,000
30-59 3.6 5.2 33,000
60+ 45.7 50.1 11,000
All ages 100,000

Age-standardized (adjusted) mortality rate in Panama=


( 5.3 56, 000 ) + ( 5.2 33, 000 ) + ( 50.1 11, 000 )
= 10.2 per1000 persons
100, 000
Methods for controlling confounding in the
analysis stage: (2) standardization
By using a reference/standard population, the
proportions of different ages became the same for both
countries.
The confounder is no longer associated with the exposure

Country Death

Age
Methods for controlling confounding in the
analysis stage: (2) standardization
Not practical when there are more than a few factors to be
standardized:
The above example is standardized for age alone, so only 3
age-specific rates are calculated for each population. In total,
3×2=6 rates need to be calculated
If both age and sex are standardized, then the above
calculations will need to be done in male and female separately
(total rates: 3×2×2=12). Total number of calculations = levels
of factor 1 × levels of factor 2 × × number of populations
(could be huge!!!)
Very difficult to identify a reference population with such
detailed information available
Methods for controlling confounding in the
analysis stage: (3) multivariate regression
analysis (not required to know the technical
details)
Multiple regression analyses, such as logistic regression
and Cox model, can easily and efficiently control for many
variables at the same time in one analysis.
A difference between the crude and adjusted effects will
suggest confounding and the adjusted effect is unbiased.
The most practical and widely used method
Step 1: In the logistic regression model, include the disease
(hypertension) as the outcome and smoking alone as the exposure to
estimate the crude OR (=0.938)
Step 2: In the logistic regression model, include both smoking (the
exposure) and the potential confounder (age) to estimate the OR for
smoking after adjusting for age (=1.029)
Step 3: Compare the crude OR and adjusted OR and
draw a conclusion
Crude OR=0.938 < adjusted OR = 1.029
Conclusion: There is confounding caused by age. The
confounding biased the true effect of smoking (i.e. 1.029)
towards the opposite direction (i.e. <1).
May adjust for many factors together at the
same time:
Methods for controlling confounding: summary
Restriction, matching, stratified analysis and
standardization can be used to control the confounding
caused by a very small number of factors
Randomization is a perfect method but can only be used to
study intervention , such as medical treatment (discussed
in next lecture)
In the other study designs, multivariate regression analysis
is the most practical and powerful method
Some studies may choose to combine several methods, e.g.
matching by one or two very important confounders
(sex/age) + multivariate regression analysis
PHPC2017
Confounding
Thank you!

Joey Yang
Division of Epidemiology
JC School of Public Health and Primary Care
The Chinese University of Hong Kong
Exercise 1.

Give an example of confounding


Ÿ Exposure smoking
Ÿ Outcome

Ÿ Confounder

Each group should give one example related to human health and one example not
related to health/disease

Exercise 2.

A team of reproductive epidemiologists studied the relationship between low birth


weight and risk of cognitive, motor and behavioural problems. They recruited 360
low-birth-weight babies and 360 normal-weight babies based on birth certificates. All
babies were followed up and then received a standardized developmental screening
test at 3 years of age. The testing results were categorized into two groups: normal
development or delayed development. The results from the study were:

Delayed development Normal development

Low birth weight 140 220

Normal birth weight 77 283

1) Calculate the crude risk ratio for the primary exposure (low birth weight)
1
器 ⼗號 ⼆ 1.82

To take account of the possibility that environmental lead exposure might confound
the relationship between birth weight and developmental status, blood lead levels
were determined from blood samples collected during follow-up. Elevated lead levels
(> 10 µg/dL) were found in 173 of the low-birth-weight children (88 of whom had
delayed development according to the screening test). Elevated lead levels were also
found in 72 of the normal-birth-weight children (24 of whom had delayed
development).
2) Clinical evidence has suggested that high blood lead level can lead to
developmental delay in children. According to the conceptual definition of
confounding, do you think blood lead level is a potential confounder of the
association between low birth weight and delayed development in this study

lowbloodòli
population?
Highbkod

prewneerotio =
( 器 ) ;-(畫 ) = 2 4
meneis assoaation
bloodeadleveluconfunder
betueen
) and owbīrth
Weigulmeprimangexposweg
mebloodleadlaelispotentialcontmderoftheass.ci
atgbetweenlowb.mn
weigntmddeoyeddeuopment
3) Do a stratified analysis to determine whether environmental lead exposure has
.

confounded the association between low birth weight and developmental delay.
(Create 2 x 2 tables for each stratum, estimate the RR for each stratum, and
interpret the results in comparison with the crude RR). Which measure of effect
would you report, crude RR or adjusted RR, and why?

Low lead Delayed development Normal development

Low birth weight 52 1 35


Rpr 151
Normal birth weight 5 3 35
2

High lead Delayed development Normal development

Low birth weight 88 85 RR ⼆


1 -53

Normal birth weight 24 49

Afwstratifledmbloodleadwl.tw
stratmn-speeificassouatwnsaresimr.la
randdifuentfromthecrude RR ,

You might also like