Assessing Data Quality - Bias (SR)

Assessing data quality:
Bias
Bi
D St
Dr. h i R
Stephanie Rollll
Institute for Social Medicine, Epidemiology and Health Economics
Charité University Medical Center, Berlin, Germany
Learning objectives
• Assessing the quality of study data
• Learn about different types of bias.
• Distinguish between random and systematic error.

Example: Study on tuberculosis
tuberculosis.
Study question
What is the prevalence of
tuberculosis (TB) in Cambodia?
The perfect study
• Includes all Cambodians (ca. 14 mil.).
• Assesses TB status objectively,
j y equally,
q y at same
time.
 Result (invented)
Persons with TB: 140 000 (1%)
Persons without TB: 13 860 000 (99%)
 we know the exact prevalence of TB: 1%

 Unfortunately this study is a dream (unrealistic)
The realistic study
• Survey (cross-sectional) planned in 50 villages.
• Agree
g to p
participate:
p 32 villages.
g
• Within each village: 97% of inhabitants tested.
 Result (invented)
Persons with TB: 280 000 (2%)
Persons without TB: 13 720 000 (98%)
What happened?
 prevalence of TB: 2%?
What happened?
• Only 32 of 50 villages participated.
Village participated
Village did not participate
• Villages with bigger TB-problems may be more

likely to participate (more interested).
 „selection bias“
Bias (systematic error)
Bias is a systematic tendency to get an incorrect

result.
result
Bias is an error in design or conduct of study that

results in a conclusion which is different from the
truth.
Selection bias I
total population
1) Is the sample
representative?
studyy sample
p
Selection bias II
total population
2) Are the groups

comparable?
Group A
studyy sample
p
Group B
3 possible explanations for a
result
Association between exposure and outcome
Bias Chance True effect

(systematic error) (random error)
Consequences of bias
Bias will result in an over- or underestimation of the

true effect .
Two main types of bias
any aspect of the way subjects are
assembled in the study that creates
• selection bias a systematic difference between
the compared populations that is
not due to the association under
study
any aspect of the way information

is collected in the study that
• information bias creates a systematic difference
b t
between the
th compared d populations
l ti
that is not due to the association
under study
with many categories
non-respondent
d t bi
bias self referral bias
giving consent bias .... many more...
sampling bias
selection bias missing data bias
attrition bias
lost to follow up bias
measurement bias
regression to mean
information bias recall bias

misclassification bias
reporting bias observer bias

i t i
interviewer bi
bias .... many more...
Examples of selection bias
“self-selection” of individuals to participate
• p
people
p interested in taking gppart in a studyy
• people giving consent vs. not giving consent
selection of sample by researchers

• different selection process for cases and controls
• different selection process for exposed and
unexposed participants
Selection bias in a cohort study
Example: self referal bias
Birth cohort on asthma to estimate the incidence

and prevalence of asthma in children in the next 10
years.
Parents who have asthma themselves are more

likely to participate.
Mini-Quiz
Mini Quiz
How will this type of self referal bias
influence the prevalence of asthma
in children?
a) the prevalence will be overestimated 

b) the prevalence will be underestimated
c) the prevalence could be over or underestimated

Selection bias in case-control
study
Association of mobile phone use and brain tumors
• selection
l ti off cases: patients
ti t ini hospital
h it l
• selection of controls: customers of a mobile phone store
A good idea?
a) yes
b) no

Selection bias in case-control
study
Association of mobile phone use and brain tumors
• selection of cases: patients in hospital
• selection of controls: customers of a mobile phone store
no phone cases
phone (brain tumor)
controls
phone
(no brain tumor)
Attrition bias
• attrition = loss of participants

– of entire study
y
– different loss of participants in groups
Attrition bias: example
Study
St d on hand
h d washing
hi promotion
ti iin children
hild age 4
4-10
10
children
N=200
hand washing promotion control group

(N=100, mean age 6.7 y) (no hand washing promotion)
(N=100 mean age 6.4
(N=100, 6 4 y)
diarrhea diarrhea
(N=60, mean age 8.6 y) (N=85, mean age 6.5 y)
with many categories
non-respondent
d t bi
bias self referral bias
giving consent bias .... many more...
sampling bias
selection bias missing data bias
attrition bias
lost to follow up bias
measurement bias
regression to mean
information bias recall bias

misclassification bias
reporting bias observer bias

i t i
interviewer bi
bias .... many more...
Information bias
also called
measurement bias
or classification bias
or misclassification bias
• systematic differences in how outcomes or

exposures are assessed and interpreted
• outcomes or exposures are ‘misclassified‘

Differential vs. non-differential
information bias
Non-differential
• if misclassification of exposure is unrelated to disease
• if misclassification of disease is unrelated to exposure
 effect: bias towards the null (OR and RR closer to 1.0)
Differential
• iff misclassification
f off exposure is related to disease
• if misclassification of disease is related to exposure
 effect:
ff t bias
bi can go in i either
ith direction
di ti from
f the
th null;
ll it can
inflate or attenuate your effect estimates (OR and RR)
Diagnostic suspicion bias
K
Knowledge
l d about
b t subject‘s
bj t‘ exposure leads
l d tto more
thorough
g search for the outcome than for an
unexposed individual.
 Exposed subjects are more likely to have the
disease diagnosed than the nonexposed
nonexposed.
Example: for a heavy smoker, one might check more

thoroughly for signs of cancer
Mini-Quiz
Mini Quiz
How will diagnostic suspicion bias

influence the association between
exposure and outcome?
a) the association will be overestimated


b) the association will be underestimated
Detection bias
• Occurs when an exposure, rather than causing

disease causes symptoms that leads to search for
disease,
the disease.
 Even though unrelated,

unrelated the disease is more
often diagnosed in exposed subjects.
Mini-Quiz
Mini Quiz
How will detection bias influence

the prevalence?
a) the risk will be overestimated 

b) the risk will be underestimated
Recall bias
• Cases may more closely look into their past

searching for possible explanations of their illness
illness.
• Controls, not having
g the disease, may
y less closely
y
examine their past history.
• Recall bias is a problem especially in case-control
studies / retrospective studies!
Mini-Quiz
Mini Quiz
If recallll bi
bias iis presentt iin a case-control
t l
study, how will it usually affect the result?
a) a greater association is found 

b) a smaller association is found
Reporting bias
• different information on exposure or outcome obtained
Examples
• Cases (with severe or long lasting disease) tends to have
complete records and more complete information about
exposures than controls
• Study participants give desirable answers
– support researcher’s
researcher s hypothesis
– conceal undesirable or not accepted behaviours
(smoking during pregnancy
pregnancy, violence within the family)
or particular diseases (sexually transmitted diseases,
HIV))
Overmatching bias
Overmatching
in a matched case-control study: cases and controls are
matched by a non-confounding
non confounding variable that is associated to
the exposure but not to the disease
• Overmatching can underestimate an association
 Prevention: matching only for confounding variables

Stages of research prone to bias (Sackett, 1979)
• Literature Review
• Study Design
• Study Execution
• Data collecion
All stages!
• Data analysis
• Interpretation of Results
• Publication
Biases in
in...
- Foreign language
exclusion bias
• Study Design - Literature search bias
- One
One-sided
sided reference bias
• Study Execution - Rhetoric bias
• Data collecion
• Data analysis
• Publication
Selection bias
Biases in
in...
Sampling frame bias
Berkson (admission rate) bias
• Literature Review Centripetal bias
Diagnostic access bias
Diagnostic purity bias
• Study Design Hospital access bias
Migrator bias
• Study Execution Prevalence-incidence (Neyman /

selective survival; attrition) bias
Telephone sampling bias
• Data collecion Nonrandom sampling bias
Autopsy series bias
Detection bias
• Data analysis Diagnostic work-up bias
Door-to-door solicitation bias
Previous opinion bias
• Interpretation of Results Referral
f filter
f bias
Sampling bias
• Publication Self-selection bias
U
Unmasking
ki bi bias
Biases in
in...
• Study Design
- wrong control bias
• Study Execution - contamination bias
(controls also receive
• Data collecion treatment/are exposed)
• Data analysis - compliance bias
• Publication
Biases in
in...
- Instrument bias
C
Case d fi iti bi
definition bias
Diagnostic vogue bias
Forced choice bias
• Literature Review Framing bias
Insensitive
I iti measure bias bi
Juxtaposed scale bias
• Study Design Laboratory data bias
Questionnaire bias
S l fformatt bias
Scale bi
• Study Execution Sensitive question bias
Stage bias
Unacceptability bias
U d l i /
Underlying/contributing
t ib ti cause off d
death
th bias
bi
• Data collecion Voluntary reporting bias
- Data source bias
Competing death bias
• Data analysis Famil history
Family histor bias
Hospital discharge bias
Spatial bias
- Observer bias
• Interpretation of Results Diagnostic suspicion bias
Exposure suspicion bias
Expectation bias
• Publication Interviewer bias
Therapeutic personality bias
Biases in
in...
- Subject bias
Apprehension bias
Attention bias (Hawthorne effect)
• Literature Review Culture bias
End-aversion bias
(end-of-scale/central tendency bias)
Faking bad bias
• Study Design Faking good bias
Family information bias
Interview setting bias
• Study Execution Obsequiousness bias
Positive satisfaction bias
Proxy respondent bias
- Recall bias
• Data collecion Reporting bias
Response fatigue bias
Unacceptable disease bias
• Data analysis Unacceptable exposure bias
Underlying cause (rumination bias)
Yes-saying bias
• Interpretation of Results - Data handling bias
Data capture error
Data entry bias
Data merging error
• Publication Digit preference bias
Record linkage bias
- Confounding bias
Biases in
in... Latency bias
Multiple exposure bias
• Literature Review Nonrandom sampling bias
Standard population bias
• Study Design Spectrum bias
- Analysis
y strategy
gy bias
• Study Execution Distribution assumption bias
Enquiry unit bias
• Data collecion E ti t bias
Estimator bi
Missing data handling bias
Outlier handling bias
• Data analysis
Overmatching bias
Scale degradation bias
• Interpretation of Results - Post
P t hoc
h analysis
l i bibias
Data dredging bias
• Publication Post hoc significance
g bias
Repeated peeks bias
Biases in
in...
• Study Design
• Study Execution
• Data collecion - Assumption bias

- Cognitive dissonance
• Data analysis bias
- Correlation bias
• Interpretation of Results - Generalisation bias
- Magnitude bias
• Publication - Significance bias
- Undere ha stion bias
Underexhaustion
Biases in
in...
• Study Design
• Study Execution
• Data collecion
• Data analysis
• Interpretation of Results - all's well literature bias

- positive result bias
• Publication - hot topic
p bias
Overview of types of bias
Bias. Delgado-Rodríguez M, Llorca J. J Epidemiol Community Health. 2004 Aug;58(8):635-41.
additional pdf-file
Bias. Delgado-Rodríguez M, Llorca J. J Epidemiol Community Health. 2004 Aug;58(8):635-41.
Exercise: Sources of bias
Please find an example from your field of work
(a real one that you experienced or a hypothetical example).
1. Diagnostic suspicion bias

((knowledge
g about subject‘s
j exposure
p leads to more thorough
g
search for the outcome compared to an unexposed individual)
2. Recall bias
(
(cases andd control
t l recallll exposures diff
differently)
tl )
3. Reporting bias of participants giving desirable answers
4
4. Selection bias
(regarding the total study population, or the groups to be
compared))
3 possible explanations for a
result
Association between exposure and outcome

Random errors
= deviation of results from the truth, occurring only as
a result of chance.
Possible reasons
• Variability of chosen sample from underlying
population
• Outcome or risk factor incorrectly assessed
(independent of group)
How to deal with random errors?
• Use
U bibig sample
l size
i
• Calculate p-value and confidence intervals

Random error and systematic error
random error
ystemattic errorr
good precision poor precision

good
d accuracy (unbiased)
( bi d) good
d accuracy (unbiased)
( bi d)
sy
good precision poor precision

poor accuracy (biased) poor accuracy (biased)
Mini-Quiz
Mini Quiz
If the study size increases, how does this
affect random and systematic errors?
random errors will a) get smaller 

b) get bigger
c) stay the same
systematic errors will d) get smaller
e) get bigger
f) stay the same


Random vs. systematic errors
in epidemiological studies
Random vs. systematic
y errors
Random error Systematic error

will cancel each other out in the will not cancel each other out
long run (large sample size) whatever the sample size
lead to imprecise results lead to invalid (inaccurate)

results
Summary
• Observational studies are especially prone to bias!
• Be aware of all kinds and sources of bias in planing
p g
and conducting of a study.
• Think of all possible sources and types of bias and
how to avoid them (e.g. blinding wherever possible).
• Be aware of all kinds and sources of bias in
believing results of a published study.
A
Association
i ti between
b t exposure and
d outcome
t

Any questions or comments?

Assessing Data Quality - Bias (SR)

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Assessing Data Quality - Bias (SR)

Uploaded by

Copyright:

Available Formats

Assessing data quality:

• Learn about different types of bias.

• Distinguish between random and systematic error.

Persons without TB: 13 860 000 (99%)

 we know the exact prevalence of TB: 1%

Persons without TB: 13 720 000 (98%)

Village did not participate

• Villages with bigger TB-problems may be more

Bias is a systematic tendency to get an incorrect

Bias is an error in design or conduct of study that

2) Are the groups

Bias Chance True effect

Bias will result in an over- or underestimation of the

any aspect of the way information

information bias recall bias

reporting bias observer bias

selection of sample by researchers

Birth cohort on asthma to estimate the incidence

Parents who have asthma themselves are more

a) the prevalence will be overestimated 

c) the prevalence could be over or underestimated

• selection of controls: customers of a mobile phone store

• selection of cases: patients in hospital

• selection of controls: customers of a mobile phone store

• attrition = loss of participants

hand washing promotion control group

information bias recall bias

reporting bias observer bias

• systematic differences in how outcomes or

• outcomes or exposures are ‘misclassified‘

Example: for a heavy smoker, one might check more

How will diagnostic suspicion bias

a) the association will be overestimated

• Occurs when an exposure, rather than causing

 Even though unrelated,

How will detection bias influence

a) the risk will be overestimated 

• Cases may more closely look into their past

a) a greater association is found 

• Overmatching can underestimate an association

 Prevention: matching only for confounding variables

• Study Execution Prevalence-incidence (Neyman /

• Data analysis - compliance bias

• Data collecion - Assumption bias

• Interpretation of Results - all's well literature bias

1. Diagnostic suspicion bias

Bias Chance True effect

• Calculate p-value and confidence intervals

good precision poor precision

good precision poor precision

random errors will a) get smaller 

c) stay the same

systematic errors will d) get smaller

f) stay the same

Random error Systematic error

lead to imprecise results lead to invalid (inaccurate)

Bias Chance True effect

You might also like