Download as pdf or txt
Download as pdf or txt
You are on page 1of 36

Reliability, Validity

&
Threats to them

By
Dr. Abebe Megerso
(BSc. MPHE. PhD. Asst. Prof. Epidemiology)

1
Outline
• Introductory concepts & definitions of terms,
• Reliability & Threats to it,
• Reliability assessment methods,
• Validity & its common classifications,
• Relation b/n sample size & Validity,
• Threats to validity & methods to mitigate them,
• Demonstration using software applications,

2
Introduction
• How do you define Epidemiology?
• Informal definition of Epidemiology is that ‘it
is a Science of Sciences’.
• What is Science?
‘Science is the pursuit & application of
knowledge & understanding of the natural &
social world following a systematic
methodology based on evidence.’

(Society of Science, 2009)


3
Introduction …
• Public Health is not a simple, reactive ‘take the pill
3X a day’ solution;
– it is a systematic approach of generating evidence &
guiding interventions,
• Epidemiology is one of the pillars of PH where
Epidemiologists study distribution & determinants
of PH problems,
– Therefore, measuring, understanding & documenting
evidences is the key role of Epidemiologists,
• Ensuring Reliability/precision & Validity/accuracy
of measurements is crucial for Epidemiologists,
4
Reliability/ Precision
• Reliability refers to whether measurement
(data collection) techniques & analytic
procedures would reproduce consistent
findings:
– If they were repeated on an other occasions, or,
– If they were replicated by an other researcher,
• It is about how the measure provides stable,
dependable & consistent results,
– It is unrelated to usefulness of the result, but a
precondition for validity,
5
Precision …
Precise but not valid estimation Less precise but valid estimation

6
Threats to Research Reliability
• Participant Error – any factor which adversely
alters the way in which a participant performs,
• Participant Bias – any factor which produces a
false response,
• E.g. social desirability bias, volunteer bias
• Researcher Error - any factor which alters the
researchers interpretation,
• Researcher Bias – any factor which introduces
systematic error in the researchers’ recording of
responses,
– E.g. Bias at measurement, analysis or interpretation
7
Reliability Coefficient
• In measurement, there are two possible
variances:
– Variance of the true score (Vt) &
– Variance in the observed score (Vo),
• Reliability coefficient = Vt/ Vo  value b/n 0 & 1
• It is a measure of variability explained by the
true score differences among the participants,
– E.g. if is 0.95 95% of the score variability is
explained by true score differences while 5% is due
to measurement error,
8
Reliability Assessment Techniques
• There are a number of techniques which can
be selected based on the character to be
measured,
• Some examples are:
i. Test-retest
ii. Alternate forms Reliability,
iii. Internal Consistency,
iv. Inter-rater consistency, etc.

9
i. Test-Retest Technique
• This is use of same test on same group at
different times, & then test for agreement
between the two sets of results using different
statistical methods,
– This technique is good if the instrument measures
stable characteristics (e.g. intelligence),
• It is not appropriate when scores can be
affected by repeated measurement,
• It is also inappropriate if clients change b/n the
two measurement administrations,
10
ii. Alternate forms Reliability
• This is an application of two equivalent forms
of instruments to same group,
• It can be applied at different times, but this
can contribute to error,
• This approach is appropriate to measure
stable characteristics & when scores are not
affected by repeated measurement,

11
iii. Internal Consistency
• This is the most commonly used technique to
assess whether the items in the instrument
measures same attribute of the character,
• This include different methods:
– Split-half method – split the instrument in to half
so that each participant has two scores,
• Lower number of items result in decreased reliability,
– Cronbach’s alpha – formula used to calculate
inter-item consistency,

12
Internal Consistency ..
• It tests to see if multiple-
question Likert scale surveys
are reliable.
• Likert scale questions
measure latent variables
hidden or unobservable
variables like:
– a person’s conscientiousness,
neurosis or openness.
• Cronbach’s alpha tells us
how closely related a set of
test items are as a group to
measure these very difficult
to measure/latent variables.
13
Rule of Thumb for Cronbach’s alpha

14
iv. Inter-rater consistency
• This method is used when there are two or
more raters:
– Calculation of correlation coefficient or,
– Calculation of percentage agreement b/n raters,
• Used for behavioral observations,

• Providing training & having the raters work


independently reduces error,

15
Validity
• An instrument is valid when it measures what it
was intended to measure,
• A valid instrument is also accurate & reliable; but
not all reliable are valid instruments,
• Generally, there are four major categories of
Validity:
– Internal validity
– External validity
– Conclusion validity
– Construct validity

16
I. Interval Validity
• Internal validity is established when our
research demonstrates a causal relationship
between the study variables,
– e.g. Hyperlipidemia  Statin presc … why?
• It can also be defined as how well result of the
study explain the actual population studied,
– Sample  study population  source/target
population,

17
II. External Validity
• External validity is concerned with whether a
study’s result can be generalized to other
relevant settings or groups,
– e.g. study on whites & apply to blacks

18
III. Conclusion Validity
• Conclusion validity is concerned with
whether the drawn conclusions are based up
on the results of the study

19
IV. Construct Validity
• A construct is a group of inter-related variables
which can not be observed directly or latent
variable,
– e.g. intelligence, depression, aggression,
• Construct validity is the extent to which a
test/scale measures the construct it claims to
measure,
• E.g. Does the scale correlate to actual outcomes?
• Adequacy of operational definitions of study
variables,

20
Construct Validity …
• Construct validity measures has different types of
sub categories:
i. Face validity
ii. Content validity
iii. Criterion validity (Concurrent & predictive)
iv. Convergent
v. Discriminant

21
i. Face Validity
• This is not a true form of validity, but the first
requirement in designing instruments,
– Assessed through judgment or non expert rating,

• It is achieved when an instrument looks like


that it measures what it was designed to
measure,
• It is useful because participants may not
respond accurately without face validity,

22
ii. Content Validity
• An instrument has content validity when its items
are a representative sample of the larger content
domain,
– Does the measure sufficiently cover the area it intends
to cover?
• Content validity is used when we want to know
whether a sample of items truly reflects an entire
universe of possible items,
• It is determined by subject matter experts &
associated with achievement tests,
• See example in the following slide:
23
Which of this has Content Validity?
Lacks content Validity Has content Validity

24
iii. Criterion Validity
• This is achieved when an instrument’s scores are
substantially related to a criterion/standard in the
real world,
• This validity has two types:
– Concurrent validity – an instrument highly correlated
with a well established instrument & the two measures
occur at the same time,
– Predictive validity – the same as the concurrent, but
one measure is administered first & intended to predict
a criterion in the future,
• Scores on the measure predict behavior on a criterion
measured at a time in the future,
25
Example for Criterion Validity

26
iv. Convergent & Discriminant Validity
• Convergent – how instrument agrees
with good established instruments.
– Scores on the measures are related to other
measures of the same constructs (high
correlation),
• Discriminant - how instrument disagrees
with good established instruments.
– Scores on the measures are not related to other
measures that are theoretically different (low
correlation),

27
Relation b/n sample size &
Accuracy/Validity

28
Lesson from accuracy-sample plot
• Accuracy is 100% in the case of a census & the
pattern of accuracy growth is not linear,
• The accuracy of a sample equal to half of the
population size is not 50%; but very near to 100%,
• Good accuracy levels can be achieved at relatively
small sample sizes, if the samples are representative,
• The result of this relationship is that, beyond a certain
sample size, the gains in accuracy are negligible while
sampling costs increase significantly,

29
Threats of Validity
• Systematic Errors:
– Bias
– Cofounding
– Interaction
– Effect modification,
• Random Error:
– Chance,

NB: We need to remain vigilant of these threats as


we generate evidences,

30
Bias
• Broad classification of Bias:
1. Selection or sampling bias,
– Can simple random sampling always possible?
• How to compensate for the lost power?
– How to sample hard-to-reach population?
• Snow-ball, Respondent driven sampling etc.
– Design related selection biases
• E.g. survivor bias in survey etc.
2. Information or Measurement bias,
– Erroneous instrument, skill gap, respondent related etc.
– Desirability bias, researcher bias, etc.
• Once introduced nothing or little can be done to
control bias afterwards,
31
Confounding
• A confusion caused by 3rd variable which:
– Affects both the exposure & outcome,
– Away from the causal pathway (Acyclic),
• Control at design stage:
– Restriction
– Randomization
– Matching
• Control at Analysis stage:
– Stratified analysis
– Adjusting (multivariable Analysis)
32
Demo: How to assess reliability & Validity

• SPSS
• JAMOVI

• (Using COVID 19 Vaccine data)

33
Reliability Result in JAMOVI

34
Assignment
• Please, search for & download a published
paper of any analytical study design,
• Assess the article for threats of reliability &
validity,
• Prepare a short summary of your findings to
present it in class for discussion,
• It has to be ready for the subsequent week
class,

35
Thank you !

36

You might also like