Download as ppt, pdf, or txt
Download as ppt, pdf, or txt
You are on page 1of 61

Reliability and Validity

Introduction to Study Skills & Research Methods (HL10040)

Dr James Betts
Lecture Outline:
•Definition of Terms
•Types of Validity
•Threats to Validity
•Types of Reliability
•Threats to Reliability
•Introduction to Measurement Error.
Commonly used terms…

“She has a valid point”

“My car is unreliable”

…in science…
“The conclusion of the study was not valid”

“The findings of the study were not reliable”.


Some definitions…
• Validity

“The soundness or appropriateness of a test


or instrument in measuring what it is
designed to measure”
(Vincent 1999)
Some definitions…
• Validity

“Degree to which a test or instrument


measures what it purports to measure”

(Thomas & Nelson 1996)


Some definitions…
• Reliability

“…the degree to which a test or measure


produces the same scores when applied in
the same circumstances…”

(Nelson 1997)
Some definitions…
• Objectivity

“…the degree to which different observers


agree on measurements…”

(Atkinson & Nevill 1998)


Types of Experimental Validity
• Internal

– Is the experimenter measuring the effect of the


independent variable on the dependent variable?

• External

– Can the results be generalised to the wider


population?
Validity
AKA Criterion

Logical Statistical

Construct

Face Content Concurrent Predictive

Reliability Consistency Objectivity


Logical Validity
• Face Validity
– Infers that a test is valid by definition
– It is clear that the test measures what it is supposed to

e.g.
If you want to assess reaction
time, measuring how long it
takes an individual to react to
a given stimulus would have Externally
face validity Valid?
Logical Validity
• Face Validity
– Infers that a test is valid by definition
– It is clear that the test measures what it is supposed to

i.e.
Would assessing 15 m sprint
time be a valid means of
assessing reaction time?

Assessing face validity is therefore a subjective process.


Logical Validity
• Content Validity
– Infers that the test measures all aspects contributing to the
variable of interest
e.g.
Who is the most physically
fit?
VO2 max test?
Wingate test?
1 RM?
…also a subjective process.
Overall:

A logically valid test simply appears to


measure the right variable in its entirety?
Statistical Validity
• Concurrent Validity
– Infers that the test produces similar results to a
previously validated test

e.g.
VO2
max

Incremental Treadmill Protocol


with expired gas analysis Multi-Stage Fitness (Beep) Test
Statistical Validity
• Predictive Validity
– Infers that the test provides a valid reflection of
future performance using a similar test

e.g.
Can performance
during test A be
used to predict
future performance
in test B?

A B
http://www.youtube.com/watch?v=vdPQ3QxDZ1s
Overall:

A statistically valid test produces results


that agree with other similar tests?
Logical/Statistical Validity
• Construct Validity
– Infers not only that the test is measuring what it is
supposed to, but also that it is capable of detecting
what should exist, theoretically
– Therefore relates to hypothetical or intangible
constructs
e.g.
Team Rivalry

Sportsmanship.
Logical/Statistical Validity
• Construct Validity
– Infers not only that the test is measuring what it is
supposed to, but also that it is capable of detecting
what should exist, theoretically
– Therefore relates to hypothetical or intangible
constructs
– This makes assessment difficult,
i.e. if what should exist cannot be detected, this could mean:

a) Test Invalid? b) Theory Incorrect? c) Sensitivity/Specificity Issues?


Interesting Example: Breast Cancer
• Incidence: ~1 % (0.8 %)
(i.e. a positive result should be detected for approximately 1
in every 100 women tested)
• Sensitivity: ~90 % (87 %)
(the mammogram is sensitive enough that approximately 90
in every 100 breast cancer patients will receive a positive result)
• Specificity: ~90 % (93 %)
(the mammogram is specific enough that approximately 90
in every 100 healthy patients will receive a negative result).

Data from Kerlikowske et al. (1996)


Quick Test

• What is the probability that a


patient receiving a positive
result actually has breast
cancer?
Threats to Validity
(and possible solutions?)
Threats to Internal Validity
• Maturation
– Changes in the DV over time irrespective of the IV
Threats to Internal Validity
• Maturation
e.g. One Group Pre-test Post-test

O 1
T O 2
Threats to Internal Validity
• Maturation (possible solution)

Time series

O 1 O 2 O 3 T O 4 O 5 O 6
Threats to Internal Validity
• Maturation (possible solution)
Pre-test Post-test Randomised Group Comparison
O 1 T O 2

n.b.
R RCT
O 3 P O 4
Threats to Internal Validity
• Maturation (possible solution)
Repeated measures designs can occasionally be an inappropriate
solution, even when randomised and counterbalanced
e.g.
Muscle Damage (repeated bout effect)
Vitamin Supplementation (wash-out period)

In which case independent measures designs could be used.


Threats to Internal Validity
• History
– Unplanned events between measurements
Threats to Internal Validity
• History

O 1
T O 2

e.g. exercise?

Therefore, solution = control extraneous variables!


Threats to Internal/External Validity
• Pre-testing
– Interactive effects due to the pre-test (e.g. learning,
sensitisation, etc.)
– Also influences External Validity
Threats to Internal/External Validity
• Pre-testing …so it is actually T+O1 that
e.g. is better than P, not T alone.
O 1 T O 2

Assessing muscle
mass here could make
R them train harder in
…but then respond better
to the T than the P…
O 3
both trials…
P O 4
Threats to Internal/External Validity
• Pre-testing (possible solution)
T
O 1
O 2

P O
O 4

R
3

T
O 5
Solomon Four-
Group Design P O 6
Threats to Internal Validity
Sophomore
Slump & SI
• Statistical Regression ‘Cover Jinx’
– AKA regression to the mean

– An initial extreme score is likely to be


followed by less extreme subsequent scores
e.g.
Training has the greatest effect on untrained individuals.

Therefore, solution = effective sampling.


Threats to Internal Validity
• Instrumentation
– A difference in the way 2 comparable variables
were measured
e.g.
Uncalibrated equipment

Therefore, solution = calibrate!


Threats to Internal Validity
• Selection Bias
– The groups for comparison are not equivalent
Threats to Internal Validity
• Selection Bias
e.g. Groups not randomly assigned
T O 1
i.e.

Static Group Group T were


resistance trained
Comparison
P to start with

Oa
Threats to Internal Validity
• Selection Bias (possible solution)

Either: T O 1

-Randomise group
assignment,
-Pre-test and post-
test difference, P
-Repeated Measures Oa
Design.
Threats to Internal/External Validity
• Experimental Mortality
– Missing Data due to subject drop-out
– Reduced n = reduced statistical Power
– Not only challenges quality of data gathered
(Internal Validity) but
also our ability to
generalise
(External Validity).
Therefore, solution =
recruit sufficient (young?)
participants
Threats to External Validity
• Inadequate description
– 5th characteristic of research…
…should be replicable

If nobody can replicate the methods of a given


study, then it is irrefutable and therefore lacks
external validity.

Therefore, solution = comprehensive methodology


Threats to External Validity
• Biased sampling
– Linked to statistical regression
– Sample does not reflect target population
–n≠N
Results generalised
across gender

Therefore, solution = random sample (of target population).


Threats to External Validity
• Hawthorne Effect
– DV is influenced by the fact that it is being
recorded
e.g.
Fastest sprint when
professor enters lab

Therefore, solution =
control the lab environment.
Threats to External Validity
• Demand Characteristics
– Participants detect the purpose of the study and
behave accordingly
e.g.
Sports Science students already know that the
carbohydrate drink is supposedly superior
Therefore, solution =
CHO double or single H2 O
blinding.
Threats to External Validity
• Operationalisation
– AKA Ecological Validity
– The DV must have some relevance in the
‘real world’
e.g.
TTE has no
Olympic
equivalent

Therefore, solution = choose your DV carefully.


Reliability
• Reliability is a pre-requisite of validity
e.g. Direct versus Indirect measures of VO2 max

-Gold Standard (i.e. valid and reliable) -Predictive


-Expensive -Cheap
-Complex -Easy
Reliability

Subject 1 60 ml.kg-1.min-1 60 ml.kg-1.min-1 60 ml.kg-1.min-1

Subject 2 55 ml.kg-1.min-1 55 ml.kg-1.min-1 55 ml.kg-1.min-1

Subject 3 70 ml.kg-1.min-1 70 ml.kg-1.min-1 70 ml.kg-1.min-1

Valid and Reliable


Reliability

Subject 1 60 ml.kg-1.min-1 65 ml.kg-1.min-1 65 ml.kg-1.min-1

Subject 2 55 ml.kg-1.min-1 60 ml.kg-1.min-1 60 ml.kg-1.min-1

Subject 3 70 ml.kg-1.min-1 75 ml.kg-1.min-1 75 ml.kg-1.min-1


5 ml.kg-1.min-1
Not Valid but Reliable correction?
Reliability

Subject 1 60 ml.kg-1.min-1 72 ml.kg-1.min-1 57 ml.kg-1.min-1

Subject 2 55 ml.kg-1.min-1 61 ml.kg-1.min-1 52 ml.kg-1.min-1

Subject 3 70 ml.kg-1.min-1 40 ml.kg-1.min-1 84 ml.kg-1.min-1


i.e. a test can never
Not Valid and not Reliable be valid without
being reliable?
Types of Reliability

• Relative
• Absolute
• Rater reliability (Objectivity)
– Intrarater reliability
– Interrater reliability.
Relative Reliability

Subject 1 60 ml.kg-1.min-1 63 ml.kg-1.min-1 57 ml.kg-1.min-1

Subject 2 55 ml.kg-1.min-1 56 ml.kg-1.min-1 48 ml.kg-1.min-1

Subject 3 70 ml.kg-1.min-1 65 ml.kg-1.min-1 66 ml.kg-1.min-1


i.e. Individuals maintain
Relatively Reliable position in the group
Absolute Reliability

Subject 1 60 ml.kg-1.min-1 63 ml.kg-1.min-1 57 ml.kg-1.min-1

Subject 2 55 ml.kg-1.min-1 56 ml.kg-1.min-1 48 ml.kg-1.min-1

Subject 3 70 ml.kg-1.min-1 65 ml.kg-1.min-1 66 ml.kg-1.min-1


i.e. Test-Retest
Not Absolutely Reliable within individuals
Rater Reliability
• Intrarater reliability
– The consistency of a given observer or
measurement tool on more than one occasion
Rater Reliability
• Interrater reliability
– The consistency of a given measurement from
more than one observer or measurement tool
e.g.
Score for the American Gymnast
British Judge = 9.9
French Judge = 4.4
Japanese Judge = 7.0
Threats to Reliability
• Fatigue
8 am 9 am 10 am

Subject 1 60 ml.kg-1.min-1 55 ml.kg-1.min-1 50 ml.kg-1.min-1

Therefore, solution = increase time between tests.


Threats to Reliability
• Habituation

Subject 1 60 ml.kg-1.min-1 65 ml.kg-1.min-1 70 ml.kg-1.min-1

Therefore, solution = familiarise prior to test.


Threats to Reliability
• Standardisation of Procedures
– Control of extraneous variables

• Precision of Measurements
– i.e. if we are happy to measure VO2 max to the nearest
10 ml.kg-1.min-1, then it could probably be reliably
predicted from your training volume and age.
Measurement Errors
• Ultimately, reliability is dependent on the
degree of measurement error in a given study

• The overall error in any measurement is


comprised of both systematic and random error

• We will address measurement error further next


week…
Literature Search Assignment
• The handout lists 8 questions which can be
answered through retrieving the corresponding
source articles
• Answer as many as possible and bring them to
next week’s lecture
• DO NOT contact author or order articles.
Selected Reading
• Atkinson, G. and A. M. Nevill. Statistical methods for
assessing measurement error (Reliability) in variables relevant
to sports medicine. Sports Medicine. 26:217-238, 1998.

• Holmes, T. H. Ten categories of statistical errors: a guide for


research in endocrinology and metabolism. American Journal
of Physiology. 286: E495-501.

• Thomas J. R. & Nelson J. K. (2001) Research Methods in


Physical Activity, 4th edition. Champaign, Illinois: Human
Kinetics
J.Betts@bath.ac.uk

You might also like