Download as ppt, pdf, or txt
Download as ppt, pdf, or txt
You are on page 1of 16

VALIDITY

AND
RELIABILITY
Delivered By:

Assoc. Prof. Dr. Othman Md.Johan

1
Primary Criteria Of Evaluation
Two primary criteria of evaluation of the in
any measurement or observation are:
 Whether we are measuring what we intend to
measure.
 Whether the same measurement process
yields the same results.
 These two concepts are validity and reliability.

2
Reliability

 Reliability is concerned with questions of


stability and consistency
- does the same measurement tool yield
stable and consistent results when
repeated over time.

- eg. a tape measure is a highly reliable


measuring instrument.
3
Validity
Validity refers to the extent we are measuring
what we hope to measure (and what we think
we are measuring).
Eg. A tape measure that has been created
with accurate spacing for inches, feet, etc.
should yield valid results as well.
i.e. : Measuring this piece of wood with a
"good" tape measure should produce a correct
measurement of the wood's length

4
 To apply these concepts to Psychological
assessment and evaluation, we want to use
measurement tools that are both reliable and
valid.
 We want questions that yield consistent
responses when asked multiple times - this is
reliability.
 Similarly, we want questions that get accurate
responses from respondents - this is validity

5
Reliability
 Reliability refers to a condition where a
measurement process yields consistent scores
(given an unchanged measured phenomenon)
over repeat measurements.

 Three criteria of reliability.


- Test-retest Reliability
- Inter-item reliability
- Interobserver reliability

6
Test-retest Reliability
 When a researcher administers the same measurement
tool multiple times - asks the same question, follows the
same research procedures, and assuming that there has
been no change in whatever he is measuring etc. – and
he obtains consistent results.
 Eg. When a researcher asks the same person the same
question twice ("What's your name?"), and he gets back
the same results both times. If so, the measure has test-
retest reliability.
 Measurement of the piece of wood talked about earlier
has high test-retest reliability.

7
Inter-item reliability

 This is a dimension that applies to cases


where multiple items are used to measure
a single concept.
 In such cases, answers to a set of
questions designed to measure some
single concept (e.g., altruism) should be
associated with each other.
8
Interobserver reliability

 The extent to which different interviewers or observers


using the same measure get equivalent results.
 If different observers or interviewers use the same
instrument to score the same thing, their scores should
match.
- Eg: the interobserver reliability of an observational
assessment of parent-child interaction is often
evaluated by showing two observers a videotape of
a parent and child at play.
 If the instrument has high interobserver reliability, the
scores of the two observers should match.

9
Validity

 Validity refers to the extent we are measuring what we


hope to measure (and what we think we are measuring).
 A valid measure should satisfy four criteria.
- Face Validity
- Content Validity
- Criterion-related Validity

- Construct Validity

10
Face Validity
 An assessment of whether a measure appears, on the face of it, to
measure the concept it is intended to measure.
 This is a very minimum assessment - if a measure cannot satisfy
this criterion, then the other criteria are inconsequential.
 Eg. of observational measures of behavior that would have face
validity.

- striking out at another person would have face validity for an


indicator of aggression.

- offering assistance to a stranger would meet the criterion of


face validity for helping.

- However, asking people about their favorite movie to measure


racial prejudice has little face validity.

11
Content Validity

 The extent to which a measure adequately


represents all facets of a concept.
 Consider a series of questions that serve as
indicators of depression (don't feel like eating, lost
interest in things usually enjoyed, etc.).
 If there were other kinds of common behaviors that
mark a person as depressed that were not included
in the index, then the index would have low content
validity since it did not adequately represent all
facets of the concept.

12
Criterion-related validity
 Criterion-related validity applies to instruments
than have been developed for usefulness as
indicator of specific trait or behavior, either now
or in the future.
 Eg.: driving test as a social measurement that
has pretty good predictive validity.
 That is to say, an individual's performance on a
driving test correlates well with his/her driving
ability.

13
Construct Validity
 There is not necessarily a pertinent criterion
available for many things we want to measure.
So we relate it to other measures as specified by
theory or previous research.
 So, Construct validity concerns with the extent
to which a measure is related to other measures
as specified by theory or previous research.
 Does a measure stack up with other variables
the way we expect it to?
 Eg: self-esteem refers to a person's sense of
self-worth or self-respect.

14
Construct Validity

 Clinical observations in psychology had shown


that people who had low self-esteem often had
depression.
 Therefore, to establish the construct validity of
the self-esteem measure, the researchers
showed that those with higher scores on the
self-esteem measure had lower depression
scores, while those with low self-esteem had
higher rates of

15
Group Work

 Factors affecting test validity


 Factors affecting test reliability
 State the method/methods use to determine
- Face Validity
- Content Validity
- Criterion-related Validity
- Construct Validity
 State the method/methods use to determine
- Test-retest Reliability
- Inter-item reliability
- Interobserver reliability

16

You might also like