Validity and Reliability

INSTRUMENTATIO
N AND
STATISTICAL
TREATMENT
RESEARCH INSTRUMENTS
•Tools that measure variables in the

study and are designed to obtain data
on a topic of interest from the
subjects of research.
VALIDITY
The validity of an instrument is the extent
to which an instrument measures what it is
intended to measure.
Validity tells you how accurately a method

measures something.
TYPES OF VALIDITY
• Content Related • Criterion Related
Appropriate Content Relationship to other
measures
1. Face validity: does the test

1. Concurrent validity: does the
appear to test what it aims to
relate to existing similar
test?
measure?
2. Content validity: Does the 2. Predictive validity: does the

test measure the concept that test predict later performance on
it’s intended to measure? a related criterion?
FACE VALIDITY
• simply whether the test appears (at face
value) to measure what it claims to. This is
the least sophisticated measure of validity.
• As face validity is a subjective measure, it’s
often considered the weakest form of validity.
However, it can be useful in the initial stages of
developing a method.
FACE VALIDITY
1. Suitability of the content of a
questionnaire or test seems to be on the
surface.
2. Structure and language used in the
study.
3. Informal and subjective
Example
You create a survey to measure the regularity of
people’s dietary habits. You review the survey
items, which ask questions about every meal of
the day and snacks eaten in between for every day
of the week. On its surface, the survey seems like
a good representation of what you want to test, so
you consider it to have high face validity.
CONTENT VALIDITY
The questions in a questionnaire must measure
what it wants to measure. It must cover all relevant
part of the subject it aims to measure.
Example:
A Physics exam should only cover all topics actually
taught to students and not unrelated material like English
or Biology.
Example:
If you develop a questionnaire to diagnose

depression, you need to know: does the
questionnaire really measure the construct (can
be measured by observing other indicators) of
depression? Or is it actually measuring the
respondent’s mood, self-esteem, or some other
construct.
CONCURRENT VALIDITY
It requires a criterion on which cases
(for example, people) are known to
differ and that is relevant to the concept
in question.
It measures how well a new test
compares to a well-known established
test.
Hamilton Stephanie’s
Depression new
Rating depression
Scale scale
administer both test and compare the

results
PREDICTIVE VALIDITY
It uses future criterion measure rather
than a contemporary one.
Some uses of Predictive validity

1. University admission tests
2. Pre-employment tests
CONVERGENT
VALIDITY VS.
DISCRIMINANT
VALIDITY
CONVERGENT VALIDITY
It tests that constructs that are expected to be
related are, in fact, related
Example:
A researcher who wants to show the convergent validity
of a measure of self-esteem may want to also show a
similar construct, such as self-worth, confidence, social
skills, and self-appraisal which are also related to self-
esteem.
DISCRIMINANT VALIDITY
It shows that two measures that are not
supposed to be related are in fact, unrelated.
Example:
Producing a scale that measures motivation is
not related to the scale that measures self-belief.
RELIABILITY
It means how consistent a method measures
something.
If the same result can be consistently achieved
by using the same methods under the same
circumstances, the measurement is considered
reliable.
Example:
You measure the temperature of a liquid
sample several times under identical
conditions. The thermometer displays
the same temperature every time, so the
results are reliable.
A doctor uses symptom questionnaire to
diagnose a patient with a long-term medical
condition. Several different doctors use the same
questionnaire with the same patient but give
different diagnoses.
What can you say about the reliability of the questionnaire?

It indicates that the questionnaire has low reliability as a
measure of the condition.
INTER-RATER RELIABILITY
(INTER-OBSERVER RELIABILITY)
It measures the consistency between two or more independent
raters (observers) of the same construct.
How to measure it?
1. Different researchers conduct the same measurement or
observation on the same sample.
2. Calculate the correlation between their different sets of results.
3. If all the researchers give similar ratings, the test has a high
inter-rater reliability.
Example:
A team of researchers observe the progress of
wound healing in patients. To record the stages of
healing, rating scales are used, with a set of criteria
to assess various aspects of wounds. The results of
different researchers assessing the same set of
patients are compared, and there is a strong
correlation between all sets of the results, so the test
has a high inter-rater reliability.
TEST-RETEST RELIABILITY
It measures the consistency between two
measurements (tests) of the same construct
administered to the same sample at two different
points in time.
The researcher use this reliability test when
measuring something that he/she expect to stay
constant in his/her sample.
Problems with Test-Retest Reliability
The longer the time gap, the greater is the chance
that the two observations may change during this
time, and the lower the test-retest reliability will
be.
Participants gaining knowledge about the purpose
of the test, so they are more prepared the second
time around.
SPLIT-HALF RELIABILITY
A test for a single knowledge

area is split into two parts and
then both parts are given to one
group of students at the same
time.
Example:
If you have a 10-item measure of a given construct,
randomly split those 10 items into two sets of five
and administer the entire instrument to a sample of
respondents.
A reliable test will have high correlation, indicating

that a student would perform equally well (or as
poorly) on both halves of the test.
INTERNAL CONSISTENCY
RELIABILITY
It measures how well the items on a test or
survey is actually measuring what you want it to
measure.
If all items on a test measure the same construct

or idea, then the test has internal consistency
reliability.
The survey has good internal consistency if
the respondents have the same answer for
each question, i.e. three “agrees” or three
“strongly disagrees”
Example:
Level of customer service in ABC Restaurant
Strongly Agree/Agree/Neutral/Disagree/Strongly
Disagree
1. I was satisfied with my experience in the restaurant.
2. I will probably recommend your restaurant to others.
3. If I write an online review, it would be positive.
PARALLEL FORMS
RELIABILITY
It measures the correlation between two
equivalent versions of a test.
It is used when you have two different

assessment tools or set of questions designed to
measure the same thing.
Example:
The same students take two

different versions of a test the they
should get similar results in both
tests.
RANDOM ERRORS
VS.SYSTEMATIC
ERRORS
RANDOM ERRORS
It is caused by unknown and unpredictable
changes in experiment
Example of causes of random errors:

Electronic noise in the circuit of an electrical instrument
Irregular changes in the heat loss rate from a solar
collector due to changes in wind
SYSTEMATIC ERROR
It primarily influences a measurement’s accuracy
due to imprecision or problems with instruments.
Example of causes of systematic errors:
Observational error
Imperfect instrument calibration
Environmental interference
ACTIVITY
•Based on your previous questionnaires, examine
the validity of your scales.
•Question:
1. What kind of validity is applied in your
questionnaire?
2. Do you think your questionnaires are valid?
•2 to 3 groups will be chosen to present their critics.

Validity and Reliability

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Validity and Reliability

Uploaded by

Copyright:

Available Formats

INSTRUMENTATIO

•Tools that measure variables in the

Validity tells you how accurately a method

1. Face validity: does the test

2. Content validity: Does the 2. Predictive validity: does the

If you develop a questionnaire to diagnose

administer both test and compare the

Some uses of Predictive validity

What can you say about the reliability of the questionnaire?

A test for a single knowledge

A reliable test will have high correlation, indicating

If all items on a test measure the same construct

It is used when you have two different

The same students take two

Example of causes of random errors:

•2 to 3 groups will be chosen to present their critics.

You might also like