Validity and Reliability

Ed 9
Assessment in
Learning 1
Eljon T. Tabulinar
Course Facilitator
Validity and
Reliability
Learning Outcomes
• Explain the meaning of item analysis, item
validity, reliability, item difficulty, discrimination
index.
• Determine the validity and reliability of given test
items.
• Determine the quality of a test item by its
difficulty index, discrimination index and
plausibility of options (for a selected-response
test).
What to do after item analysis and
revision of test items?
- After performing the item analysis and revising the
items which need revision, the next step is to
validate the instrument.
Validation & Validity
• Validation is the process of collecting and analyzing
evidence to support the meaningfulness and usefulness of
the test.
• Validity – the extent to which a test measures what it
purports to measure or as referring to the appropriateness,
correctness, meaningfulness and usefulness of the specific
decisions a teacher makes based on the test results.
• A teacher who conducts test validation might want to
gather different kinds of evidence.
Types of Validity Evidence
1. Content-related – refers to the content and format of the
instrument.
a) How appropriate is the content?
b) How comprehensive?
c) Does it logically get at the intended variable?
d) How adequately does the sample of items or
questions represent the content to be assessed?
How to obtain evidences for content-related validity?
The teacher writes out the objectives of the test based on the Table of
Specifications and then gives these together with the test to at least two
(2) experts along with a description of the intended test takers. The
experts look at the objectives, read over the items in the test and place a
check mark in front of each question or item that they feel does not
measure one or more objectives. They also place a check mark in front of
each objective not assessed by any item· in the test. The teacher then
rewrites any item checked and resubmits to the experts and/or writes new
items to cover those objectives not covered by the existing test. This
continues until the experts approve of all items and also until the experts
agree that all of the objectives are sufficiently covered by the test.
2. Criterion-related – refers to the relationship between
scores obtained using the instrument and scores obtained
using one or more other tests (often called criterion).
a) How strong is this relationship?
b) How well do such scores estimate present or predict
future performance of a certain type?
How to obtain evidences for criterion-related validity?
The teacher usually compares scores on the test in question with the scores on
some other independent criterion test which presumably has already high
validity.
Concurrent validity – if a test is designed to measure mathematics ability of
students and it correlates highly with a standardized mathematics
achievement test (external criterion), then we say we have high criterion-
related evidence of validity.
Predictive validity – the test scores in the instrument are correlated with
scores on a later performance (criterion measure) of the students.
For example, the mathematics ability test constructed by the teacher may
be correlated with their later performance in a Division-wide mathematics
achievement test.
3. Construct-related – refers to the nature of the
psychological construct or characteristic being measured
by the test.
a) How well does a measure of the construct explain
differences in the behavior of the individuals or their
performance on a certain task?
Reliability
Reliability refers to the consistency of the scores obtained -
how consistent they are for each individual from one
administration of an instrument to another and from one set of
items to another.
For internal consistency Split-half or Kuder-Richardson (KR-
20 or KR-21)
• If an instrument is unreliable, it cannot get valid outcomes.
• As reliability improves, validity may improve (or it may not).
• However, if an instrument is shown scientifically to be valid
then it is almost certain that it is also reliable.
Reliability
The following table is a standard followed almost universally
in educational test and measurement.
Reliability Interpretation
.90 and above Excellent reliability; at the level of the best
standardized tests
.80 - 90 Very good for a classroom test
.70 - 80 Good for a classroom test; in the range of most. There are probably a few
items which could be improved.
.60 - 70 Somewhat low. This test needs to be supplemented by other measures (e.g.,
more tests) to determine grades. There are probably some items which could
be improved.
.50 - 60 Suggests need for revision of test, unless it is quite short (ten or fewer items).
The test definitely needs to be supplemented by other measures (e.g., more
tests) for grading.
.50 or below Questionable reliability. This test should not contribute heavily to the course
grade, and it needs revision.
In Short
Validity and reliability is illustrated in the figure below.

Validity and Reliability

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Validity and Reliability

Uploaded by

Copyright:

Available Formats

Ed 9

You might also like