Validity and Reliability

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 2

VALIDATION OF TESTS

 process of collecting and analyzing evidence to support the meaningfulness and


usefulness of the test
 purpose: determine the characteristics of the whole test – validity and reliability

VALIDITY

 extent to which a test measures what it intends to measure or as referring to the


appropriateness, correctness, meaningfulness and usefulness of the specific decisions a
teacher makes based on the test results

Types of validity

1. criterion-related

 relationship of scores obtained using the instrument and scores obtained using one or
more other tests
 teacher compares scores on the test in question with the scores on some other
independent criterion test which presumably has already high validity
 types: concurrent and predictive
 concurrent: compare a math ability test with standardized math achievement test
(external criterion)
 predictive: test scores in the instrument are correlated with scores on a later
performance of the students, i.e., math test constructed by the teacher may be
correlated with their later performance in a math achievement test conducted by the
division or national office of DepEd

Gronlund’s expectancy table

Grade Point Average (GPA)


Test Score Very Good Good Needs Improvement
High 20 10 5
Average 10 25 5
Low 1 10 14
 criterion measure is the GPA od the students

2. construct-related

 nature of the psychological construct or characteristic being measured by the test


 defines how well a test or experiment measures up to its claims
 It refers to whether the operational definition of a variable actually reflects the true
theoretical meaning of a concept.
 It isn’t that easy to measure construct validity – several measures are usually required to
demonstrate it, including pilot studies and clinical trials.
 One of the reasons it’s so hard to measure is one of the very reasons it exists: in the
social sciences, there’s a lot of subjectivity and most constructs have no real unit of
measurement.
 related to or interchangeably referred to face validity
3. content-related

 usual procedure: (1) teacher writes out the objectives of the test based on the table of
specifications and gives these together with the test; (2) experts along with a description
of the intended test takers will review the test; experts look at the objectives, read over
the items in the test and place a check mark in front of each question or item that they
feel does not measure one or more objectives; they also place a check mark in front of
each objective not assessed by any item in the test; and (3) teacher rewrites any item
checked and resubmits to the experts and/or writes new items.

RELIABILITY

 consistency of the scores obtained


 internal consistency formulas: Kuder-Richardson 20 (KR-20) or Kuder-Richardson 21
(KR-21)
 There are four general classes of reliability estimates, each of which estimates reliability
in a different way. They are:
 Inter-Rater or Inter-Observer Reliability
 Used to assess the degree to which different raters/observers give consistent estimates
of the same phenomenon.
 Test-Retest Reliability
 Used to assess the consistency of a measure from one time to another.
 Parallel-Forms Reliability
 Used to assess the consistency of the results of two tests constructed in the same way
from the same content domain.
 Internal Consistency Reliability
 Used to assess the consistency of results across items within a test.

There are four general classes of reliability estimates, each of which estimates reliability in a
different way. They are:

 Inter-Rater or Inter-Observer Reliability


Used to assess the degree to which different raters/observers give consistent estimates of
the same phenomenon.
 Test-Retest Reliability
Used to assess the consistency of a measure from one time to another.
 Parallel-Forms Reliability
Used to assess the consistency of the results of two tests constructed in the same way
from the same content domain.
 Internal Consistency Reliability
Used to assess the consistency of results across items within a test.
We judge the reliability of the instrument by estimating how well the items that reflect the
same construct yield similar results. We are looking at how consistent the results are for
different items for the same construct within the measure. There are a wide variety of
internal consistency measures that can be used.

You might also like