Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 20

RELIABILITY

MeEvalEd

(PETER JOHN MELCHOR)


• Reliability refers to the consistency of scores obtained in an experiment.
• Like validity, reliability is an abstraction.
• Several indicators of reliability are commonly used. Most indicators are
statistical.
• Being familiar with some of the conventional methods for estimating the
reliability of tests scores is useful.
• Most methods express reliability as a number that ranges from .00 to 1.00.
Inter-Rater or Inter-Observer Reliability
Object or
phenomenon
Inter-Rater or Inter-Observer Reliability
Object or
phenomenon

Observer 1
Inter-Rater or Inter-Observer Reliability
Object or
phenomenon

Observer 1 Observer 2
Inter-Rater or Inter-Observer Reliability
Object or
phenomenon

?
=
Observer 1 Observer 2
With subjectively scored assessments such as essay tests and judgments of
products produced by students, considerable inconsistency may arise in the
scoring of the assessments. That is, two raters, if they separately review the
student’s work, may come up with very different judgments of each
student’s performance. The inter- rater method can be used to detect this
inconsistency.
Test-Retest Reliability

Time 1 Time 2
Test-Retest Reliability

Test
= Test

Time 1 Time 2
Test-Retest Reliability
Stability over time

Test
= Test

Time 1 Time 2
Probably the most obvious method for judging whether a test measures
something consistently is to readminister the test to the same students. If, on
the readminstration, the students who originally obtained the highest scores
continue to achieve high scores, the middle-scoring students continue to
achieve the middle scores, and so on, then one would anticipate the test is
reliable.
Parallel-Forms Reliability

Time 1 Time 2
Parallel-Forms Reliability
Form A

=
Form B

Time 1 Time 2
Parallel-Forms Reliability
Form A Stability across forms

=
Form B

Time 1 Time 2
Parallel-Forms (Alternate-Form)
Reliability
• Administer both forms to the same people.
• Get correlation between the two forms.
• Usually done in educational contexts where you need alternative forms
because of the frequency of retesting and where you can sample from lots
of equivalent questions.
Internal-Consistency Reliability
Types of Internal Consistency Reliability
There are three major types of internal consistency testing:
Split-half Method
Kuder-Richardson Method
Cronbach Coefficient Alpha
Split-Half Method
In this procedure, the test is split in half and each half is scored separately–
usually odd items versus even items. A coefficient is then calculated to
determine if the two halves showed the same results. A coefficient is the
degree to which the two halves of the test have the same results.
The coefficient is then calculated using the Spearman-Brown formula.
2𝑟
R =  𝑟+1
Kuder-Richardson Method
K-R20 and K-R21
These are applicable to tests scored dichotomously where 1 point is for the
correct answer and 0 for wrong answer.
The KR-20 requires information on the proportion (p) of correct responses
of each item in the test. It functions better if the p values vary.
However, if the test items do not vary widely in the p values, KR-21 can be
used to estimate the reliability index.
Cronbach Coefficient Alpha
Coefficient alpha provides a reliability for a measure composed of items
scored with values other than 0 and 1. Such is the case with essay tests
having items of varying point values or attitude scales that provide responses
such as strongly agree and strongly disagree with intermediate response
options.

You might also like