Hal 3

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 1

I

The Reliability of Assessments

r
!

No eiamination can produce a perfect, error-free result. The size of the errors
due to the first of the sources of error listed above can be estimated from the internal consistenry of a test's results. If, for a test composed of several itemt candidates are divided according to their overall score on the test, then one can look at
eadr component question (or item) to see whether those with a high overall score
have high scores on this questiorp and those with low overall scores have low
scores on this question. If this tums out to be the case, then the question is said to
have 'high discrimination . If most of the questionp have high discriminatiory then
they are consistent with one another in putting the candidates in more or less the
same order. The reliability-coefficient that is often quoted for a test is a measure of
the intemal consistenry between the different questions that make up the test. Its
value will be a number between zero and one. The measures usually employed
are the Kuder-Richardson coefficient (for multiple-choice tests) or Cronbach's
alpha (for other types of test); the principle underlying these two is the same.
If this intemal consistenry is higtu then it is likely that a much longer test
sampling more of the syllabus will give approximately the same result.
However, if checks on internal consistency reveal (say) that the reliability of a
test is at the level of 0.85, then in order to increase it to 0.95 with questions of
the same type, it would be necessary to more than triple the length of the test.
Reliability could be increased in another way - by removing from the test all
those questions whidr had low discrimination and replacing them with ques-

tions with high discrimirration. This can only be done if questions are Pretested, and might have the effect of narrowing the diversity of issues
represented in the test in order to homogenize it.
Indices based on such drecks are often claimed to give the reliability of an
examination result. Such a daim is not justified, howevet for it takes no account
of other possible sources of error. For example, a second source of error me.um
that the ictual score achieved by a candidite on a given day could vary substantially from day to day. Again, this figure could be improved, but only by
setting the test in sections with each taken on different days. Data on this source
are hard to find so it is usually not possible to estimate its effect. It would seem
hard to claim a priori that it is negligible.
The third source - marker error - is dealt with in part by carefirl selection and
training of markers, in part by rigorous rules of procedure laid down for
markeis to follow and in part by careful checks on samples of marked work'
Whilst errors due to this source could be reduced by double marking of every
script, this would also lead to very large increases both in the cost of examinations and in the time taken to determine results. Particular cases of marker error

justifiably attract public concern, yet overall the errors due to this source are
probably smail in comparison with the effects of the other sources listed here.
It is important tO note, therefore, that the main limitatit",ns on the accttracy of
examinatibn results are not the fault of testing agencies. All of the sources could
be tackled, but only if increases in costs, examining times and times taken to
produce results were to be accepted by the educational system. Such acceptance
ieems most unlikely; in this, as in many other situations, the pubiic gets what it
is prepared to pay ior.
121

<_-;:*--_

You might also like