Professional Documents
Culture Documents
Discrimination and Difficulty Indices of A Senior High School Entrance Examination Using Classical Test Theory
Discrimination and Difficulty Indices of A Senior High School Entrance Examination Using Classical Test Theory
Abstract
Measurement of the psychological capacities of a person is done worldwide through the use of
achievement testing. It is thereby important that the institution that uses achievement tests create
correct, relevant and reliable test constructs in order to come up with the beneficial results. This study
was done to evaluate the Discrimination and Difficulty Indices of the Annual Senior High School
Entrance Examination, which consists of 75 English, 30 Science, 40 Mathematics, and 25 Aptitude
multiple-choice questions, of the Senior High School Department of Mindanao State University -
Tawi-Tawi College of Technology and Oceanography using the Classical Test Theory. Descriptive
quantitative design was employed and raw data from the scored answer sheet of 200 examinees was
utilized. Stratified sampling was applied to the raw data. Then, a computer application, Statistical
Program for Social Sciences (SPSS), was employed to determine the discrimination and difficulty
indices. The study concluded that the most of multiple-choice items of the examination have
difficulty values less than 0.5, which means these items are difficult for the takers, and discrimination
values higher than 0.2 which can be considered good items. The results also implied that the test
constructs are highly reliable. The study recommends further enhancement of the examination.
Keywords: psychological testing, achievement tests, discrimination index, difficulty index, classical
test theory
behavior samples, norms and standards, standardized crucial antecedent to the reliability and validity studies
procedures, and prediction of non-test behavior. (Bandalos, 2018).
Essay, multiple choice, and performance items are Different theories can be used to evaluate the different
cognitive items that are used in academic achievement perspectives of a test and the items on it. Two of them
tests. These are often widely categorized into objective are the Classical Test Theory (CTT) and the Item
items and performance assessments. The former is Response Theory (IRT). They are used in the
more structured and usually have only one correct educational measurement to develop, evaluate and
answer. They are divided into two categories: study test items. These are based on different
selection-or-recognition-types of items and supply- assumptions and also use different statistical
types items. Examples of the selection type include approaches. Their concerns are not only to develop,
multiple-choice, true or false, and matching-type tests evaluate, or determine the reliability and validity of
wherein the respondent is required to distinguish the test but also to improve the quality of test items
correct answer from among those provided. The latter, holistically (Awopeju, 2008).
supply-type items, on the other hand, require the
respondent to generate the right answer such as In measurement theory, inconsistencies across test
sentence completion or short-answer tests. items, occasions, and raters are known as measurement
errors, and a theory known as classical test theory is
The most versatile of all item types are the multiple- used to describe the effects of measurement error on
choice items. It is often concluded that multiple-choice test scores. (Bandalos, 2018).
items can only measure rote recall of information,
when they are cleverly constructed, are capable of Classical Test Theory (CTT)
tapping into higher-level cognitive process such as
analysis and synthesis of information. Items that One of the world’s oldest measurement theories of
require to detect respondent similarities or differences, behavioral or psychological measurement is Classical
interpret graphs or tables, make comparison, or mold True Score Theory or often called Classical Test
previously learned material into a new context Theory (CTT). According to Gullicksen (1950), CTT
emphasizes on higher-level cognitive processes. And is called “classical” because it is regarded to be the
these are appropriate for wide variety of subject first operational use of mathematics to describe this
matter. Another benefit of multiple-choice items is the relationship.
fact that they can provide useful diagnostic
in fo rm a tio n r eg ar d in g the r esp o n d en t’s (Teo, 2013 as cited by Sallil, 2017) said that, the
misunderstandings. Fails or distractors or incorrect primary feature of CTT is its adherence to learning
options must be based on common misconceptions or theories that follow notions of classical and operant
errors (Bandalos, 2018) conditioning for example, behaviorism, social learning
theory, motivation. In CTT the domain with its
Establishing the psychometric properties of the test theoretical parameters, can be accurately sampled by
items to then promote a higher outcome-based results the test items or exercises. It focuses to determine the
of the test questionnaire requires psychological degree to which the examinee has mastered the domain
testing. which is the implied individual’s true score which is
inferred through responses to the test’s stimuli.
To better use psychological testing, item analysis is
done on each of the questions in the test questionnaire. The foundation for CTT model was laid down by
An important phase in the development of a test is Spearman (1907). He stated that any observed test
item analysis. It will reveal if an item is too easy or too score can be seen as the composite of two hypotheticsl
difficult or scored incorrectly. Moreover, it will also components which are a true score and a random error
show a difference between skilled and unskilled component.
examiners.
According to De Champlain in 2009 as cited in Sallil
A term that refers to a wide range of strategies, both (2017), the main advantage of CTT is the fact that it is
qualitative and quantitative, that is used to assess the based on relatively weak assumptions that are easy to
quality of pool items is called item analysis. These are meet with real data and modest sample size. In
usually used during the scale development process to addition, CTT is easy to apply in many testing
help choose the best set of items from a pool of situations (Hambleton & Jones, 1993). CTT are also
potential candidates. The procedures in item analysis most common paradigm for scale development and
are basic to the scale of development process and are validation, barriers observed score into True Score +
Methodology
For better understanding on the values of the item
difficulty index of CTT, the intervals with the The research design used for his study is the
corresponding interpretation in Table 1 will be used. descriptive quantitative design, which involves
observing and describing the behavior of a data
Table 1. Interpretation of the Difficulty Index (P)
(quantitative data) without influencing it in any way.
Scored answer sheets of the Senior High School
Entrance Examination of Minandao State University –
Tawi-Tawi College of Technology and Oceanography
given on November 2018 was used as data of this
study, with the necessary approval for use by the MSU
TCTO Admissions Office. To prevent bias, a stratified
sampling was applied. Respondents were grouped into
different strata (per municipality) in order to have
The Discrimination Index, on the other hand, is
proper distributions of the test takers. From the strata,
computed using the difference between the percentage
a random envelope containing the answered sheets
of students in the upper group (PU), i.e., the top 27%
scorers, who obtained the correct response, and the
were picked until the desired number of respondents
percentage of those in the lower group (PL), i.e., the was taken.
bottom 27% scorers, who obtained the correct
response; thus: Each correct and wrong answers were tallied using MS
Excel. – 1 for correct answers, and 0 for wrong
answers. Moreover, the name and total scores of the
students were represented by numerical values. The
study used the formula for the Classical Test Theory
(CTT) using the Statistical Program for Social
Sciences (SPSS), to determine the difficulty and
discrimination indeces of the test. A statistician was
For better understanding on the values of the item consulted for the proper use of the program.
discrimination index of CTT, the intervals with the
corresponding interpretation on Table 2 will be used.
Acknowledgement
References
Furthermore, results showed that items with zero Gullicksen H. (1950). Theories of Mental Test Score. New York.
discrimination values are very few. This implies that
Haladyna, T., & Downing, S. (2004). Construct-irrelevant variance
such items need to be improved or revised. Most of the in high-stake testing. Educational Measurement: Issues and
items have discrimination values higher than 0.2, Practice, 23(1), 17-27
which can be considered good items.
Hambleton, R. K., Jones, R. W.. Comparison of Classical Test
Table 3 shows the reliability indices of 0.714, 0.739,
Theory and Item Response Theory and their Application to Test Price, L. R. (2017). Theory into Practice. The Guild Press New
Development. York London. pp. 5
Kelly, T.L. (1939). The selection of upper and lower groups for the
Affiliations and Corresponding Information
validation of test items. Journal of Educational Psychology, 30,
1724. Jeffrey Imer C. Salim
Mindanao State University
Lord F.M., Novick M (1969). Statistical theories of mental test
scores. Reading, MA: Addison-Wesley. Tawi-Tawi College of Technology and
Oceanography - Philippines
Osarumwense, H. J., Oyedeji, S. O. (2015). Empirical Comparison
of Methods of Establishing Item Difficulty Index of Test Items Using
Classical Test Theory (CTT).