Download as pdf or txt
Download as pdf or txt
You are on page 1of 7

Module Assessment of Learning 1

Chapter 5 ITEM ANALYSIS AND VALIDATION

Intended Learning Outcomes: At the end of this chapter, the students are expected to:

1. Explain the meaning of item analysis, validity, reliability, difficulty and


discrimination.
2. Determine the validity and reliability of the given test items.
3. Determine the validity of a test by its difficulty index, discrimination
index and plausibility index of options.

Management of assessment program has to undergo different stages of system


approach. From planning the assessment proceeds to organizing, staffing,
implementing, evaluating, and reporting. Item analysis is one of the many tasks that
every assessor or teacher has to perform before reaching the point of implementing and
evaluating.

5.1 Indices of Superior of Quality of Items

Item analysis imports three different but important functions in assessment of


learning, to determine the:

1. Difficulty index
2. Discrimination or Separation Index
3. Plausibility Index.

This process is best applicable for a test as there is correct or wrong answer in
every item.

The preliminary steps involved in item analysis are:

1. Arrange the scores of the students from lowest to highest.


2. Get the top 27% for upper group (U) and bottom 27% for lower group (L) of the
students, leaving the middle 46% as the average group.
3. Record the frequencies of those who got the correct answers in each item for
each groups of students (upper and lower groups).
4. Do the item analysis, for difficulty index, discrimination or separation index and
plausibility index.

5.1.1 Difficulty Index

The difficulty index is a gauge that determines how easy or how difficult the
item for the group of learners. It provides information on the number of students who
got the correct and wrong answers in each item.

Formula:
𝒏𝒖 + 𝒏𝒍
𝑫𝑰 = 𝒙𝟏𝟎𝟎
𝑵
where:

𝑫𝑰 = 𝐷𝑖𝑓𝑓𝑖𝑐𝑢𝑙𝑡𝑦 𝐼𝑛𝑑𝑒𝑥
𝒏𝒖 = 𝑁𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑙𝑒𝑎𝑟𝑛𝑒𝑟𝑠 𝑖𝑛 𝑡ℎ𝑒 𝑢𝑝𝑝𝑒𝑟 𝑔𝑟𝑜𝑢𝑝 𝑤ℎ𝑜 𝑔𝑜𝑡 𝑡ℎ𝑒 correct 𝑎𝑛𝑠𝑤𝑒𝑟
𝑛𝑙 = 𝑁𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑙𝑒𝑎𝑟𝑛𝑒𝑟𝑠 𝑖𝑛 𝑡ℎ𝑒 𝑙𝑜𝑤𝑒𝑟 𝑔𝑟𝑜𝑢𝑝 𝑤ℎ𝑜 𝑔𝑜𝑡 𝑡ℎ𝑒 𝑐𝑜𝑟𝑟𝑒𝑐𝑡 𝑎𝑛𝑠𝑤𝑒𝑟
𝑵 = 𝑁𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑙𝑒𝑎𝑟𝑛𝑒𝑟𝑠 𝑤ℎ𝑜 𝑤𝑒𝑟𝑒 𝑎𝑠𝑠𝑒𝑠𝑠𝑒𝑑

Illustration:

There are 40 students comprising the upper and lower groups, only 8 students
from the upper group and 6 from the lower group who got the correct answer in item
number 1 of the given test in Probability and Statistics. Find the difficulty index.

Solution:

𝒏𝒖 + 𝒏𝒍
𝑫𝑰 = 𝒙𝟏𝟎𝟎
𝑵

𝟖+𝟔
𝑫𝑰 = 𝒙𝟏𝟎𝟎
𝟒𝟎

𝑫𝑰 = 𝟑𝟓%

This indicates that the item is “difficult”. In the given illustrative example, it is
proper for the framer of the item to “revise”.

In terms of the action to be taken for or against an item, the scale below is
suggested as guide.

Difficulty Index Interpretation Action


0 - 20% Very difficult Discard
21% - 40% Difficult Revise
41% - 60% Moderate/Average Retain
61% - 80% Easy Revise
81% – 100% Very easy Discard

5.1.2 Discrimination Index

It describes the ability of an item to distinguish between high and low scorers
(scores of upper and lower 27% of students after being ordered in descending way the
obtained score.)
A highly discriminating item indicates that the students who had high test scores
got the item correct whereas students who had low test scores got the item incorrect.
Item with discriminating values near or less than 0 should be removed from the test.
This indicates that students who overall did poorly on the test did better on that item
than students who overall did well. The item may be confusing for your better scoring
students in some way.

Assessment of Learning I
Module Page 3 of 7

USMKCC-COL-F-050
Formula:

(𝒏𝒖 − 𝒏𝒍 )
𝑺= 𝟏 𝒙𝟏𝟎𝟎
𝑵
𝟐

where:

𝑆 = 𝐷𝑖𝑠𝑐𝑟𝑖𝑚𝑖𝑛𝑎𝑡𝑖𝑜𝑛 𝑜𝑟 𝑆𝑒𝑝𝑎𝑟𝑎𝑡𝑖𝑜𝑛 𝐼𝑛𝑑𝑒𝑥


𝑛𝑢 = 𝑁𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑙𝑒𝑎𝑟𝑛𝑒𝑟𝑠 𝑖𝑛 𝑡ℎ𝑒 𝑢𝑝𝑝𝑒𝑟 𝑔𝑟𝑜𝑢𝑝 𝑤ℎ𝑜 𝑔𝑜𝑡 𝑡ℎ𝑒 𝑐𝑜𝑟𝑟𝑒𝑐𝑡 𝑎𝑛𝑠𝑤𝑒𝑟
𝑛𝑙 = 𝑁𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑙𝑒𝑎𝑟𝑛𝑒𝑟𝑠 𝑖𝑛 𝑡ℎ𝑒 𝑙𝑜𝑤𝑒𝑟 𝑔𝑟𝑜𝑢𝑝 𝑤ℎ𝑜 𝑔𝑜𝑡 𝑡ℎ𝑒 𝑐𝑜𝑟𝑟𝑒𝑐𝑡 𝑎𝑛𝑠𝑤𝑒𝑟
𝑁 = 𝑇𝑜𝑡𝑎𝑙 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑙𝑒𝑎𝑟𝑛𝑒𝑟𝑠 𝑤ℎ𝑜 𝑤𝑒𝑟𝑒 𝑎𝑠𝑠𝑒𝑠𝑠𝑒𝑑

Illustration:

There are 40 students comprising the upper and lower groups, only 8 students
from the upper group and 6 from the lower group who got the correct answer in item
number 1 of the given test in Probability and Statistics. Find the discrimination or the
separation index.

Solution:

(𝒏𝒖 − 𝒏𝒍)
𝑺= 𝟏 𝒙𝟏𝟎𝟎
𝑵
𝟐

(𝟖 − 𝟔)
𝑺= 𝟏 𝒙𝟏𝟎𝟎
(𝟐)𝟒𝟎

𝟐
𝑺= 𝒙𝟏𝟎𝟎
𝟐𝟎

𝑺 = 𝟏𝟎%

The obtained index indicates that the item has poor discriminating power and
thus must be rejected or improved through revision.

In terms of the action to be taken for or against an item, the scale below is
suggested as guide.

Discrimination Interpretation/Action
Index
0.40- higher Very good discrimination
0.30-0.39 Reasonably good discrimination but possibly subject to
improvement
0.20-0.29 Marginal/Acceptable discrimination (subject to improvement)
0.00-0.19 Poor discrimination (to be rejected or improved by revision)
Negative To be rejected
Discrimination Index

Assessment of Learning I
Module Page 4 of 7

USMKCC-COL-F-050
Interpretation using the Difficulty and Discrimination Index

Item 1
Upper Group (20) 8
Lower Group (20) 6
Total correct response 14
Difficulty Index 35%
Discrimination Index 10%

In general, item number 1 is difficult and has a poor discriminating power, thus it
must be subjected to revision.

5.1.3 Plausibility Index

The plausibility or breaking index (P) is applicable to an assessment using a test


instrument with 4 or 5 options for each item, noting that the minimum option for a
multiple-choice assessment is 4. This index determines the extent which the option
distracts the focus of the learner in the correct answer and persuades to the options.
Hence, if the learner is uncertain or doubtful of the answer this option has the
persuasive factor as the alternative though wrong.

In the determination of this plausibility index, each of the options must have
been chosen as an option for the item in relation to the total upper and lower groups
used in the item analysis. Hence, the formula: P = n/N where “n” stands for the total
frequency and N for the total number of students in both groups.

Below are the parameters to guide the item analyst.

4-option test 5-option test Evaluation Decision


0.25 and up 0.20 and up Excellent Retain
0.20 -0.24 0.16-0.19 Good Retain
0.15-0.19 0.12-0.15 Fair Discard
0.09 and below 0.04 and below Very Good Discard

5.2 Validation and Validity

After performing the item analysis and revising the items that need revision, the
next step is to validate the instrument. The purpose of validation is to determine the
characteristics if the whole test itself, namely the validity and reliability of the test.
Validation is the process of collecting and analyzing evidence to support the
meaningfulness and usefulness of the test.

5.2.1 Validity

Validity refers to the accuracy of an assessment whether or not it measures


what it is supposed to measure. Even if a test is reliable, it may not provide a valid
measure.

Let’s imagine a bathroom scale that consistently tells you that you weigh 130
pounds. The reliability (consistency) of this scale is very good, but it is not accurate

Assessment of Learning I
Module Page 5 of 7

USMKCC-COL-F-050
(valid) because you actually weigh 145 pounds (perhaps you re-set the scale in a weak
moment)! Since teachers, parents, and school districts make decisions about students
based on assessments (such as grades, promotions and graduation), the validity inferred
from the assessments is essential even more crucial than the reliability. Also, if a test is
valid, it is almost always reliable.

Validity can be measured in three ways. In order to have confidence that a test is
valid (and therefore the inferences we make based on the test scores are valid), all three
kinds of validity evidence should be considered.

Validity Definition Example


A teacher wishes to validate a test in
The extent to which the Mathematics. He requests experts in
Content content of the test matches Mathematics to judge if the items or
the instructional objectives. questions measures the knowledge the skills
and values supposed to be measured.
The extent to which scores on Mr. Celso wants to know the predictive
the test are in agreement validity of his test administered in the
Criterion with (concurrent validity) or previous year by correlating the scores with
predict (predictive validity) an the grades of the same students obtained in
external criterion. a (test) later date.
A teacher might design whether an
The extent to which an educational program increases artistic
assessment corresponds to ability amongst pre-school children.
Construct
other variables, as predicted Construct validity is a measure of whether
by some rationale or theory. your research actually measures artistic
ability, a slightly abstract label.

5.2.2. Reliability

Reliability refers to the consistency of the scores obtained, how consistent they
are for each individual from one administration of an instruments to another and from
one set of items to another.
Reliability and validity are related concepts. If an instrument is unreliable, it
cannot get valid outcomes. As reliability improves, validity may improve. However, if an
instrument is shown scientifically to be valid then it is almost certain that it is also
reliable.

Predictive validity compares the question with an outcome assessed at a later


time. An example of predictive validity is a comparison of scores in the National
Achievement Test (NAT) with first semester grade average (GPA) in college. Do NAT
scores predict college performance? Construct validity refers to the ability of a test to
measure what it is supposed to measure.

The following table is a standard almost universally in educational test and


measurement.

Assessment of Learning I
Module Page 6 of 7

USMKCC-COL-F-050
Reliability Interpretation
.90 and above Excellent reliability; at the level if the best standardized
tests.
.80 -.90 Very good for classroom test
.70-.80 Good for classroom test; in the range if most. There are
probably few items which could be improved.
.60-.70 Somewhat low. This test needs to be supplemented by
other measures to determine grades. There are probably
some items which could be improved.
.50-.60 Suggests need revision of test, unless it is quite short. The
test definitely needs to be supplemented by other
measures for grading
.50 or below Questionable reliability. This test should not contribute
heavily to the course grade, and it needs revision.

Assessment of Learning I
Module Page 7 of 7

USMKCC-COL-F-050

You might also like