Test Construction 2 With Item Analysis

Flow Chart of Test Construction
with Item Analysis

Flowchart of Test Construction
1. Formulate test objectives
 Remember the AB3C when determining TEST objectives:
 Audience or the actor
 Behavior or the action verb specifying the learning
outcome
 Content or the subject matter
 Condition or the circumstances when demonstrating
the behavior
 Criterion or the degree of performance considered
sufficient to demonstrate mastery
Flowchart of test Construction
Example of a test objective
 The examinee (audience) should distinguish

(behavior) all (criterion) objectives indicating
learning outcomes (content) from a set of
objectives having both learning outcomes and
learning activities (condition)
 The objective should be SMART:
 SPECIFIC
 MEASURABLE
 ATTAINABLE
 REALISTIC
 TIME BOUND
2. Construct a Table Of Specifications (TOS).
• A device for describing test items in terms of
content and the process dimensions
 Decide on the test format
 1. Selective
 Multiple choice
 Binary
 Matching type
2. Supply
 Completion
 Short answer (identification, labeling)
 Essay/open ended/constructed type
Table of Specifications
• A test map (blueprint) that guides the teacher in
constructing a test
• Ensures that there is a balance between items
that test lower level thinking skills and those
which test higher order thinking skills (or
alternatively, a balance between easy and
difficult items) in the test
Steps in Preparing Table of Specifications (TOS)
1. Write down the topics/coverage of the test

extracted from the syllabus.
2. Classify the topics according to the general
behavioral objectives of Taxonomy used
3. Write the content and behavioral objectives
on a two-way grid.
4. Fill out the instruction time in each of the specific

content (objective).
5. Determine the percentage of content using the
formula:

Category % = time spent per category

total time of teaching
6. Add rows at the bottom of the grid and determine
the total number of items of the test.
7. Compute the number of items for each content area
using the formula:
No. of item/content area = Category% x total
number of items
8. Determine the type of test to be prepared for each
content area.
9. Indicate the item placement.
Format of TOS
 One way Format
Objectives Topics No. of Hours Format, No & Number of Items

Taught Placement of (or % of items)
Items
Number of Items = Number of hours taught x Desired number of items

Total Number of class hours
% of items = number of items on a topic x 100
Total number of items
The number of items for a given test objective should be in
proportion to the estimated time for instruction. Example. If a given
test objective was taught for 1 hours out of 20 ours coverage of test,
then a test with 100 items should have 5 items for that given
objective
 One way Format

Topics Objectives No. of Hours Format, No & % of items
Taught Placement of
Items
What is the advantage of this format? Easy to use because one
just works on the objectives without worrying on the different
levels of cognitive behavior
 One way Format

Taught Placement of
Items
What is the disadvantage of this format?
Test may not capture all levels of cognitive behaviors that a
meaningful instruction should have developed?
 One way Format

Placement of
Items
 Two way format
Content Instructional No. % of KD Levels of Behavior, Item Format, No. &

Time
of items Placement
Items
R U Ap An E C
Legend: KD Knowledge Dimension

Format: I –Multiple Choice, II Essay
What is the advantage of this format?
It allows one to see at a glance what
 Two-Way format levels of thinking skills are emphasize
Content Instructional No. of % of KD Levels of Behavior, Item Format, No. & Placement
Time Items items
R U Ap An E C
1. Methods F I II #4
of #1
assessment
2. etc
Total 15 50 100% 5 10 20 10 3 2
What is the disadvantage of this
format?
 Two way format It is difficult to construct items for a
target level without a statement of
objective where the behavior
required is given skills are
emphasize
Time Items items
R U Ap An E C
of #1
assessment
2. etc
Total 15 50 100% 5 10 20 10 3 2
 Two-Way Format
What is the disadvantage of this format?

Its complexity may require so much time
from the test writer
Time Items items
R U Ap An E C
of #1
assessment
2. etc
Total 15 50 100% 5 10 20 10 3 2
 Three way format
Content Test Instruction No. of % of KD Levels of Behavior, Item Format, No. & Placement
Objectives al Time Items items
R U Ap An E C
1. F I II #4
Methods #1
of
assessmen
t
2. etc
Total 15 50 100 5 1 20 10 3 2
% 0
3. Have the TOS approved by experts
 The experts check the scope of test, the

distribution of items and the correctness of the
classification of behavior
4. Write test items
Follow the guidelines in constructing the

different test formats
Types of Tests According to Format
1. Selective Test - provides choices for the answer.

a. Multiple choice - consists of a stem which describes
the problem and 3 or more alternatives which give the
suggested solutions. One of the alternatives is the
correct answer while the other alternatives are the
distractors.
b. True-False or Alternative Response - consists of
declarative statement that one has to mark true or
false, right or wrong, correct or incorrect, yes or no,
fact or opinion, and the like.
c. Matching Type - consists of two parallel columns:
Column A, the column of premises from which a
match is sought; Column B, the column of responses
from which the selection is made.
Types of Tests According to Format
2. Supply Test
Short Answer - uses a direct question that can be
answered by a word, a phrase, a number, or a symbol.
Completion Test - consists of an incomplete statement
3. Essay Test and the Scoring Rubrics
Restricted Response - limits the content of the
response by restricting the scope of the topic
Extended Response - allows the students to select any
factual information that they think is pertinent and to
organize their answers in accordance with their best
judgment.
Different Methods of Assessment
Objective Objective Essay Performance- Oral Question Observation Self-Report
Supply Selection Based
Short Multiple Restricted Paper Oral Informal Attitude
Answer Choice Response presentations Examinations
Completion Matching Extended Projects Conferences Formal Survey
test Type Response
True/False Demonstrations Interviews Sociometric
Devices
Exhibitions Questionnaires
Portfolios Inventories
General Suggestions in Writing Test
• Use test specifications as guide to item writing.
• Construct more test items than needed to have
extra items when making decisions as to which
items have to be discarded or revised.
• Have test of sufficient length to adequately
measure the target performance (Note: the longer
the test, the more reliable it is.).
• Write the test items well in advance of the testing
date to have time for face and content validity.
• Write the test items with reference to the test
objectives.
Specific Suggestions: Multiple Choice
Have: Avoid:
 A clear problem  Double negatives in the stem
 Stems that are meaningful  Irrelevant information in the stem
 Negatively stated stem only when  Having patterns in the answers
significant learning outcomes  Verbal clues in the stem and the correct
required it but highlight the answer
negative word.  Alternative like “all of the above”
 Plausible distracters specially when it is the correct answer.
 Alternatives that are grammatically  Alternative like “none of the above”
parallel to the stem when there are many possible distracters
 Only one correct and clearly best to the correct answer.
answer  Answers that are relatively longer than
 Choices that are arranged the alternatives
alphabetically, according value or  Using MC when there are better test
length formats for the test objectives.
 Stems and options that are on the
same page.
Specific Suggestions: Alternative-Response Test
Have: Avoid:
 meaningful items  trivial statements
 simple sentences  long sentences unless cause-and-
 only one correct and clearly effect relationships.
best answer  use of obviously negative words
 equal or approximately equal or double negatives in an item.
number for a choice to be a  two ideas in one statement unless
correct answer cause-effect relationships are
being measured
 opinionated ideas unless you
acknowledge the source or unless
the ability to identify opinion is
being specifically measured.
Specific Suggestions: Matching Type
Have: Avoid:
 unequal number of responses and  clues or patterns for the
premises, and instruct the pupils correct answer
that responses may be used once,  different or heterogeneous
more than once, or not at all. items in a single exercise
 list of items to be matched that are  redundant items
brief  breaking the whole match
 the shorter responses at the right into two pages
 responses arranged in logical
order.
 directions indicating the basis for
matching the responses and
premises.
 a maximum of 15 items per match
Specific Suggestions: Supply Objective Test
Have: Avoid:
 item/s that require brief and  clues or patterns for the
specific answer or unit. correct answer
 a direct question is generally  statements taken directly
more desirable than an from textbooks as a basis
incomplete statement. for short answer items.
 Blanks for answers equal in  too many blanks in a
length. single item
 The answers written before  Blanks at the beginning
the item number for easy of the sentence.
checking.
Specific Suggestions: Essay Test
Have: Avoid:
 item/s that target/s high-level  Items that simply
thinking skills require recall of facts
 questions that specifies clearly  items that are taken
the behavior of the learning directly from
outcome textbooks
 items that all students could  optional questions that
fairly answer regardless of their vary in levels of
religion, gender, or social status. difficulty or items
 rubric in scoring the work,
which is given to the students as
a guide in answering the
question
5. Validate the face and content of the item
 Check if:
 It looks good
 The guidelines in test construction were followed;
and
 The target behaviors in the TOS were met
6. Pilot test the instrument and revise if necessary
 Check the length of time required and the

readability of the material
Item Analysis
 A technique used to assess the quality or utility of an
item and its distracters
 A process of examining the learner’s response to each
item in the test.
 Use to characterize an item:
 Desirable – can be retained for subsequent use
 Undesirable-can be revised or rejected
Do Item Analysis?
When?
Before and after trying out the test
Why?
• To improve the quality of the test
• To determine the level of difficulty and
discriminatory power of a test item
Two Types of ITEM ANALYSIS
1. Qualitative Analysis
–It is done for all test formats and
kinds
–It is done by matching items and
objectives and by editing poorly
written test items
Example of Qualitative Item Analysis
Content Objective No. % of KD Levels of Behavior, Item Format, No. & Placement
of items
Items
R U Ap An E C
Methods of Identify the most F I II #4

assessment appropriate method of #1
assessment for a given
test objective
For Qualitative item analysis, have a
second look at the face, content and
Judge the
appropriateness of a instructional validity of the test item.
given method of
assessment for a given
Assuming that these types of validity
learning objective have been established but you want
Total 50 100% 5 to 10
improve20the item,10 then tryout
3 test2
to determine level of difficulty and
discriminatory power of each item
using the quantitative analysis. Also
test the plausiblility of distracters
Two Types of ITEM ANALYSIS
2. Quantitative Analysis
– It is suited for the analysis of multiple choice
formats
– This is necessary particularly for norm
referenced test
– It tells if responses to a test item are
characterized by guessing or whether the item
is ambiguous or miskeyed
Tryout the test for the purpose of item
analysis
For Qualitative item analysis, have

a second look at the face, content
and instructional validity of the test
item. Assuming that these types of
If this is a School test, the validity have been established but
samples should come you want to improve the item, then
from heteregenoeus tryout test to determine level of
difficulty and discriminatory power
of each item using the quantitative
analysis. Also test the plausibility of
distracters
Item Analysis
1. Get the scores of students in the test.
2. Arrange the papers from highest to lowest
3. Determine the number of students belonging to upper
27%, middle 46%, and lower 27%. The 27% can be
computed by: 27% (N).. Eg. 27%(30) = 8
4. For each group, determine the number of students
who answered each item correctly.
5. Prepare a table like the one shown below and write the
frequency of the top 27% of the class who got the item
right
Item Analysis Grid
(pu - pl)/Nu or l
Item Analysis
6. For each item, compute the difficulty index and the discrimination
index.
*Difficulty Index (Df = p/N)
where: p = no. of students who got the item right
N total number of students taking the test
or the proportion of the number of students in the Upper group
and lower groups who correctly answered an item
D = Ug + Lg
2
* Difficulty Index = p/N (if all are included)
where: p = no. of students who got the item right
N = total number of students taking the test
Item Analysis
Discrimination Index (Di) = (Ug - Lg)/Nu
where:
Ug = no. of students in the Upper 27% group who got the item right
Lg = no. of students in the Lower 27% group who got the item right
N = no. of students in the Upper group or No. of Lower group.
or
Di = PUg- PLg
LUg -proportion of the upper group who got an item right
PLg- proportion of the upper group who got an item right
Item Analysis
6. Based on the difficulty and discrimination indices,
decide on what to do with the item:
Difficulty Index: Interpretation Decision
0.81 or higher Very easy item Eliminate item
0.20 - 0.80 Good item Retain/Revise item
0.19 or lower Very difficult item Eliminate item
Discrimination Index:
0.40 or higher Very good item Retain item
0.30 - 0.39 Reasonably good item Revise/Improve item
0.20 – 0.29 Marginal item Revise/Improve item
0.19 or below/Negative Value Poor item Eliminate item
Note: In a Classroom achievement test, the desired difficulty index: not lower than 0.20 nor higher than .80
Average is from 0.30 to 0.80
• Maximum Discrimination is the sum of the proportion of
the Ug and Lg who answered the item correctly. It will occur
if the half or less of the sum of the Ug and Lg answered an
item correctly
• DM= PUg + PLg
• Discrimination Efficiency is the index of discrimination

divided by the maximum discrimination
• DE= Di
Dm
Index of Discrimination
• Positive discrimination- upper group
obtained higher proportion than the
lower group
• Negative discrimination- lower
group obtained higher proportion
than the upper group
• Zero discrimination- equal
proportion
• A good or retained item must have both
acceptable index of difficulty and
discrimination index.
• A fair or revised item contains either
unacceptable difficulty or discrimination
index
• A poor or rejected item must possess both
unacceptable difficulty and discrimination
index
Sample Item Analysis Worksheet
Upper Lower
Item 27% 27% Difficulty Discrimination
no (14) (14) index index Remarks Decision
1 12 3 0.54 0.64 Retain
2 14 7 0.75 0.50 Retain
3 7 10 0.61 -0.21 Revise
4 12 6 0.64 0.43 Retain
5 10 4 0.50 0.43 Retain
6 14 14 1.00 0.00 Revise
7 11 1 0.43 0.71 Retain
8 13 12 0.89 0.07 Retain
9 9 7 0.57 0.14 Revise
10 4 14 0.64 -0.71 Revise

Determining Test Reliability
7. Administer the final version of the test to another group
of students.
8. Check the papers and determine the number of students
who answered each item correctly.
9. Compute the correlation coefficient Kuder-Richardson
20: KR20 = 1-
𝒌 pq
𝒌−𝟏 s2
where: k = no. of items,
p = ratio of the no. of students who got the item right to
the total no. of students,
q = 1 – p,
s = standard deviation of the scores.
Computation of the Reliability Coefficient
Interpretation of Reliability Coefficient
.90 & above Excellent reliability; at the level of the best standardized tests
.80 - .89 Very good for a classroom test
.70 - .79 Good for a classroom test; in the range of most. There are probably a
few items which could be improved.
.60 - .69 Somewhat low. This test needs to be supplemented by other measures
(e.g., more tests) to determine grades. There are probably some items
which could be improved.
.50 - .59 Suggests need for revision of test, unless it is quite short (ten or fewer
items). The test definitely needs to be supplemented by other
measures (e.g., more tests) for grading.
Below 0.50 Questionable reliability. This test should not contribute heavily to the
course grade, and it needs revision.
Examine the results of your item
Analysis
If this is a Criterion If your test is a norm

Reference test, it
could have items
reference test, then retain
with varying levels only items with average level
of difficulty and at of difficulty and with
least zero acceptable discrimination
discrimination index
index
• Remember, in the new grading system of
the DEPED, a CRT should have this
distribution of items:
– 60% easy
– 30% average
– 10% difficult
Does your test If your answer is NO,
satisfy all the then pause and do
acceptable further quantitative
indices of item
analysis on the test of
difficulty and
discrimination? plausibility of distracters
A plausible distracter is an item
option that is incorrect yet could
What are attract the attention of the
plausible examinees particularly those
who do not know the correct
distracters answer. It has a negative
discrimination index.
How do we determine the
plausibility or
discriminatory power of a
distracters?
Causes of Poor Test Item Distracters
1. Miskeying- indicated by having more

students in the upper half selecting a
distracter rather than the correct answer
Example
A B C* D
Upper 1 1 2 5
2. Guessing-indicated by having almost equal

frequency of students in the upper half selecting
a given choice. This could be due to lack of
instructional validity of the item
Example
A B C* D
Upper 2 3 2 3
3. Ambiguity- indicated by having a distracter

that is chosen as frequent as the correct choice.
The error could be due to the use of terms not
common to the students
Example
A B C D*
Upper 4 1 1 4
Checklist in Improving Criterion-
reference test
1. Does your item fail to satisfy the level of difficulty you
have targeted?
2. Does the item discriminate negatively?
3. DO the distracters discriminate positively?
4. In the upper group:
– Was there any distracter chosen more frequent than
the key (miskeying)?
Checklist in Improving Criterion-
reference test
4. In the upper group:
– Do the choices have almost the same frequency (guessing)?
– Was there a distracter chosen as frequent as the key
(ambiguous)?
Note:
• If you answered YES to any one of the questions, then
revise the item
• If you answered YES to at least two questions, then it
would be better to eliminate the item
• If you answered NO to all questions, then retain the item
Test Construction According to Garcia (2008)
• 1) Identification of learning outcomes

• 2) Listing of the topics to be covered by the test
• 3) Preparation of the Table of Specifications (TOS)
• 4) Selection of the appropriate types of tests
• 5) Writing test items
• 6) Sequencing the items
• 7) Writing the directions or instructions.
• 8) Preparation of the answer sheet and scoring
key

Test Construction 2 With Item Analysis

Uploaded by

Copyright:

Available Formats

You might also like

Test Construction 2 With Item Analysis

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Test Construction 2 With Item Analysis

Uploaded by

Copyright:

Available Formats

Flow Chart of Test Construction

with Item Analysis

 The examinee (audience) should distinguish

1. Write down the topics/coverage of the test

4. Fill out the instruction time in each of the specific

Category % = time spent per category

Objectives Topics No. of Hours Format, No & Number of Items

Number of Items = Number of hours taught x Desired number of items

 One way Format

 One way Format

Topics Objectives No. of Hours Format, No & % of items

 One way Format

Content Instructional No. % of KD Levels of Behavior, Item Format, No. &

Legend: KD Knowledge Dimension

What is the disadvantage of this format?

 The experts check the scope of test, the

Follow the guidelines in constructing the

1. Selective Test - provides choices for the answer.

6. Pilot test the instrument and revise if necessary

 Check the length of time required and the

Methods of Identify the most F I II #4

For Qualitative item analysis, have

• DM= PUg + PLg

• Discrimination Efficiency is the index of discrimination

2 14 7 0.75 0.50 Retain

3 7 10 0.61 -0.21 Revise

4 12 6 0.64 0.43 Retain

5 10 4 0.50 0.43 Retain

6 14 14 1.00 0.00 Revise

7 11 1 0.43 0.71 Retain

8 13 12 0.89 0.07 Retain

9 9 7 0.57 0.14 Revise

10 4 14 0.64 -0.71 Revise

.80 - .89 Very good for a classroom test

If this is a Criterion If your test is a norm

1. Miskeying- indicated by having more

2. Guessing-indicated by having almost equal

3. Ambiguity- indicated by having a distracter

• 1) Identification of learning outcomes

You might also like