Module 3 - Designing and Developing Assessment Tools

MARIANO MARCOS STATE UNIVERSITY
College of Teacher Education
MODULE 3
Designing and Developing Assessment Tools
90 | P a g e
Module DESIGNING AND DEVELOPING

3 ASSESSMENT TOOLS
INTRODUCTION
We learnt in the last module that the four components of instruction - learning
objectives, teaching strategies, learning activities, and assessment – should be aligned
constructively. In this subject, we'll study how to design and create assessment tools such
that test items are appropriately and accurately aligned with learning objectives. The
importance of classroom examinations and evaluations in measuring students' performance
cannot be overstated. They are useful indicators of learning progress. The primary purpose
of classroom testing and assessment is to acquire accurate, valid, and relevant information
about students' progress (Mehrens and Lehmann, 1991). Only in this way is it possible.
Learning Outcomes
Learning Outcomes
At the end of the unit, the pre-service teachers must have:
1. constructed a Table of Specifications;
2. constructed paper-and-pencil tests in accordance with the guidelines in test construction;
and
3. performed item analysis;
4. determined the quality of a test item by its difficulty index and discrimination index , and
5. explained the characteristics of assessment methods.
PRE-TEST
91 | P a g e
Before you browse through the module, you are required to take the pre-test. You
are encouraged to take the test with utmost honesty so that your teacher will be able to
provide the appropriate help that you need.
Answer the pre-test found at the end of this module. Scan your answer sheet and
send the file in the Assignment section.
LESSONS
LESSON 1. PLANNING A TEST AND CONSTRUCTION OF TABLE OF SPECIFICATIONS (TOS)
Planning a test is an important aspect of the assessment process. So, if a teacher's goal
is to improve learning and instruction, he or she must take the necessary steps in planning
and implementing assessment instruments.
The Important Steps in Planning and Developing Assessment Tools

1. Identifying test objectives
2. Deciding on the type of objective test to be prepared
3. Preparing a Table of Specifications (TOS)
4. Constructing the draft test items
5. Item Analysis and Try-out
Let's take a closer look at the many phases involved in creating effective assessment
tools. It's critical to follow the stages in order for the test items created to accurately
measure the various learning outcomes. In this scenario, the teacher is able to measure
what needs to be measured. Consider the following points as you progress through each
step.
1. Identifying Test Objectives

 An objective test, must cover the various levels of Bloom’s taxonomy , if it is to be
inclusive. Each goal consists of a statement of what has to be accomplished and,
ideally, by how many learners. The test objectives should match with the test items
to be constructed as well as the learning objectives in the instruction.
2. Deciding on the type of objectives test
• • The test objectives determine the type of objective tests that the teacher will
create and construct. For example, we may want to build a multiple choice type of
examination for the first four (4) levels, while for application and judgment, we may
want to give an essay test or a modified essay test.
92 | P a g e
3. Preparing a Table of Specifications (TOS)

 A table of specifications, or TOS, is a test map and plan that assists a teacher
through the process of creating a test which ensures the test validity, reliability and
objectivity.
What is the Table of Specification (TOS) for?
a. The TOS assures that the exam contains a balance of items that test lower-
level thinking abilities and those that test higher-level thinking skills (or,
alternatively, a balance of easy and hard items) in the test;
b. it guarantees that instructional objectives and content are adequately covered

within a predetermined time frame, such as one unit; and
c. it aids in the equitable distribution of skill measurement.
Without a TOS, a test constructor will take the shortest route. It's possible
that you're focusing on the lower levels of thinking skills unintentionally. A
TOS would make him more aware of the need to broaden the scope of this
coverage to encompass higher-order cognitive abilities.
93 | P a g e
Format of Table of Specifications

There are two possible formats for a table of specifications. A TOS could
either be one-way or two-way.
Below is an example of a One-Way Table of Specification.
One-Way Table of Specifications in Reading Comprehension Test for Grade 6
Objective Number of Number Percentage Item Placement

Recitation or of Items of Items
Discussion
Days
1. Identify topic sentence in 5 4 10 1-4

a paragraph.
2. Restate main idea of 5 4 10 5-8

paragraph.
3. Identify supporting details 3 2 5 9-10

in a paragraph.
4. Classify ideas under 5 4 10 11-14

proper headings.
5. Give title to a paragraph. 5 4 19 15-18
6. Give title to a poem. 4 3 8 19-21
7. Sequence events as they

happen in a story.
5 4 10 22-25
8. Make a 3-point sentence

outline
8 7 18 26-32
9. Make a 3-point topic 10 8 20 33-40

outline
TOTAL 50 40 100
Source: Assessment of Learning in the Cognitive Domain by Danilo S. Gutierrez
94 | P a g e
Here is a sample of a two-way TOS
A Two-Way Table of Specifications in a Reading Test for Grade 6
Classification Total
# of
Items
Comprehension
Application
Knowledge
Evaluation
Synthesis
Objective Analysis
1. Identify 3 3
synonyms and
antonyms of
common words.
2. Note explicit 5 5
details in a
paragraph.
3. Give the main 5 5

idea of a given
paragraph.
4. Classify ideas. 5 5
5. Analyze data 5 5
from a table.
6. Arrange 5 5
events in a story.
7. Infer details in 3 3
passages that
are not explicitly
stated.
8. Make a 3- 5 5
95 | P a g e
point outline.
9. Draw 6 6
conclusions from
given data.
10. Distinguish 8 8
between fact
and opinion.
TOTAL NO. OF 8 5 10 8 11 8 50
ITEMS
% OF ITEMS 16 10 20 16 22 *15 100

 Another format of TOS is the two-way grid. The purpose of this table is to assure the
teacher that a particular test will accurately determine a representative sample of
learning outcomes and content to be covered.
 The two-way table indicates the objectives to be measured and their classification
according to Bloom's Taxonomy of Educational Objectives (Cognitive Domain) which
is not included in the one-way TOS
Let us look at another example of a two-way TOS.
96 | P a g e
Source: Lecture of Arnel O. Rivera – "Table of Specifications (TOS) with an overview on Test Construction"
How does it differ from the first example of two-way TOS?
 In this example, the following are replaced: first, the test objectives are replaced
with topics and second, the Bloom's Taxonomy of Educational Objectives in the
Cognitive Domain are replaced with Lorin Anderson's revised Educational
Objectives in the Cognitive Domain.
The following are included in the second example:

 Item placement (last column)
 Number of discussion days or number of hours per topic or per objective (2 nd
column)
NOW, let us look at the parts of TOS.
Parts of Table of Specifications
Parts Description
1. Test Objective or Topics The test objectives similar to the learning objectives
should be written in behavioral terms. It should be SMART
– specific, measurable, attainable, realistic and time-
bound. Also, it should be aligned to the learning
objectives.
97 | P a g e
2. No. of days or Number of This specifies the number of days or number of hours a
discussion days or Number of teacher taught a particular topic or attained a particular
hours per topic objective. This can be determined by looking at the lesson
plan.
3. Number of items (per This refers to the number of test items to be constructed
objective or topic) per objective or topic. It is proportional to the number of
discussion days. It is computed by using the formula given
below:
# of items = _# of discussion days_ x total # of items

Total number of
discussion days
4. Percentage of Items The percentage of items tells what proportion of the

entire test is given to each learning objective or topic. It is
computed by using the formula shown below:
% of items = _# of items per objective_ x 100

Total number of items
5. Levels of Cognitive Domain The test items for each objective or topic are classified
based on the levels of cognitive domain. If the test
objective is knowledge level then test item to be
constructed should likewise be classified under the
knowledge level or remembering
6. Item Placement The item placement specifies the location of a particular

number of items in the entire test. It is used for easy
reference if the teacher wants to determine which of the
skill is mastered or not mastered by the students.
Let us try to use the formula for number of items and percentage items using a part of the
example of one-way TOS
98 | P a g e
For # 1 objective
# of items = _# of discussion days_ x total # of items

Total number of
discussion days
= _ 5_ x 40
50
= 4
So, there should be four test items for the first test objective.
% of items = _# of items per objective_ x 100

Total number of items
= _4_ x 100
40
= 10%
This indicates that 10% of the total number of test items is from
As a wrap-up, preparingobjective
a table ofnumber 1.
specifications, involves the following steps:
1. Determine the desired learning objectives to be measured.
2. Assign the corresponding number of items in terms of emphasis given during
instruction.
3. Create a table with relative weights by proportionally distributing the contents
throughout the cells of the table.
99 | P a g e
Self -Assessment Exercise 1
Let us check your understanding.

Compute the number of items and the percentage of items. Identify also a
possible location for the test items. Round-off answers to whole number.
Objective # of # of % of Item
discussion items items Placement
days
1. Identify the place value 3

in a 3-digit number
2. Use symbols to write 5
numbers from 1 to 1000.
100 | P a g e
3. Skip count numbers by 4

50s up to 1000
4. Recognize ordinal 6
numbers up to 20th
5. Write money as pesos & 6
centavos through Php
100
6. Write Roman numerals 5
in equivalent Hindu-
Arabic through L
(50)
7. Add 2- to 3- digit 3
numbers with sums up
to 999
numbers with zero any
addends without
regrouping
numbers with zero any
addends with regrouping
10. Show properties of 5
addition in adding
numbers
TOTAL 50 30 100
LESSONS
LESSON 2. PRINCIPLES OF TEST CONSTRUCTION
The fourth step in developing assessment tools is writing the draft test items.
The actual construction of the test items follows the TOS. Aside from observing that test
items should match the learning objectives, a teacher should be knowledgeable on the
principles of test construction in writing test items. According to a research, 13% of students
who got low grades in exams are caused by faulty test questions (WorldWatch The
Philadelphia Trumpet August 2005). Thus, it is very necessary for a teacher to follow the
principles of test construction. This assures the quality, validity and reliability of test items.
101 | P a g e
In this lesson, we will learn the principles of test construction of objective and
subjective test or non-objective test.
Test is a measuring instrument that is used to quantify what our students have
learned from us. It reflects what we have taught them and how we have taught them. Figure
1 shows the types of objective test.
Figure 1. Types of Objective Test
There are two general types of objective tests – supply type test items and selection
type test items. For the supply test items, these are short answer type or identification and
completion type or fill -in – the blank. For selection type test items, these are multiple
choice, matching type, true-false test , arrangement and analogy.
Before we have the specific principles in constructing the supply type and selection
type test items, let us look into the general suggestions in test item writing.
GENERAL SUGGESTIONS IN TEST ITEM WRITING

1. Use the table of specifications as a guide for item construction.
2. Create more test items than are required.
3. Prepare the test items well in advance of the testing date.
4. Write each item at the appropriate reading level and difficulty level.
5. Write a test item in such a way that it does not serve as a hint for other test items.
6. Prepare the test item with answers that can be agreed on by all experts.
7. Recheck the relevance of a test whenever it is revised.
102 | P a g e
LESSONS
LESSON 2.1 GUIDELINES IN THE CONSTRUCTION OF SUPPLY TYPE TEST ITEMS
A. SHORT ANSWER / IDENTIFICATION/ SIMPLE RECALL
This is a test in which an unknown specimen is to be identified by name or other

criterion In this test a term is being defined, described, explained or indicated by a
picture and the term referred is supplied by the learner. The response necessarily
requires the subject to remember previously learned material and the answers are
typically brief, consisting of a single word or phrase.
Advantages
1. The teacher could easily identify the test's or students' strengths and weaknesses.
2. Unbiased in the sense that teachers' preconceived notions about students' work
cannot influence marking.
3. Relatively useful as a pre-test
4. The answer key is unambiguous, so two observers arrive at the same conclusion.
Disadvantages
1. Most types are limited to factual recall
2. Its construction is time consuming.
3. It promotes guessing
4. It is not suitable for language skill testing.
GUIDELINES IN THE CONSTRUCTION OF SHORT ANSWER / IDENTIFICATION/ SIMPLE

RECALL
1. In most cases, the direct-question form is preferable to the statement form.
2. The blanks for the answers, as much as possible should be at the right column of
the test items.
3. The question should be phrased in such a way that there is only one correct answer.
Example:
1. How many centimeters make up 2 meters? 1. _________
2. Convert 7, 000 grams to kilograms. 2. __________
3. It is the fundamental unit of element. 3. __________
103 | P a g e
This format is applicable if the test question is also the answer sheet.
B. COMPLETION TYPE TEST ITEM / FILL – IN THE BLANK

This test consists of a series of items in which the subject must fill in the blanks with
a word or phrase. This is useful for the testing of specific facts. The completion-drawing
test is a modified version in which an incomplete drawing is presented and the learner
must complete it.
Advantages
1. Construction is relatively easy
2. Gives better diagnostic information- they can easily see where they go wrong
3. It minimizes guessing.
4. The correct answer may not be as obvious as in a multiple choice question.
5. It may be appropriate for students who are unable to generate answers on their own.
Disadvantages
1. The assessed understanding is possibly to be trivial.
2. It is difficult to avoid ambiguity when developing questions.
3. There may be numerous possible correct answer and is therefore harder to mark
than multiple choice.
4. It might confuse students if context is not obvious.
5. Scoring takes more time than other objectives.
When to Use
 Use completion questions only when the recall of ideas and words is important.
GUIDELINES IN THE CONSTRUCTION OF COMPLETION TYPE OR FILL – IN THE BLANK

1. Word the item specifically and clearly.
Example
Poor: The type of test in which students must select the correct answer is ________.
Better: The type of test that consists of a stem and three to five alternatives is_____.
2. The blank should be placed at the end of the sentence.

Example
Poor ________ are the wrong choices or options in a multiple- choice item.
104 | P a g e
Better: The wrong choices or options in a multiple – choice item are called
________.
3. Avoid several blanks, one blank is most recommended.
4. Attend to the length and arrangement of blanks. Make the blanks of equal length.
Example
Poor: The ____ subatomic particle is the _______.
Better: The positively charged subatomic particle is the ________.
5. Do not give grammatical clues to the correct answer.

Example
Poor: A collection of questions held in a system of storage is called a ________.
Better: A collection of questions held in a system of storage is called a/an ______
6. Avoid copying exact words of the textbook.
Let us check your understanding.

Choose the better completion test item. Write the letter only. Identify also the
guideline or principle that is violated by the poor item.
105 | P a g e
1. a) Every atom has a central ____ known as the nucleus.

b) Each atom has a central core known as ______.
2. a) The wrong answer for the choices of a multiple choice is called a/an______.
b) The wrong answer for the choices of a multiple choice is called a______.
3. a) It is a type of test that requires a learner to provide an answer by completing a

sentence is _________
b) It is a type of test that requires a student to provide an answer is ______.
4. a) The products of photosynthesis are sugar and _____________.

b) The products of photosynthesis are _________ and _________.
LESSONS
LESSON 2.2 GUIDELINES IN THE CONSTRUCTION OF SELECTION TYPE TEST ITEMS
A. TRUE – FALSE TEST OR ALTERNATIVE RESPONSE TEST.

True – False items require students to identify statements which are correct or
incorrect. Only two possible responses are possible in this item format – true or
false, right or wrong, yes or no.
106 | P a g e
Advantages
1. Enables teachers to sample a variety of cognitive behaviors in a limited amount of time.
2. Scoring is simple and easy
Disadvantages
1. Problem solving and complex learning could not be measured by the alternate response test
2. Possibility of guessing is very high
When to Use
 It can be used to evaluate results concerned with the recall of concepts or the ability to
discriminate. They can be used to provide an encouraging lead-in to assessment.
 True – False tests are better used for self-assessment and diagnostic assessment than in
summative exams because learners can guess the correct answer.
GUIDELINES IN THE CONSTRUCTION OF TRUE-FALSE TEST:

1. Do not provide a hint in the body of the question.
Example:
Poor: The Philippines gained independence in 1898 and thus marked its centennial
year in 2000.
 It is very clear that the answer is FALSE. It is because 100 years

from 1898 is not 2000 but 1998.
Better: The Philippines gained its independence in 1898.

OR
The centennial year of Philippine independence is in 2000.
2. Do not use the words “always, never, often” and similar adverbs that tend to be
either, always true or always false.
Example
Poor: True – False test items are always prone to guessing.
 Statements containing the word always are almost always false. Even if
a student does not know anything about the test, he can easily guess
his way through it and get high scores.
Better: True – False test items are prone to guessing.
107 | P a g e
3. Avoid long sentences because they tend to be "true." Keep your sentences short and
simple.
4. Example
Poor : Tests must be valid, reliable, and useful; however, ensuring that these test
characteristics are present would take a significant amount of time and
effort.
 Take note that the statement is correct. However, we are unsure which
part of the sentence the student believes to be true. It's just a
coincidence that all of the preceding sentences are true in this case.
Better: Test items should have the following characteristics - valid, reliable and
useful.
5. Avoid tricky statements with minor misleading word or spelling errors, misplaced
phrases, and so on. A wise student who is unfamiliar with the subject matter may
detect this strategy and thus correctly answer the question.
Example
Poor: The Mr. Albert P. Panadero is the Principle of our school .
 The Principal’s name may actually be correct but since the word is
misspelled and the entire sentence takes a different meaning, the answer
would be false! This is an example of a tricky but entirely useless item.
Better: The Mr. Albert P. Panadero is the Principal of our school.
6. Avoid double negatives.

Example
Poor: True-False items cannot be scored by an untrained person.
 Take note of the double negative cannot and untrained. To improve
this item, delete one of the negative words and use capital letter to
show emphasis.
Better: True – False items CANNOT be scored by a trained person.

7. Avoid quoting verbatim from sentence materials or textbooks. This practice sends
the wrong signal to the students that it is necessary to memorize the textbook word
for word and thus, acquisition of higher level thinking skills are not given due
importance.
108 | P a g e
8. Avoid a grossly disproportionate number of either true or false statements or even

patterns in the occurrence of true and false statements.
Let us check your understanding
I. Which is the better True – False test item?
1. A) Tuberculosis is a communicable disease.

B) Tuberculosis is a not noncommunicable disease.
2. A) The true-false item is more subject to guessing but it should be used in place of
multiple choice item, if well - constructed, when there is a dearth of distracters
that are plausible.
B) The true-false item should be used in place of the multiple choice item when
only two alternatives are possible.
3. A) A statement of opinion should never be used in a true – false item.

B) A statement of opinion cannot be marked true or false.
4. A) The true – false item is also called alternative response item.

B) The true – false item, which is favored by all tests experts, is also called an
alternative response.
Below is a Practice Exercise on evaluating True – False test items

(Source: Assessment of Learning 1 in the Cognitive Domain by Danilo S. Gutierrez, 2007)
II. Read each True-False item below. Evaluate each item by writing G for good item and P
for poor item. Write your comment if your answer is P by identifying what part of the item
makes it poor. Use the space provided below each item.
109 | P a g e
1. Each item task should match the learning outcome to be measured.

_________________________________________________________________
_________________________________________________________________
2. The option none of the above is perhaps used more often with mathematical
problems.
_________________________________________________________________
_________________________________________________________________
3. No pattern should appear in the test.

_________________________________________________________________
_________________________________________________________________
4. It is not desirable to make the options approximately unequal in length.

_________________________________________________________________
_________________________________________________________________
5. Teachers tend to use multiple choice items more often because many items can
be prepared.
_________________________________________________________________
_________________________________________________________________
B. MULTIPLE-CHOICE TEST
This test is made up of items that consist of three or more plausible options in each
item. It consists of two parts:
1. the stem; and
2. the alternatives/options.
In the set of options there is a “correct” option while the others are considered
“distracters”.
110 | P a g e
Advantages
1. Used in measuring different kinds of content, almost any type of cognitive behavior,
including higher cognitive levels like reasoning, understanding and giving judgments
2. Easy to score
3. It limits scoring bias
4. Incorrect response pattern can be analyzed
5. Highly reliable test scores.
6. Scoring is efficient and accurate.
Disadvantages
1. Tends to focus on low level learning objectives
2. Development of a good test item is time consuming
3. It encourages guessing.
4. It is the most difficult and the most time-consuming type of test items to construct.
5. It places a high degree of dependence on the student's reading ability and
instructor's writing ability.
When to Use
 Multiple choice exams are appropriate to use when the attainment of the
educational objective can be measured by having the students select his or her
response from a list of several alternative response.
A. GUIDELINES IN WRITING GOOD STEM

1. The stem of the item should present clearly a simple central problem or idea. The
problem
or idea must be accurately stated.
Example
Poor: A table of specifications
A. indicates how learning can be improved.
111 | P a g e
B. adequately samples the behavior to be tested.

C. specifies the weaknesses of student performance.
D. reduces the amount of time to construct the items.
 The stem of the item is vague, ambiguous and broad. The stem is the
foundation of the item. After reading the stem, the student should know
exactly what the problem is and what he/she is expected to do to answer
it. The stem should adequately cover enough details to make it clear and
precise.
Better: Why is a table of specification primarily important before writing any test item?
A. It indicates how learning can be improved.
B. It adequately samples the behavior to be tested.
C. It specifies the weaknesses of student performance.
D. It reduces the amount of time to construct the items.
 The improved stem presents one clear problem that the students will surely
understand. It is specific enough that the question can be answered even
without the options.
2. All relevant information should be included in the stem. Include all the information
necessary for the examinee to understand the intent of the item.
Example
Poor: When a piece of stone is dropped into the graduated cylinder, the water level
rose. What is the volume of the stone?
A. 0.6 mL B. 1.6 mL C.32 mL D. 132 mL
 One cannot compute for the volume of the stone because numerical
values are not provided in the stem.
Better: When a piece of stone is dropped into the graduated cylinder, the water
level rose from 50 mL to 82 mL . What is the volume of the stone?
A. 0.6 mL B. 1.6 mL C.32 mL D. 132 mL
3. All irrelevant materials should be omitted from the stem. Avoid the inclusion of
nonfunctional words. A word or phrase is nonfunctional when it does not contribute
to the basis for choosing a response.
Example
Poor: Pitong was walking in the park when he passed by a well. He wanted to
know how deep the well was so he picked up a stone and dropped the stone
into the well. What kind of force is acting on the falling stone?
112 | P a g e
A. electrical C. gravity
B. friction D. magnetism
 The stem is wordy and contains information that are not important like
Pitong was walking in the park when he passed by a well is not necessary.
Better: A stone was dropped into a deep well. What kind of force is acting on the
stone?
A. electrical B. friction C. gravity D. magnetism
4. The stem should be stated in positive form. If the negative form is used, emphasize
the fact by underlining, using italics or capitalizing it. Avoid using double negatives.
Example
Poor: Each of the following substances EXCEPT ONE is a mineral. Which one is not?
 Avoid using double negative – except one and which one is not.
Better: Which of the following substances is a mineral?

OR
Which of the following is NOT a mineral?
5. Place all information that can be placed in the stem to avoid repetition in the option.
Example
Poor: Substances expand when heated because
A. molecules move very fast. C. molecules move in all directions.
B. molecules move very slowly. D. molecules move farther from each other.
 The phrase molecules move in the option can be placed in the stem to
avoid repetition.
Better: Substances expand when heated because molecules move

A. very fast. C. in all directions.
B. very slowly. D. farther from each other.
6. Avoid giving grammatical clues.
Example
Poor: A word used to describe a noun is called an
A. a verb. B. a pronoun. C. an adjective. D. a conjunction.
 The word AN in the stem serves as a clue to the correct answer,

ADJECTIVE because the other options begin with consonants.
 The problem can be corrected by placing appropriate article a / an in each
option or by placing a/an in the stem.
113 | P a g e
Better: A word used to describe a noun is called

OR
A word used to describe a noun is called a/ an
7. If the stem is an incomplete statement, (as much as possible) do not place a blank
line and place a period for each of the alternative.
Example
Poor: Substances expand when heated because molecules move __________.

B. GUIDELINES IN WRITING PLAUSIBLE OPTIONS

1. The option should be homogeneous in the sense that each should be a member of the
same
set of things.
Example
Poor: Which one of the following animals is most clearly in danger of extinction?
A. mackerel B. monkey-eating eagle C. sampaguita D. tamaraw
 The correct answer is a beast, so to make the distracters plausible, options

should be replaced by beast also.
 Homogeneous options are those that belong to one category or classification. If

the key or correct answer is an animal, the distracters should also be animals of
the same kind. Homogeneous options are plausible alternatives. Plausible
alternatives have the possibility to be chosen by the uninformed students.
Better: A. carabaoB. cow C. horse D. tamaraw
2. The options should be related but must not have similar meaning or synonymous.
Example
Poor: Sliding in the bathroom can be prevented by wearing
A. sandals with even soles. C. sandals with soapy water.
114 | P a g e
B. sandals with rough soles. D. sandals with smooth soles.
 Options A and B are similar in meaning. If the soles are even, the
description implies that the soles are smooth. If two options are similar in
meaning, the student can eliminate them as a possible answer.
Better: Sliding in the bathroom can be prevented by wearing sandals with
A. grease. C. soapy water.
B. rough soles. D. smooth soles.
3. The key (correct option) should be of the same length as the distractors to avoid giving
a clue.
Example
Poor: One problem met by scientists about cloning animals is that cloned animals.
A. get old fast. C. do not reproduce.
B. remain young. D. do not live long as uncloned animas do.
 Most of the times, the correct answer tends to be longer than the
distracters. The smart students can often detect this.
Better: A. die early. C. get old fast.

B. stay young. D. do not reproduce.
 The correct answer - do not live long as uncloned animals do is reworded

to die early to make the length of alternative as much as possible similar.
Numerous studies have indicated that items are easier when the answer is
noticeably longer than the distracters as compared when all of the
alternatives are similar in length (Haladyna & Downing, 1986 as cited by
Burton et. al. 1991
4. Make all the options grammatically consistent and parallel in form with the stem of the
Item
Example
Poor: How is the movement of bones made possible?
A. By pushing the skeletal muscle C. Muscles and bones are combined
B. By pulling the skeletal muscles D. Muscle pushes the other muscles
 Options C and D are not parallel in form with the correct answer. They
must be revised. 115 | P a g e
Better: A. By pushing the skeletal muscle C. By pulling and pushing the muscles
B. By pulling the skeletal muscles D. By combining the muscles and bones
4. State options in sequential (natural) order, whether alphabetically or

numerically.
5.
Example
Poor: Mars’ gravity is 0.38 times that of Earth. What will be the weight on planet Mars of
an astronaut who weighs 400 N on Earth?
A. 152 N B. 10.5 N C. 400.38 N D. 3.62 N
 Arranged answers in sequential order.
Better: A. 3.62 N B. 10.5 N C. 152 N D. 400.38 N
Here is another example.

Which animal is hatched from eggs?
A. carabao B. goat C. rabbit D. snake
 Options are arranged alphabetically.
7. If alternatives are sentences or phrases, arrange them in order of increasing length.
Example
Poor: Substances expand when heated because molecules move
A. in all directions. C. very slowly.
B. very fast. D. farther from each other.
 Arrange the phrases preferably in order of increasing length.

8. Put the term to be defined in the stem and suggest various definitions in the alternatives
or options.
Example
Poor: The tendency of particles to move from greater concentration to lesser
concentration is called
A. diffusion. B. evaporation. C. osmosis. D. transpiration.
116 | P a g e
Better: Which of the following correctly defines diffusion?

A. It is the tendency of particles to be evenly distributed.
B. It is the loss of materials from cells of plants and animals.
C. It is the spreading of particles towards the bottom of a container.
D. It is the movement of particles from lesser to greater concentration.
9. Avoid using ALL OF THE ABOVE and NONE OF THE ABOVE as one of the alternatives
Example
Poor: Which is an example of a mixture?
A. air C. seawater
B. juice D. all of the above
 The alternative ALL OF THE ABOVE or NONE OF THE ABOVE can be

considered a clue for the student. ALL OF THE ABOVE means that all the
other choices are correct while NONE OF THE ABOVE means that the all
the other choices are wrong. If this happens, a student need not read the
other alternatives. The distracters lose their function in distinguishing a
good performer from a poor one.
Better: Which is an example of a mixture?

A. air C. sugar
B. salt D. water
OR if all choices are correct, then you can follow the format below:
Which of the following is / are mixtures?

I. air II. juice III. seawater IV. steel
A. I only C. I, II, and IV only

B. I and II only D. I, II, III, and IV
C. MATCHING TYPE
The matching item is selection type of item consisting of a series of stimuli (or stems)
called premises, and a series of options called responses. The premises and responses are
arranged in columns. Usually the premises are placed in the left column (Column A) while
the responses are set in the right column (Column B). Directions provide the basis for
matching.
117 | P a g e
The following are the possible premises and responses:
Column A (Premise) Column B (Response)
Accomplishments Persons
Noted events Dates
Definitions Terms/Phrases
Uses and Functions Parts/Machines
Quantities and Qualities Symbols/Signs
Examples Rules/Principles
Plants, Animals, Element Classification/Category
Advantages
1. Valuable in content areas that have a lot of facts
2. Easy to score
3. Relatively easy to construct
4. Measures primarily associations and relationships as well as sequence of events
Disadvantages
1. Not very effective in measuring higher order skills
2. Time consuming for students
When to Use
 They are used when you need to measure the learner’s ability to identify the
relationships or association between similar items.
GUIDELINES IN WRITING MATCHING TYPE ITEMS

1. Use only homogeneous material in a single matching exercise.
2. Keep the list of items to be matched brief, and place the shorter at the right or at the
column for responses.
 The list of premises should not be more than 15. Keeping the list short enables
the students to locate answers easily. It also saves reading time. Confusion is
avoided and the premises are made more homogeneous.
118 | P a g e
3. There should be more responses than premises. For example, if there are five in the
premises, there should be six in the responses.
 The uneven match between premises and responses, reduces guessing.
4. Arrange the list of responses in logical order.

5. Indicate in the directions the basis for matching the premises and responses. This will
make the task clear and specific.
Examples
 Match the descriptions of the parts of the digestive system in Column A with the
digestive organs being described in Column B. Write the letter of the correct
answer on the blank before each number in Column A.
 Match the function of the part of computer in Column A with its name in Column
B. Write the letter of your choice before the number.
6. Place the matching items on one page.
Here is an example of a matching type of test.
Match the descriptions of the parts of the digestive system in Column A with the
digestive organs being described in Column B. Write the letter of the correct answer
on the blank before each number in Column A.
Column A Column B
_____1.coil-like structure where food is a. esophagus
digested and absorbed
_____2. tube-like structure where food b. large intestine
is swallowed
_____3.muscular tube which collects c. liver
undigested food
_____4.pear-shaped organ where food is d. mouth
churned
_____5.an opening where food is digested e. small intestine
f. stomach
Match the capitals in Column B with the provinces of the Philippines in Column A by writing
the letter of the capital in the blank provided before the corresponding province.
Column A (Province) Column B (Capital)
_____ 1. La Union A. Balanga

_____ 2.Zambales B. Bangued
_____ 3.Abra C. Iba
_____ 4.Palawan D. Laoag
119 | P a g e
_____ 5.Ilocos Norte E. Puerto Princesa

F. San Fernando
D. ARRANGEMENT TYPE
This type consists of a multiple option item where it requires a chronological, logical,
rank, etc. order. This test consists of ordering and assembling items on some basis
This is used to test knowledge of sequence and order. This measures memory of
relationships and concepts of organization while assembly measures mechanical ability
and the ability to figure out spatial relationships. This type of test can be a multiple
choice item.
The following can be arranged in a specified order:

1. letters of the alphabet
2. events (e.g. historical events)
3. geographical location
4. numbers according to a magnitude
5. process (e.g. stages of the process of digestion)
6. incidents in a story ( in order of their occurrence)
7. quality of importance ( from the least to the most important)
8. jumbled words into a sentence or letters in a word.
9. planets (e.g. based on the distance of between the planet and the sun)
GUIDELINES IN THE CONSTRUCTION OF ARRANGEMENT TYPE OF TEST

1. The basis for arrangement should be explicitly stated in the direction.
2. The items to be arranged should belong to only one category.
3. Provide instructions on the rationale for arrangement or sequencing.
4. Specify the response code students have to use in arranging the items.
5. Provide sufficient space for the writing of the answers.
Example 1
The group of sentences numbered 1 to 5 below consists of one paragraph. Read the
sentences in each number and arrange the best order to have a complete and well-
organized paragraph. Choose from the options the best order.
1. Miss Castro, their teacher, gave then a good grade.

2. The study was on the determination of protein content of dried anchovy.
3. One day, the students in the chemistry class conducted an experiment.
4. All of them were happy.
5. They were all successful in their analysis.
120 | P a g e
A. 32514 B. 31425 C. 35241 D. 23514 E. 25143
Example 2
Arrange the planets in the solar system according to their nearness to the sun by
using numbers 1,2,3….
___ Earth
___ Venus
___ Uranus
___ Mars
___ Jupiter
___ Mercury
___ Neptune
___ Saturn
E. ANALOGY
This type is made up of items consisting of a pair of words which are related to each
other. It is a comparison between two things that are usually thought to be different from
each other but have some similarities.
It is designed to measure the ability of students to observe the pair relationship
between paired words or concepts. This type of test can be a multiple choice item.
Advantages
1. They are valuable tools in conceptual change learning
2. They provide visualization and understanding of the abstract by pointing to
similarities in the real world.
3. They may incite pupils' interest and hence have a motivational effect.
4. They force the teacher to take into consideration pupils' prior knowledge and may
reveal misconceptions in previously taught topics.
15 KINDS OF ANALOGY TEST

1. Purpose Relationship – It is a relationship in which one word in the pair shows the
purpose of the other word
Example : SHOE is to SHOELACE as DOOR is to

A. hinge B. key C. threshold D. transom
2. Cause and Effect Relationship – The similarity in this type derives from the cause on
one side and its indisputable effect on the other side.
121 | P a g e
Example: SMOKE is to FIRE as WATER is to

A. cloud B. H2O B. rain C. sky
3. Part – Whole Relationship - The analogy of this type is based on whether the item is
a member of the same group or category.
Example: SLICE is to LOAF as ISLAND is to

A. archipelago B. land C. ocean D. peninsula
4. Part – Part Relationship - The analogy is based on a part of a whole of a particular

item.
Example: HAND is to ELBOW as FEET is to

A. knee B. led C. muscle D. toe
5. Action to Object Relationship - The analogy is based on two sets of performers and
their corresponding actions
Example:- OBEY is to CHILDREN as COMMAND is to

A. armies B. parents C. principal D. teacher
6. Object to Action Relationship – The analogy is based on the action that would be
done to a particular object.
Example: EGG is to BOIL as POTATO is to

A. hash B. mash C. slash D. slice
7. Synonym Relationship – It is a relationship in which both words have similar

meanings or a word having the same or nearly the same meaning as another.
Example: DIG is to ESCAVATE as KILL is to

A. average B. convict C. slay D. try
8. Antonym Relationship – It is a relationship in which the meaning of the word is the

opposite of the other word.
Example: FLY is to CRAWL as RUN is to

A. hop B. jump C. skip D. walk
9. Place Relationship - The analogy is based on the location of the object.
Example: WATER is to AQUEDUCT as BLOOD is to

A. body B. corpuscle C. plasma D. vein
122 | P a g e
10. Degree Relationship - It is a relationship in which one is more intense than the
other.
Example: POSSIBLE is to PROBABLE as HOPE is to

A. deceive B. expect C. prove D. resent
11. Characteristic Relationship – It is a relationship in which it compares the

characteristics of two objects
Example: RICH is to WEALTH as WISE is to

A. divulge B. knowledge C. save D. teach
12. Sequence Relationship – It is a relationship in which both words may compare

objects, events or actions according to each sequence.
Example: SUNDAY is to TUESDAY as THURSDAY is to

A. Monday B. Wednesday C. Friday D. Saturday
13. Grammatical Relationship – It is a relationship in which both words are related by

their grammatical type.
Example: ANGRY is to ANGRILY as BEAUTY is to

A. charming B. beautifully C. nice D. pretty
14. Numerical Relationship – The analogy is based on a mathematical analogy test,

similarities of equality or proportions of a number.
Example: 024 is to 135 as 6810 is to
A. 111317 B. 111517 C. 111315 D. 111519
15. Association Relationship – The analogy relationship points out the cause and effect,
functional and sequential order relationships
Example: ROMANCE is to HEART as RIBBON is to
A. baloney B. card C. gift D. lace
123 | P a g e
I. Evaluate the following multiple choice test items. Choose from the options
given below. Write the letter only.
A. Good stem and plausible options

B. Good stem and poor options
C. Poor stem and plausible options
D. Poor stem and poor options
1. Luis is twelve years old. How many orbits around the sun has the earth made
since he was born?
A. 12 B. 30 C. 52 D. 365
2. A standardized test has

A. norms. B. objectivity. C. reliability. D. scorability.
3. Generally, the longer the test, the higher is its

A. interpretability. B. reliability. C. usability. D. validity.
4. A reading teacher wants to find out if the pupils are ready to move on to the next
lesson. What kind of test should she give?
A. diagnostic B. formative C. placement D.
summative
5. A gumamela is a
A. complete flower. C. pistillate flower.
B. incomplete flower. D. staminate flower.
6. Which of the following methods is not used in hydroponics?

A. slop – method C. sub – irrigation method
B. water – culture method D. all of the above
7. Which of the following is the general purpose of educational measurement and

evaluation?
A. improve instruction
B. d. construct good test items
C. facilitate learning
D. improve the curriculum
II. Evaluate the matching type test item given below. Identify the guidelines that
are violated.
124 | P a g e
Direction: Match Column A with Column B.
Column A Column B
___1. animals with fins and scales a. reptiles

___2. animals with dry scaly skin b. fishes
___3. animals that feed milk to their young c. amphibians
___4. animals that live in both water and land d. mammals
___5. animals that have feathers e. birds
LESSONS
LESSON 2.3 GUIDELINES IN THE CONSTRUCTION OF SUBJECTIVE TEST /

NON – OBJECTIVE TEST
A. ESSAY
Essays, classified as subjective tests or non-objective tests, allow for the assessment
of higher order thinking skills. Such tests require students to organize their thoughts on a
125 | P a g e
subject matter in coherent sentences in order to inform an audience. In essay tests,

students are required to write one or more paragraphs on a specified topic.
Essay questions can be used to measure attainment of a variety of objectives.

Stecklein (1955) has listed 14 types of abilities that can be measured by essay items:
1. Comparisons between two or more things

2. The development and defense of an opinion
3. Questions of cause and effect
4. Explanations of meanings
5. Summarizing of information in a designated area
6. Analysis
7. Knowledge of relationship
8. Illustrations of rules, principles, procedures and applications
9. Applications of rules, laws and principles to new situations
10. Criticisms of the adequacy, relevance or correctness of a concept, idea or
information
11. Formulation of new questions and problems
12. Reorganization of facts
13. Discriminations between objects, concepts or events
14. Inferential thinking
 Note that all these involve the higher-level skills mentioned in Bloom’s Taxonomy.
Types of Essay
1. Restricted-response essay
 Restricted - response limit the ways in which you will permit the students to answer.
There ARE correct answers and we allow students to express the answer in their own
words.
 Restricted response is predicated on the notion that students supply the answers
rather than selecting the answer from a group options
Let us look at the following examples:

1. “Write a brief essay comparing and contrasting the term analysis and
synthesis as they relate to constructing (a) objective items and (b) essay
items.”
2. “What is the poet’s attitude toward literature as it is apparent in lines 1 to 8?

what words in these lines make this apparent?
126 | P a g e
3. “A car traveling 50 mph leaves Chicago at 9am. A train traveling at 70 mph

leaves Milwaukee at 10 am. Who will arrive in Toledo (250 miles away) first?
Show your work.”
2. Extended-Response Essay
 Extended Response Allow students to express their own ideas and interrelationships
among ideas and use their own strategy for organization. No “correct” answers but
based on reasons/ logic.
 Because the focus is on logical argument and reasoned answering the teacher must
be open and accepting of uncomfortable responses
 Based on the creative process of the student - often including reasoning and factual
presentations.
Let us look at the following examples:

1. Devise a plan to determine whether the democrats or republicans are evenly
distributed throughout the city, or whether the supporters of each party are
concentrated in certain wards.
2. Design an experiment to calculate the height of a redwood tree.
3. If you were Pnoy, which of the 8 MDGs would you give more focus? Justify
your answer.
Advantages
1. It measures the higher level of knowledge – ability to interpret, evaluate, apply
principles, create, organize thoughts and ideas, compare and contrast, etc.
2. Helps students organize their thoughts and ideas logically – they can practice in
topical organization and discussion
3. Easy to prepare
4. Can be used in any subject
5. Harder to cheat than objective type
Disadvantages
1. There is difficulty in scoring as giving of the right weight to each question is difficult.
It cannot be mechanically scored.
2. Its usability, validity and reliability are low
3. Sampling is limited as only few questions could be included in the test
4. Scoring is subjective
5. Standards of excellence varies from teacher to teacher
6. Physical and mental conditions of the checker affects the scoring
7. Irrelevant factors like grammatical errors, poor penmanship, language difficulty,
wrong spelling and the like adversely affect the scoring
127 | P a g e
8. Time-consuming to score—especially if follow guidelines. Can be impossible if

conscientious in scoring, give good feedback, and have many students
9. Limited sampling of content domain.
RULES FOR PREPARING/ADMINISTERING ESSAY QUESTIONS

1. State questions that require clear, specific and narrow task or topic to be performed.
 Some sample terms to use that make the task clear and specific are as follows:
compare, describe, explain, summarize, relate differentiate, criticize and appraise.
 Give explicit instructions on type of answer desired.

o Example: Your answer should be confined to 100-150 words. It will be evaluated in terms of
the appropriateness of the facts and examples presented and the skill with which it is
written.
2. Give enough time limit for answering each essay question.

 Enough time to think and write the answers should be given to students. Only a few
essay questions should be formulated for a given class period.
 It is not wise to include so many questions to cover as many desired learning
outcomes because there may not be enough time for the students to finish the test.
 It is advisable to include one or two essay questions in a testing period.
3. Require students to answer all questions.

 It is best to require all students to respond to the same essay questions for point of
comparison and representativeness. If only a few questions are answered, the
problem of adequate and representative sampling of behavior becomes evident.
4. Make it clear to students if spelling, punctuation, content, clarity and style are to be
considered in scoring the essay questions. When these criteria are clear and specific to
students, the item becomes valid.
5. Grade each essay question by the point method, using well-defined criteria.
 By using certain criteria as guide, scoring essay questions becomes less subjective
and more objective.
 Examples of criteria to use in scoring an essay question are given below.

A. Completeness of ideas presented (40%)
B. Clarity of expressions used (30%)
C. Organization of ideas (30%)
 Below is a sample scoring guide:
Score Description
128 | P a g e
5 Complete ideas presented; clarity of expressions used;

organization of ideas is evident with the use of outline
4 Complete ideas presented; clarity of expressions used;

organization of ideas is evident with the use of outline
3 Lack some important ideas; clarity of expressions used;

organization of some ideas not evident
2 Lack some important ideas; some expressions are vague used;

organization of some ideas not evident
1 Lack most of important ideas; most expressions are vague

used; organization of some ideas not evident
Source: Danilo S. Gutierrez Assessment of Learning Outcomes in the Cognitive Domain Book
1
6. Evaluate all of the students’ responses to one question before going to the next question.
 Scoring essay test question-by-question rather than student-by-student maintains
uniform standard for evaluating the answer to each question.
7. Evaluate answers to essay questions without identifying the student.

 There is a tendency to score essay questions based on personal bias. To prevent
knowing the identity of the student, a code number may be assigned to each student.
In this way, answers can be scored more objectively based on their own merits.
7. If possible, two or more correctors must be employed to ensure reliable results.
B. INTERPRETATIVE TEST
This type of test is often used in testing higher cognitive behavior. It may involve
analysis of maps, figures, charts and even comprehension of written passages.
TIPS FOR CONSTRUCTION

1. The interpretative exercise must be related to the instruction provided to the
students.
2. The material to be presented should be new but similar to what was presented
during instruction.
129 | P a g e
3. Written passages should be as brief as possible. The exercise should not be a test of
general reading ability.
4. The students have to interpret, apply, analyze and comprehend in order to answer a
given question in the exercise.
III. Performance Tests. It consists of a list of behaviors that make up a type of performance
 Used to determine whether or not an individual behaves in a certain way when
asked to complete a particular task.
 These are not measured thru the usual paper and pencil tests but responses are
shown thru overt manual, vocal and other behavioural activities
 Examples are: visual arts, music, dramatics, speech, TLE, PE, military trainings,
sports and the like
 When the behaviour is present when an individual is observed, the teacher
places a check opposite it on the list.
 Activities measured are intangible and tangible finished products
 Intangible finished products –actual process of performance after which it is no
longer observable ( renditions of vocal and instrumental music, public speaking,
dance, gymnastics, etc)
 Tangible finished product – concrete object or article produced by the performer
(baskets, dresses, cross stitch, etc)
 Some common performance tests are:
o Recognition - it may either be simple recognition, where pupils are
required to identify tools described, or given a task and students are
made to pick up from those given the materials he needs and use them to
perform the task.
o Simulated – used as a substitute for expensive operations like military
operations
o Work sample – not expensive and easy to prepare. It’s practical
Advantages
1. Performance test allow teachers to assess areas of learning that traditional
assessments do not address
2. Good instructional alignment- instructional alignment means that teachers test what
they teach.
3. Performance tests usually involve real-world tasks that students tend to find them
more engaging and challenging
4. Performance test provide high-quality feedback to students throughout the
assessment because they have a formative component
130 | P a g e
5. Because performance test are linked with instruction, the two can be accomplished
simultaneously, thus increasing instructional efficiency.
6. Performance test can empower students by giving them freedom to make choices,
within parameters set by teachers, about the direction that their learning should
take
7. Performance test prompt students to use higher-order thinking skills such as
analysis, synthesis, and evaluation.
Disadvantages
1. Performance test items frequently are unrelated to those tasks and behaviors
required in the classroom setting
2. Performance test results reflect behavior or ability that has been measured during a
single point in time and, as such, are greatly influenced by noncognitive factors
3. Performance test results do not provide the type of information required for making
curricular modifications or instructional change
TIPS FOR CONSTRUCTION

1. Performance of students should be measured under controlled conditions that can
be made equal or the same for all students.
2. Develop the test in such a way that it should include all the activities that are in the
work sample
3. Develop an overall procedure for administration with instructions and directions
incorporated.
4. Develop a rating form for measuring the important aspects or dimensions in the
sample activities.
LESSONS
LESSON 3. ITEM ANALYSIS
In the previous lessons we discussed, how to plan a test and how to write good test
items. In this lesson we will consider how to improve test items using an appropriate
procedure – this is the item analysis.
131 | P a g e
What is Item Analysis?

 Item analysis is a process of examining the student's response to individual item in the
test. It consists of different procedures for assessing the quality of the test items given
to the students.
 It is a procedure used to identify good items by determining the index of difficulty and
the index of discrimination
 It is a way of establishing the objectivity of a test, aside from determining the
structural adequacy or weakness of an item.
Uses of Item Analysis
1. It provides useful information for efficient class discussion of the test result.
2. It provides data which helps students improve their learning – identify areas of
weaknesses of students that need remediation
3. It provides a basis for increased skills in test construction.
4. It helps identify structural or content defects of the items.
5. It helps detect learning difficulties of the class.
6. It provides a basis for constructing test bank
Phases of Item Analysis and Validation

1. Try Out Phase
First the teacher tries out the draft test to a group of students of similar
characteristics as the intended test takers.
2. Item Analysis Phase

From the try-out group, each item will be analyzed according to its ability to
discriminate between those who know and those who do not know and also its
level of difficulty.
3. Item Revision Phase

The item analysis will provide information that will allow the teacher whether to
revise or replace an item.
4. Validation
Lastly, the final draft shall then be subjected to validation if the intent is to make
use of it as a standard test for a particular unit or grading period
Types of Quantitative Item Analysis
132 | P a g e
There are three common types of quantitative item analysis which provide
teachers with three different types of information about individual test items. These
are difficulty index, discrimination index and response options analysis
1. Difficulty Index is the proportion of students/test takers who answered the item
correctly or the percentage of students who answered each item correctly. It is
computed by using the formula:
Where: DF = difficulty index

U = number of students in the upper group who got the item
correctly
L = number of students in the lower group who got the item
correctly
T = total number of paper analyzed
2. Discrimination Index is the extent to which a test item can differentiate between
good performers and poor performers. It tells whether an item can discriminate
between those who know and those who do not know the answer.
Where: DI = discrimination index

U = number of students in the upper group who got the item
correctly
L = number of students in the lower group who got the item
correctly
½ T = ½ total number of paper analyzed
STEPS IN PERFORMING ITEM ANALYSIS

To illustrate how item analysis is done, let us assume a test given to 52 students in a
science class ( or any class) has just been checked and scored. The procedure for the
item analysis is provided for in the succeeding illustrations.
133 | P a g e
1. Arrange the scores from the highest to the lowest.

2. a. Get the top 27% of the papers or scores. Designate these as the upper group.
b. Next, get the lowest 27% of the papers or scores. Designate these as the
lower group.
b. Set aside the middle group. These are not needed in the item analysis.
Let us look at this specific illustration.
Example
Let us assume that there are 52 papers
First, get 27% of 52

52 x .27 = 14
Second, get or count 14 papers from the highest score down to the 14th
paper. This is the upper group.
Third, count 14 papers from the lowest score up to the 14th paper.
This is the lower group.
Fourth, set aside the remaining 12 papers This is the middle group.
They are not included in the item analysis.
Note: If the sample size is less than 50, the 50% grouping is used.
Example: Sample size = 30 papers
50% = 15 papers upper group and 15 papers lower group
3. Tabulate the choices or answers of the students in the upper and lower groups.
3.1 Write the frequency of the answers written in a table similar to Table 1.
 For each paper, tally the answer for every number. Then summarize it in a
table similar to Table 1.
3.2 In each item, highlight the number of students who got the item correctly.
Table 1. Students’ (Upper and Lower Groups) Responses to the Items

Subject/Topic ___________________________________ No. of Students:__28__
No of Items: 100
134 | P a g e
UPPER GROUP LOWER GROUP
Ite
m
A B C D omi Total A B C D omit Total
No.
t
1 5 6 2 1 0 14 4 4 4 2 0 14
2 5 9 0 1 9 4 0 14
3 4 9 1 0 14 5 1 1 6 1 14
4 3 11 0 14 5 1 8 0 14
5 13 1 0 3 7 1 2 1 14
 Double check entries – The total should be equal to 14 for the upper and lower
groups
 Tally under omit if the student did not answer an item.
4. Compute for the indices of difficulty and discrimination for each item.
The index of difficulty tells the percentage of students who got the item right.
The index of discrimination tells the difference between the number of students in
the
upper and lower groups who got the item right.
4.1 Write the frequency of those who answered correctly in the upper 27% and the
lower 27% in a table similar to Table 2.
135 | P a g e
Table 2. Computed Difficulty and Discrimination Indices and their Descriptive

Interpretations.
Item Upper Lower Difficulty Interpretation Discrimination

No. Group Group Index Index
Interpretation Remarks /
Action Taken
1 6 4 .36 Average item .14 Poor item improve
4.2 Interpret the computations made using the following values presented by Ebel
and
Frisbie (1986) ; interpret the test results as follows:
 If an item was answered correctly by 75% or more of the students, it is easy.

 If only 24% or less answered it correctly, the item is difficult.
This is the difficulty index.
 Refer to Table 3 for the interpretation of difficulty index and discrimination index
result.
 Example: For Item # 1 DF = .36 The interpretation is average item.
This means that the item is average in terms of difficulty .
Table 3. Interpretation for Difficulty Index and Discrimination Index
136 | P a g e
Difficulty Index Interpretation Discrimination Interpretation

Index
.76 or higher Easy item .40 and up Very good item
.25 to .75 Average item .30 to .39 Reasonably good
.24 or lower Difficult item .20 to .29 Marginal item
.19 and below Poor item
Source: Dr. Natividad E. Lorenzo. Lecture handout on Item Analysis
 With regard to discrimination index, if 40% or more of the pupils answered the
item correctly, it is very good; if only 19% or less got it right, it is a poor item.
 Example: For Item # 1 DI = .14 The interpretation is poor item.

This means that the item cannot discriminate or
differentiate between those who know and those who do not
know
the answer.
4.3 Improve the test using the following guide:
For the last column of Table 2 – Remarks / Action Taken, refer to Table 4.
Table 4. Remarks / Action Taken

Difficulty Discrimination Remarks/
Action Taken
Index Index
Easy Very good improve
Average Very good, reasonably good Retain
Difficult Very good improve
Easy Reasonably good, marginal item Improve
Average marginal item Improve
Difficult Reasonably good, marginal item Improve
Easy Poor Discard
Average Poor Improve
Difficult Poor Discard

Source: Dr. Natividad E. Lorenzo. Lecture handout on Item Analysis
137 | P a g e
What is the proper action for each remark?

1. Items with a remark of “retain” are good items. They could be included in
the item bank or file of high quality items for future use.
2. Items with a remark of “improve” should be reworded and tried out again to
be included in the item bank.
3. Items with a remark of “discard” should be rejected and not to be included

in the final copy of test.
Let us look at this particular example:
For Item number 1

DF = .36 average item
DI = .14 poor item
 Combination of average item and poor item is improve. (Refer to Table
4)
138 | P a g e
Accomplish Table 2. Get the data for upper group and lower group from Table 1.
Table 2. Computed Difficulty and Discrimination Indices and their Descriptive

Interpretations.
Item Upper Lower Difficulty Interpretation Discrimination

No. Group Group Index Index
Interpretation Remarks /
Action Taken
1 6 4 .36 Average item .14 Poor item improve
2 9 4
3 9 1
4 11 8
5 13 7
Questions
1. Based on numerical result which item do you think is the easiest? ________
2. Which do you think is the most difficult? ________
3. Which item can least identify good performers from slow performers? _______
139 | P a g e
LESSONS
LESSON 4. CHARACTERISTICS OF ASSESSMENT METHODS
Quality of assessment instrument is important since evaluation is dependent

on the value obtained using the instruments. After constructing good test items
based on the principles of test construction, we also need to know the
characteristics of assessment methods. So, the important question to ask is : What
are the characteristics of a good test? In this lesson, we will discuss in details the
answer to this question.
CHARACTERISTICS OF A GOOD MEASURING INSTRUMENT
A good measuring instrument should have the following four characteristics:

validity, reliability, objectivity and usability.
A. VALIDITY
The validity of a test refers to the degree of accuracy by which it measures
what it aims to measure or as referring to the appropriateness, correctness,
meaningfulness and usefulness of the specific decisions a teacher makes based on
the test results. The degree of validity is expressed numerically as a coefficient of
correlation with another test of the same kind and of known validity.
 It is the most important characteristics of test.
 A valid test is always reliable
Factors Influencing the Validity of Test in General

1. Appropriateness of test. It should measure the abilities, skills and information it
is supposed to measure.
 The test items should be the topics covered in the class discussion.
2. Directions. It should be very clear. Indicate exactly how the learners should
answer and record their answers.
140 | P a g e
3. Reading vocabulary and sentence structures. Terminologies and words used

should be based on the intellectual level of maturity and background experience
of the learners.
4. Difficulty of Items. It should have items that are not too difficult and not too easy
to be able to discriminate the bright from slow pupils.
 Do not give too easy or too difficult test items.
5. Construction of Test Items. It should not provide clues so it will not be a test on
clues nor ambiguous so it will not be a test on interpretation .
6. Length of Test. It should just be of sufficient length based on the grade level of
the learner, so it can measure what it is supposed to measure and not that it is
too short that it cannot adequately measure the performance we want to
measure.
Example: Number of test items for a Grade 1 pupil is 10 – 20 items while for
Grade 6 is 50 items; for a college student can be 100 items .
7. Arrangement of Items. It should have items that are arranged in ascending level
of difficulty such that it starts with the easy items so that students will be
encouraged to pursue or continue the test.
 First few items should be easy. Essay or problem solving part of the
test should be at the last part of the test.
8. Patterns of Answers. It should not allow the creation of patterns in answering
the test.
 Once the student discovers the pattern, he/she will not read the items
anymore but will just follow the pattern.
TYPES OF VALIDITY
1. Content validity. This refers to the relevance of the test items of a test to the
subject matter or situation from which they are taken. It refers to the content
and format of the instrument.
 Sometimes called face validity or logical validity.
 Uses item analysis to determine its validity
 This could be done by asking at least 3 experts in the field to judge the
content.
Example 1 ( on content) For instance, you are to make a test in

Algebra. In order for the test to have high content validity THEN, all
items should be algebra, BUT if the test items are more on arithmetic,
then it has a low content validity.
Example 2 (on format) For a matching type of test – all test items
should be on one page. It cannot be that the some items are on the
other page.
Also, with a multiple choice, it cannot be that the stem is on page 1
and the choices are on page 2.
141 | P a g e
 This can be time – consuming on the part of the learner

and may even confuse him/her.
2. Concurrent Validity. This refers to the correspondence of the scores of a group

in a test with the scores of the same group in a similar test of already known
validity used as a criterion.
 Also called criterion-related validity in which the test is judged against a
specific criterion
Example: (How to evaluate the criterion-related validity of a test?)
Take a newly constructed test and a similar test of known validity and
give both tests to the same students. If the correlation is high, the test
has a high concurrent validity. This means that if the students get high
score in the newly constructed test, they should also get high in the test
of known validity.
3. Predictive Validity. This refers to the degree of accuracy of how a test

predicts the level of performance in a certain activity which it intends to
foretell.
Example. Intelligence tests foretell the level of performance of students

in school work. If a student has a high school in an intelligence tests and
gets high grades in school, the intelligence test has a high predictive
validity
4. Construct validity . This refers to the agreement of test results with certain
characteristics which the test aims to portray.
Example 1: If students with high intellectual ability score higher than

students with lower intellectual ability, the test has a high construct
validity
Example 2: Grade 12 STEM students should get high in Gen Chem 2 and
not the Grade 12 HUMMS or other academic track. If they get low test
and students in other tracks get high, THEN, the test has a very
poor construct validity.
Methods of Validating a Test:

a. judgement by experts in the field ( Content Validity)
b. correlating the scores against a valid criterion (Concurrent or Criterion-
Related Validity)
c. computation of the percentage of students who got the answer right both in
the upper and lower groups
d. by factor analysis
142 | P a g e
B. RELIABILITY
The reliability of the test is the degree of consistency and accuracy of
measurement that an instrument gives. It is called dependability or stability. The
degree of reliability is numerically expressed as a coefficient of correlation.
A reliability index of 0.50 or higher for teacher-made test is already acceptable.
This could be done using Kuder-Richardson Formula 21.
 Reliability is a factor of validity - a test could not be valid without it
being reliable but not the reverse. It means two things: first, a test could
be reliable without it being valid and second, a valid test is reliable.
Example: (How to qualitatively determine the reliability of the test?)

If a test is given to the same students after a certain lapse of time
(e.g 2 weeks) after the first take and the results of the examination more
or less agree or similar, then the test is reliable.
Factors Affecting Reliability of a Test

1. Adequacy. It refers to the appropriate length of the test and the proper sampling
of the test content
 A test is adequate if it is long enough to contain a sufficient number of
representative items of the behaviour being measured
 Items should be made proportional from all topics in the subject matter
which is the subject of the test.
2. Testing condition. This refers to the conditions of the examination room like
lighting, ventilation, presence of too much noise and distractions and including
chairs and spaces.
3. Test administration procedures. The manner of administering a test also affects
its reliability. Explicit direction should accompany the test and these should be
followed strictly.
 Directions should be clear and easy to understand
 Test materials should be sufficient and available
4. Moderate in item difficulty. Reliability is increased when items are of moderate
difficulty because the scores spread over a greater range.
5. Objective scoring. Reliability is higher when test are scored objectively
6. Heterogeneity of the student groups. Reliability is higher when scores are
spread over a range of abilities since measurement error is smaller.
7. Time. A test is more reliable if speed is one of the factors.
 A test for one hour should be administered for 1 hour and not 2
hours.
143 | P a g e
Methods of Determining Test Reliability:

1. Test- retest method - we administer the test twice to the same group of
students with ant time interval between tests.
 Correlating the test result through Pearson r
2. Alternate-Form Method - give two forms of a test similar in content, type of

items, difficulty and others in close succession to the same group of students.
 Correlating results through Pearson r
3. Split-half Method – test is conducted once and the results is broken down into
halves like the odd-even division
 Test scores are divided into odd-numbered items and even-numbered
items
 Uses the Pearson’s Coefficient correlation formula
n ∑ xy−∑ x ∑ y
r oe =
√¿¿¿
Which in turn uses the Spearman-Brown formula for the reliability

index
2 r oe
rt =
1+ r oe
4. Kuder-Richardson Formula 21.

o The test is conducted only once with the assumption that items
are of equal difficulty
o Uses the formula:
KR 21=[ ][
k
k −1
1−
x ( k−x )
k s2 ]
Where: k – number of items
x - mean
S – standard deviation
C. OBJECTIVITY
This refers degree to which personal judgement is eliminated in the
scoring of the test.
 An objective test is one which yields the same score no matter who
checks it even when the test is checked at different times.
 Objectivity requires that personal opinion of the teachers does not
affect the score of the individual student.
 The more objective the test, the greater is its reliability.
144 | P a g e
 To make a test objective, make the responses to the items single

symbols, words or phrases.
D. USABILITY
This refers to the characteristics of administrability, scorability, economy,
comparability and test utility. A test is usable if it is easy to administer, easy to
score, economical and when results are given meaning.
1. Administrability. This refers to the ability of the test to be administered

easily since they are more in demand
2. Scorability. This refers to the quality of test wherein this could be scored
in the simplest way and at a quickest possible time .
 Tests that are easier to score are more in demand
 To facilitate scoring, directions should be clear and separate
answer sheet should be provided.
3. Economy. This refers to the cheapest way of giving the test.
 Cheaper tests are more in demand
4. Comparability/interpretability. This refers to the availability of norms
with which scores of tests are compared to determine the meanings of
their scores.
 Test in which results can be readily, easily and properly interpreted
is
more in demand
5. Utility. A test is useful if it adequately serves the very purpose for which
it is intended
145 | P a g e
Lesson References
Balagtas, M. U. and Dacana, A. G, Licensure Examination for Teachers Refresher Course,

Phillipine Normal University
Burton, S. J. et. al. 1991. How to Prepare Better Multiple Choice Test Items: Guidelines for
University Faculty. Brigham Young University Testing Services and The
Department of Instructional Science.
Buendicho, F.C. (2010). Assessment of Student Learning 1. Rex Bookstore Inc. Sta. Mesa
Heights, Quezon City
Gabuyo, Y.A (2015). Assessment of Learning 1. Rex Bookstore Inc. Sta. Mesa Heights,
Quezon City
Gutierrez, D. S (2007). Assessment of Learning Outcomes (Cognitive Domain) Book 1.

Kerusso Publishing House, Malabon, Metro Manila.
Lorenzo, Natividad E. 2011. Lecture on Traditional Assessment, Mariano Marcos State

University College of Teacher Education.
Okonkwo, C.A. 2006. Measurement and Evaluation.National Open University of Nigeria.

www. NOU.EDU.NG.
Puzon and Legaspi. Lecture PowerPoint on Analogy Type of Test

https://www.slideshare.net/apuz2/analogy-type-of-testpptx-new
Date of Access 11/3/2020
Rivera, Arnel O. Test Construction: The Art of Effective Evaluation. UPHSD Molino Campus
(PPt, slideshare)
Santos, R D (2007). Assessment of Learning 1. Lorimar Publishing Inc. Cubao, Quezon Cit
146 | P a g e

Module 3 - Designing and Developing Assessment Tools

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Module 3 - Designing and Developing Assessment Tools

Uploaded by

Copyright:

Available Formats

MARIANO MARCOS STATE UNIVERSITY

College of Teacher Education

Module DESIGNING AND DEVELOPING

LESSON 1. PLANNING A TEST AND CONSTRUCTION OF TABLE OF SPECIFICATIONS (TOS)

The Important Steps in Planning and Developing Assessment Tools

1. Identifying Test Objectives

3. Preparing a Table of Specifications (TOS)

What is the Table of Specification (TOS) for?

b. it guarantees that instructional objectives and content are adequately covered

c. it aids in the equitable distribution of skill measurement.

Format of Table of Specifications

Below is an example of a One-Way Table of Specification.

One-Way Table of Specifications in Reading Comprehension Test for Grade 6

Objective Number of Number Percentage Item Placement

1. Identify topic sentence in 5 4 10 1-4

2. Restate main idea of 5 4 10 5-8

3. Identify supporting details 3 2 5 9-10

4. Classify ideas under 5 4 10 11-14

5. Give title to a paragraph. 5 4 19 15-18

6. Give title to a poem. 4 3 8 19-21

7. Sequence events as they

8. Make a 3-point sentence

9. Make a 3-point topic 10 8 20 33-40

Here is a sample of a two-way TOS

A Two-Way Table of Specifications in a Reading Test for Grade 6

3. Give the main 5 5

% OF ITEMS 16 10 20 16 22 *15 100

Let us look at another example of a two-way TOS.

The following are included in the second example:

NOW, let us look at the parts of TOS.

Parts of Table of Specifications

# of items = _# of discussion days_ x total # of items

4. Percentage of Items The percentage of items tells what proportion of the

% of items = _# of items per objective_ x 100

6. Item Placement The item placement specifies the location of a particular

# of items = _# of discussion days_ x total # of items

% of items = _# of items per objective_ x 100

Self -Assessment Exercise 1

Let us check your understanding.

1. Identify the place value 3

3. Skip count numbers by 4

LESSON 2. PRINCIPLES OF TEST CONSTRUCTION

Figure 1. Types of Objective Test

GENERAL SUGGESTIONS IN TEST ITEM WRITING

LESSON 2.1 GUIDELINES IN THE CONSTRUCTION OF SUPPLY TYPE TEST ITEMS

A. SHORT ANSWER / IDENTIFICATION/ SIMPLE RECALL

This is a test in which an unknown specimen is to be identified by name or other

GUIDELINES IN THE CONSTRUCTION OF SHORT ANSWER / IDENTIFICATION/ SIMPLE

B. COMPLETION TYPE TEST ITEM / FILL – IN THE BLANK

GUIDELINES IN THE CONSTRUCTION OF COMPLETION TYPE OR FILL – IN THE BLANK

2. The blank should be placed at the end of the sentence.

5. Do not give grammatical clues to the correct answer.

Self -Assessment Exercise 2

Let us check your understanding.

1. a) Every atom has a central ____ known as the nucleus.

3. a) It is a type of test that requires a learner to provide an answer by completing a

4. a) The products of photosynthesis are sugar and _____________.

LESSON 2.2 GUIDELINES IN THE CONSTRUCTION OF SELECTION TYPE TEST ITEMS

A. TRUE – FALSE TEST OR ALTERNATIVE RESPONSE TEST.

GUIDELINES IN THE CONSTRUCTION OF TRUE-FALSE TEST:

 It is very clear that the answer is FALSE. It is because 100 years

Better: The Philippines gained its independence in 1898.