Professional Documents
Culture Documents
Module 3 - Designing and Developing Assessment Tools
Module 3 - Designing and Developing Assessment Tools
MODULE 3
Designing and Developing Assessment Tools
90 | P a g e
MARIANO MARCOS STATE UNIVERSITY
College of Teacher Education
INTRODUCTION
We learnt in the last module that the four components of instruction - learning
objectives, teaching strategies, learning activities, and assessment – should be aligned
constructively. In this subject, we'll study how to design and create assessment tools such
that test items are appropriately and accurately aligned with learning objectives. The
importance of classroom examinations and evaluations in measuring students' performance
cannot be overstated. They are useful indicators of learning progress. The primary purpose
of classroom testing and assessment is to acquire accurate, valid, and relevant information
about students' progress (Mehrens and Lehmann, 1991). Only in this way is it possible.
Learning Outcomes
Learning Outcomes
At the end of the unit, the pre-service teachers must have:
1. constructed a Table of Specifications;
2. constructed paper-and-pencil tests in accordance with the guidelines in test construction;
and
3. performed item analysis;
4. determined the quality of a test item by its difficulty index and discrimination index , and
5. explained the characteristics of assessment methods.
PRE-TEST
91 | P a g e
MARIANO MARCOS STATE UNIVERSITY
College of Teacher Education
Before you browse through the module, you are required to take the pre-test. You
are encouraged to take the test with utmost honesty so that your teacher will be able to
provide the appropriate help that you need.
Answer the pre-test found at the end of this module. Scan your answer sheet and
send the file in the Assignment section.
LESSONS
Planning a test is an important aspect of the assessment process. So, if a teacher's goal
is to improve learning and instruction, he or she must take the necessary steps in planning
and implementing assessment instruments.
Let's take a closer look at the many phases involved in creating effective assessment
tools. It's critical to follow the stages in order for the test items created to accurately
measure the various learning outcomes. In this scenario, the teacher is able to measure
what needs to be measured. Consider the following points as you progress through each
step.
92 | P a g e
MARIANO MARCOS STATE UNIVERSITY
College of Teacher Education
a. The TOS assures that the exam contains a balance of items that test lower-
level thinking abilities and those that test higher-level thinking skills (or,
alternatively, a balance of easy and hard items) in the test;
Without a TOS, a test constructor will take the shortest route. It's possible
that you're focusing on the lower levels of thinking skills unintentionally. A
TOS would make him more aware of the need to broaden the scope of this
coverage to encompass higher-order cognitive abilities.
93 | P a g e
MARIANO MARCOS STATE UNIVERSITY
College of Teacher Education
TOTAL 50 40 100
Source: Assessment of Learning in the Cognitive Domain by Danilo S. Gutierrez
94 | P a g e
MARIANO MARCOS STATE UNIVERSITY
College of Teacher Education
Classification Total
# of
Items
Comprehension
Application
Knowledge
Evaluation
Synthesis
Objective Analysis
1. Identify 3 3
synonyms and
antonyms of
common words.
2. Note explicit 5 5
details in a
paragraph.
4. Classify ideas. 5 5
5. Analyze data 5 5
from a table.
6. Arrange 5 5
events in a story.
7. Infer details in 3 3
passages that
are not explicitly
stated.
8. Make a 3- 5 5
95 | P a g e
MARIANO MARCOS STATE UNIVERSITY
College of Teacher Education
point outline.
9. Draw 6 6
conclusions from
given data.
10. Distinguish 8 8
between fact
and opinion.
TOTAL NO. OF 8 5 10 8 11 8 50
ITEMS
Another format of TOS is the two-way grid. The purpose of this table is to assure the
teacher that a particular test will accurately determine a representative sample of
learning outcomes and content to be covered.
The two-way table indicates the objectives to be measured and their classification
according to Bloom's Taxonomy of Educational Objectives (Cognitive Domain) which
is not included in the one-way TOS
96 | P a g e
MARIANO MARCOS STATE UNIVERSITY
College of Teacher Education
Source: Lecture of Arnel O. Rivera – "Table of Specifications (TOS) with an overview on Test Construction"
How does it differ from the first example of two-way TOS?
In this example, the following are replaced: first, the test objectives are replaced
with topics and second, the Bloom's Taxonomy of Educational Objectives in the
Cognitive Domain are replaced with Lorin Anderson's revised Educational
Objectives in the Cognitive Domain.
Parts Description
1. Test Objective or Topics The test objectives similar to the learning objectives
should be written in behavioral terms. It should be SMART
– specific, measurable, attainable, realistic and time-
bound. Also, it should be aligned to the learning
objectives.
97 | P a g e
MARIANO MARCOS STATE UNIVERSITY
College of Teacher Education
2. No. of days or Number of This specifies the number of days or number of hours a
discussion days or Number of teacher taught a particular topic or attained a particular
hours per topic objective. This can be determined by looking at the lesson
plan.
3. Number of items (per This refers to the number of test items to be constructed
objective or topic) per objective or topic. It is proportional to the number of
discussion days. It is computed by using the formula given
below:
5. Levels of Cognitive Domain The test items for each objective or topic are classified
based on the levels of cognitive domain. If the test
objective is knowledge level then test item to be
constructed should likewise be classified under the
knowledge level or remembering
Let us try to use the formula for number of items and percentage items using a part of the
example of one-way TOS
98 | P a g e
MARIANO MARCOS STATE UNIVERSITY
College of Teacher Education
For # 1 objective
So, there should be four test items for the first test objective.
= _4_ x 100
40
= 10%
This indicates that 10% of the total number of test items is from
As a wrap-up, preparingobjective
a table ofnumber 1.
specifications, involves the following steps:
1. Determine the desired learning objectives to be measured.
2. Assign the corresponding number of items in terms of emphasis given during
instruction.
3. Create a table with relative weights by proportionally distributing the contents
throughout the cells of the table.
99 | P a g e
MARIANO MARCOS STATE UNIVERSITY
College of Teacher Education
Objective # of # of % of Item
discussion items items Placement
days
100 | P a g e
MARIANO MARCOS STATE UNIVERSITY
College of Teacher Education
LESSONS
The fourth step in developing assessment tools is writing the draft test items.
The actual construction of the test items follows the TOS. Aside from observing that test
items should match the learning objectives, a teacher should be knowledgeable on the
principles of test construction in writing test items. According to a research, 13% of students
who got low grades in exams are caused by faulty test questions (WorldWatch The
Philadelphia Trumpet August 2005). Thus, it is very necessary for a teacher to follow the
principles of test construction. This assures the quality, validity and reliability of test items.
101 | P a g e
MARIANO MARCOS STATE UNIVERSITY
College of Teacher Education
In this lesson, we will learn the principles of test construction of objective and
subjective test or non-objective test.
Test is a measuring instrument that is used to quantify what our students have
learned from us. It reflects what we have taught them and how we have taught them. Figure
1 shows the types of objective test.
There are two general types of objective tests – supply type test items and selection
type test items. For the supply test items, these are short answer type or identification and
completion type or fill -in – the blank. For selection type test items, these are multiple
choice, matching type, true-false test , arrangement and analogy.
Before we have the specific principles in constructing the supply type and selection
type test items, let us look into the general suggestions in test item writing.
102 | P a g e
MARIANO MARCOS STATE UNIVERSITY
College of Teacher Education
LESSONS
Advantages
1. The teacher could easily identify the test's or students' strengths and weaknesses.
2. Unbiased in the sense that teachers' preconceived notions about students' work
cannot influence marking.
3. Relatively useful as a pre-test
4. The answer key is unambiguous, so two observers arrive at the same conclusion.
Disadvantages
1. Most types are limited to factual recall
2. Its construction is time consuming.
3. It promotes guessing
4. It is not suitable for language skill testing.
Example:
1. How many centimeters make up 2 meters? 1. _________
2. Convert 7, 000 grams to kilograms. 2. __________
3. It is the fundamental unit of element. 3. __________
103 | P a g e
MARIANO MARCOS STATE UNIVERSITY
College of Teacher Education
This format is applicable if the test question is also the answer sheet.
Advantages
1. Construction is relatively easy
2. Gives better diagnostic information- they can easily see where they go wrong
3. It minimizes guessing.
4. The correct answer may not be as obvious as in a multiple choice question.
5. It may be appropriate for students who are unable to generate answers on their own.
Disadvantages
1. The assessed understanding is possibly to be trivial.
2. It is difficult to avoid ambiguity when developing questions.
3. There may be numerous possible correct answer and is therefore harder to mark
than multiple choice.
4. It might confuse students if context is not obvious.
5. Scoring takes more time than other objectives.
When to Use
Use completion questions only when the recall of ideas and words is important.
104 | P a g e
MARIANO MARCOS STATE UNIVERSITY
College of Teacher Education
Better: The wrong choices or options in a multiple – choice item are called
________.
3. Avoid several blanks, one blank is most recommended.
4. Attend to the length and arrangement of blanks. Make the blanks of equal length.
Example
Poor: The ____ subatomic particle is the _______.
Better: The positively charged subatomic particle is the ________.
105 | P a g e
MARIANO MARCOS STATE UNIVERSITY
College of Teacher Education
2. a) The wrong answer for the choices of a multiple choice is called a/an______.
b) The wrong answer for the choices of a multiple choice is called a______.
LESSONS
106 | P a g e
MARIANO MARCOS STATE UNIVERSITY
College of Teacher Education
Advantages
1. Enables teachers to sample a variety of cognitive behaviors in a limited amount of time.
2. Scoring is simple and easy
Disadvantages
1. Problem solving and complex learning could not be measured by the alternate response test
2. Possibility of guessing is very high
When to Use
It can be used to evaluate results concerned with the recall of concepts or the ability to
discriminate. They can be used to provide an encouraging lead-in to assessment.
True – False tests are better used for self-assessment and diagnostic assessment than in
summative exams because learners can guess the correct answer.
Example:
Poor: The Philippines gained independence in 1898 and thus marked its centennial
year in 2000.
2. Do not use the words “always, never, often” and similar adverbs that tend to be
either, always true or always false.
Example
Poor: True – False test items are always prone to guessing.
Statements containing the word always are almost always false. Even if
a student does not know anything about the test, he can easily guess
his way through it and get high scores.
107 | P a g e
MARIANO MARCOS STATE UNIVERSITY
College of Teacher Education
3. Avoid long sentences because they tend to be "true." Keep your sentences short and
simple.
4. Example
Poor : Tests must be valid, reliable, and useful; however, ensuring that these test
characteristics are present would take a significant amount of time and
effort.
Take note that the statement is correct. However, we are unsure which
part of the sentence the student believes to be true. It's just a
coincidence that all of the preceding sentences are true in this case.
Better: Test items should have the following characteristics - valid, reliable and
useful.
5. Avoid tricky statements with minor misleading word or spelling errors, misplaced
phrases, and so on. A wise student who is unfamiliar with the subject matter may
detect this strategy and thus correctly answer the question.
Example
Poor: The Mr. Albert P. Panadero is the Principle of our school .
The Principal’s name may actually be correct but since the word is
misspelled and the entire sentence takes a different meaning, the answer
would be false! This is an example of a tricky but entirely useless item.
108 | P a g e
MARIANO MARCOS STATE UNIVERSITY
College of Teacher Education
2. A) The true-false item is more subject to guessing but it should be used in place of
multiple choice item, if well - constructed, when there is a dearth of distracters
that are plausible.
B) The true-false item should be used in place of the multiple choice item when
only two alternatives are possible.
109 | P a g e
MARIANO MARCOS STATE UNIVERSITY
College of Teacher Education
5. Teachers tend to use multiple choice items more often because many items can
be prepared.
_________________________________________________________________
_________________________________________________________________
B. MULTIPLE-CHOICE TEST
This test is made up of items that consist of three or more plausible options in each
item. It consists of two parts:
1. the stem; and
2. the alternatives/options.
In the set of options there is a “correct” option while the others are considered
“distracters”.
110 | P a g e
MARIANO MARCOS STATE UNIVERSITY
College of Teacher Education
Advantages
1. Used in measuring different kinds of content, almost any type of cognitive behavior,
including higher cognitive levels like reasoning, understanding and giving judgments
2. Easy to score
3. It limits scoring bias
4. Incorrect response pattern can be analyzed
5. Highly reliable test scores.
6. Scoring is efficient and accurate.
Disadvantages
1. Tends to focus on low level learning objectives
2. Development of a good test item is time consuming
3. It encourages guessing.
4. It is the most difficult and the most time-consuming type of test items to construct.
5. It places a high degree of dependence on the student's reading ability and
instructor's writing ability.
When to Use
Multiple choice exams are appropriate to use when the attainment of the
educational objective can be measured by having the students select his or her
response from a list of several alternative response.
111 | P a g e
MARIANO MARCOS STATE UNIVERSITY
College of Teacher Education
The stem of the item is vague, ambiguous and broad. The stem is the
foundation of the item. After reading the stem, the student should know
exactly what the problem is and what he/she is expected to do to answer
it. The stem should adequately cover enough details to make it clear and
precise.
Better: Why is a table of specification primarily important before writing any test item?
A. It indicates how learning can be improved.
B. It adequately samples the behavior to be tested.
C. It specifies the weaknesses of student performance.
D. It reduces the amount of time to construct the items.
The improved stem presents one clear problem that the students will surely
understand. It is specific enough that the question can be answered even
without the options.
2. All relevant information should be included in the stem. Include all the information
necessary for the examinee to understand the intent of the item.
Example
Poor: When a piece of stone is dropped into the graduated cylinder, the water level
rose. What is the volume of the stone?
A. 0.6 mL B. 1.6 mL C.32 mL D. 132 mL
One cannot compute for the volume of the stone because numerical
values are not provided in the stem.
Better: When a piece of stone is dropped into the graduated cylinder, the water
level rose from 50 mL to 82 mL . What is the volume of the stone?
A. 0.6 mL B. 1.6 mL C.32 mL D. 132 mL
3. All irrelevant materials should be omitted from the stem. Avoid the inclusion of
nonfunctional words. A word or phrase is nonfunctional when it does not contribute
to the basis for choosing a response.
Example
Poor: Pitong was walking in the park when he passed by a well. He wanted to
know how deep the well was so he picked up a stone and dropped the stone
into the well. What kind of force is acting on the falling stone?
112 | P a g e
MARIANO MARCOS STATE UNIVERSITY
College of Teacher Education
A. electrical C. gravity
B. friction D. magnetism
The stem is wordy and contains information that are not important like
Pitong was walking in the park when he passed by a well is not necessary.
Better: A stone was dropped into a deep well. What kind of force is acting on the
stone?
A. electrical B. friction C. gravity D. magnetism
4. The stem should be stated in positive form. If the negative form is used, emphasize
the fact by underlining, using italics or capitalizing it. Avoid using double negatives.
Example
Poor: Each of the following substances EXCEPT ONE is a mineral. Which one is not?
Avoid using double negative – except one and which one is not.
5. Place all information that can be placed in the stem to avoid repetition in the option.
Example
Poor: Substances expand when heated because
A. molecules move very fast. C. molecules move in all directions.
B. molecules move very slowly. D. molecules move farther from each other.
The phrase molecules move in the option can be placed in the stem to
avoid repetition.
Example
Poor: A word used to describe a noun is called an
A. a verb. B. a pronoun. C. an adjective. D. a conjunction.
7. If the stem is an incomplete statement, (as much as possible) do not place a blank
line and place a period for each of the alternative.
Example
Poor: Substances expand when heated because molecules move __________.
A. very fast. C. in all directions.
B. very slowly. D. farther from each other.
Example
Poor: Which one of the following animals is most clearly in danger of extinction?
A. mackerel B. monkey-eating eagle C. sampaguita D. tamaraw
2. The options should be related but must not have similar meaning or synonymous.
Example
Poor: Sliding in the bathroom can be prevented by wearing
A. sandals with even soles. C. sandals with soapy water.
114 | P a g e
MARIANO MARCOS STATE UNIVERSITY
College of Teacher Education
Options A and B are similar in meaning. If the soles are even, the
description implies that the soles are smooth. If two options are similar in
meaning, the student can eliminate them as a possible answer.
Better: Sliding in the bathroom can be prevented by wearing sandals with
A. grease. C. soapy water.
B. rough soles. D. smooth soles.
3. The key (correct option) should be of the same length as the distractors to avoid giving
a clue.
Example
Poor: One problem met by scientists about cloning animals is that cloned animals.
A. get old fast. C. do not reproduce.
B. remain young. D. do not live long as uncloned animas do.
Most of the times, the correct answer tends to be longer than the
distracters. The smart students can often detect this.
Numerous studies have indicated that items are easier when the answer is
noticeably longer than the distracters as compared when all of the
alternatives are similar in length (Haladyna & Downing, 1986 as cited by
Burton et. al. 1991
4. Make all the options grammatically consistent and parallel in form with the stem of the
Item
Example
Poor: How is the movement of bones made possible?
A. By pushing the skeletal muscle C. Muscles and bones are combined
B. By pulling the skeletal muscles D. Muscle pushes the other muscles
Options C and D are not parallel in form with the correct answer. They
must be revised. 115 | P a g e
MARIANO MARCOS STATE UNIVERSITY
College of Teacher Education
Better: A. By pushing the skeletal muscle C. By pulling and pushing the muscles
B. By pulling the skeletal muscles D. By combining the muscles and bones
Example
Poor: Substances expand when heated because molecules move
A. in all directions. C. very slowly.
B. very fast. D. farther from each other.
8. Put the term to be defined in the stem and suggest various definitions in the alternatives
or options.
Example
Poor: The tendency of particles to move from greater concentration to lesser
concentration is called
A. diffusion. B. evaporation. C. osmosis. D. transpiration.
116 | P a g e
MARIANO MARCOS STATE UNIVERSITY
College of Teacher Education
9. Avoid using ALL OF THE ABOVE and NONE OF THE ABOVE as one of the alternatives
Example
Poor: Which is an example of a mixture?
A. air C. seawater
B. juice D. all of the above
OR if all choices are correct, then you can follow the format below:
The matching item is selection type of item consisting of a series of stimuli (or stems)
called premises, and a series of options called responses. The premises and responses are
arranged in columns. Usually the premises are placed in the left column (Column A) while
the responses are set in the right column (Column B). Directions provide the basis for
matching.
117 | P a g e
MARIANO MARCOS STATE UNIVERSITY
College of Teacher Education
Accomplishments Persons
Definitions Terms/Phrases
Examples Rules/Principles
Advantages
1. Valuable in content areas that have a lot of facts
2. Easy to score
3. Relatively easy to construct
4. Measures primarily associations and relationships as well as sequence of events
Disadvantages
1. Not very effective in measuring higher order skills
2. Time consuming for students
When to Use
They are used when you need to measure the learner’s ability to identify the
relationships or association between similar items.
118 | P a g e
MARIANO MARCOS STATE UNIVERSITY
College of Teacher Education
3. There should be more responses than premises. For example, if there are five in the
premises, there should be six in the responses.
The uneven match between premises and responses, reduces guessing.
Match the function of the part of computer in Column A with its name in Column
B. Write the letter of your choice before the number.
Match the descriptions of the parts of the digestive system in Column A with the
digestive organs being described in Column B. Write the letter of the correct answer
on the blank before each number in Column A.
Column A Column B
_____1.coil-like structure where food is a. esophagus
digested and absorbed
_____2. tube-like structure where food b. large intestine
is swallowed
_____3.muscular tube which collects c. liver
undigested food
_____4.pear-shaped organ where food is d. mouth
churned
_____5.an opening where food is digested e. small intestine
f. stomach
Match the capitals in Column B with the provinces of the Philippines in Column A by writing
the letter of the capital in the blank provided before the corresponding province.
119 | P a g e
MARIANO MARCOS STATE UNIVERSITY
College of Teacher Education
D. ARRANGEMENT TYPE
This type consists of a multiple option item where it requires a chronological, logical,
rank, etc. order. This test consists of ordering and assembling items on some basis
This is used to test knowledge of sequence and order. This measures memory of
relationships and concepts of organization while assembly measures mechanical ability
and the ability to figure out spatial relationships. This type of test can be a multiple
choice item.
Example 1
The group of sentences numbered 1 to 5 below consists of one paragraph. Read the
sentences in each number and arrange the best order to have a complete and well-
organized paragraph. Choose from the options the best order.
120 | P a g e
MARIANO MARCOS STATE UNIVERSITY
College of Teacher Education
Example 2
Arrange the planets in the solar system according to their nearness to the sun by
using numbers 1,2,3….
___ Earth
___ Venus
___ Uranus
___ Mars
___ Jupiter
___ Mercury
___ Neptune
___ Saturn
E. ANALOGY
This type is made up of items consisting of a pair of words which are related to each
other. It is a comparison between two things that are usually thought to be different from
each other but have some similarities.
It is designed to measure the ability of students to observe the pair relationship
between paired words or concepts. This type of test can be a multiple choice item.
Advantages
1. They are valuable tools in conceptual change learning
2. They provide visualization and understanding of the abstract by pointing to
similarities in the real world.
3. They may incite pupils' interest and hence have a motivational effect.
4. They force the teacher to take into consideration pupils' prior knowledge and may
reveal misconceptions in previously taught topics.
2. Cause and Effect Relationship – The similarity in this type derives from the cause on
one side and its indisputable effect on the other side.
121 | P a g e
MARIANO MARCOS STATE UNIVERSITY
College of Teacher Education
3. Part – Whole Relationship - The analogy of this type is based on whether the item is
a member of the same group or category.
5. Action to Object Relationship - The analogy is based on two sets of performers and
their corresponding actions
6. Object to Action Relationship – The analogy is based on the action that would be
done to a particular object.
122 | P a g e
MARIANO MARCOS STATE UNIVERSITY
College of Teacher Education
10. Degree Relationship - It is a relationship in which one is more intense than the
other.
123 | P a g e
MARIANO MARCOS STATE UNIVERSITY
College of Teacher Education
I. Evaluate the following multiple choice test items. Choose from the options
given below. Write the letter only.
1. Luis is twelve years old. How many orbits around the sun has the earth made
since he was born?
A. 12 B. 30 C. 52 D. 365
4. A reading teacher wants to find out if the pupils are ready to move on to the next
lesson. What kind of test should she give?
A. diagnostic B. formative C. placement D.
summative
5. A gumamela is a
A. complete flower. C. pistillate flower.
B. incomplete flower. D. staminate flower.
II. Evaluate the matching type test item given below. Identify the guidelines that
are violated.
124 | P a g e
MARIANO MARCOS STATE UNIVERSITY
College of Teacher Education
Column A Column B
LESSONS
A. ESSAY
Essays, classified as subjective tests or non-objective tests, allow for the assessment
of higher order thinking skills. Such tests require students to organize their thoughts on a
125 | P a g e
MARIANO MARCOS STATE UNIVERSITY
College of Teacher Education
Note that all these involve the higher-level skills mentioned in Bloom’s Taxonomy.
Types of Essay
1. Restricted-response essay
Restricted - response limit the ways in which you will permit the students to answer.
There ARE correct answers and we allow students to express the answer in their own
words.
Restricted response is predicated on the notion that students supply the answers
rather than selecting the answer from a group options
126 | P a g e
MARIANO MARCOS STATE UNIVERSITY
College of Teacher Education
Advantages
1. It measures the higher level of knowledge – ability to interpret, evaluate, apply
principles, create, organize thoughts and ideas, compare and contrast, etc.
2. Helps students organize their thoughts and ideas logically – they can practice in
topical organization and discussion
3. Easy to prepare
4. Can be used in any subject
5. Harder to cheat than objective type
Disadvantages
1. There is difficulty in scoring as giving of the right weight to each question is difficult.
It cannot be mechanically scored.
2. Its usability, validity and reliability are low
3. Sampling is limited as only few questions could be included in the test
4. Scoring is subjective
5. Standards of excellence varies from teacher to teacher
6. Physical and mental conditions of the checker affects the scoring
7. Irrelevant factors like grammatical errors, poor penmanship, language difficulty,
wrong spelling and the like adversely affect the scoring
127 | P a g e
MARIANO MARCOS STATE UNIVERSITY
College of Teacher Education
4. Make it clear to students if spelling, punctuation, content, clarity and style are to be
considered in scoring the essay questions. When these criteria are clear and specific to
students, the item becomes valid.
5. Grade each essay question by the point method, using well-defined criteria.
By using certain criteria as guide, scoring essay questions becomes less subjective
and more objective.
Score Description
128 | P a g e
MARIANO MARCOS STATE UNIVERSITY
College of Teacher Education
Source: Danilo S. Gutierrez Assessment of Learning Outcomes in the Cognitive Domain Book
1
6. Evaluate all of the students’ responses to one question before going to the next question.
Scoring essay test question-by-question rather than student-by-student maintains
uniform standard for evaluating the answer to each question.
B. INTERPRETATIVE TEST
This type of test is often used in testing higher cognitive behavior. It may involve
analysis of maps, figures, charts and even comprehension of written passages.
129 | P a g e
MARIANO MARCOS STATE UNIVERSITY
College of Teacher Education
3. Written passages should be as brief as possible. The exercise should not be a test of
general reading ability.
4. The students have to interpret, apply, analyze and comprehend in order to answer a
given question in the exercise.
III. Performance Tests. It consists of a list of behaviors that make up a type of performance
Used to determine whether or not an individual behaves in a certain way when
asked to complete a particular task.
These are not measured thru the usual paper and pencil tests but responses are
shown thru overt manual, vocal and other behavioural activities
Examples are: visual arts, music, dramatics, speech, TLE, PE, military trainings,
sports and the like
When the behaviour is present when an individual is observed, the teacher
places a check opposite it on the list.
Activities measured are intangible and tangible finished products
Intangible finished products –actual process of performance after which it is no
longer observable ( renditions of vocal and instrumental music, public speaking,
dance, gymnastics, etc)
Tangible finished product – concrete object or article produced by the performer
(baskets, dresses, cross stitch, etc)
Some common performance tests are:
o Recognition - it may either be simple recognition, where pupils are
required to identify tools described, or given a task and students are
made to pick up from those given the materials he needs and use them to
perform the task.
o Simulated – used as a substitute for expensive operations like military
operations
o Work sample – not expensive and easy to prepare. It’s practical
Advantages
1. Performance test allow teachers to assess areas of learning that traditional
assessments do not address
2. Good instructional alignment- instructional alignment means that teachers test what
they teach.
3. Performance tests usually involve real-world tasks that students tend to find them
more engaging and challenging
4. Performance test provide high-quality feedback to students throughout the
assessment because they have a formative component
130 | P a g e
MARIANO MARCOS STATE UNIVERSITY
College of Teacher Education
5. Because performance test are linked with instruction, the two can be accomplished
simultaneously, thus increasing instructional efficiency.
6. Performance test can empower students by giving them freedom to make choices,
within parameters set by teachers, about the direction that their learning should
take
7. Performance test prompt students to use higher-order thinking skills such as
analysis, synthesis, and evaluation.
Disadvantages
1. Performance test items frequently are unrelated to those tasks and behaviors
required in the classroom setting
2. Performance test results reflect behavior or ability that has been measured during a
single point in time and, as such, are greatly influenced by noncognitive factors
3. Performance test results do not provide the type of information required for making
curricular modifications or instructional change
LESSONS
In the previous lessons we discussed, how to plan a test and how to write good test
items. In this lesson we will consider how to improve test items using an appropriate
procedure – this is the item analysis.
131 | P a g e
MARIANO MARCOS STATE UNIVERSITY
College of Teacher Education
1. It provides useful information for efficient class discussion of the test result.
2. It provides data which helps students improve their learning – identify areas of
weaknesses of students that need remediation
3. It provides a basis for increased skills in test construction.
4. It helps identify structural or content defects of the items.
5. It helps detect learning difficulties of the class.
6. It provides a basis for constructing test bank
4. Validation
Lastly, the final draft shall then be subjected to validation if the intent is to make
use of it as a standard test for a particular unit or grading period
132 | P a g e
MARIANO MARCOS STATE UNIVERSITY
College of Teacher Education
There are three common types of quantitative item analysis which provide
teachers with three different types of information about individual test items. These
are difficulty index, discrimination index and response options analysis
1. Difficulty Index is the proportion of students/test takers who answered the item
correctly or the percentage of students who answered each item correctly. It is
computed by using the formula:
2. Discrimination Index is the extent to which a test item can differentiate between
good performers and poor performers. It tells whether an item can discriminate
between those who know and those who do not know the answer.
133 | P a g e
MARIANO MARCOS STATE UNIVERSITY
College of Teacher Education
Example
Let us assume that there are 52 papers
Third, count 14 papers from the lowest score up to the 14th paper.
This is the lower group.
Fourth, set aside the remaining 12 papers This is the middle group.
They are not included in the item analysis.
Note: If the sample size is less than 50, the 50% grouping is used.
Example: Sample size = 30 papers
50% = 15 papers upper group and 15 papers lower group
3. Tabulate the choices or answers of the students in the upper and lower groups.
3.1 Write the frequency of the answers written in a table similar to Table 1.
For each paper, tally the answer for every number. Then summarize it in a
table similar to Table 1.
3.2 In each item, highlight the number of students who got the item correctly.
134 | P a g e
MARIANO MARCOS STATE UNIVERSITY
College of Teacher Education
Ite
m
A B C D omi Total A B C D omit Total
No.
t
1 5 6 2 1 0 14 4 4 4 2 0 14
2 5 9 0 1 9 4 0 14
3 4 9 1 0 14 5 1 1 6 1 14
4 3 11 0 14 5 1 8 0 14
5 13 1 0 3 7 1 2 1 14
Double check entries – The total should be equal to 14 for the upper and lower
groups
Tally under omit if the student did not answer an item.
4. Compute for the indices of difficulty and discrimination for each item.
The index of difficulty tells the percentage of students who got the item right.
The index of discrimination tells the difference between the number of students in
the
upper and lower groups who got the item right.
4.1 Write the frequency of those who answered correctly in the upper 27% and the
lower 27% in a table similar to Table 2.
135 | P a g e
MARIANO MARCOS STATE UNIVERSITY
College of Teacher Education
4.2 Interpret the computations made using the following values presented by Ebel
and
Frisbie (1986) ; interpret the test results as follows:
136 | P a g e
MARIANO MARCOS STATE UNIVERSITY
College of Teacher Education
With regard to discrimination index, if 40% or more of the pupils answered the
item correctly, it is very good; if only 19% or less got it right, it is a poor item.
137 | P a g e
MARIANO MARCOS STATE UNIVERSITY
College of Teacher Education
2. Items with a remark of “improve” should be reworded and tried out again to
be included in the item bank.
138 | P a g e
MARIANO MARCOS STATE UNIVERSITY
College of Teacher Education
Accomplish Table 2. Get the data for upper group and lower group from Table 1.
2 9 4
3 9 1
4 11 8
5 13 7
Questions
1. Based on numerical result which item do you think is the easiest? ________
3. Which item can least identify good performers from slow performers? _______
139 | P a g e
MARIANO MARCOS STATE UNIVERSITY
College of Teacher Education
LESSONS
A. VALIDITY
The validity of a test refers to the degree of accuracy by which it measures
what it aims to measure or as referring to the appropriateness, correctness,
meaningfulness and usefulness of the specific decisions a teacher makes based on
the test results. The degree of validity is expressed numerically as a coefficient of
correlation with another test of the same kind and of known validity.
It is the most important characteristics of test.
A valid test is always reliable
2. Directions. It should be very clear. Indicate exactly how the learners should
answer and record their answers.
140 | P a g e
MARIANO MARCOS STATE UNIVERSITY
College of Teacher Education
TYPES OF VALIDITY
1. Content validity. This refers to the relevance of the test items of a test to the
subject matter or situation from which they are taken. It refers to the content
and format of the instrument.
Sometimes called face validity or logical validity.
Uses item analysis to determine its validity
This could be done by asking at least 3 experts in the field to judge the
content.
Example 2 (on format) For a matching type of test – all test items
should be on one page. It cannot be that the some items are on the
other page.
Also, with a multiple choice, it cannot be that the stem is on page 1
and the choices are on page 2.
141 | P a g e
MARIANO MARCOS STATE UNIVERSITY
College of Teacher Education
4. Construct validity . This refers to the agreement of test results with certain
characteristics which the test aims to portray.
Example 2: Grade 12 STEM students should get high in Gen Chem 2 and
not the Grade 12 HUMMS or other academic track. If they get low test
and students in other tracks get high, THEN, the test has a very
poor construct validity.
142 | P a g e
MARIANO MARCOS STATE UNIVERSITY
College of Teacher Education
B. RELIABILITY
The reliability of the test is the degree of consistency and accuracy of
measurement that an instrument gives. It is called dependability or stability. The
degree of reliability is numerically expressed as a coefficient of correlation.
A reliability index of 0.50 or higher for teacher-made test is already acceptable.
This could be done using Kuder-Richardson Formula 21.
Reliability is a factor of validity - a test could not be valid without it
being reliable but not the reverse. It means two things: first, a test could
be reliable without it being valid and second, a valid test is reliable.
143 | P a g e
MARIANO MARCOS STATE UNIVERSITY
College of Teacher Education
3. Split-half Method – test is conducted once and the results is broken down into
halves like the odd-even division
Test scores are divided into odd-numbered items and even-numbered
items
Uses the Pearson’s Coefficient correlation formula
n ∑ xy−∑ x ∑ y
r oe =
√¿¿¿
2 r oe
rt =
1+ r oe
KR 21=[ ][
k
k −1
1−
x ( k−x )
k s2 ]
Where: k – number of items
x - mean
S – standard deviation
C. OBJECTIVITY
This refers degree to which personal judgement is eliminated in the
scoring of the test.
An objective test is one which yields the same score no matter who
checks it even when the test is checked at different times.
Objectivity requires that personal opinion of the teachers does not
affect the score of the individual student.
The more objective the test, the greater is its reliability.
144 | P a g e
MARIANO MARCOS STATE UNIVERSITY
College of Teacher Education
D. USABILITY
This refers to the characteristics of administrability, scorability, economy,
comparability and test utility. A test is usable if it is easy to administer, easy to
score, economical and when results are given meaning.
145 | P a g e
MARIANO MARCOS STATE UNIVERSITY
College of Teacher Education
Lesson References
Burton, S. J. et. al. 1991. How to Prepare Better Multiple Choice Test Items: Guidelines for
University Faculty. Brigham Young University Testing Services and The
Department of Instructional Science.
Buendicho, F.C. (2010). Assessment of Student Learning 1. Rex Bookstore Inc. Sta. Mesa
Heights, Quezon City
Gabuyo, Y.A (2015). Assessment of Learning 1. Rex Bookstore Inc. Sta. Mesa Heights,
Quezon City
Rivera, Arnel O. Test Construction: The Art of Effective Evaluation. UPHSD Molino Campus
(PPt, slideshare)
Santos, R D (2007). Assessment of Learning 1. Lorimar Publishing Inc. Cubao, Quezon Cit
146 | P a g e