Download as pdf or txt
Download as pdf or txt
You are on page 1of 23

1

Chapter 3
Designing and Developing Assessment Tools
Time Allotment: 12 hours (Week 6-9)
(Week 10- devoted to Midterm Examination)

Qualities/Characteristics Desired in an Assessment Instrument


Major Characteristics
A. Validity – the degree to which a test measures what it is supposed or intends to
measure. It is the usefulness of the test for a given purpose. It Is the most important
quality/characteristic desired in an assessment instrument.
B. Reliability – refers to the consistency of measurement that is how consistent test
scores or other assessment results are from one measurement to another. It is the
most important characteristic of an assessment instrument next to validity.

Minor Characteristics
A. Administrability – the test should be easy to administer such that the directions
should clearly indicate how a student should respond to the test/task items and
how much time should he/she spend for each test item or for the whole test.
B. Scoreability – the test should be easy to score such that directions for scoring are
clear, point/s for each correct answer(s) is/are specified.
C. Interpretability – test scores can easily be interpreted and described in terms of
the specific tasks that a student can perform or his/her relative position in a clearly
defined group.
D. Economy – the test should be given in the cheapest way in terms of time and effort
spent for administration of the test and answer sheets must be provided so the test
can be given from time to time.

Factors Influencing the Validity of an Assessment Instrument

1. Unclear directions – directions that do not clearly indicate to the students how
to respond to the tasks and how to record the responses tend to reduce validity.
2. Reading vocabulary and sentence structure too difficult- vocabulary and
sentence structure that are too complicated for the students result in the
assessment of reading comprehension thus altering the meaning of assessment
result.
3. Ambiguity – ambiguous statements in assessment tasks contribute to
misinterpretations and confusion. Ambiguity sometimes confuses the better
students more that it does the poor students.
4. Inadequate time limits – time limits that do not provide students with enough
time to consider the tasks and provide thoughtful responses can reduce the validity
of interpretations of results. Rather than measuring what a student knows about a
topic or is able to do given adequate time, the assessment may become a measure of
the speed with which the student can respond. For some content (e.g. a typing test),
speed may be important. However, most assessments of achievement should
minimize the effects of speed on student performance.
5. Overemphasis of easy-to assess aspects of domain at the expense of
important, but hard-to assess aspects (construct under representation)
-it is easy to develop test questions that assess factual recall and generally harder to
develop ones that tap conceptual understanding or higher-order thinking processes
2

such as the evaluation of competing positions or arguments. Hence, it is important


to guard against under representation of tasks getting at the important, but more
difficult to assess aspects of achievement.
6. Test items inappropriate for the outcomes being measured- attempting to
measure understanding, thinking skills, and other complex types of achievement
with test forms that are appropriate only for measuring factual knowledge will
invalidate the results.
7. Poorly constructed test items – test items that unintentionally provide clues to
the answer tend to measure the students’ alertness in detecting cues as well as
mastery of skills or knowledge the test is intended to measure.
8. Test too short – if a test is too short to provide a representative sample of the
performance we are interested in, its validity will suffer accordingly.
9. Improper arrangement of items – test items are typically arranged in order of
difficulty, with the easiest items first. Placing difficult items first in the test may
cause students to spend too much time on these and prevent them from reaching
item they could easily answer. Improper arrangement may also influence validity by
having a detrimental effect on student motivation.
10. Identifiable pattern of answer – placing correct answers in some systematic
pattern (e.g. T,T,F,F or B,B,B, C,C,C,C,D,D,D) enables students to guess the
answers to some items more easily, and this lowers validity.

Improving Test Reliability

Several test characteristics affect reliability. They include the following:

1. Test length – in general, a longer test is more reliable than a shorter one because
longer tests sample the instructional objectives more adequately.
2. Spread of scores – the type of students taking the test can influence reliability. A
group of students with heterogeneous ability will produce a larger spread of test
scores than a group with homogenous ability.
3. Item difficulty – in general, tests composed of items of moderate or average
difficulty (0.30 to 0.70) will have more influence on reliability than those composed
primarily of easy or very difficult items.
4. Item discrimination – in general, tests composed of more discriminating items
will have greater reliability than those composed of less discriminating items.
5. Time limits- adding a time factor may improve reliability for lower-level cognitive
test items. Since all students do not function at the same pace, a time factor adds
another criterion to the test that causes discrimination, thus improving reliability.
Teachers should not, however, arbitrarily impose a time limit. For higher-level
cognitive test items, the imposition of a time limit may defeat the intended purpose
of the items.

Test

➢ It is an instrument or systematic procedure which typically consists of a set of


questions for measuring a sample of behavior.
➢ It is a special form of assessment made under contrived circumstances especially so
that it may be administered.
➢ It is a systematic form of assessment that answers the question, “How well does the
individual perform-either in comparison with others or in comparison with a
domain of performance task.
➢ An instrument designed to measure any quality, ability, skill or knowledge.
3

Purposes/Uses of Tests

✓ Instructional Uses of Tests


• grouping learners for instruction within a class
• identifying learners who need corrective and enrichment experiences
• measuring class progress for any given period
• assigning grades/marks
• guiding activities for specific learners (the slow, average, fast)
✓ Guidance Uses of Tests
• assisting learners to set educational and vocational goals
• improving teacher, counselor and parents’ understanding of children
with problems
• preparing information/data to guide conferences with parents about
their children
• determining interests in types of occupations not previously
considered or known by the students
• predicting success in future educational or vocational endeavor
✓ Administrative Uses of Tests
• determining emphasis to be given to the different learning areas in
the curriculum
• measuring the school progress from year to year
• determining how well students are attaining worthwhile educational
goals
• determining appropriateness of the school curriculum for students of
different levels of ability
• developing adequate basis for pupil promotion or retention

Classification of Tests According to Format

I. Standardized Tests – tests that have been carefully constructed by experts in


the light of accepted objectives.
(Achievement Tests, Diagnostic Tests, Intelligence Tests)
1. Ability Tests – combine verbal and numerical ability, reasoning and
computations.
Ex. OLSAT (Otis Lennon Standardized Ability Test)
2. Aptitude Tests – tests which measure potential in a specific field or area;
predict the degree to which an individual will succeed in any given area
such as art, music, mechanical task or academic studies.
Ex. DAT – Differential Aptitude Test

II. Teacher-made Tests – tests constructed by classroom teacher which


measure and appraise student progress in terms of specific
classroom/instructional objectives.
1. Objective Type –answers are in the form of a single word or phrase or
symbol number of alternatives or choices.
a. Limited Response Type
i. Multiple Choice Test – consists of a stem each of which presents
three to five alternatives or options in which only one is correct or
definitely better than the other. The correct option choice or
4

alternative in each item is merely called answer and the rest of the
alternatives are called distracters or decoys or foils.
ii. True-False or Alternative Response –consists of declarative
statements that one has to respond or mark true or false, right or
wrong, correct or incorrect, yes or no, fact or opinion, agree or disagree
and the like. It is a test made up of items which allow dichotomous
responses.
iii. Matching Type –consists of two parallel columns with each word,
number, or symbol in one column being matched to a word sentence,
or phrase in the other column. The items in Column I or A for which a
match is sought are called premises, and the items in Column II or B
from which the selection is made are called responses.
b. Free Response Type or Supply Test- requires the student to supply
or give the correct answer.
i. Short answer – uses a direct question that can be answered by a
word, phrase, number or symbol.
ii. Completion Test – consists of an incomplete statement that can
also be answered by a word, phrase, number, or symbol
2. Essay Type – essay questions provide freedom of response that is needed
to adequately assess students’ ability to formulate, organize, integrate and
evaluate ideas and information or apply knowledge and skills.
a. Restricted Essay - limits both the content and the response.
Content is usually restricted by the scope of the topic to be discussed.
b. Extended Essay - allows the students to select any factual
information that they think is pertinent to organize their answers in
accordance with their best judgment and to integrate an evaluate
ideas which they think appropriate.

Other Classification of Tests

➢ Psychological Tests - aim to measure students’ intangible aspects of behavior,


i.e. intelligence, attitudes, interests and aptitude
➢ Educational Tests – aim to measure the results/effects of instruction.
➢ Survey Tests – measure general level of student’s achievement over a broad range
of learning outcomes and tend to emphasize norm-referenced interpretation.
➢ Mastery Tests – measure the degree of mastery of a limited set of specific
learning outcomes and typically use criterion referenced interpretations.
➢ Verbal Tests – verbal test is one on which words are very necessary and the
examinee should be equipped with vocabulary in attaching meaning to or
responding to test items.
➢ Non-verbal Tests – one on which words are not that important, student responds
to test items in the form of drawings, pictures or designs.
➢ Standardized Tests – constructed by a professional item writer, cover a large
domain of learning tasks with just few items measuring each specific task.
Typically, items are of average difficulty and omits very easy and very difficult
items, emphasize discrimination among individuals in terms of relative level of
learning.
➢ Teacher-made Tests – constructed by a classroom teacher, give focus on a
limited domain of learning tasks with relatively large number of items measuring
each specific task. Matches item difficulty to learning tasks, without alternating
5

item difficulty or omitting easy to difficult items, emphasize description of what


learning tasks students can and cannot do/perform.
➢ Individual Tests – administered on a one-to-one basis using careful oral
questioning.
➢ Group Test – administered to group of individuals, questions are typically
answered using paper and pencil technique.
➢ Objective Test – one on which equally competent examinees will get the same
scores, e.g. multiple-choice test
➢ Subjective Test- one on which the scores can be influenced by the
opinion/judgment of the rater, e.g. essay test.
➢ Power Tests – designed to measure level of performance under sufficient time
conditions, consist of items arranged in order of increasing difficulty.
➢ Speed Tests- designed to measure the number of items an individual can
complete in a given time, consists of items approximately of the same level of
difficulty.
6

Stages in the Development and Validation of an


Assessment Instrument

Phase I- Planning Stage

1. Specify the objectives/skills and content


areas to be measured.
2. Prepare the Table of Specifications
3. Decide on the item format-short answer
form/multiple choice, etc.

Phase II- Test Construction/Item


Writing Stage

1. Writing of test items based on the table of


specifications
2. Consultation with experts-subject teacher/
test expert for validation (content) and
editing

Phase III- Test Administration Stage/


Try out Stage

1. First trial run-using 50 to 100 students


2. Scoring
3. First item analysis- determine difficulty
and discrimination indices
4. First option analysis
5. Revision of the test items-based on the
results of test item analysis
6. Second trial run/field testing
7. Second item analysis
8. Second option analysis
9. Writing the final form of the test

Phase IV- Evaluation Stage

1. Administration of the final form of the test.


2. Establish test validity
3. Estimate test reliability
7

Table of Specifications (TOS)

 A plan prepared as basis for test construction.


 The blueprint of a test.
 A table that relates the instructional outcomes or course contents to the thinking
skills that we want to measure.
Importance of TOS

 It provides an assurance that the test questions are representative samples of the
lessons covered.
 It will result to a balanced test.
 Helps teachers determine the content mastery of the learners.
Steps in Making TOS

1. List down the learning outcomes, topics or competencies that you want to measure.
2. Determine the number of class sessions or the no. of hours spent per learning
outcome.
3. Decide on the number of items to be prepared.
4. Determine the number of items to be prepared per outcome.
Divide the no. of hours spent by the total number of class sessions times the
total number of items.
The result tells us of the no. of item per outcome.

5. Distribute each of the items according to the level of thinking skills being
measured.
6. Determine the type of items to be prepared.
One-way TOS

Topics Covered Class Number of Item Type


Sessions (in items
hours)

1.

2.

3.

4.

5.

Total
8

One-way Table of Specifications


Long Exam in Educ 106A
Topics Covered Class Total items Item Type
Sessions

I. Introduction 3 15 Multiple choice


(measurement, assessment,
evaluation, testing)
II. Purposes of assessment 1.5 8 Multiple choice
Assessment for/of/as learning Supply Test

III. Relevance of assessment 3 15 Multiple choice

IV. Roles of assessment 3 16 Multiple choice


(Placement ,Formative, Diagnostic, Supply
Summative) Test

V. Types of tests: 3 16 Multiple choice


(written, oral, performance,
objective, subjective, standardized,
non-standardized, norm-referenced,
criterion-referenced , power & speed;
verbal & non-verbal)
Total 13.5 70

Two-way TOS

Topics Class Levels of Thinking Skills Number Item Type


Covered Sessions of items
(in hours)

R U A A E C

1.

2.

3.

4.
5.

Total

Legend: R-remembering
U-understanding
A-applying
A-Analyzing
E-Evaluating
C-Creating
9

Two-way Table of Specifications


Long Exam in Educ 106A

Topics Covered Class Levels of Thinking Total Item


Sessions Skills items Type

R U A A E

I. Introduction 3 3 5 6 1 15 MC
(measurement, assessment, evaluation,
testing)

II. Purposes of assessment 1.5 4 4 8 MC


Assessment for/of/as learning S

III. Relevance of assessment 3 3 7 2 3 15 MC

IV. Roles of assessment 3 4 5 2 5 16 MC


(Placement ,Formative, Diagnostic, S
Summative)

V. Types of tests: 3 5 4 7 16 MC
( written, oral, performance,
objective, subjective, standardized, non-
standardized, norm-referenced, criterion-
referenced , power & speed; verbal & non-
verbal)

Total 13.5 70

Legend: MC – Multiple Choice; S-Supply Test

TABLE OF SPECIFICATIONS
Midterm Examination in Educ 106A- Assessment of Learning 1
1st Semester, SY 2021-2022
Step 1: List down the learning outcomes, topics or competencies that you want to
measure.
Learning Outcomes Class No. of Item
Sessions Items Type
(in
hours)

1. Identify if given scenario is an assessment, measurement, or


evaluation.

2. Determine the purpose of assessment in the given assessment


scenario if it is Assessment for Learning, Assessment of
Learning or Assessment as Learning.

3. Determine the roles and relevance of assessment.

4. Classify the given learning outcome


according to the domains of learning: Cognitive, Psychomotor,
and Affective.

5. Identify the different kinds/types of test.

6. Select the assessment method appropriate


for a particular learning outcome.

7. Explain the relevance of validity in assessment; how to validate


different assessment methods, and identify the types of validity.

TOTAL
10

Step 2: Determine the number of class sessions or the no. of hours spent per
learning outcome.
Learning Outcomes Class No. of Item
Sessions Items Type
(in hours)

1. Identify whether the given scenarios is assessment, 2


measurement, or evaluation.

2. Determine the purpose of assessment in the given assessment 3


scenario if it is Assessment for Learning, Assessment of
Learning or Assessment as Learning.

3. Determine the roles and relevance of assessment. 3

4. Classify the given learning outcome 3


according to the domains of learning: Cognitive, Psychomotor,
and Affective.

5. Identify the different kinds/types of test. 3

6.Select the assessment method appropriate 3


for a particular learning outcome.

7. Explain the relevance of validity in assessment; how to validate 3


different assessment methods, and identify the types of validity.

TOTAL 20

Step 3: Decide on the number of items to be prepared.


Learning Outcomes Class No. of Item
Sessions (in Items Type
hours)

1. Identify whether given scenarios is assessment, measurement, 2


or evaluation.

2. Determine the purpose of assessment in the given assessment 3


scenario if it is Assessment for Learning, Assessment of
Learning or Assessment as Learning.

3. Determine the roles and relevance of assessment. 3

4. Classify the given learning outcome according to the domains of 3


learning: Cognitive, Psychomotor, and Affective.

5. Identify the different kinds/types of test. 3

6. Select the assessment method appropriate for a particular 3


learning outcome.

7. Explain the relevance of validity in assessment; how to validate 3


different assessment methods, and identify the types of validity.

TOTAL 20 70
11

Step 4: Determine the number of items to be prepared per outcome.


Divide the no. of hours spent by the total number of class sessions X the
total number of items.
The result tells us of the no. of item per outcome.

Learning Outcomes Class No. of Item


Sessions (in Items Type
hours)

1. Identify whether given scenarios is assessment, measurement, 2 7


or evaluation.

2. Determine the purpose of assessment in the given assessment 3 10


scenario if it is Assessment for Learning, Assessment of
Learning or Assessment as Learning.

3. Determine the roles and relevance of assessment. 3 10

4. Classify the given learning outcome according to the domains of 3 11


learning: Cognitive, Psychomotor, and Affective.

5. Identify the different kinds/types of test. 3 10

6. Select the assessment method appropriate for a particular 3 11


learning outcome.

7. Explain the relevance of validity in assessment; how to validate 3 11


different assessment methods, and identify the types of validity.

TOTAL 20 70

Step 5: Distribute each of the items according to the level of thinking skills being
measured. ( for two-way TOS)
Step 6: Determine the type of items to be prepared.

Learning Outcomes Class Sessions No. of Item Type


(in hours) Items

1. Identify whether given scenarios is assessment, 2 7 Multiple


measurement, or evaluation. choice

2. Determine the purpose of assessment in the given 3 10 Multiple


assessment scenario if it is Assessment for Learning, choice
Assessment of Learning or Assessment as Learning.

3. Determine the roles and relevance of assessment. 3 10 Supply test

4. Classify the given learning outcome according to the 3 11 Supply test


domains of learning: Cognitive, Psychomotor, and
Affective.

5. Identify the different kinds/types of test. 3 10 Multiple


choice

6. Select the assessment method appropriate for a 3 11 Multiple choice


Essay
particular learning outcome.

7. Explain the relevance of validity in assessment; how 3 11 Multiple


to validate different assessment methods, and choice
Essay
identify the types of validity.

TOTAL 20 70
12

ASSESSMENT TOOLS DEVELOPMENT


General Suggestions for Writing Assessment Tasks and Test Items

1. Use assessment specifications (TOS) as a guide to item/task writing.


2. Construct more items/tasks than needed.
3. Write the items/tasks ahead of the testing date.
4. Write each test item/task at an appropriate reading level and difficulty.
5. Write each test item/task in a way that it does not provide help in answering other
test items or tasks.
6. Write each test item/task so that the task to be performed is clearly defined and it
calls forth the performance described in the intended learning outcome.
7. Write a test item/task whose answer is one that would be agreed upon by the
experts.
8. Whenever a test is revised, recheck its relevance.

POINTERS TO BE OBSERVED IN CONSTRUCTING AND SCORING THE


DIFFERENT TYPES OF TESTS

A. RECALL TYPES
1. Completion type/Supply type of test
a. Only important words or phrases should be omitted to avoid confusion.
Ask question on more significant item not on trivial matter.
EX. Jose Rizal was born on June ___, 1861.
b. Blanks should be of equal lengths. The length of the blanks must not suggest
the answer. So better to make the blanks uniform in size.
c. The blank should be at the end or near the end of the sentence. The question
must first be asked before an answer is expected.
d. Articles a, an, and the should not be provided before the omitted word or
phrase to avoid clues for answers.
e. Do not take statements directly from textbooks
f. If the item is to be expressed in numerical units, indicate the type of answer
wanted.
g. When the completion items are to be used, do not include too many blanks.
Ex.
The ____produced by the ______ is used by the green _____ to
change the ____ and ____ into _____. This process is called ____.

h. Avoid open-ended item. There should be only one acceptable answer. This
item is open-ended hence, not good test item.

Ex. Ernest Hemingway wrote _________.

i. Score is the number of correct answers.


13

Sample Fill-in the blanks:

Subject: Araling Panlipunan

Panuto: Tukuyon ang mga sumusunod. Isulat ang tamang sagot sa patlang. (5 puntos)

1. Ang pinakamataas na bundok sa buong mundo ay ang ___________.


2. Ang pinakamalawak na karagatan sa daigdig ay ang ____________.
3. Ang topograpiya ay nagpapakita ng ___________ na katangian ng isang lugar o
rehiyon.
4. Ang pinakamalawak na masa ng lupa sa ibabaw ng daigdig ay tinatawag na
___________.
5. Ang Pilipinas ay matatagpuan sa kontinente ng __________.

2. Enumeration type
a. The exact number of expected answers should be stated.
b. Score is the number of correct answers.

Sample Enumeration type:

Subject: Science

Directions: Enumerate the following. Write your answer in the space provided. (1 point
each)
1-3. Main parts of a plant
4-7. Uses of plants
8-10. Ways of taking care of plants

3. Identification type/ Analogy


a. The items should make an examinee think of a word, number, or group of
words that would complete the statement or answer the problem.
b. Score is the number of correct answers.

Sample Identification type:

Subject: TLE

Directions: Read the following statements carefully and identify what farm tools,
implements and equipment are being described. Write your answer in the
blank provided before the number. ( 1 pt. each)

_______1. It is a tool used for digging canals, breaking hard topsoil and digging up
stones and tree stumps.

_______2. It is an implement mounted to a tractor use for tilling and pulverizing the soil.

_______3. It is an equipment used to pull disc plow and disc harrow in preparing much
bigger area of metal.
14

_______4. An implement made of metal mounted to a tractor which is used for tilling
and pulverizing the soil.

_______5. A tool used for cutting branches of planting materials and unnecessary
branches of plants.

B. RECOGNITION TYPES

1. True-false or alternate-response type


a. Declarative sentences should be used.
b. The number of “true” and “false” items should be more or less equal.
Ex:
1. T 6. F
2. F 7. T
3. F 8. F
4. T 9. F
5. T 10.T

c. The “modified true-false” is more preferable than the “plain true-false”


d. In arranging the items, avoid the regular recurrence of “true” and “false”
statements.
Ex: 1. T 6. F
2. F 7. T
3. T 8. F
4. F 9. T
5. T 10.F
e. Avoid using specific determiners like: all, always, never, none, nothing, most,
often, some, etc. and avoid weak statements as may, sometimes, as a rule, in
general, etc. Test items which use these determiners tend to be either always
true or always false. Moderately worded statements are more likely to be true
than false.

Ex: Christmas always falls on Sunday because it is a Sabbath day.

Statements that use the word “always are almost always false. A test-
wise student can easily guess his way through a test like these and get high
scores even if he does not know anything about the test.

Ex: Executives usually suffer from hyperacidity.

The statement tends to be correct. The word “usually” leads to the


answer.
f. Minimize the use of qualitative terms like: few, great, many, more, etc.
g. Avoid leading clues to answers in all terms.
h. Avoid broad, trivial statements and use of negative words especially double
negatives.
Ex: Elasticity is not a property of solid.
15

Ang hindi pagsunod sa magulang ay hindi magandang asal.


(Double negative)

i. Avoid multiple facts or including two ideas in one statement, unless cause-
effect relationship is being measured.
j. If opinion is used, attribute it to some source unless the ability to identify
opinion is being specifically measured.
Ex:
Ang kabataan ang pag-asa ng bayan. (It might be true or false)
Ayon kay Dr.Jose Rizal, ang kabataan ang pag-asa ng bayan. ( This is really
true)
k. True statements and false statements should be approximately equal in
length.
l. Do not give a hint in the body of the question.
Example: The Philippines gained its independence in 1898 and therefore
celebrated its centennial year in 2000.

m. Avoid long sentences as these tend to be “true”. Keep sentences short.


Example: Tests need to be valid, reliable and useful, although, it would
require a great amount of time and effort to ensure that tests possess these
test characteristics.

n. Avoid trick statements with some minor misleading word or spelling


anomaly, misplaced phrases, etc. A wise student who does not know the
subject matter may detect this strategy and thus get the answer correctly.
Example: The Raven was written by Edgar Allen Poe.

o. Avoid quoting verbatim from reference materials or textbooks. This


practice sends the wrong signal to the students that it is necessary to
memorize the textbook word from word and thus, acquisition of higher
level thinking skills is not given due importance.

p. Score is the number of correct answers in “modified true-false and right


answers minus wrong answers in “plain true-false”.

Sample Plain True-False Type:


Subject: Science 9
Topic: From Parents to Offspring

Directions: Write the word True if the statement is correct and False if otherwise. Write
your answer in the space provided before each number. (1 pt. each)
_______1. Genetics is a branch of Biology that deals with the study of heredity and
variation.
16

_______2. The law of segregation states that different genes are not affected by each
other or separate independently from each other during gamete formation.
_______3. Sex chromosomes determine the sex of an individual.
_______4. A Punnett square is used to predict the results of genetic crosses.
_______5. Gregor Mendel is the father of Biology.

Sample Modified True-False Type:

Subject: Science 9
Topic: Light Gives Life
Directions: Write the word True if the statement is correct and it is false, underline the
word/s that make/s the statement incorrect, then write the correct answer in
the blank provided to make the statement correct. (1 point each)
_______1. Photosynthesis is a multistep process whereby light energy is trapped by
chlorophyll in plants and converted into chemical energy.
_______2. Organism use cellular respiration to break down glucose and harvest energy.
_______3. A chloroplast has two membranes surrounding the liquid in its interior called
the granum.
_______4.Oxygen and water are produced during the process of cell respiration.
_______5. Plants are called autotrophs because they are self-feeders.

2. Multiple-response type
a. There should be three to five choices. The number of choices used in the first
item should be the same number of choices in all the items of this type of
test.
b. The choices should be numbered or lettered so that only the number or letter
can be encircled or written on the blank provided.
c. If the choices are figures, they should be arranged in ascending order.
Ex: How many factors does 86 have?
a. 3 b. 4 c. 5 d. 6
d. Avoid the use of “a” or “an” as the last word prior to the listing of the
responses.
e. The correct answer should appear approximately equal number of times but
in random order.
Ex:
1. b 6. c 11. d
2. a 7. c 12. a
3. a 8. b 13. b
4. c 9. d 14. c
5. d 10.b 15. d
f. The choices should be related in some way or should belong to the same
class.
17

g. Use a negatively stated stem only when significant learning outcomes require
it and stress/highlight the negative words for emphasis.
Ex:
The following are properties of solid except
a.
b.
c.
d.
h. An item should only contain one correct or clearly best answer.
i. Better still use “none of the above” and “all of the above” sparingly. But best
not to use them at all.
j. Use the “None of the above “option only when the keyed answer is totally
correct. When choice of the “best” response is intended, “none of the above”
is not appropriate, since the implication has already been made that the
correct response may be partially inaccurate.
k. Note that use of “all of the above” may allow credit for partial knowledge.
In a multiple option item, (allowing only one option choice) if a student only
knew that two (2) options were correct, he could then deduce the
correctness of “all of the above”. This assumes you are allowed only one
correct choice.
l. Do not use unfamiliar words, terms, and phrases. The ability of the
item to discriminate or its level of difficulty should stem from the subject
matter rather than from the wording of the question.
Example: What would be the system reliability of a computer system
whose slave and peripherals are connected in parallel
circuits and each one has a known time to failure
probability of 0.05?

m. Do not use modifiers that are vague and whose meanings can differ from
one person to the next such as: much, often, usually. etc.
Example:
Much of the process of photosynthesis takes place in the:
a. bark
b. leaf
c. stem
n. Do not use negatives or double negatives as such statements tend to be
confusing. It is best to use simpler sentences rather than sentences that
would require expertise in grammatical construction.
Example:
(Poor) Which of the following will not cause inflation in the Philippine
economy?
(Better) Which of the following will cause inflation in the Philippine
economy?
Poor: What does the statement “Development patterns acquired
during the formative years are NOT Unchangeable” imply?
Better: What does the statement “Development patterns acquired
during the formative years are changeable” imply?
o. Each item should be s short as possible; otherwise you risk testing more
for reading and comprehension skills.
18

p. Distracters should be equally plausible and attractive.


Example:
The short story: May Day’s Eve, was written by which Filipino author?
a. Jose Garcia Villa
b. Nick Joaquin
c. Genoveva Edrosa Matute
d. Rober Frost
e. Edgar Allan Poe

q. All multiple choice options should be grammatically consistent with the


stem.
Example: As compared to the autos of the 1960s autos in the
1980s______.
A. travel faster C. use less fuel
B. have bigger interiors D. contain more safety measures

r. Avoid stems that reveal the answer to another item.


Example:

1. Who will most strongly disagree with the progressivist who claims that
the child should be taught only that which interests him and if he is not
interested, wait till the child gets interested?
A. Essentialist C. Progressivist
B. Empiricist D. Rationalist

2. Which group will most strongly focus its teaching on the interest of the
child?
A. Progressivist C. Perrenialist
B. Essentialist D. Reconstructionist

s. Avoid use of unnecessary words or phrases, which are not relevant to the
problem at hand (unless such discrimination ability is the primary intent of
the evaluation). The item’s value is particularly damaged if the unnecessary
material is designed to distract or mislead. Such items test the student’s
reading comprehension rather than knowledge of the subject matter.

Example:
The side opposite the thirty degree angle in a right triangle is equal to half
the length of the hypotenuse. If the sine of a 30-degree is 0.5 and its hypotenuse
is 5, what is the length of the side opposite the 30-degree angle?
a. 2.5
b. 3.5
c. 5.5
d. 1.5

t. Pack the question in the stem. Here is an example of a question which has
no question. Avoid it by all means.
Example:
The Roman Empire _______.
a. had no central government.
b. had no definite territory
19

c. had no heroes
d. had no common religion
u. Always have the stem and alternatives on the same page.
v. Score is the number of correct answers.

Sample Multiple Choice type:

Subject: Assessment of Student Learning

Directions: Choose the best answer. Write the letter of your choice in the space provided
before each number. (1 point each)

_____1. In a positively skewed distribution, the following statements are true except

a. Median is higher than the Mode


b. Mean is higher than the Median
c. Mean is lower than the Mode
d. Mean is not lower than the Mode
_______2. Miss Cruz administered a test to her class and the result is positively skewed.
What kind of test do you think Miss Cruz gave to her students?

a. Posttest c. Mastery Test


b. Pretest d. Criterion-referenced Test
______3. Which of the following indicates how compressed or expanded the distribution
of scores is?

a. Measures of position c. Measures of correlation


b. Measures of variability d. Measures of central tendency
______4. In a frequency distribution, what is the interval size of the class whose lower
and upper limits are 99.5 and 19.5?

a. 5 b. 9 c. 10 d. 11
______5. Bert obtained a 97 percentile rank in an aptitude test. This means

a. she answered 97% of the items correctly.


b. she belongs to the 97% of the group who took the test.
c. 79% of the examinees did better than her on the test.
d. she surpassed 97% of those who took the test.

3. Matching type

a. There should be two columns. Under “A” are the stimuli which should be
longer and more descriptive than the responses under column “B”. The
response may be a word, a phrase, a number or a formula.
b. The stimuli under column “A” should be numbered and the responses under
column “B” should be lettered. Answers will be indicated by letters only on
lines provided in column “A”.
c. Matching sets should neither be too long nor too short.
d. All items should be on the same page to avoid turning of pages in the process
of matching pairs.
20

e. Use only homogenous material in a single matching exercise.

Ex.: The test items are all about the Filipino heroes, nothing more

Directions: Match the items in column A with the items in Column B.

A B
___1. First President of the Republic a. Magellan
___2. National Hero b. Mabini
___3. Discovered the Philippines c. Rizal
___4. Brain of Katipunan d. Lapu-Lapu
___5. The great painter e. Aguinaldo
___6. Defended Limasawa island f. Juan Luna
g. Antonio Luna
f. Include an unequal number of responses and premises and instruct the pupil
that responses may be used once, more than once, or not at all. This is to
avoid guessing.
g. Arrange the list of responses in logical order.
h. Limit a matching exercise to not more than 10 to 15 items.
i. Like any other test, the direction of the test must be given. The examinees
must know exactly what to do.
j. Score is the number of correct answers.

C. ESSAY TYPE OF TEST

a. Restrict the use of essay questions to those learning outcomes that cannot be
satisfactorily measured by objective items.
b. Construct questions that will call forth the skills specified in the learning
standards.
c. Avoid the use of optional questions
d. Indicate the approximate time limit or the number of points for each
question.
e. Prepare an outline of the expected answer in advance or scoring rubric.

Sample Restricted Essay:

Discuss the different measures of reliability. Justify the use of each measure in the
context of measuring reliability. ( 5 points)

Sample Extended Essay:

In one (1) paragraph, explain the significance of classroom assessment.


21

ITEM ANALYSIS

Item analysis is a statistical technique which is used for selecting and rejecting the
items of the test on the basis of their difficulty value and discriminated power.

STEPS OF ITEM ANALYSIS


1. Arrange the scores in descending order
2. Separate two sub groups of the test papers
3. Take 27% of the scores out of the highest scores and 27% of the scores falling at
bottom
4. Count the number of right answer in upper group and count the number of right
answer in lower group and compute for the proportion of each group.
5. Solve for the difficulty and discrimination indices.

Difficulty Index = UGprop + LGprop


2
Discrimination Index = UGprop - LGprop
6. Decide whether the item is to be retained, revised or rejected/discarded.

Note: Items with difficulty index within 0.26 to 0.75 and with discrimination index
from 0.20 and above are to be retained. Items with difficulty index within 0.25 to
0.75 but with discrimination index of 0.19 and below or with discrimination index
of 0.20 and above but with difficulty index not within 0.26 to 0.75 should be
revised. Items with difficulty index not within 0.26 to 0.75 and with
discrimination index of 0.19 and below should be rejected/discarded.

Illustrative Example:
The teacher gave a summative examination in Science consisting of 40 items
among 48 students. Analyze each item of the test to determine the difficulty and
discrimination indices of each item, and decide whether a given item is to be retained,
revised or discarded/rejected.

STEPS OF ITEM ANALYSIS

1. Arrange the scores in descending order


- Arrange the scores of 40 students from highest score to lowest score.

2. Separate two sub groups of the test papers


- The two sub groups refer to the Upper Group (UG) and the Lower Group (LG)
3. Take 27% of the scores out of the highest scores and 27% of the scores falling at
bottom
22

48 students x 0.27 = 12.96 or 13 students. This means that there will be 13


students in the Upper group ( top 13 students) and 13 students in the
Lower group (bottom 13 students)

4. Count the number of right answer in upper group and count the number of right
answer in lower group and compute for the proportion of each group.
- determine the proportion of the students in the upper group and the lower
group by getting the number of students who got the correct answer per
item, then divide it by the total number of students in each group.

Say, there are 13 students in the upper group and 13 students in the lower
group. There are 10 students who got the correct answer in the upper group
and 5 students got the correct answer in the lower group.
Proportion of the upper group: 10/13 = 0.77
Proportion of the lower group: 5/13 = 0.38
5. Solve for the difficulty and discrimination indices.

Difficulty Index = UGprop + LGprop = 0.77 + 0.38 = 0.58


2 2
Discrimination Index = UGprop - LGprop
= 0.77 - 0. 38
= 0.39
6. Decide whether the item is to be retained, revised or rejected/discarded.
Since the difficulty index is within 0.26 to 0.75 and with discrimination
index is above 0.19, therefore the item is to be retained. This means
that the item is moderately difficult and a discriminating item. It
discriminates the upper group and the lower group.

Template for Item Analysis

Item No. of No. of Propor- Difficulty Discrimination


Decision
Number students students tion Index Index
who got the
correct
answer
1 UG =13 10 0.77 0.58 0.39 Retained
LG =13 5 0.38
2 UG =13 12 0.92 0.84 0.15 Rejected
LG =13 10 0.77
3 UG =13 2 0.15 0.08 0.15 Rejected
LG =13 0 0
4 UG =13 10 0.77 0.70 0.15 Revised
UG =13 8 0.62
5 UG =13 12 0.92 0.77 0.30 Revised
UG =13 8 0.62
6 UG =13
UG =13
until UG =13
UG =13
23

40 UG =13
UG =13

Note: Item No. 1 is a good item because it is moderately difficult and a discriminating
item. Retain this item.
Item No. 2 is an easy item because most of the students in the upper group and
lower group got the correct answer. It does not discriminate the lower group and the
upper group. Therefore, it should be discarded. Construct another item to replace this
item.

Item No. 3 is an difficult item because almost all of the students in both groups
did not get the correct answer. It does not discriminate the lower group and the
upper group. Therefore, it should be discarded/rejected. Construct another item to
replace this item.

Item number 4 needs to be revised. Although the item is moderately difficult but
it is not discriminating. You can restate or improve the question.

Item number 5 needs to be revised although the item is a discriminating item but
it is an easy item. You can restate or improve the question.

***Same procedures will be employed in analyzing item no. 6 up to item


no. 40.

References

Navarro, Rosita L., Santos, Rosita G. and Corpuz, Brenda B. 2017. Assessment of Learning 1.
LORIMAR Publishing Inc.

Professional Education(A Reviewer for the Licensure Examinations for Teachers). Philippine
Normal University. Manila.

Disclaimer

This module is prepared for instructional purposes only based on our course syllabus. The teacher
who prepared this does not claim ownership of this module but patterned the ideas from different
authors.

You might also like