Module 4. Teacher-Made Tests

You might also like

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 54

MODULE 4.

TEACHER-MADE TESTS

Objectives:

1. Enumerate and explain the guidelines in constructing teacher-


made tests
2. Construct test questions/items in each type of tests following
the hierarchy or levels of cognitive domain.
3. Practice administering and scoring teacher-made tests

Introduction

In the field of teaching, written and oral tests are common ways of measuring
student achievement.
Teacher-made tests provide more frequent evaluation and they are more
closely related to schools’ and teachers’ objectives and content of the course.
Teacher-made tests are made by teachers and are often prepared hurriedly
and are not subjected to any statistical procedures. Standardized tests are generally
prepared by specialists and are given to a large population and were subjected to
validity and reliability tests.

ESSAY EXAMINATION

Essay examination consists of questions to which the students respond in one


or more sentences to a specific questions or problem. It is widely considered as a
measuring instrument for evaluating knowledge of the subject matter or to measures
skills in writing where it tests student’s ability to express his ideas accurately and to
think critically within a certain period of time. In other words, an essay examination
may be evaluated in terms of content and form.

Suggestion in constructing an essay examination

1. If an essay examination is to be used effectively in the classroom, it must be


planned and constructed carefully in advance.
2. It must show major aspects of the lesson in framing questions, so care must
be taken to distribute questions evenly among the different units.
3. After the test has been planned and the questions have been written
tentatively, precautions on the causes of unreliability should be taken.
4. In assembling the test questions into a final form, the teachers should be
careful that he phrases the questions vividly so that its scope will be cleared to
the students.
5. Time limit on the coverage of each question should be reckoned so that
students have adequate time to answer.
2

Types of constructing an essay examination

Monroe and Carter made twenty types of very suggestive classification of


thought questions. These are (1) selective recall, (2) evaluating recall, (3) comparison
of two things (specific), (4) comparison of two things (general), (5) decision (for or
against), (6) cause or effect , (7) explanation of the use of exact meaning of some
phrases or statement in a passage, (8) summary of some us units of the test or
some articles read, (9) analysis, (10) statement of relationship, (11) illustration and
examples of principles in science constructions in language, (12) classification (13)
application of rules or principles in new situations, (14) discussion, (15) statement of
aim, (16) criticism, (17) outline, (18) reorganization of facts, (19) formulation of new
questions, problems and questions raised, and (20) new method or procedure.

1. Selective recall. The basis is given. For instance, name the heads of states in
the world who had been awarded as the “WORLD WHOS’S WHO OF
WOMEN.”

2. Evaluating recall. The basis is also given. For example- name five departments
in the Philippines which have had the greatest influences on the economic
development of country.
3. Comparison of two things (specific). There is one single designed basis.
Examples (3.1) compare bamboo raft and state methods in culturing
Eucheuma. (3.2) compare traditional and modern methods in teaching
mathematics.

4. Comparison of two things (general). Comparison two things in general.


Example- (4.1) compare fish farming in china with that in Philippines (4.2)
compare education in Unite States with that in Philippines.

5. Decision (for or against). Examples- (5.1) Which, in your own opinion, can you
do better, an oral or written examination? Why? (5.2) In your opinion, which is
better, essay or objective examination? Why?

6. Cause and effect. examples- (6.1) Why is vocational education (agriculture,


fisheries, and trade.) given more stress by the government than other courses
in the Philippines? (6.2) Why is values education given more emphasis by
government than other subjects in curriculum?

7. Explanation of the use and exact meaning of some phrases or statements in a


passage. Example.- What does this passage mean “Be the best of what you
are, if you can’t be a pine on top of a hill, be a scrub in the valley, but be the
best little scrub at the side of the hill”.
3

8. Summary of some unit of the test or some of the article read.


Example- Summarize in paragraph or two the advantages and disadvantages
in the teaching of modern mathematics.

9. Analysis. (the word itself is seldom involved in the questions). Example- what
are the characteristics of Cory Aquino which make you understand why
Filipino people sympathize her?

10. Statement of relationship. Examples- (10.1) Why is knowledge on algebra


helpful in studying statistics? (10.2) Why is knowledge on chemistry helpful in
studying fish processing?

11. Illustration and examples of principles in science constructing in language, etc.


example- from your own experience, give three example of the use of
mathematics in your daily life.

12. Classification. (usually the verse of No.11) examples- (12.1) To what class does
human being in the animal kingdom? (12.2) To what class Gracilaria verrucosa
belong to plant kingdom?

13. Application of rules or principles in new situations. Example- Why is the


objective type of examination commonly use in the government examinations
like the NCEE, Professional Board Examinations for Teachers, Career, etc. in
the Philippines than essay?
14. Discussion. Example- (14.0) Discuss briefly the functions of measurement and
evaluation.

15. Statements of aim. Example- State the rules in constructing matching type
test.

16. Criticism. As to the adequate, correctness, or relevance of a pointed


statement or student’s answer to a question on the lesson. Example- What is
wrong with the statement “practice makes perfect.”

17. Outline. Example- Outline the rules for constructing multiple-choice test.

18. Reorganization of facts. (a good type of reviews questions to give training in


organization). Example- Discuss the theory-and-practice scheme approach
based upon the book, class discussion, outside reading and actual practice.

19. Formulation of new questions, problems and questions raised. Example- What
else must be known in order to understand the matter into consideration?
4

20. New method or procedure. Example. Suggest a plan for improving the truth or
falsity of the contention that abolition of NCEE is a good policy in education.

Advantage of an essay examination

The advantages of an essay examination are as follows:

1. Easy to construct. With regard to the preparation of the test an essay


examination is easier to construct by a classroom teacher and it saves time
and energy as far as construction is concerned because it involves few items.

2. Economical. Essay examinations is economical when it comes to duplication


facilities like the typewriter, computer, mimeographing machine, etc.,
because the questions can be written on the board. It is also advantageous for
schools which lack duplicating facilities.

3. Trains the core of organizing, expressing, and reasoning power. An essay


examination trains the students to organize, express and reason out their
ideas.

4. Minimize guessing. Responses to an essay examination consist of one or more


sentences, hence, guessing is minimized.

5. Develops critical thinking. An essay examination develops the students to


think critically. Essay questions call for the comparison, analysis,
reorganization of facts, for criticism, for defense of opinion, for decision and
other mental activity.
6. Minimizes cheating and memorizing. Cheating and memorizing in an essay
examination are minimized because essay tests are evaluated in terms of
content and form and that an answer to a questions in composed of one or
more sentences.

7. Develops good study habits. An essay examination develops good study habits
on the part of the students, in the sense, that they study their lesson with
comprehension rather than rote memory.

Disadvantages of Essay Examination

1. Low Validity. Essay examination has low validity for it has limited sampling.

2. Low reliability. Low reliability may occur in an essay examination due to its
subjective of scoring. Tendency of some teachers is to react unfavorably to
5

answers of students whom he considers weak and give favorable


impression to answer of bright students.

3. Low usability. Essay examination is time consuming to both teacher and


students wherein much time and energy are wasted.

4. Encourages bluffing. Another limitation of an essay examination is it


encourages bluffing on the part of the examinee. The tendency of the
students who does not know the answer is to bluff his answers just to
cover up his lack of information. If the student is intelligent enough to
present a worthy discussion as an answer related to the scope covered by
the question, this often misleads the teacher and sometimes such answer
may give a sense of completeness. If bluffing becomes satisfactory on an
essay examination, inaccuracy of the measuring instrument may occur and
evaluation of the student’s achievement may not be valid and reliable.

5. Difficult to correct or score. Another weakness of an essay examination is


the difficulty on the part of the teacher to correct or score due to an
answer to a question consisting of one or more sentences.

6. Disadvantageous for students with poor penmanship. Some teachers react


unfavorably to response of students having poor handwriting and untidy
papers.

Scoring an Essay Examination

To avoid subjectivity of scoring an essay test, the following procedures are


hereby presented.

1. Brush up the answers before scoring.


2. Quickly read through the papers on the basis of your opinions of their
worthiness and sort them into five groups;(a) very superior papers, (b)
superior papers; (c) average, (d) inferior, and (e) very inferior.
3. Read the responses at the same time simultaneously.
4. Re-read the papers in each and shift any that you feel have been
misplaced.
5. Avoid looking at the names of the paper you are scoring.
6

Test Yourself

1. Discuss the advantages and disadvantages of an essay examination.


2. Based on the topic assigned to you, formulate 2 essay questions in each
type of essay.
7

OBJECTIVE EXAMINATION

1. Easy to correct or score. With regard to ease in scoring the test, objective
test is easier to correct by classroom teachers due to short responses
involve in each item. Responses may contain a single word, letter, number,
or phrase in each item.
2. Eliminates subjectivity. An objective test eliminates subjectivity in scoring
because the responses are short and exact.
3. Adequate sampling. More items are included in an objective test where
validity and reliability of the test can be adequately observed.
4. Objectivity in scoring. Scoring objective tests can be objectively done due
to short or one correct response in each item.
5. Eliminates bluffing. Bluffing is eliminated in n objective type of test because
the students only choose the answers from the options provided.
6. Norms can be established. Due to adequate sampling test, norms can be
established.
7. Saves time and energy in answering questions. An objective test saves
student’s time and energy in answering questions because the options are
provided, from which selection of the answers are to be made using short
statements.

LIMITATIONS OF AN INFORMAL OBJECTIVE TEST

1. Difficult to construct. As far as preparation of the test is concerned, an


objective test is difficult to construct. There are more items involved.
2. Encourage cheating and guessing. An objective test encourage s cheating
and guessing due to the short answer given for each item, Responses can
be in the form of a letter, number, word, or phrase.
3. Expensive. Due to adequate sampling of an objective test, it is expensive
when it comes to duplicating facilities. Questions cannot be written on the
board and is disadvantageous for schools without duplicating facilities.
4. Encourages rote memorization. An objective test encourages rote
memorization rather than memorizing logically because an answer to an
item may consist only of a single word or a phrase. A student’s ability to
think critically, express, organize, and reason out hid ideas is not
developed.
5. Time consuming. Preparation of objective test is time consuming in the
part of the teachers.

There are two main types of objective tests, namely, the recall and the recognition.
The recall type is categorized as to: 1) simple recall, and 2) completion
8

Recall Type

SIMPLE RECALL – this test is one of the easiest to construct among the objective
types where the item appears as a direct question. The response requires the subject
to recall previously learned material and the answers are usually short consisting of
either a word or a phrase.

Rules and suggestions for the construction of simple-recall type:

1. The test item should be worded that the response is brief as possible preferably a
simple word, number , symbols, or a very brief phrase.
2. The direct-question form is usually preferable to the statement form.
3. the blanks for the responses should be in a column preferably at the right side .
This facilitates scoring and is more convenient to the students.
4. the question should be so worded that there is only one correct response.
Whenever this is impossible, all acceptable answers should be included in the scoring
key.
5. Make a minimum use of textbook language in wording the questions.

COMPLETION TYPE. This test consists of a series of items which requires the subject
to fill a word or phrase on the blanks. An item may contain one or more blanks.

Some rules and suggestions for the construction of a completion tests:

1. Word each item so that the blank or answer space is toward the end of the
sentence.
2. Avoid indefinite statements.
Meriam Defensor Santiago was born in ________.

3. Avoid over mutilated statements.


Ex. The ____ is obtained by dividing the ___ by the _______.

(It is impossible to tell what the statement refers to.)


4. Avoid giving students unwarranted clues to the desired responses.
a. Avoid lifting statements directly from the book.
b. Omit only key words or phrases rather than trivial details.
c. Whenever possible, avoid “a” or “an” immediately before the blank.
These words may give a clue of whether a response is with a consonant or
vowel.
d. Do not indicate the expected answer by varying lengths of blanks or by
using a dot for each letter in the correct word.
e. avoid giving grammatical clues to the answer expected.

5. Arrange the test so as to facilitate scoring.


9

Recognition type

MULTIPLE-CHOICE TEST

The multiple-choice is regarded as one of the best forms of testing. This form
is most valuable and widely used in standardized test due to its flexibility and
objectivity in scoring.

The multiple-choice item is considered somewhat more difficult to construct


than the other objective items. However, it is much more effective item for
measuring higher cognitive process.

A multiple-choice test is made up of items each of which presents two or more


responses, only one which is correct or definitely better than the others.

Each item must be in form of a complete sentence, question, incomplete


statement, or a stimulus word or phrase. The given response s from which only one
correct response is selected are called options.

The introductory part of an item is called the stem, and its functions are to
ask a question, set the task to be performed, or state the problem to be solved . As a
general rule, after the examinee has read the stem, he or she should understand the
task at hand and know what task is required by the stem.

The suggested responses are called alternatives, responses, or options. Usually,


only one of the alternatives is the correct or best answer to the question or problem
posed. The remaining incorrect alternatives are called “distractors” or “foils”. Their
function is to appear as plausible answer or solutions to the problem for those
examinees who do not possess sufficient knowledge.

In short, each multiple-choice item consists of a stem and a series of


alternative responses, one of which is the correct response. Alternatives that are
incorrect are, for obvious reasons, called distractors.

Parts if multiple-choice item and their functions:

Stem 1. The degree to which the test Functions to the test set the
measures a theoretical trait problem or pose the question
10

a. Content validity Function as plausible answers to


b. Predictive validity the question
c. Concurrent validity for
Those lacking sufficient
knowledge
d. Constructive validity Correct alternative

Alternatives, responses, or ‘options”

The stem

The stem in a multiple-choice question should present the problem so clearly


that students will know what is expected of them. It should be constructed in such a
way that it leads directly to the alternatives without ambiguity. This can be assures if
booth the stem and the correct alternative are written as grammatically complete
statements.

For example:

The Connecticut River originates at the Connecticut Lakes in Northern


Vermont.

Stated this way, the entire item is more likely to have a clearly-stated stem and
a good set of alternatives. Then, break the sentence in the following way to construct
the alternatives, responses and distractors:

The Connecticut River originates at the Connecticut Lakes in:

a. Southern Canada
b. Northwestern New Hampshire
c. Northern Vermont
d. Northeastern Connecticut

It does not matter very much where the stem is split so long as it makes good
sense and contains most of the information. Items at this level should provide clues
for accurate recall in order for the students to be accurate in their selection of an
answer.

It does not matter either whether the stem is written as an incomplete


sentence, as above or whether it is restated as a question for example:

Where does the Connecticut River originate?


11

a. Southern Canada
b. Western New Hampshire
c. Northeast Connecticut

The Alternatives

The alternatives (sometimes called options) are the “multiple choices” from
which students select.

Since the alternatives are as plausible as the correct responses, they are called
“distractors”. They are designed to force student s to think by making their choices
more difficult.

Good multiple-choice items, then, must contain a clearly-stated stem that


contain enough information for students to know exactly what is expected to them.
They must also contain a correct alternative and several incorrect distractors; all of
which must flow logically and grammatically from the stem. The distractors must be
plausible but completely wrong.

Variety of multiple-choice items

A. The correct-answer variety

Who invented the sewing machine?

a. Fulton b. Howe c. Singer d. White e. Whitney

B. The best-answer variety

What is the basic purpose of the Marshall Plan?

a. Military defend Western Europe


b. Re-establish business and industry in Western Europe
c. Settle United States difference with Russia
d. Directly help the hungry homeless in Europe

C. the multiple-response variety

What factors are principally responsible for clotting blood?

a. Contact of blood with foreign substance


b. Contact of blood with injured tissue*
c. Oxidation of hemoglobin
d. Presence on unchanged prothrombin
12

D. The incomplete statement variety

Millions of dollars’ worth of corn, oats, wheat, and rye are destroyed annually in the
United States by:
a. Mildew b. mold c. rust d. smut*

E. The negative variety

Which of these is NOT true of viruses

A. Viruses live only in pants and animals


B. Viruses reproduce themselves
C. Viruses are composed of very large living cells*
D. Viruses can cause disease.

F. The substitution variety


(Passage to be read)

“Surely the forces of education should be fully utilized to acquaint the youth with the
real nature of the danger to democracy, for no other place offers as good or better
1
opportunities than the school for a rational consideration
2 3
of the problems involved.

Items to be answered:

1. a., for* b. for c.-for d. no punctuation needed


2. a. as good or better opportunities than
b. as good opportunities or better than
c. as good opportunities as or better than
d. better opportunities than*
3. a rational* b. radical c reasonable d. realistic

G. The incomplete-alternative variety

An apple that has a sharp, pungent, but no disagreeably sour or bitter, taste is
said to be (4)
a. p b. q c. t* d. v e. w

(The numerical in parentheses indicates the number of letters in the correct answers
which in this case is “tart”)
13

H. The combine-response variety

In what order should these sentences be written in order to make coherent


paragraph?

a. A sharp distinction must be drawn between table manners and sporting manners.

b. This kind of handling of a spoon at the table, however, is likely to procedure


nothing more than an angry protest against squirting grapefruit juice about.
c. Thus, example, a fly ball caught by an outfielder in baseball or a completed pass in
football is a subject for applause.
d. Similarity, the dexterous handling of a spoon in golf to release a ball from the sand
trap may win a championship match.
e. But biscuit or a muffin tossed and caught at the table produce scorn reproach.

A. a,b,c,d,e
B. a,c,e,d,b*
C. a,e,c,d,b
D. b,e,d,c,a

Advantages of multiple-choice items

1. The multiple-choice item can be used to test a greater variety of instructional


objectives.
2. It does not require the examinee to write out and elaborate their answers,
minimizing the opportunity for less knowledgeable examinees to “bluff” or “dress
up” their answers.
3. It focuses on reading and thinking.
4. There is less of a chance for an examinee to guess the correct answer to a multiple-
choice item.

Disadvantages of multiple choice tests

1. They require students to choose from among a fixed list of options, rather than to
create or express their own ideas or solutions.
2. Poorly written multiple choice can be superficial, trivial, and limited to factual
knowledge.
3. Bright pupils may detect flaws in multiple-choice items due to ambiguities of
wording, divergent viewpoints, or only one option on an item is keyed as correct,
they may be penalized.
4. Multiple-choice items tend to be based on “standardized , vulgarized” or
“approved” knowledge and give students the impression that there is a single,
correct answer
14

Some suggestion for constructing a multiple-choice test:

A. Constructing/Improving the Main Stem

1. Statement borrowed from the textbooks or other reference materials must be


avoided. Use familiar phrases to test the comprehension of students/

2. The main stem of the test items may be constructed in question form, completion
form or direction form.

Question Form

Which is the same as four hundred seventy?


a. b. c.

Completion Form

Four hundred seventy is the same as ___.


a. b. c.

Direction form
Add 22 + 43

a. b. c.

3. Articles “an” and “a” must be avoided as last words an incomplete sentence.
These words give clues to probable answers as to whether the best options starts
with consonant or vowel.

4. The main stem should be clear. Avoid awkward stems.

Example of an awkward stem:


If there are 9 chairs in the classroom and 26 children in the class, the class
lacks how many chairs?

a. 7 b.8 c.9

Improved stem:

There are 16 children and 9 chairs in the classroom. How many more chairs are
needed?
a.7 b.8 c.9
15

5. In items testing definition, place the word or items in stem and use definition or
description as alternatives.

6. Avoid negatively-worded items.

B. Constructing/Improving Alternatives

1. Alternatives should be as closely related to each other as possible.


2. Alternative should be arranged according to length: from shortest to
longest or longest to shortest.
3. All options must be plausible with each other to attract the students to
choose distractors or incorrect responses where only those with high
intellectual levels can get the best option.
4. All options must be grammatically consistent. For instance, if the stem
is singular, the options are all singular.
5. Four or more option must be provided in each item to minimize
guessing.
6. The order to correct answers in all items are randomly arranged rather
that following a regular pattern.
7. A uniform number of options in each item must be used. For instance,
if there are twenty items for this type and if item 1 starts with five
options, the rest of the items will have also five options.
8. Avoiding using “not given” “none of the above,” “all of the above,”
etc. as alternative in best-answer types of items.

An illustration of a multiple-choice item that measures behavior in the cognitive


domain

Knowledge Level

Where is the mouth of the Connecticut River Valley located ?

a. New Haven b. New London c. Saybrook d. Essex

Simple recall of information is all that is asked:

Understanding Level

Which term most accurately describes the soil deposits at the base of a
canyon?

a. volcanic rock b. alluvial c. sedimentary deposit

d. conglomerate
16

Children needed to recall information about erosion and soil formation


accurately and understand how these phenomena build specific geographic
formations.

Application Level

To help retain valuable farm lands along a river, man often builds:

a. dikes b. underwater dams


c. waterfalls d. floodgates

Children must apply their knowledge and understanding of rivers and flooding
to know dikes will prevent rampaging flood from carrying the soil away.

Analysis Level

A river that flows between steep mountains for a hundred miles and then
suddenly into a broad plain will require people who live in the plain to build dams:

a. at the head of the canyon


b. at the mouth of the canyon
c. two miles below the mouth of the canyon
d. at several points around the canyon

In analyzing the flow of such a river, students should understand how water
from the mountain streams will swell the water level in the river and cause it to flow
faster and in dangerous amounts. They should conclude, if they can perform at this
cognitive level, that a series a dams will likely afford the best protection.

Synthesis Level

In addition to providing drinking water, a reservoir high in the mountain can


be an important source for which of the following needs of man?

a. transportation b. irrigation c. electricity

Students now will have to analyze the information they have gained about the
flow of water in order to synthesize anew way to make use of the reservoir.

Evaluation Level

Which of the following strategies would be the most equitable solution to the
perennial drought problems of a large population living in a plain below a well
watered upland area?
17

a. Divert the water from the upland lakes by aqueducts


b. change the course of a major river that servers the upland region
c. drill deep wells in the plain area
d. build a series of dams in the upland region to store water for the plains area

Each response is plausible and each poses economic and emotional problems.
Making a thoughtful judgment in terms of available information is called for.
18

Test Yourself

1. Discuss the advantages and disadvantages of objective type tests.

2. Based on the selected topic, construct 10 test items for the following:

a) simple recall
b) completion type

3. Group Activity: ( by pair) : Construct at least 2 multiple choice test items for
every variety of multiple choice.

4. Among the test items constructed, identify the question with levels of
cognitive domain such as:

a) knowledge
b) comprehension
c) application
d) analysis
e) synthesis
f) evaluation.
19

Matching Type

This type consists of two columns in which the proper pairing relationship of
two things is strictly observed. For instance, Column A is to be matched with column
B.

In the balance of matching tests, the number of items is equal to the number
of options. For instance, if there are 15 items column A, there are also 15 points in
column B. In other words, all options have pairs.

On the other hand, it is said to be an unbalanced matching type, if there are


unequal numbers in the two columns. For instance, column A has five items while
column B has seven options.

Suggestions for construction of a matching type test

1. Using heterogeneous materials must be avoided in matching exercises. For


instance , dates and terms, persons and events, measurement and
definitions, and many other must not be mixed with each other. Make a
matching exercise homogeneous.
2. All options, including distractors, must be plausible.
3. The item column must be placed at the left and the option column at the
right.
4. The option column must be arrange in alphabetical order and date on
chronological order to facilitate selection of correct answers. Each option is
assigned a code number or letter.
5. There should only be one correct response for each item.
6. The ideal number of items is 5 to 10 and a maximum of 15.
7. All items just appear on one page to avoid waste of time and energy in
turning pages.

Sample Test Items of Matching-Type

Column A Column B
1. The father of educational a. Cattell,M. 1._________
measurement 2._________
2. The originator of the questionnaire b. Cattell,R.
method and the theory of eugenics c. Ebbeinghaus 3. _________
3. The first to adopt IQ d. Esquirol
4. The founder of quantitative e. Fisher 4._________
study of memory f. Pearson
5. The first to use the term g. Stone 5._________
“mental test” h. Terman
i. Thorndike
20

Test Yourself
1. Construct a 15-item matching type test following the guidelines in test
construction
21

True-false Test

The true-false test is one of the most widely-used objective tests, because it
gives students greater opportunities to show what they have learned
True-false tests can be constructed to assess higher cognitive functioning, and
they have the advantage of sampling large amounts of subject matter.

Good true-false tests are written according to the following principles:

1. Avoid the use of absolute modifiers such as all, none, no, always, never,
nothing, only, alone, more and not, since they are more likely to be false unless
they are a part of a fact or truth.
Moreover, determiners such as many, some, seldom, sometimes,
usually, often, frequently, generally, as a determiners must be avoided because
they give indirect suggestions to possible answers.

Examples: All athletes are strong.(false)


Some Filipinos are hardworking. (true)

2. The test items must be arranged in groups of five to facilitate scoring. The
groups must be separated by a two single spaces and the items within a group
by a single space.

3. In indicating response, it must be as simple as possible where a single letter


is enough to facilitate scoring. For instance, T for true and F for false of X for
true and O for false. It would be better to place responses in one column at the
right margin. Sometimes the response is written before the item number but
the former placement is preferable.

4. The use of similar statements from a book must be avoided to minimize rote
memorization.

5. The items are carefully constructed so that the language is within the level of
the students, hence, flowery statements are avoided.

6. Statements which are partly right and partly wrong must be avoided. The
truth and partly wrong must be avoided. The truth or falseness of a statement
should depend upon the main idea and not upon any minor element, word,
phrase and clause.

7. Ambiguous and double negative statement must be avoided.

8. True-false statements should be equal in number, equal in length and


arranged at random to avoid any clues.
22

9. Correct responses should not follow a pattern, otherwise the students may
be able to give the right symbols although they do not know the real answers.

10. Do not use statement that cannot be answered by true or false or by yes or
no. A statement cannot be answered by true or false if it is too abstract or too
general.
Example:
Filipinos are industrious.

An illustration of a true-false item that measures behavior at the higher cognitive


levels:

Application

Question
A fuse in a television set will prevent lightning from damaging the TV.
(T or F)?

Analysis

Question

When a fuse blows, we know that its resistance to the flow of electricity is
less than that of wire to which it is connected.

The students should be able to analyze the knowledge than have gained and
identify important components of a circuit that makes the system work.

Synthesis

Question

A way to protect an entire neighborhood from excessive electrical current is


to place fuses in the wires leading from the main power line to the neighborhood.

The fifth level of the taxonomy – synthesis- calls for behavior that includes
the ability to rearrange to recombine learned information about electricity arrive at a
different application of the knowledge than the ones studied.
23

Evaluation

True-false items will have to be constructed very carefully to call for evidence
of this ability. In this case, a hypothetical circuit might be presented to the class with
an item asking them to evaluate its usefulness, effectiveness, or safety.

Ex.

You are building a house and you wish to have five wall outlets in each of the
three bedrooms and four overhead light fixtures operated independently of each other.
In addition, each room will be equipped with a 4,000-BTU window airconditioner.

Questions

1. Each room should have its own 15-amp fuse.


2. Each airconditioner should be provided with a separate circuit and fuse.
3. One 20-amp fuse will be adequate for all three rooms.
24

Test Yourself

1. Based on the selected topic, write 15 true-false test items.

2. Among the test questions formulated, identify which one measures the
levels of cognitive domain:

a) knowledge
b) comprehension
c) application
d) analysis
e) synthesis
f) evaluation
25

Analogy Type

This type is made of items consisting of a pair of words which are related to
each other. It is designed to measure the ability of students to observe the pair
relationship of the first group to the second group.

Suggestions for the construction of analogy types of tests

1. The relationship of the first pair of words must be equal to the relationship of the
second.

2. Distractors must be plausible, with correct options to attract students while the
process of obtaining a correct answer is by logical elimination.

3 .All options must be constructed in parallel language.

4. Four or more options must be included in each item to minimize the chances of
guessing. If using less than four options, a correction formula must be applied.

5. Only homogeneous relationships must be included in each item. For instance, if


purpose relationship is used in the first pair of words, the second pair is also purpose
relationship.

There are 15 kinds of relationships. These are:

1 .Purpose Relationship

SHOE is to SHOELACE as DOOR is to

1. transom 2. threshold 3.hinge 4. key

2. Cause-and-Effect Relationship

HEAT is to FIRE as WATER is to

1. sky 2.rain 3. Cloud 4. H2O

3. Part- Whole Relationship

SLICE is to LOAF as ISLAND is to

1. land 2. Archipelago 3.peninsula 4.

Ocean
26

4. Part-Part Relationship

HAND is to ELBOW as FEET is to

1. Muscle 2. Knee 3. Leg 4. Toe

5. Action-to-Object Relationship

OBEY is to CHILDREN as COMMAND is to

1. Performance 2. Parents 3. Army 4. Result

6. Object-to-Action Relationship

GUN is to SHOOT as CAR is to

1. Enjoy 2. Drive 3. Move 4. Repair

7. Synonym Relationship

DIG: ESCAVATE: KILL: _____

1. try 2.average 3. convict 4. slay

8. Antonym Relationship

FLY: SPIDER: MOUSE: _____

1. rat 2. cat 3. rodent 4. Animal

9. Place Relationship

WATER: AQUEDUCT: BLOOD: _____

1. corpuscle 2. Body 3. Dumb 4. Rough

10. Degree Relationship

POSSIBBLE: PROBABLE: HOPE: ____

1. expect 2. Deceive 3. Resent 4. Prove


27

1 1.Characteristic Relationship

PILOT: ALERT: MARKSMAN: ______

1. strong 2. cruel 3. kind 4. steady

12. Sequence Relationship

TUESDAY: THURSDAY: SATURDAY: _____

1. Sunday 2.Monday 3.Wednesday 4.Friday

13. Grammatical Relationship

SHABBY: SHABBILY: HARMONIOUS: _____

1. harp 2.harmonica 3. Harmoniously 4. Harmony

14. Numerical Relationship

2:8:1/3:____

1. 2/3 2. 4/3 3. 12 4. 4

15. Association Relationship

ROMANCE: MOON: RIBBON: _____

1. gift 2. horse 3. baloney 4. city


28

Test Yourself

1. Discuss the guidelines in constructing analogy types of test.

2. Construct 2 test items for each type of analogy test.


29

Module 5. GRADING SYSTEMS

Objectives:

1. Differentiate Norm-referenced grading from criterion-referenced


grading system

2. Discus some of the grading systems

Assessment of student performance is essentially knowing how the teacher is


progressing in a course. The first step in assessment is, of course, testing (either by
paper-pencil objective tests or by some performance-based testing procedure).

Grading therefore is the nest step after testing. Different schools had different
grading systems. In the American system for example, grades are expressed in terms
of letters, A, B, B+, B- C, C- D or what is referred to as a seven-point system. In
Philippine colleges and universities, the letters are replaced with numerical values:1,
1.25,. 1.50, 1.75, 2.0, 2.5, 3.0 and 4.0 or an eight-point system. In basic education
grades are expressed as percentages (of accomplishment) such as 80% or 75%.
Whatever be the system.

Norm-Referenced Grading

The most commonly used grading system falls under the category of norm-
reference grading. Norm-referenced grading refers to a grading system wherein a
student’s grade is placed in relation to the performance of a group. Thus, in this
system, a grade of 80 means that the student performed better than or same as 80%
of the class (or group).

In norm-referenced grading, the students, while they may work individually,


are actually in competition to achieve a standard of performance that will classify
them into a desired grade range.

Criterion-referenced Grading Systems

Criterion-referenced grading systems are based on a fixed criterion measure.


There is a fixed target and the students must achieve that target in order to obtain a
passing grade in a course regardless of how the other students in the class perform.
Criterion-referenced systems are often used in situations where the teachers
are agreed on the meaning of a “standard of performance” .
30

Most schools use the criterion-referenced grading system with a fixed standard
formula.

Alternative Grading System

Pass-Fail Systems. Other colleges and universities/institutions use pass-fail


grading systems especially in the fine arts and music. Also in practicum activities.

Non-graded Evaluations. This is not practiced in Philippine schools. In non-


graded evaluation, numeric or letter grades are not assigned. Performances are
described in narrative form.

Standardized test Scoring

Test standardization is a process by which teacher or researcher-made tests


are validated and item analyzed. After a thorough process of validation, the test
characteristics are established. Standardized tests are psychometric instruments
whose scoring systems are developed by norming the test using national samples of
test takers.

Cumulative and Averaging System of Grading

In the Phil., there are two types of grading systems used: the averaging and the
cumulative grading systems. In the averaging system, the grade of a student on a
particular grading period equals the average of the grades obtained in the prior
grading periods and the current grading period. In the cumulative grading system,
the grade of a student in a grading period equals his current grading period which is
assumed to have the cumulative effects of the previous grading periods.
31

Test Yourself

1. Describe the different grading systems used in rating students’ performance.

2. In your observation, in which grading system in the Philippines would there

be more fluctuations observed in students’ grades? Cumulative or averaging?


32

Module 6. AUTHENTIC AND ALTERNATIVE ASSESSMENT

In the traditional assessment method, tests are standardized and uniform and
rather impersonal and absolute. Precisely, because of these characteristics , they can
not be fair. A test is fair when it is appropriate and flexible. Generally authentic
assessment is designed for evaluating strengths and weaknesses which cannot be
measured by traditional tests.

There are ways of assessing students’ learning other than the traditional methods.

1. Mind-mapping – involves writing down a central idea and thinking up new and
related ideas which radiate from the center. Mind maps can be used as visual
aids, speaker’s guides, note-taking technique, evaluation tools, brainstorming
and awareness-raising tools.

Example:
Benefits:
Uses:
-understanding
-notetaking
-recall
-

MIND-
MAPPING

Tips: How?

-symbols/colors

-emphasize caps

2. Case Study – A case study in an oral or written account of a situation in which


learners are asked to identify or solve a problem.
Case studies use a storyline approach to document projects, processes, issues and
events.
33

How to Prepare a case study:


a. identify its purpose
b. the performance objectives it is intended to support
c. the targeted learners
d. conduct some research inside and outside the organization
e. try to find some existing related studies
When to use a case study:
a. to encourage assertive behavior
b. to develop critical thinking and decision-making skills
c. to provide realistic and practical experience
d. to evaluate learning and test analytical knowledge or abilities
e. to learn to separate facts from inferences

Advantages:
a. actively involves participants
b. keeps interest levels high because of participant activity and
relevance to real world situations
c. blends well with other methods
(Ex. Make a case study on how boys respond to testing compared to girls)

3. Self-expression through pictures- is a means by which students express and share


their personal perspectives by crystallizing these pictures. The purpose of this is
to let students reflect on what they have learned.
(Ex.>>. From these pictures above, choose one that best describes you
and why?

Discuss in class.
>> Make a collage of a particular theme

4. Role Play . Role playing is an activity where students act out situations. A facilitator
leads the discussion of the ideas and feelings that emerge. Students are given a
problem situation and a short description through which they depict real-life
responses and behavior.
Purpose:
a. to practice interaction with people
b. behavior rehearsal and behavior modelling
c. assessment and evaluation

Example: Role play the following learning situations:


.Historical events: declaration of independence;
EDSA rev.
34

4. Problem-Solving – is a teaching strategy that employs scientific methods in


searching for scientific information. The purpose of problem solving is to
develop skills in employing scientific processes, develop higher order thinking
skills while students learn to accept the opinions of others.
Five basic steps in scientific method:
1. sensing and defining a problem
2. formulating hypothesis
3. testing the likely hypothesis (by observing, conducting
experiment, survey, and collecting data)
4. analyzing, interpreting and evaluating
4. formulating conclusions

Tips for Problem-solving:


1. Take a problem that is challenging and write down your thought
processes as you solve it.
2. Students learn by doing. Help them trouble-shoot in class before they
have to do it alone on the homework. Put students in small groups with
similar competency levels.
3. Try not to solve the problems for students.
4. When teaching abstracts concepts, make analogies to concrete things in
everyday life. Encourage students to come up with analogies.
5. Focus on problem-solving as a skill to be learned
5. Help students to be aware of their own process of problem-solving Pair
students up according to their competency and give each pair a problem
to work on.
(Ex. Try this problem: Conservation of energy at home)

5. Timelines. A timeline is a listing of key events with their corresponding dates. It is


most often used as tool in participatory appraisal. Time line is a useful tool to
complement case studies and action research. It provides lots of historical
information in a simple and easily understandable form and it shows the importance
of the past to the present. The purpose of timelines is that they can reveal what a
person or community believes to be important in their history.

(Ex. Make a timeline of events before the Philippines was declared


independent.)

6. Narratives – As a teaching tool, narratives are anchored on two popular


arguments, namely: they facilitate consolidation of information needed to
understand a concept, a story or a literary piece; or they are powerful tools for
instant recall. Narratives help students develop skills in recognizing coherent and
accurate accounts of events
35

Good sources of narratives are stories shared with one another in both
formal and informal settings. A narrative is a powerful tool for instant recall.
Students are encouraged to construct their own narratives because they test
student understanding and improve one’s ability to consolidate segments of
information into a coherent whole.

Sources of narratives are life stories of prominent people, biographies,


autobiographies, historical accounts, research activities and chronicles of significant
events.

(Activity: Make a narrative of the history of your school from the time it was
founded up to the present.)

7. Portfolio – is a documentation of learning. It is a purposeful collection of a


student’s work that exhibits his or her learning efforts or achievements in one or
more areas. It is a new approach to evaluating student learning. It brings the
assessment process into alignment with instructional goals by testing these goals
more directly. It generally contains examples of students’ work over a period of time
which give them the capacity to demonstrate improvements in performance. The
overall purpose of the portfolio is to enable the student to demonstrate learning and
progress. The greatest value of portfolios is that, in building them, students become
active participants in the learning process and its assessment.

Key characteristics of portfolio assessment:

a. A portfolio is a form of assessment that students do together with the


teachers.
b. A portfolio is not just a collection of student work but a selection-the
student must be involved in choosing and justifying the pieces to be included.
c. A portfolio provides samples of students’ work which shows growth over
time. By reflecting on their own learning (self-assessment), students begin to
identify the strengths and weaknesses in their work. These weaknesses then become
improvement goals.
d. The criteria for selecting and assessing the portfolio contents must be clear
to teacher and student at the start of the process.

Why use portfolio:

Portfolio Assessment
a. matches assessment to teaching
b. has clear goals
c. gives a profile of learner abilities
d. a tool of assessing a variety of skills
e. develops awareness of own learning
36

f. caters to individuals in a heterogeneous class


g. develops social skills
h. develops independent and active learners
i. provides opportunity for student-teacher dialogue
j. can improve motivation for learning
k. is an efficient tool for demonstrating learning

Essential elements of the portfolio:

1. cover letter- (about the author)


2. Table of contents (with numbered pages)
3. Entries (both core and optional)
4. Dates on all entries ( to facilitate proof of growth over time)
5. Drafts of oral and written products and revised versions or corrected
6. Reflections can appear at different stages of the learning process.
-what did I learn from it?
-what did I do well?
-what do I want to improve in the item?
-how do I feel about my performance?
-what were the problem areas?

Stages in implementing portfolio assessment


1. Identifying teaching goals to assess through the portfolio.
2. Introducing the idea of portfolios to your class
3. Specifying portfolio content
4. Give clear and detailed guidelines for portfolio presentation
5. Explain how the portfolio will be graded.

Assessing portfolio:
1. Quality not quantity counts.
2. Get students to provide route maps
3. Be clear about what you are assessing
4. Structure your feedback
5. Provide opportunities for self-assessment
6. Set up an exhibition
7. Assess in a team
8. Encourage creativity
9. Think about where and when you will mark portfolios

(Example : Evidences of learning in Experiential learning Courses --FS and Practice


Teaching)

Types of Portfolio
37

a. documentation portfolio -- involves a collection of work over time


showing growth and improvement reflecting students’ learning of
identified outcomes.
b. Process portfolio --- demonstrates all facets of the learning process.
c. Showcase portfolio --- only shows the best of the students’ outputs and
products.
38

Test Yourself

1. Differentiate traditional method from authentic method of assessing students’


performance.

2. Discuss how each alternative assessment is done to measure performance.

3. Propose a portfolio assessment for an identified activity. Specify the format and
the desired contents. Propose an assessment method for the portfolio.
39

Module 7. SCORING RUBRICS

Objectives:

1. Design a teacher-made scoring rubric appropriate for any learning


activity/output
2. Differentiate analytical rubric from holistic rubric

What is a scoring rubric?

  A Scoring rubric is a scoring scale used to assess student performance along a


task-specific set of criteria.

Authentic assessments typically are criterion-referenced measures.  That is, a


student's aptitude on a task is determined by matching the student's performance
against a set of criteria to determine the degree to which the student's performance
meets the criteria for the task.  To measure student performance against a pre-
determined set of criteria, a rubric, or scoring scale, is typically created which
contains the essential criteria for the task and appropriate levels of performance for
each criterion. 

For example, the following rubric (scoring scale) covers the research portion of a
project:

Research Rubric

Criteria 1 2 3
Number of
x1 1-4 5-9 10-12
Sources
Lots of
Historical Few No apparent
x3 historical
Accuracy inaccuracies inaccuracies
inaccuracies
Can not tell Can tell with
Can easily tell
from which difficulty
which sources
Organization x1 source where
info was
information information
drawn from
came came from
Bibliography x1 Bibliography Bibliography All relevant
contains very contains most information is
little relevant included
40

information information

As in the above example, a rubric is comprised of two components:  criteria


and levels of performance.  Each rubric has at least two criteria and at least two levels
of performance.  The criteria, characteristics of good performance on a task, are
listed in the left-hand column in the rubric above (number of sources, historical
accuracy, organization and bibliography). Actually, as is common in rubrics, the
author has used shorthand for each criterion to make it fit easily into the table. The
full criteria are statements of performance such as "include a sufficient number of
sources" and "project contains few historical inaccuracies."

  For each criterion, the evaluator applying the rubric can determine to what
degree the student has met the criterion, i.e., the level of performance. In the above
rubric, there are three levels of performance for each criterion. For example, the
project can contain lots of historical inaccuracies, few inaccuracies or no inaccuracies.

Finally, the rubric above contains a mechanism for assigning a score to each
project. (Assessments and their accompanying rubrics can be used for purposes
other than evaluation and, thus, do not have to have points or grades attached to
them.) In the second-to-left column a weight is assigned each criterion. Students can
receive 1, 2 or 3 points for "number of sources." But historical accuracy, more
important in this teacher's mind, is weighted three times (x3) as heavily. So, students
can receive 3, 6 or 9 points (i.e., 1, 2 or 3 times 3) for the level of accuracy in their
projects.

 Descriptors

The above rubric includes another common, but not a necessary, component
of rubrics -- descriptors. Descriptors spell out what is expected of students at each
level of performance for each criterion. In the above example, "lots of historical
inaccuracies," "can tell with difficulty where information came from" and "all relevant
information is included" are descriptors. A descriptor tells students more precisely
what performance looks like at each level and how their work may be distinguished
from the work of others for each criterion. Similarly, the descriptors help the teacher
more precisely and consistently distinguish between student work.

Many rubrics do not contain descriptors, just the criteria and labels for the
different levels of performance. For example, imagine we strip the rubric above of its
descriptors and put in labels for each level instead. Here is how it would look:

Criteria Poor (1) Good (2) Excellent (3)


Number of Sources x1      
Historical Accuracy x3      
41

Organization x1      
Bibliography x1      

It is not easy to write good descriptors for each level and each criterion. So,
when you first construct and use a rubric you might not include descriptors. That is
okay. You might just include the criteria and some type of labels for the levels of
performance as in the table above. Once you have used the rubric and identified
student work that fits into each level it will become easier to articulate what you
mean by "good" or "excellent." Thus, you might add or expand upon descriptors the
next time you use the rubric.

 Why Include Levels of Performance?

Clearer expectations

As mentioned in Step 3, it is very useful for the students and the teacher if the
criteria are identified and communicated prior to completion of the task. Students
know what is expected of them and teachers know what to look for in student
performance. Similarly, students better understand what good (or bad) performance
on a task looks like if levels of performance are identified, particularly if descriptors
for each level are included.

More consistent and objective assessment

In addition to better communicating teacher expectations, levels of performance


permit the teacher to more consistently and objectively distinguish between good
and bad performance, or between superior, mediocre and poor performance, when
evaluating student work.

Better feedback

Furthermore, identifying specific levels of student performance allows the teacher to


provide more detailed feedback to students. The teacher and the students can more
clearly recognize areas that need improvement.

 Analytic Versus Holistic Rubrics

For a particular task you assign students, do you want to be able to assess how well
the students perform on each criterion, or do you want to get a more global picture
of the students' performance on the entire task? The answer to that question is likely
to determine the type of rubric you choose to create or use: Analytic or holistic.

Analytic rubric
42

Most rubrics, like the Research rubric above, are analytic rubrics. An analytic rubric
articulates levels of performance for each criterion so the teacher can assess student
performance on each criterion. Using the Research rubric, a teacher could assess
whether a student has done a poor, good or excellent job of "organization" and
distinguish that from how well the student did on "historical accuracy."

Holistic rubric

In contrast, a holistic rubric does not list separate levels of performance for each
criterion. Instead, a holistic rubric assigns a level of performance by assessing
performance across multiple criteria as a whole.

For example, the analytic research rubric above can be turned into a holistic rubric:

3 - Excellent Researcher

 included 10-12 sources


 no apparent historical inaccuracies
 can easily tell which sources information was drawn
from
 all relevant information is included

2 - Good Researcher

 included 5-9 sources


 few historical inaccuracies
 can tell with difficulty where information came from
 bibliography contains most relevant information

1 - Poor Researcher

 included 1-4 sources


 lots of historical inaccuracies
 cannot tell from which source information came
 bibliography contains very little information

In the analytic version of this rubric, 1, 2 or 3 points is awarded for the number
of sources the student included. In contrast, number of sources is considered along
with historical accuracy and the other criteria in the use of a holistic rubric to arrive at
a more global (or holistic) impression of the student work.

When to choose an analytic rubric


43

Analytic rubrics are more common because teachers typically want to assess
each criterion separately, particularly for assignments that involve a larger number of
criteria. It becomes more and more difficult to assign a level of performance in a
holistic rubric as the number of criteria increases. For example, what level would you
assign a student on the holistic research rubric above if the student included 12
sources, had lots of inaccuracies, did not make it clear from which source information
came, and whose bibliography contained most relevant information? As student
performance increasingly varies across criteria it becomes more difficult to assign an
appropriate holistic category to the performance. Additionally, an analytic rubric
better handles weighting of criteria. How would you treat "historical accuracy" as
more important a criterion in the holistic rubric? It is not easy. But the analytic rubric
handles it well by using a simple multiplier for each criterion.

When to choose a holistic rubric

So, when might you use a holistic rubric? Holistic rubrics tend to be used when
a quick or gross judgment needs to be made. If the assessment is a minor one, such
as a brief homework assignment, it may be sufficient to apply a holistic judgment
(e.g., check, check-plus, or no-check) to quickly review student work. But holistic
rubrics can also be employed for more substantial assignments. On some tasks it is
not easy to evaluate performance on one criterion independently of performance on
a different criterion. For example, many writing rubrics are holistic because it is not
always easy to disentangle clarity from organization or content from presentation.
So, some educators believe a holistic or global assessment of student performance
better captures student ability on certain tasks. (Alternatively, if two criteria are
nearly inseparable, the combination of the two can be treated as a single criterion in
an analytic rubric.)

 How Many Levels of Performance Should I Include in my Rubric?

There is no specific number of levels a rubric should or should not possess. It


will vary depending on the task and your needs. A rubric can have as few as two
levels of performance (e.g., a checklist) or as many as ... well, as many as you decide
is appropriate. (Some do not consider a checklist a rubric because it only has two
levels -- a criterion was met or it wasn't. But because a checklist does contain criteria
and at least two levels of performance, I include it under the category of rubrics.)
Also, it is not true that there must be an even number or an odd number of levels.
Again, that will depend on the situation.

To further consider how many levels of performance should be included in a


rubric, I will separately address analytic and holistic rubrics.

Analytic rubrics
44

Generally, it is better to start with a smaller number of levels of performance


for a criterion and then expand if necessary. Making distinctions in student
performance across two or three broad categories is difficult enough. As the number
of levels increases, and those judgments become finer and finer, the likelihood of
error increases.

Thus, start small. For example, in an oral presentation rubric, amount of eye
contact might be an important criterion. Performance on that criterion could be
judged along three levels of performance: never, sometimes, always.

makes eye contact with audience never sometimes always

Although these three levels may not capture all the variation in student
performance on the criterion, it may be sufficient discrimination for your purposes.
Or, at the least, it is a place to start. Upon applying the three levels of performance,
you might discover that you can effectively group your students' performance in
these three categories. Furthermore, you might discover that the labels of never,
sometimes and always sufficiently communicates to your students the degree to
which they can improve on making eye contact.

On the other hand, after applying the rubric you might discover that you
cannot effectively discriminate among student performance with just three levels of
performance. Perhaps, in your view, many students fall in between never and
sometimes, or between sometimes and always, and neither label accurately captures
their performance. So, at this point, you may decide to expand the number of levels
of performance to include never, rarely, sometimes, usually and always.

makes eye contact never rarely sometimes usually always

There is no "right" answer as to how many levels of performance there should be for
a criterion in an analytic rubric; that will depend on the nature of the task assigned,
the criteria being evaluated, the students involved and your purposes and
preferences. For example, another teacher might decide to leave off the "always"
level in the above rubric because "usually" is as much as normally can be expected or
even wanted in some instances. Thus, the "makes eye contact" portion of the rubric
for that teacher might be

makes eye contact never rarely sometimes usually

So, I recommend that you begin with a small number of levels of performance
for each criterion, apply the rubric one or more times, and then re-examine the
number of levels that best serve your needs. I believe starting small and expanding if
45

necessary is preferable to starting with a larger number of levels and shrinking the
number because rubrics with fewer levels of performance are normally

 easier and quicker to administer


 easier to explain to students (and others)
 easier to expand than larger rubrics are to shrink

The fact that rubrics can be modified and can reasonably vary from teacher to
teacher again illustrates that rubrics are flexible tools to be shaped to your purposes.

Holistic rubrics

Much of the advice offered above for analytic rubrics applies to holistic rubrics
as well. Start with a small number of categories, particularly since holistic rubrics
often are used for quick judgments on smaller tasks such as homework assignments.
For example, you might limit your broad judgments to

 satisfactory
 unsatisfactory
 not attempted

or

 check-plus
 check
 no check

or even just

 satisfactory (check)
 unsatisfactory (no check)

Of course, to aid students in understanding what you mean by "satisfactory"


or "unsatisfactory" you would want to include descriptors explaining what
satisfactory performance on the task looks like.

Even with more elaborate holistic rubrics for more complex tasks I
recommend that you begin with a small number of levels of performance. Once you
have applied the rubric you can better judge if you need to expand the levels to more
effectively capture and communicate variation in student performance.

  In Step 1 of creating an authentic assessment, you identified what you wanted


your students to know and be able to do -- your standards.
46

In Step 2, you asked how students could demonstrate that they had met your
standards. As a result, you developed authentic tasks they could perform.

In Step 3, you identified the characteristics of good performance on the


authentic task -- the criteria.

Now, in Step 4, you will finish creating the authentic assessment by


constructing a rubric to measure student performance on the task. To build the
rubric, you will begin with the set of criteria you identified in Step 3. As mentioned
before, keep the number of criteria manageable. You do not have to look for
everything on every assessment.

Once you have identified the criteria you want to look for as indicators of
good performance, you next decide whether to consider the criteria analytically or
holistically. 

Creating an Analytic Rubric

In an analytic rubric performance is judged separately for each criterion.


Teachers assess how well students meet a criterion on a task, distinguishing between
work that effectively meets the criterion and work that does not meet it. The next
step in creating a rubric, then, is deciding how fine such a distinction should be made
for each criterion. For example, if you are judging the amount of eye contact a
presenter made with his/her audience that judgment could be as simple as did or did
not make eye contact (two levels of performance), never, sometimes or always made
eye contact (three levels), or never, rarely, sometimes, usually, or always made eye
contact (five levels).

Generally, it is better to start small with fewer levels because it is usually


harder to make more fine distinctions. For eye contact, I might begin with three
levels such as never, sometimes and usually. Then if, in applying the rubric, I found
that some students seemed to fall in between never and sometimes, and never or
sometimes did not adequately describe the students' performance, I could add a
fourth (e.g., rarely) and, possibly, a fifth level to the rubric.

In other words, there is some trial and error that must go on to arrive at the
most appropriate number of levels for a criterion. (See the Rubric Workshop below
to see more detailed decision-making involved in selecting levels of performance for
a sample rubric.)

Do I need to have the same number of levels of performance for each


criterion within a rubric?
47

No. You could have five levels of performance for three criteria in a rubric,
three levels for two other criteria, and four levels for another criterion, all within the
same rubric. Rubrics are very flexible Alaskan Moose. There is no need to force an
unnatural judgment of performance just to maintain standardization within the
rubric. If one criterion is a simple either/or judgment and another criterion requires
finer distinctions, then the rubric can reflect that variation.

Here are some examples of rubrics with varying levels of performance......

Do I need to add descriptors to each level of performance?

No. Descriptors are recommended but not required in a rubric. As described in


Rubrics, descriptors are the characteristics of behavior associated with specific levels
of performance for specific criteria. For example, in the following portion of an
elementary science rubric, the criteria are 1) observations are thorough, 2)
predictions are reasonable, and 3) conclusions are based on observations. Labels
(limited, acceptable, proficient) for the different levels of performance are also
included. Under each label, for each criterion, a descriptor (in brown) is included to
further explain what performance at that level looks like.

Criteria Limited Acceptable Proficient


most all
observations
made good observations observations
are absent or
observations are clear and are clear and
vague
detailed detailed
predictions are most
made good all predictions
absent or predictions are
predictions are reasonable
irrelevant reasonable
conclusion is
conclusion is
absent or conclusion is
appropriate consistent with
inconsistent consistent with
conclusion most
with observations
observations
observations

  As you can imagine, students will be more certain what is expected to reach
each level of performance on the rubric if descriptors are provided. Furthermore, the
more detail a teacher provides about what good performance looks like on a task the
better a student can approach the task. Teachers benefit as well when descriptors
are included. A teacher is likely to be more objective and consistent when applying a
descriptor such as "most observations are clear and detailed" than when applying a
simple label such as "acceptable." Similarly, if more than one teacher is using the
same rubric, the specificity of the descriptors increases the chances that multiple
48

teachers will apply the rubric in a similar manner. When a rubric is applied more
consistently and objectively it will lead to greater reliability and validity in the results.

Assigning point values to performance on each criterion

As mentioned above, rubrics are very flexible tools. Just as the number of
levels of performance can vary from criterion to criterion in an analytic rubric, points
or value can be assigned to the rubric in a myriad of ways. For example, a teacher
who creates a rubric might decide that certain criteria are more important to the
overall performance on the task than other criteria. So, one or more criteria can be
weighted more heavily when scoring the performance. For example, in a rubric for
solo auditions, a teacher might consider five criteria: (how well students
demonstrate) vocal tone, vocal technique, rhythm, diction and musicality. For this
teacher, musicality might be the most important quality that she has stressed and is
looking for in the audition. She might consider vocal technique to be less important
than musicality but more important than the other criteria. So, she might give
musicality and vocal technique more weight in her rubric. She can assign weights in
different ways. Here is one common format:

Rubric 1: Solo Audition

  0 1 2 3 4 5 weight
vocal tone              
vocal technique             x2
rhythm              
diction              
musicality             x3

In this case, placement in the 4-point level for vocal tone would earn the
student four points for that criterion. But placement in the 4-point box for vocal
technique would earn the student 8 points, and placement in the 4-point box for
musicality would earn the student 12 points. The same weighting could also be
displayed as follows:

Rubric 2: Solo Audition

  NA Poor Fair Good Very Good Excellent


vocal tone 0 1 2 3 4 5
vocal technique 0 2 4 6 8 10
rhythm 0 1 2 3 4 5
49

diction 0 1 2 3 4 5
musicality 0 3 6 9 12 15

In both examples, musicality is worth three times as many points as vocal


tone, rhythm and diction, and vocal technique is worth twice as much as each of
those criteria. Pick a format that works for you and/or your students. There is no
"correct" format in the layout of rubrics. So, choose one or design one that meets
your needs.

Yes, but do I need equal intervals between the point values in a rubric?

No. Say it with me one more time -- rubrics are flexible tools. Shape them to fit
your needs, not the other way around. In other words, points should be distributed
across the levels of a rubric to best capture the value you assign to each level of
performance. For example, points might be awarded on an oral presentation as
follows:

Rubric 3: Oral Presentation

Criteria never sometimes always


makes eye contact 0 3 4
volume is appropriate 0 2 4
enthusiasm is evident 0 2 4
summary is accurate 0 4 8

In other words, you might decide that at this point in the year you would be
pleased if a presenter makes eye contact "sometimes," so you award that level of
performance most of the points available. However, "sometimes" would not be as
acceptable for level of volume or enthusiasm.

Here are some more examples of rubrics illustrating the flexibility of number
of levels and value you assign each level.

Rubric 4: Oral Presentation

Criteria never sometimes usually


makes eye contact 0 2 4
volume is appropriate 0 4
enthusiasm is evident 0 4
summary is accurate 0 4 8
50

In the above rubric, you have decided to measure volume and enthusiasm at
two levels -- never or usually -- whereas, you are considering eye contact and accuracy
of summary across three levels. That is acceptable if that fits the type of judgments
you want to make. Even though there are only two levels for volume and three levels
for eye contact, you are awarding the same number of points for a judgment of
"usually" for both criteria. However, you could vary that as well:

Rubric 5: Oral Presentation

Criteria never sometimes usually


makes eye contact 0 2 4
volume is appropriate 0 2
enthusiasm is evident 0 2
summary is accurate 0 4 8

In this case, you have decided to give less weight to volume and enthusiasm
as well as to judge those criteria across fewer levels.

So, do not feel bound by any format constraints when constructing a rubric.
The rubric should best capture what you value in performance on the authentic task.
The more accurately your rubric captures what you want your students to know and
be able to do the more valid the scores will be.

 Creating a Holistic Rubric

In a holistic rubric, a judgment of how well someone has performed on a task


considers all the criteria together, or holistically, instead of separately as in an
analytic rubric. Thus, each level of performance in a holistic rubric reflects behavior
across all the criteria. For example, here is a holistic version of the oral presentation
rubric above.

Rubric 6: Oral Presentation (Holistic)

Oral Presentation Rubric


Mastery

 usually makes eye contact


 volume is always appropriate
 enthusiasm present throughout presentation
 summary is completely accurate

Proficiency
51

 usually makes eye contact


 volume is usually appropriate
 enthusiasm is present in most of presentation
 only one or two errors in summary

Developing

 sometimes makes eye contact


 volume is sometimes appropriate
 occasional enthusiasm in presentation
 some errors in summary

Inadequate

 never or rarely makes eye contact


 volume is inappropriate
 rarely shows enthusiasm in presentation
 many errors in summary

An obvious, potential problem with applying the above rubric is that


performance often does not fall neatly into categories such as mastery or proficiency.
A student might always make eye contact, use appropriate volume regularly,
occasionally show enthusiasm and include many errors in the summary. Where you
put that student in the holistic rubric? Thus, it is recommended that the use of holistic
rubrics be limited to situations when the teacher wants to:

 make a quick, holistic judgment that carries little weight in evaluation, or


 evaluate performance in which the criteria cannot be easily separated.

Quick, holistic judgments are often made for homework problems or journal
assignments. To allow the judgment to be quick and to reduce the problem
illustrated in the above rubric of fitting the best category to the performance, the
number of criteria should be limited. For example, here is a possible holistic rubric for
grading homework problems.
52

Rubric 7: Homework Problems

Homework Problem Rubric


++ (3 pts.)

 most or all answers correct, AND


 most or all work shown

+ (1 pt.)

 at least some answers correct, AND


 at least some but not most work shown

- (0 pts.)

 few answers correct, OR


 little or no work shown

Although this homework problem rubric only has two criteria and three levels
of performance, it is not easy to write such a holistic rubric to accurately capture
what an evaluator values and to cover all the possible combinations of student
performance. For example, what if a student got all the answers correct on a
problem assignment but did not show any work? The rubric covers that: the student
would receive a (-) because "little or no work was shown." What if a student showed
all the work but only got some of the answers correct? That student would receive a
(+) according to the rubric. All such combinations are covered. But does giving a (+)
for such work reflect what the teacher values? The above rubric is designed to give
equal weight to correct answers and work shown. If that is not the teacher's intent
then the rubric needs to be changed to fit the goals of the teacher.

All of this complexity with just two criteria -- imagine if a third criterion were
added to the rubric. So, with holistic rubrics, limit the number of criteria considered,
or consider using an analytic rubric.

 Final Step: Checking Your Rubric

As a final check on your rubric, you can do any or all of the following before
applying it.

 Let a colleague review it.


 Let your students review it -- is it clear to them?
53

 Check if it aligns or matches up with your standards.


 Check if it is manageable.
 Consider imaginary student performance on the rubric.

By the last suggestion I mean to imagine that a student had met specific levels
of performance on each criterion (for an analytic rubric). Then ask yourself if that
performance translates into the score that you think is appropriate. For example, on
Rubric 3 above, imagine a student scores

 "sometimes" for eye contact (3 pts.)


 "always" for volume (4 pts.)
 "always" for enthusiasm (4 pts.)
 "sometimes" for summary is accurate (4 pts.)

That student would receive a score of 15 points out of a possible 20 points.


Does 75% (15 out of 20) capture that performance for you? Perhaps you think a
student should not receive that high of a score with only "sometimes" for the
summary. You can adjust for that by increasing the weight you assign that criterion.
Or, imagine a student apparently put a lot of work into the homework problems but
got few of them correct. Do you think that student should receive some credit? Then
you would need to adjust the holistic homework problem rubric above. In other
words, it can be very helpful to play out a variety of performance combinations
before you actually administer the rubric. It helps you see the forest through the
trees.

Of course, you will never know if you really have a good rubric until you apply
it. So, do not work to perfect the rubric before you administer it. Get it in good shape
and then try it. Find out what needs to be modified and make the appropriate
changes.

Okay, does that make sense? Are you ready to create a rubric of your own?
Well, then come into my workshop and we will build one together.

(For those who might be "tabularly challenged" (i.e., you have trouble making
tables in your word processor) or would just like someone else to make the rubric
into a tabular format for you, there are websites where you enter the criteria and
levels of performance and the site will produce the rubric for you.)
54

Test Yourself
 

1. What is a scoring rubric? When are you going to use it?


2. Differentiate analytic from holistic rubrics
3. Design some learning activities/output sand create a scoring rubric for each.

You might also like