Professional Documents
Culture Documents
Module 4. Teacher-Made Tests
Module 4. Teacher-Made Tests
Module 4. Teacher-Made Tests
TEACHER-MADE TESTS
Objectives:
Introduction
In the field of teaching, written and oral tests are common ways of measuring
student achievement.
Teacher-made tests provide more frequent evaluation and they are more
closely related to schools’ and teachers’ objectives and content of the course.
Teacher-made tests are made by teachers and are often prepared hurriedly
and are not subjected to any statistical procedures. Standardized tests are generally
prepared by specialists and are given to a large population and were subjected to
validity and reliability tests.
ESSAY EXAMINATION
1. Selective recall. The basis is given. For instance, name the heads of states in
the world who had been awarded as the “WORLD WHOS’S WHO OF
WOMEN.”
2. Evaluating recall. The basis is also given. For example- name five departments
in the Philippines which have had the greatest influences on the economic
development of country.
3. Comparison of two things (specific). There is one single designed basis.
Examples (3.1) compare bamboo raft and state methods in culturing
Eucheuma. (3.2) compare traditional and modern methods in teaching
mathematics.
5. Decision (for or against). Examples- (5.1) Which, in your own opinion, can you
do better, an oral or written examination? Why? (5.2) In your opinion, which is
better, essay or objective examination? Why?
9. Analysis. (the word itself is seldom involved in the questions). Example- what
are the characteristics of Cory Aquino which make you understand why
Filipino people sympathize her?
12. Classification. (usually the verse of No.11) examples- (12.1) To what class does
human being in the animal kingdom? (12.2) To what class Gracilaria verrucosa
belong to plant kingdom?
15. Statements of aim. Example- State the rules in constructing matching type
test.
17. Outline. Example- Outline the rules for constructing multiple-choice test.
19. Formulation of new questions, problems and questions raised. Example- What
else must be known in order to understand the matter into consideration?
4
20. New method or procedure. Example. Suggest a plan for improving the truth or
falsity of the contention that abolition of NCEE is a good policy in education.
7. Develops good study habits. An essay examination develops good study habits
on the part of the students, in the sense, that they study their lesson with
comprehension rather than rote memory.
1. Low Validity. Essay examination has low validity for it has limited sampling.
2. Low reliability. Low reliability may occur in an essay examination due to its
subjective of scoring. Tendency of some teachers is to react unfavorably to
5
Test Yourself
OBJECTIVE EXAMINATION
1. Easy to correct or score. With regard to ease in scoring the test, objective
test is easier to correct by classroom teachers due to short responses
involve in each item. Responses may contain a single word, letter, number,
or phrase in each item.
2. Eliminates subjectivity. An objective test eliminates subjectivity in scoring
because the responses are short and exact.
3. Adequate sampling. More items are included in an objective test where
validity and reliability of the test can be adequately observed.
4. Objectivity in scoring. Scoring objective tests can be objectively done due
to short or one correct response in each item.
5. Eliminates bluffing. Bluffing is eliminated in n objective type of test because
the students only choose the answers from the options provided.
6. Norms can be established. Due to adequate sampling test, norms can be
established.
7. Saves time and energy in answering questions. An objective test saves
student’s time and energy in answering questions because the options are
provided, from which selection of the answers are to be made using short
statements.
There are two main types of objective tests, namely, the recall and the recognition.
The recall type is categorized as to: 1) simple recall, and 2) completion
8
Recall Type
SIMPLE RECALL – this test is one of the easiest to construct among the objective
types where the item appears as a direct question. The response requires the subject
to recall previously learned material and the answers are usually short consisting of
either a word or a phrase.
1. The test item should be worded that the response is brief as possible preferably a
simple word, number , symbols, or a very brief phrase.
2. The direct-question form is usually preferable to the statement form.
3. the blanks for the responses should be in a column preferably at the right side .
This facilitates scoring and is more convenient to the students.
4. the question should be so worded that there is only one correct response.
Whenever this is impossible, all acceptable answers should be included in the scoring
key.
5. Make a minimum use of textbook language in wording the questions.
COMPLETION TYPE. This test consists of a series of items which requires the subject
to fill a word or phrase on the blanks. An item may contain one or more blanks.
1. Word each item so that the blank or answer space is toward the end of the
sentence.
2. Avoid indefinite statements.
Meriam Defensor Santiago was born in ________.
Recognition type
MULTIPLE-CHOICE TEST
The multiple-choice is regarded as one of the best forms of testing. This form
is most valuable and widely used in standardized test due to its flexibility and
objectivity in scoring.
The introductory part of an item is called the stem, and its functions are to
ask a question, set the task to be performed, or state the problem to be solved . As a
general rule, after the examinee has read the stem, he or she should understand the
task at hand and know what task is required by the stem.
Stem 1. The degree to which the test Functions to the test set the
measures a theoretical trait problem or pose the question
10
The stem
For example:
Stated this way, the entire item is more likely to have a clearly-stated stem and
a good set of alternatives. Then, break the sentence in the following way to construct
the alternatives, responses and distractors:
a. Southern Canada
b. Northwestern New Hampshire
c. Northern Vermont
d. Northeastern Connecticut
It does not matter very much where the stem is split so long as it makes good
sense and contains most of the information. Items at this level should provide clues
for accurate recall in order for the students to be accurate in their selection of an
answer.
a. Southern Canada
b. Western New Hampshire
c. Northeast Connecticut
The Alternatives
The alternatives (sometimes called options) are the “multiple choices” from
which students select.
Since the alternatives are as plausible as the correct responses, they are called
“distractors”. They are designed to force student s to think by making their choices
more difficult.
Millions of dollars’ worth of corn, oats, wheat, and rye are destroyed annually in the
United States by:
a. Mildew b. mold c. rust d. smut*
“Surely the forces of education should be fully utilized to acquaint the youth with the
real nature of the danger to democracy, for no other place offers as good or better
1
opportunities than the school for a rational consideration
2 3
of the problems involved.
Items to be answered:
An apple that has a sharp, pungent, but no disagreeably sour or bitter, taste is
said to be (4)
a. p b. q c. t* d. v e. w
(The numerical in parentheses indicates the number of letters in the correct answers
which in this case is “tart”)
13
a. A sharp distinction must be drawn between table manners and sporting manners.
A. a,b,c,d,e
B. a,c,e,d,b*
C. a,e,c,d,b
D. b,e,d,c,a
1. They require students to choose from among a fixed list of options, rather than to
create or express their own ideas or solutions.
2. Poorly written multiple choice can be superficial, trivial, and limited to factual
knowledge.
3. Bright pupils may detect flaws in multiple-choice items due to ambiguities of
wording, divergent viewpoints, or only one option on an item is keyed as correct,
they may be penalized.
4. Multiple-choice items tend to be based on “standardized , vulgarized” or
“approved” knowledge and give students the impression that there is a single,
correct answer
14
2. The main stem of the test items may be constructed in question form, completion
form or direction form.
Question Form
Completion Form
Direction form
Add 22 + 43
a. b. c.
3. Articles “an” and “a” must be avoided as last words an incomplete sentence.
These words give clues to probable answers as to whether the best options starts
with consonant or vowel.
a. 7 b.8 c.9
Improved stem:
There are 16 children and 9 chairs in the classroom. How many more chairs are
needed?
a.7 b.8 c.9
15
5. In items testing definition, place the word or items in stem and use definition or
description as alternatives.
B. Constructing/Improving Alternatives
Knowledge Level
Understanding Level
Which term most accurately describes the soil deposits at the base of a
canyon?
d. conglomerate
16
Application Level
To help retain valuable farm lands along a river, man often builds:
Children must apply their knowledge and understanding of rivers and flooding
to know dikes will prevent rampaging flood from carrying the soil away.
Analysis Level
A river that flows between steep mountains for a hundred miles and then
suddenly into a broad plain will require people who live in the plain to build dams:
In analyzing the flow of such a river, students should understand how water
from the mountain streams will swell the water level in the river and cause it to flow
faster and in dangerous amounts. They should conclude, if they can perform at this
cognitive level, that a series a dams will likely afford the best protection.
Synthesis Level
Students now will have to analyze the information they have gained about the
flow of water in order to synthesize anew way to make use of the reservoir.
Evaluation Level
Which of the following strategies would be the most equitable solution to the
perennial drought problems of a large population living in a plain below a well
watered upland area?
17
Each response is plausible and each poses economic and emotional problems.
Making a thoughtful judgment in terms of available information is called for.
18
Test Yourself
2. Based on the selected topic, construct 10 test items for the following:
a) simple recall
b) completion type
3. Group Activity: ( by pair) : Construct at least 2 multiple choice test items for
every variety of multiple choice.
4. Among the test items constructed, identify the question with levels of
cognitive domain such as:
a) knowledge
b) comprehension
c) application
d) analysis
e) synthesis
f) evaluation.
19
Matching Type
This type consists of two columns in which the proper pairing relationship of
two things is strictly observed. For instance, Column A is to be matched with column
B.
In the balance of matching tests, the number of items is equal to the number
of options. For instance, if there are 15 items column A, there are also 15 points in
column B. In other words, all options have pairs.
Column A Column B
1. The father of educational a. Cattell,M. 1._________
measurement 2._________
2. The originator of the questionnaire b. Cattell,R.
method and the theory of eugenics c. Ebbeinghaus 3. _________
3. The first to adopt IQ d. Esquirol
4. The founder of quantitative e. Fisher 4._________
study of memory f. Pearson
5. The first to use the term g. Stone 5._________
“mental test” h. Terman
i. Thorndike
20
Test Yourself
1. Construct a 15-item matching type test following the guidelines in test
construction
21
True-false Test
The true-false test is one of the most widely-used objective tests, because it
gives students greater opportunities to show what they have learned
True-false tests can be constructed to assess higher cognitive functioning, and
they have the advantage of sampling large amounts of subject matter.
1. Avoid the use of absolute modifiers such as all, none, no, always, never,
nothing, only, alone, more and not, since they are more likely to be false unless
they are a part of a fact or truth.
Moreover, determiners such as many, some, seldom, sometimes,
usually, often, frequently, generally, as a determiners must be avoided because
they give indirect suggestions to possible answers.
2. The test items must be arranged in groups of five to facilitate scoring. The
groups must be separated by a two single spaces and the items within a group
by a single space.
4. The use of similar statements from a book must be avoided to minimize rote
memorization.
5. The items are carefully constructed so that the language is within the level of
the students, hence, flowery statements are avoided.
6. Statements which are partly right and partly wrong must be avoided. The
truth and partly wrong must be avoided. The truth or falseness of a statement
should depend upon the main idea and not upon any minor element, word,
phrase and clause.
9. Correct responses should not follow a pattern, otherwise the students may
be able to give the right symbols although they do not know the real answers.
10. Do not use statement that cannot be answered by true or false or by yes or
no. A statement cannot be answered by true or false if it is too abstract or too
general.
Example:
Filipinos are industrious.
Application
Question
A fuse in a television set will prevent lightning from damaging the TV.
(T or F)?
Analysis
Question
When a fuse blows, we know that its resistance to the flow of electricity is
less than that of wire to which it is connected.
The students should be able to analyze the knowledge than have gained and
identify important components of a circuit that makes the system work.
Synthesis
Question
The fifth level of the taxonomy – synthesis- calls for behavior that includes
the ability to rearrange to recombine learned information about electricity arrive at a
different application of the knowledge than the ones studied.
23
Evaluation
True-false items will have to be constructed very carefully to call for evidence
of this ability. In this case, a hypothetical circuit might be presented to the class with
an item asking them to evaluate its usefulness, effectiveness, or safety.
Ex.
You are building a house and you wish to have five wall outlets in each of the
three bedrooms and four overhead light fixtures operated independently of each other.
In addition, each room will be equipped with a 4,000-BTU window airconditioner.
Questions
Test Yourself
2. Among the test questions formulated, identify which one measures the
levels of cognitive domain:
a) knowledge
b) comprehension
c) application
d) analysis
e) synthesis
f) evaluation
25
Analogy Type
This type is made of items consisting of a pair of words which are related to
each other. It is designed to measure the ability of students to observe the pair
relationship of the first group to the second group.
1. The relationship of the first pair of words must be equal to the relationship of the
second.
2. Distractors must be plausible, with correct options to attract students while the
process of obtaining a correct answer is by logical elimination.
4. Four or more options must be included in each item to minimize the chances of
guessing. If using less than four options, a correction formula must be applied.
1 .Purpose Relationship
2. Cause-and-Effect Relationship
Ocean
26
4. Part-Part Relationship
5. Action-to-Object Relationship
6. Object-to-Action Relationship
7. Synonym Relationship
8. Antonym Relationship
9. Place Relationship
1 1.Characteristic Relationship
2:8:1/3:____
1. 2/3 2. 4/3 3. 12 4. 4
Test Yourself
Objectives:
Grading therefore is the nest step after testing. Different schools had different
grading systems. In the American system for example, grades are expressed in terms
of letters, A, B, B+, B- C, C- D or what is referred to as a seven-point system. In
Philippine colleges and universities, the letters are replaced with numerical values:1,
1.25,. 1.50, 1.75, 2.0, 2.5, 3.0 and 4.0 or an eight-point system. In basic education
grades are expressed as percentages (of accomplishment) such as 80% or 75%.
Whatever be the system.
Norm-Referenced Grading
The most commonly used grading system falls under the category of norm-
reference grading. Norm-referenced grading refers to a grading system wherein a
student’s grade is placed in relation to the performance of a group. Thus, in this
system, a grade of 80 means that the student performed better than or same as 80%
of the class (or group).
Most schools use the criterion-referenced grading system with a fixed standard
formula.
In the Phil., there are two types of grading systems used: the averaging and the
cumulative grading systems. In the averaging system, the grade of a student on a
particular grading period equals the average of the grades obtained in the prior
grading periods and the current grading period. In the cumulative grading system,
the grade of a student in a grading period equals his current grading period which is
assumed to have the cumulative effects of the previous grading periods.
31
Test Yourself
In the traditional assessment method, tests are standardized and uniform and
rather impersonal and absolute. Precisely, because of these characteristics , they can
not be fair. A test is fair when it is appropriate and flexible. Generally authentic
assessment is designed for evaluating strengths and weaknesses which cannot be
measured by traditional tests.
There are ways of assessing students’ learning other than the traditional methods.
1. Mind-mapping – involves writing down a central idea and thinking up new and
related ideas which radiate from the center. Mind maps can be used as visual
aids, speaker’s guides, note-taking technique, evaluation tools, brainstorming
and awareness-raising tools.
Example:
Benefits:
Uses:
-understanding
-notetaking
-recall
-
MIND-
MAPPING
Tips: How?
-symbols/colors
-emphasize caps
Advantages:
a. actively involves participants
b. keeps interest levels high because of participant activity and
relevance to real world situations
c. blends well with other methods
(Ex. Make a case study on how boys respond to testing compared to girls)
Discuss in class.
>> Make a collage of a particular theme
4. Role Play . Role playing is an activity where students act out situations. A facilitator
leads the discussion of the ideas and feelings that emerge. Students are given a
problem situation and a short description through which they depict real-life
responses and behavior.
Purpose:
a. to practice interaction with people
b. behavior rehearsal and behavior modelling
c. assessment and evaluation
Good sources of narratives are stories shared with one another in both
formal and informal settings. A narrative is a powerful tool for instant recall.
Students are encouraged to construct their own narratives because they test
student understanding and improve one’s ability to consolidate segments of
information into a coherent whole.
(Activity: Make a narrative of the history of your school from the time it was
founded up to the present.)
Portfolio Assessment
a. matches assessment to teaching
b. has clear goals
c. gives a profile of learner abilities
d. a tool of assessing a variety of skills
e. develops awareness of own learning
36
Assessing portfolio:
1. Quality not quantity counts.
2. Get students to provide route maps
3. Be clear about what you are assessing
4. Structure your feedback
5. Provide opportunities for self-assessment
6. Set up an exhibition
7. Assess in a team
8. Encourage creativity
9. Think about where and when you will mark portfolios
Types of Portfolio
37
Test Yourself
3. Propose a portfolio assessment for an identified activity. Specify the format and
the desired contents. Propose an assessment method for the portfolio.
39
Objectives:
For example, the following rubric (scoring scale) covers the research portion of a
project:
Research Rubric
Criteria 1 2 3
Number of
x1 1-4 5-9 10-12
Sources
Lots of
Historical Few No apparent
x3 historical
Accuracy inaccuracies inaccuracies
inaccuracies
Can not tell Can tell with
Can easily tell
from which difficulty
which sources
Organization x1 source where
info was
information information
drawn from
came came from
Bibliography x1 Bibliography Bibliography All relevant
contains very contains most information is
little relevant included
40
information information
For each criterion, the evaluator applying the rubric can determine to what
degree the student has met the criterion, i.e., the level of performance. In the above
rubric, there are three levels of performance for each criterion. For example, the
project can contain lots of historical inaccuracies, few inaccuracies or no inaccuracies.
Finally, the rubric above contains a mechanism for assigning a score to each
project. (Assessments and their accompanying rubrics can be used for purposes
other than evaluation and, thus, do not have to have points or grades attached to
them.) In the second-to-left column a weight is assigned each criterion. Students can
receive 1, 2 or 3 points for "number of sources." But historical accuracy, more
important in this teacher's mind, is weighted three times (x3) as heavily. So, students
can receive 3, 6 or 9 points (i.e., 1, 2 or 3 times 3) for the level of accuracy in their
projects.
Descriptors
The above rubric includes another common, but not a necessary, component
of rubrics -- descriptors. Descriptors spell out what is expected of students at each
level of performance for each criterion. In the above example, "lots of historical
inaccuracies," "can tell with difficulty where information came from" and "all relevant
information is included" are descriptors. A descriptor tells students more precisely
what performance looks like at each level and how their work may be distinguished
from the work of others for each criterion. Similarly, the descriptors help the teacher
more precisely and consistently distinguish between student work.
Many rubrics do not contain descriptors, just the criteria and labels for the
different levels of performance. For example, imagine we strip the rubric above of its
descriptors and put in labels for each level instead. Here is how it would look:
Organization x1
Bibliography x1
It is not easy to write good descriptors for each level and each criterion. So,
when you first construct and use a rubric you might not include descriptors. That is
okay. You might just include the criteria and some type of labels for the levels of
performance as in the table above. Once you have used the rubric and identified
student work that fits into each level it will become easier to articulate what you
mean by "good" or "excellent." Thus, you might add or expand upon descriptors the
next time you use the rubric.
Clearer expectations
As mentioned in Step 3, it is very useful for the students and the teacher if the
criteria are identified and communicated prior to completion of the task. Students
know what is expected of them and teachers know what to look for in student
performance. Similarly, students better understand what good (or bad) performance
on a task looks like if levels of performance are identified, particularly if descriptors
for each level are included.
Better feedback
For a particular task you assign students, do you want to be able to assess how well
the students perform on each criterion, or do you want to get a more global picture
of the students' performance on the entire task? The answer to that question is likely
to determine the type of rubric you choose to create or use: Analytic or holistic.
Analytic rubric
42
Most rubrics, like the Research rubric above, are analytic rubrics. An analytic rubric
articulates levels of performance for each criterion so the teacher can assess student
performance on each criterion. Using the Research rubric, a teacher could assess
whether a student has done a poor, good or excellent job of "organization" and
distinguish that from how well the student did on "historical accuracy."
Holistic rubric
In contrast, a holistic rubric does not list separate levels of performance for each
criterion. Instead, a holistic rubric assigns a level of performance by assessing
performance across multiple criteria as a whole.
For example, the analytic research rubric above can be turned into a holistic rubric:
3 - Excellent Researcher
2 - Good Researcher
1 - Poor Researcher
In the analytic version of this rubric, 1, 2 or 3 points is awarded for the number
of sources the student included. In contrast, number of sources is considered along
with historical accuracy and the other criteria in the use of a holistic rubric to arrive at
a more global (or holistic) impression of the student work.
Analytic rubrics are more common because teachers typically want to assess
each criterion separately, particularly for assignments that involve a larger number of
criteria. It becomes more and more difficult to assign a level of performance in a
holistic rubric as the number of criteria increases. For example, what level would you
assign a student on the holistic research rubric above if the student included 12
sources, had lots of inaccuracies, did not make it clear from which source information
came, and whose bibliography contained most relevant information? As student
performance increasingly varies across criteria it becomes more difficult to assign an
appropriate holistic category to the performance. Additionally, an analytic rubric
better handles weighting of criteria. How would you treat "historical accuracy" as
more important a criterion in the holistic rubric? It is not easy. But the analytic rubric
handles it well by using a simple multiplier for each criterion.
So, when might you use a holistic rubric? Holistic rubrics tend to be used when
a quick or gross judgment needs to be made. If the assessment is a minor one, such
as a brief homework assignment, it may be sufficient to apply a holistic judgment
(e.g., check, check-plus, or no-check) to quickly review student work. But holistic
rubrics can also be employed for more substantial assignments. On some tasks it is
not easy to evaluate performance on one criterion independently of performance on
a different criterion. For example, many writing rubrics are holistic because it is not
always easy to disentangle clarity from organization or content from presentation.
So, some educators believe a holistic or global assessment of student performance
better captures student ability on certain tasks. (Alternatively, if two criteria are
nearly inseparable, the combination of the two can be treated as a single criterion in
an analytic rubric.)
Analytic rubrics
44
Thus, start small. For example, in an oral presentation rubric, amount of eye
contact might be an important criterion. Performance on that criterion could be
judged along three levels of performance: never, sometimes, always.
Although these three levels may not capture all the variation in student
performance on the criterion, it may be sufficient discrimination for your purposes.
Or, at the least, it is a place to start. Upon applying the three levels of performance,
you might discover that you can effectively group your students' performance in
these three categories. Furthermore, you might discover that the labels of never,
sometimes and always sufficiently communicates to your students the degree to
which they can improve on making eye contact.
On the other hand, after applying the rubric you might discover that you
cannot effectively discriminate among student performance with just three levels of
performance. Perhaps, in your view, many students fall in between never and
sometimes, or between sometimes and always, and neither label accurately captures
their performance. So, at this point, you may decide to expand the number of levels
of performance to include never, rarely, sometimes, usually and always.
There is no "right" answer as to how many levels of performance there should be for
a criterion in an analytic rubric; that will depend on the nature of the task assigned,
the criteria being evaluated, the students involved and your purposes and
preferences. For example, another teacher might decide to leave off the "always"
level in the above rubric because "usually" is as much as normally can be expected or
even wanted in some instances. Thus, the "makes eye contact" portion of the rubric
for that teacher might be
So, I recommend that you begin with a small number of levels of performance
for each criterion, apply the rubric one or more times, and then re-examine the
number of levels that best serve your needs. I believe starting small and expanding if
45
necessary is preferable to starting with a larger number of levels and shrinking the
number because rubrics with fewer levels of performance are normally
The fact that rubrics can be modified and can reasonably vary from teacher to
teacher again illustrates that rubrics are flexible tools to be shaped to your purposes.
Holistic rubrics
Much of the advice offered above for analytic rubrics applies to holistic rubrics
as well. Start with a small number of categories, particularly since holistic rubrics
often are used for quick judgments on smaller tasks such as homework assignments.
For example, you might limit your broad judgments to
satisfactory
unsatisfactory
not attempted
or
check-plus
check
no check
or even just
satisfactory (check)
unsatisfactory (no check)
Even with more elaborate holistic rubrics for more complex tasks I
recommend that you begin with a small number of levels of performance. Once you
have applied the rubric you can better judge if you need to expand the levels to more
effectively capture and communicate variation in student performance.
In Step 2, you asked how students could demonstrate that they had met your
standards. As a result, you developed authentic tasks they could perform.
Once you have identified the criteria you want to look for as indicators of
good performance, you next decide whether to consider the criteria analytically or
holistically.
In other words, there is some trial and error that must go on to arrive at the
most appropriate number of levels for a criterion. (See the Rubric Workshop below
to see more detailed decision-making involved in selecting levels of performance for
a sample rubric.)
No. You could have five levels of performance for three criteria in a rubric,
three levels for two other criteria, and four levels for another criterion, all within the
same rubric. Rubrics are very flexible Alaskan Moose. There is no need to force an
unnatural judgment of performance just to maintain standardization within the
rubric. If one criterion is a simple either/or judgment and another criterion requires
finer distinctions, then the rubric can reflect that variation.
As you can imagine, students will be more certain what is expected to reach
each level of performance on the rubric if descriptors are provided. Furthermore, the
more detail a teacher provides about what good performance looks like on a task the
better a student can approach the task. Teachers benefit as well when descriptors
are included. A teacher is likely to be more objective and consistent when applying a
descriptor such as "most observations are clear and detailed" than when applying a
simple label such as "acceptable." Similarly, if more than one teacher is using the
same rubric, the specificity of the descriptors increases the chances that multiple
48
teachers will apply the rubric in a similar manner. When a rubric is applied more
consistently and objectively it will lead to greater reliability and validity in the results.
As mentioned above, rubrics are very flexible tools. Just as the number of
levels of performance can vary from criterion to criterion in an analytic rubric, points
or value can be assigned to the rubric in a myriad of ways. For example, a teacher
who creates a rubric might decide that certain criteria are more important to the
overall performance on the task than other criteria. So, one or more criteria can be
weighted more heavily when scoring the performance. For example, in a rubric for
solo auditions, a teacher might consider five criteria: (how well students
demonstrate) vocal tone, vocal technique, rhythm, diction and musicality. For this
teacher, musicality might be the most important quality that she has stressed and is
looking for in the audition. She might consider vocal technique to be less important
than musicality but more important than the other criteria. So, she might give
musicality and vocal technique more weight in her rubric. She can assign weights in
different ways. Here is one common format:
0 1 2 3 4 5 weight
vocal tone
vocal technique x2
rhythm
diction
musicality x3
In this case, placement in the 4-point level for vocal tone would earn the
student four points for that criterion. But placement in the 4-point box for vocal
technique would earn the student 8 points, and placement in the 4-point box for
musicality would earn the student 12 points. The same weighting could also be
displayed as follows:
diction 0 1 2 3 4 5
musicality 0 3 6 9 12 15
Yes, but do I need equal intervals between the point values in a rubric?
No. Say it with me one more time -- rubrics are flexible tools. Shape them to fit
your needs, not the other way around. In other words, points should be distributed
across the levels of a rubric to best capture the value you assign to each level of
performance. For example, points might be awarded on an oral presentation as
follows:
In other words, you might decide that at this point in the year you would be
pleased if a presenter makes eye contact "sometimes," so you award that level of
performance most of the points available. However, "sometimes" would not be as
acceptable for level of volume or enthusiasm.
Here are some more examples of rubrics illustrating the flexibility of number
of levels and value you assign each level.
In the above rubric, you have decided to measure volume and enthusiasm at
two levels -- never or usually -- whereas, you are considering eye contact and accuracy
of summary across three levels. That is acceptable if that fits the type of judgments
you want to make. Even though there are only two levels for volume and three levels
for eye contact, you are awarding the same number of points for a judgment of
"usually" for both criteria. However, you could vary that as well:
In this case, you have decided to give less weight to volume and enthusiasm
as well as to judge those criteria across fewer levels.
So, do not feel bound by any format constraints when constructing a rubric.
The rubric should best capture what you value in performance on the authentic task.
The more accurately your rubric captures what you want your students to know and
be able to do the more valid the scores will be.
Proficiency
51
Developing
Inadequate
Quick, holistic judgments are often made for homework problems or journal
assignments. To allow the judgment to be quick and to reduce the problem
illustrated in the above rubric of fitting the best category to the performance, the
number of criteria should be limited. For example, here is a possible holistic rubric for
grading homework problems.
52
+ (1 pt.)
- (0 pts.)
Although this homework problem rubric only has two criteria and three levels
of performance, it is not easy to write such a holistic rubric to accurately capture
what an evaluator values and to cover all the possible combinations of student
performance. For example, what if a student got all the answers correct on a
problem assignment but did not show any work? The rubric covers that: the student
would receive a (-) because "little or no work was shown." What if a student showed
all the work but only got some of the answers correct? That student would receive a
(+) according to the rubric. All such combinations are covered. But does giving a (+)
for such work reflect what the teacher values? The above rubric is designed to give
equal weight to correct answers and work shown. If that is not the teacher's intent
then the rubric needs to be changed to fit the goals of the teacher.
All of this complexity with just two criteria -- imagine if a third criterion were
added to the rubric. So, with holistic rubrics, limit the number of criteria considered,
or consider using an analytic rubric.
As a final check on your rubric, you can do any or all of the following before
applying it.
By the last suggestion I mean to imagine that a student had met specific levels
of performance on each criterion (for an analytic rubric). Then ask yourself if that
performance translates into the score that you think is appropriate. For example, on
Rubric 3 above, imagine a student scores
Of course, you will never know if you really have a good rubric until you apply
it. So, do not work to perfect the rubric before you administer it. Get it in good shape
and then try it. Find out what needs to be modified and make the appropriate
changes.
Okay, does that make sense? Are you ready to create a rubric of your own?
Well, then come into my workshop and we will build one together.
(For those who might be "tabularly challenged" (i.e., you have trouble making
tables in your word processor) or would just like someone else to make the rubric
into a tabular format for you, there are websites where you enter the criteria and
levels of performance and the site will produce the rubric for you.)
54
Test Yourself