Professional Documents
Culture Documents
Chapter 1, Chapter 2, Chapter 3
Chapter 1, Chapter 2, Chapter 3
Chapter 1, Chapter 2, Chapter 3
Introduction
This chapter is a review of the principles of high quality assessment which more presented and
discussed in Chapter 2 of the book on Assessment of Learning 1. We face the risk of sounding
repetitive for the sake of concept mastery. Let me check how much we learned from our first course in
assessment of learning.
Assessment can be made precise, accurate and dependable only if what are to be achieved
are clearly stated and feasible. To this end, we consider learning targets involving knowledge,
reasoning, skills, products and effects. Learning targets need to be stated in behavioral terms of terms
that denote something which can be observed through the behavior of the students. Thus, the
objective "to understand the concept of buoyancy" is note stated in behavioral terms. It is not clear
how one measures "understanding". On the other hand, if we restate the target as "to determine the
volume of water displaced by a given object submerged", then we can easily measure the extent to
which a student understands "buoyancy".
Level 1. KNOWLEDGE which refers to the acquisition of facts, concepts and theories. Knowledge
of historical facts like the date of EDSA revolution, discovery of the Philippines or of scientific concepts
like the scientific name of milkfish, the chemical symbol of argon etc. all fall under knowledge.
Knowledge forms the foundation of all other cognitive objectives for without knowledge, it is
not possible to move up to the next higher level of thinking skills in the hierarchy of educational
objectives.
Level 2. COMPREHENSION refers to the same concept as "understanding". It is step higher than
mere acquisition of facts and involves a cognition or awareness of the irrelationships of facts and
concepts.
Level 3. APPLICATION refers to the transfer of knowledge from one field of the another or from
one concept to another concept in the same discipline.
EXAMPLE. The classic experiment Pavlov on dogs showed that animals can be conditioned
to respond in a certain stimuli. The same principle can be applied in the context of teaching and
learning in behavior modification for school children.
Level 4. ANALYSIS refers to the breaking down of a concept or idea into its components and
explaining the concept as composition of these concepts.
EXAMPLE. Poverty in the Philippines, particularly at the barangay level, can be traced back
to the low income levels of families in such barangays and the propensity for large households with an
average of about 5 children per family. (Note: Poverty is analyzed in the context of income and
number of children).
Level 5. SYNTHESIS refers to the opposite of analysis and entail putting together the
components in order to summarize the concept.
EXAMPLE. The field of geometry is replete with examples of synthesis lessons. From the
relationship of the parts of a triangle for instance, one can deduce that the sum of the angles of a
triangle is 180 degrees. (Padua, Roberto and Rosita G. Santos. (1997) "Educational Evaluation and
Measurement" Quezon City: katha Publishing) pp. 21-22.
Level 6. EVALUATION AND REASONING refers to valuing and judgement or putting the "worth"
of a concept or principle.
Skills refer to specific activities or tasks that a student can proficiently do e.g. skills in coloring,
language skills. Skills can be clustered together to form specific competencies e.g. birthday card
making Related competencies characterize a student's ability (DACUM, 2000). It is important to
recognize a student's ability in order that the program of study can be so designed as to optimize
his/her innate abilities.
Abilities can be roughly categorized into:cognitive, psychomotor and affective abilities. For
instance, the ability to work well with others and to be trusted by every classmate (affective ability) is
an indication that the student can most likely succeed in work that requires leadership abilities. On
the other hand, other students are better at doing things alone like programing and Web designing
(cognitive ability) and, therefore, they would be good at highly technical individualised work.
Products, outputs and projects are tangible and concrete evidence of a student's ability. A clear
target for products and projects need to clearly specify the level of workmanship of such projects e.g.
expert level, skilled level or novice level outputs. For instance, an expert output may be characterized
by the indicator "at most two imperfections noted" while a skilled level output can be characterized
by the indicator "at most four (4) imperfections noted" etc.
Before we move on to methods of assessment, let's have a mental exercise.
EXERCISE 1.1
5. To use the concept of ratio and proportion in finding the height of a building.
B. FOR EACH OF THE LESSONS BELOW, WRITE A LEAST FIVE(5) LEARNING TARGETS FOLLOWING
BLOOM’S TAXONOMY.
C. WRITE AT LEAST FIVE (5) SKILLS AND THREE(3) COMPETENCIES INVOLVED IN BAKING A CAKE.
Written-response instruments include objective tests (multiple choice, true-false, matching or short
answer) tests, essays, examinations and checklist. Objective tests are appropriate for assessing the
various levels of hierarchy of educational objectives. Multiple choice tests in particular can be
constructed in such a way as to test Higher order thinking skills. Essays, when properly planned, can
test the student's grasp of the higher level cognitive skills particularly in the areas of application
analysis, synthesis, and judgment. However, when the essay question is not sufficiently precise and
when the parameters are not properly defined, there is a tendency for the students to write
irrelevant and unnecessary things just to fill in blank apaches. When this happens, both the teacher
and the students will experience difficulty and frustration.
(BETTER) Write an essay about the first EDSA revolution giving focus on the main
characters of revolution and their respective roles.
In the second essay question, the assessment foci are narrowed down to: (a)the main characters of
the event, and (b) the Roles of each character in the revolution leading to the ouster of the incumbent
President at that time. It becomes clear what the teacher wishes to see and what the students are
supposed to write.
A teacher is often tasked to rate products. Examples of products that are frequently rated in
Education are book reports, maps, charts, diagrams, notebooks, essays and creative endeavors of all
sorts. An example of a product rating scale is the classic 'handwriting' scale used in the California
Achievement Test, Form W (1957). There are prototype handwriting specimens of pupils and students
(of various grades and ages). The sample handwriting of a student is then moved along the scale for
the various products in education, the teacher must possess prototype products over his/her years of
experience.
One of the most frequently used measurement instruments is the checklist. A performance
checklist consists of a list of behaviors that make up a certain type of performance (e.g. using a
microscope, typing a letter, solving a mathematics performance and so on). It is used to determine
whether or not an individual behaves in a certain (usually desired) way when asked to complete a
particular task. If a particular behavior is present when an individual is observed, the teacher places a
check opposite it in the list.
EXAMPLE: (Performance Checklist in Solving a mathematics problem)
Behavior:
6. Obtains an answer
The traditional Greeks used oral questioning extensively as an assessment method. Socrates
himself, considered the epitome of a teacher, was said to have handled his classes solely based on
questioning and oral interactions.
Oral questioning is an appropriate assessment method when the objectives are: (a) to assess
the student's stock knowledge and/or (b) to determine the student's ability to communicate ideas in
coherent verbal sentences. While oral questioning is indeed an option for assessment several factors
need to be considered when using this option. Of particular significance are the student's state of
mind and feelings, anxiety and nervousness in making oral presentations which could mask the
student's true ability.
A tally sheet is a device often used by teachers to record the frequency of student behaviors,
activities or remarks. How many high school students follow instructions during a fire drill, for
example? How many instances of aggression or helpfulness are observed when elementary students
are observed in the play ground? In the class of Mr. Sual in elementary statistics, how often do they
ask questions about inference? Observational tally sheets are most useful in answering these kinds of
questions.
EXERCISE 1.2
1. objective tests
2. essay tests
3. performance tests
4. oral questioning
5. self reports
6. observational reports
7. product reports
B. IF YOY WERE TO UTILIZE ALL THESE PROCEDURES, HOW WOULD YOU PUT WEIGHTS ON EACH
OF THE PROCEDURE. EXPLAIN YOUR ANSWER.
The quality of the assessment instrument and method used in education is very important
since the evaluation and judgment that the teacher gives on a student are based on the information
he obtains using these instruments. Accordingly teachers follow a number of procedures to ensure
that the entire assessment process is valid and reliable.
Validity had traditionally been defined as the instrument's ability to measure what it
purports to measure. We shall learn in this section that the concept has, of recent, been modified to
accommodate a number of concerns regarding the scope of this traditional definition. Reliability, on
the other hand, is defined as the instrument's consistency.
1.3.1 Validity
Validity, in recent years, has been defined as reffering to the appropriateness, correctness,
meaningfulness and usefulness of the specific conclusions that a teacher reaches regarding the
teaching-learning situation. Content-validity refers to the content and format of the instrument. How
appropriate is the content? How comprehensive? Does the instrument logically get the intended
variable or factor? How adequately does the sample of items or questions represent the content to be
assessed? Is the format appropriate? The content and format must be consistent with the definition
of the variable or factor to be measured. Some criteria for judging content validity are given as
follows:
1. Do students have adequate experience with the type of task posed by the item?
2. Did the teachers cover sufficient material for most students to be able to answer the item
correctly?
3. Does the item reflect the degree of emphasis received during instruction?
With these as guide, a content validity table may be constructed in two (2) forms as provided
below:
ITEM NO.
Criteria 1 2 3 4 5 6
B: ENTIRE TEST
1. Knowledge
2. Comprehension
3. Application
4. Analysis
5. Synthesis
6. Evaluation
Based on Form B, adjustments in the number of items that relate to a topic can be made accordingly.
While content validity is important, there are other types of validity that one needs to verify. Face
validity refers to the outward appearance of the test. It is the lowest form of test validity. A more
important type of validity is called criterion related validity. In criterion validity, the test item is
judged against a specific criterion e.g. relevance to a topic like the topic on conservation, for example.
The degree to which the item measures the criterion is said to constitute its criterion validity.
Criterion validity can also be measured by correlating the test with a known valid test (as criterion).
Finally, a test needs to possess construct validity. A "construct" is another term for a factor, and we
already know that a group of variables that correlate highly with each other form a factor. It follows
that an item possesses construct validity if it loads highly on a given construct or factor. A technique
called factor analysis is required to determine the construct validity of an item. Such a technique is
beyond the scope of this book.
1.3.2 Reliability
The reliability of an assessment method refers to its consistency. It is also a term that is
synonymous with dependability or stability.
Stability or internal consistency as reliability measures can be estimated in several ways. The
Split-half method involves scoring two halves (usually, odd items versus even items) of a test
separately for each person and then calculating a correlation coefficient for the two sets of scores. The
coefficient indicates the degree to which the two halves of the test provide the same results and
hence, describes the internal consistency of the test. The reliability of the test is calculated using what
is known as the Spearman-Brown prophecy formula:
where K = number of items on the test, M = mean of the test, Variance = variance of the test
scores.
The mean of a set of scores is simply the sum of the scores divided by the number of scores; its
variance is given by:
Reliability of a test may also mean the consistency of test results when the same test is
administered at two different time periods. This is the test-- retest method of estimating reliability.
The estimate of test reliability is then given by the correlation of the two test results.
1.3.3 Faireness
An assessment procedure needs to be fair. This means many things. First, students need to know
exactly what the learning targets are and what method of assessment will be used. If students do not
know what they are supposed to be achieving, then they could get lost in the maze of concepts being
discussed in class. Likewise, students have to be informed how their progress will be assessed in order
to allow them to strategize and optimize their performance.
Third, fairness also implies freedom from tacher-stereo-typing. Some examples of stereotyping
include: boys are better than girls in Mathematics or girls are better than boys in language. Such
stereotyped images and thinking could lead to unnecessary and unwanted biases in the way that
teachers assess their students.
The term "ethics" refers to questions of right and wrong. When teachers think about ethics,
they need to ask themselves if it is right to assess a specific knowledge or investigate a certain
question. Are there some aspects of the teaching-learning situation that should not be assessed? Here
are some situations in which assessment may not be called for:
* Asking elementary pupils to answer sensitive questions without consent of their parents;
* Testing the mental abilities of pupils using an instrument whose validity and reliability are
unknown;
When a teacher thinks about ethics, the basic question to ask in this regard is: "Will any
physical or psychological harm come to any one as a result of the assessment or testing?"
Test results and assessment results are confidential results. Such should be known only by
the student concerned and the teacher. Results should be communicated to the students in such a
way that other students would not be in possession of information pertaining to any specific member
of the class.
The third ethical issue in assessment is deception. Should students be deceived? There are
instances in which it is necessary to conceal the objective of the assessment from the students in
order to ensure fair and impartial results. When this is the case, the teacher has a special
responsibility to (a) determine whether the use of such techniques is justified by the educational
value of the assessment, (b) determine whether alternative procedures are available that do not
make use of concealment and (c) ensure that students are provided with sufficient explanation as
soon as possible.
Finally, the temptation to assist certain individuals in class during assessment or testing is
ever present. In this case, it is best if the teacher does not administer the test himself if he believes
that such a concern may, at a later time, be considered unethical.
CHAPTER EXERCISES
1. Knowledge
2. comprehension
3. application
4. analysis
5. synthesis
6. evaluation
B. Suppose that you wish to teach the concept of "Addition of Similar Fractions" in elementary
mathematics. Write one objective for each of the following:
1. Knowledge
2. comprehension
3. application
4. analysis
5. synthesis
C. Construct a performance checklist for assessing the performance of a student in each of the
following:
3.basket weaving
5.using a microscope
2.criterion-related validity
3.construct validity
4.reliability
5.stability
1. A test may be reliable but not necessarily valid. Is it possible for a test to be valid but not
reliable? Discuss.
2. A 50 item test was admistered to a group of 20 students. The mean score was 35 while
standard deviation was 5.5. Compute the KR21 index of reliability.
3. Compute the Spearman-Brown reliability index if the correlation between the odd and even
scores is 0.84.
4. How many items are needed to construct a KR21 index of 0.60 if the mean is 75 and the
standard deviation is 10.5 for a group of 30 students?
6. Cite another example of a behavior considered not ethical in testing and assessment. Explain
why you think such a behavior is not ethical.
7. Enumerate the three (3) main concerns of ethics in testing and assessment. Discuss each
major ethical concern.
9. Which of the following: content validity, criterion validity, construct validity, is the most
difficult to obtain? Explain why.
10. Is it possible to obtain a correlation coefficient of 1.5 for two sets of test scores? Discuss.
F. In the following situation, identify the ethical issues that may be raised in terms of (a) possible harm
to the participants, (b) confidentiality of the assessment data, and (c) presence of concealment or
deception:
1. A teacher plans to rate the performance of students in a gymnastics class unobtrusively. He does
not let the students know that he is actually rating their gymnastics abilities. Instead, he tells the
students to use the gymnasium facilities for practice and then, he watches the students practice on
occasions that are unannounced.
2. A teacher is taking a graduate course in research and intends to use his students in English 1 as
subjects of his study. His research deal with the effect of classical music on the learning of grammar.
One class is taught English grammar with subtle background music while the other class is taught the
same lesson without any background music.
3. As part of the students' portfolio assessment, the pupils are required to write every event that
happens in their homes at night which may have some bearing in their ability to complete their
homework. The teacher instructs the pupils to write one paragraph of such events once every hour
from 5:00 P. M. daily.
4. An arts and crafts teacher requires the students to submit their basket weaving projects to be
graded. He selects the best student outputs and brings these projects home.
5. In grading his students in Mathematics 4, a high school teacher subjectively adds five or more
points to the grades of students who have performed poorly but who, he believes, deserve better
grades had they spent more time studying. In some instances, however, he does not add any point to
a poor performer because he also believes that such cases do not represent a case of "just needing
more time".
6. In order to proceed with a final examination in a swimming class, the teacher brings his students to
a nearby beach and individually rates his students' swimming skills in the open sea. To ensure that he
is protected in the event of an untoward incident, the teacher also required the students to submit a
parental consent form.
CHAPTER 2
PROCESS--ORIENTED PERFORMANCE--BASED ASSESSMENT
Too often, we tend to assess students' learning through their outputs or products or through some
kind of traditional testing. However, it is important to assess not only these competencies but also the
processes which the students underwent in order to arrive at these products or outputs. It is possible
to explain why the students' outputs are as they are through an assessment of the processes which
they did in order to arrive at the final product. This Chapter is concerned with process-oriented
performance based assessment. Assessment is bit an end in itself but a vehicle for educational
improvement. Its effective practice, then, begins with and enacts a vision of the kinds of learning we
most value for students and strive to help them achieve.
Information about outcomes is of high importance; where students “end up” matters greatly. But
to improve outcomes, we need to know about student experience along the way — about the
curricula, teaching, and kind of student effort that lead to particular outcomes.
Assessment can help us understand which students learn best under what conditions; with such
knowledge comes the capacity to improve the whole of their learning. Process-oriented performance -
based assessment is concerned with the actual task performance rather than the output or product of
the activity.
The learning objectives in process-oriented performance based assessment are stated in directly
observable behaviors of the students. Competencies are defined as groups or clusters of skills and
abilities for needed for a particular task. The objectives generally focus on those behaviors which
exemplify a “best practice” for the particular task. Such behaviors range from a “beginner” or novice
level up to the level of an expert.
Objectives: The activity aims to enable the students to recite a poem entitled “The Raven” by Edgar
Allan Poe. Specifically:
3. Maintain eye contact with the audience while reciting the poem;
4. Create the ambiance of the poem through appropriate rising and falling intonation;
Notice that the objective starts with a general statement of what is expected of the student from the
task (recite a poem by Edgar Allan Poe) and then breaks down the general objective into easily
observable behaviors when reciting a poem. The specific objectives identified constitute the learning
competencies for his particular task. As in the statement of objectives using Bloom's taxonomy, the
specific objectives also range from simple observable processes to more complex observable
processes e.g. creating an ambiance of the poem through appropriate rising and falling intonation. A
competency is said to be more complex when it consists of two or more skills.
* Recite a poem with feeling using appropriate voice quality, facial expressions and hand gestures;
2. 2 Task Designing
Learning tasks need to be carefully planned. In particular, the teacher must ensure that the
particular learning process to be observed contributes to the overall understanding of the subject or
course. Some generally accepted standards for designing a task include:
* Identifying an activity that would highlight the competencies to be evaluated e.g. reciting a poem,
writing an essay, manipulating the microscope.
* Identifying an activity that would entail more or less the same sets of competencies. If an activity
would result in too many possible competencies then the teacher would have difficulty assessing the
student's competency on the task.
* Finding a task that would be interesting and enjoyable for the students. Tasks such as writing an
essay are often boring and cumbersome for the students.
Ask them to find all living organisms as they can find living near the pond or creek. Also, bring them to
the school playground to find as many living organisms as they can. Observe how the students will
develop a system for finding such organisms, classifying the organisms and concluding the differences
in biological diversity of the two sites.
Science laboratory classes are particularly suitable for a process-oriented performance - based
assessment technique.
Rubric is a scoring scale used to assess student performance along a task-specific set of criteria.
Authentic assessment typically are criterion-refferenced measures that is, the student's aptitude on a
task is determined by matching the student's performance against a set of criteria to determine the
degree to which the student performance against a pre-determined set of criteria, a rubric, or scoring
scale which contains the essential criteria for the task and appropriate levels of performance for each
criterion is typically created. For example, the following rubric (scoring scale?) covers the recitation
portion of a task in English.
Recitation Rubric
Criteria 1 2 3
Number of x1
Appropriate Hand
Gestures 1-4 5-9 10-12
Voice Inflection X2 Monotone voice Can vary voice Can easily vary
used inflection with voice inflection
difficulty
As in the above example, a rubric is comprised of two components: criteria and levels of
performance. Each rubric has at least two levels of performance. The criteria, characteristics of good
performance on a task, are listed in the lefthand column in the rubric above (number of hand
gestures, appropriate facial features, voice inflection and ambiance). Actually, as is common in
Rubrics, a shorthand is used for each criterion to make it fit easily into the table. The full criteria are
statements of performance such as “include a sufficient number of hand gestures” and “recitation
captures the ambiance through appropriate feelings and tone in the voice”.
For each criterion, the evaluator applying the rubric can determine to what degree the student has
met the criterion, i.e., the level of performance. In the above rubric, there are three levels of
performance for each criterion. For example, the recitation can contain lots of inappropriate, few
inappropriate or no inappropriate hand gestures.
Finally, the rubric above contains a mechanism for assigning a scored to each project.
(Assessments and their accompanying rubrics can be used for purposes other than evaluation and,
thus, do not have to have points or grades attached to them). In the second-to-left column a weight is
assigned each criterion. Students can receive 1,2 or 3 points for “number of sources”. But appropriate
ambiance, more important in this teacher's mind, is weighted three times (x3) as heavily. So, students
can receive 3,6 or 9 points (i.e., 1,2 or 3 times 3) for the level of appropriateness in this task.
Descriptors
The above rubric includes another common, but not a necessary, component of rubrics—
descriptors. Descriptors spell out what is expected of students at each level of performance for each
criterion. In the above example, “lots of historical inappropriate hand gestures”, “monotone voice
used” are descriptors. A descriptor tells students more precisely what performance looks like at each
level and how their work may be distinguished from the work of others for each criterion. Similarly,
the descriptors help the teacher more precisely and consistently distinguish between student work.
Clearer expectations
It is very useful for the students and the teacher if the criteria are identified and communicated
prior to completion of the task. Students know what is expected of them and teachers know what to
look for in student performance. Similarly, students better understand what good (or bad)
performance on a task looks like if levels of performance are identified, particularly if descriptors for
each level are included.
In addition to better communicating teacher expectations, levels of performance permit the teacher
to more consistently and objectively distinguish between good and bad performance, or between
superior, mediocre and poor performance, when evaluating student work.
Better feedback
Furthermore, identifying specific levels of student performance allows the teacher to provide more
detailed feedback to students. The teacher and the students can more clearly recognize areas that
need improvement.
For a particular task you assign the students, do you want to be able to assess how well te students
perform on each criterion, or do you want to get a more global picture of the students' performance
on the entire task? The answer to that question is likely determine the type of rubric you choose to
create or use —Analytic or holistic.
Analytic rubric
Most rubrics, like the Recitation rubric above, are analytic rubrics. An analytic rubric articulates
levels of performance for each criterion so the teacher can assess student performance on each
criterion.
Using the Recitation rubric, a teacher could assess whether a student has done a poor, good or
excellent job of “creating ambiance” and distinguish that from how well the student did on “voice
inflection.”
Holistic rubric
In contrast, a holistic rubric does not list separate levels of performance for each criterion.
Instead, a holistic rubric assigns a level of performance by assessing performance across multiple
criteria as a whole. For example, the analytic research rubric above can be turned into a holistic rubric:
3 - Excellent Speaker
2 - Good Speaker
1 - Poor Spencer
Analytic rubrics are more common because teachers typically want to assess each criterion
separately, particularly for assignments that involve a larger number of criteria. It becomes more and
more difficult to assign a level of performance in a holistic rubric as the number of criteria increases.
As student performance increasingly varies across criteria it becomes more difficult to assign an
appropriate holistic category to the performance. In addition, an analytic rubric better handles
weighting of criteria.
So, when might you use a holistic rubric? Holistic rubrics tend to be used when a quick or gross
judgement needs to be made. If the assessment is a minor one, such as a brief homework assignment,
it may be sufficient to apply a holistic judgement (e.g., check, check-plus, or no-check) to quickly
review student work. But holistic rubrics can also be employed for more substantial assignments. On
some tasks it is not easy to evaluate performance on one criterion independently of performance on a
different criterion. For example, many writing rubrics are holistic because it is not always easy to
disentangle clarity from organization or content from presentation. So, some educators believe a
holistic or global assessment of student performance better captures student ability on certain tasks.
(Alternatively, if two criteria are nearly inseparable, the combination of the two can be treated as a
single criterion in an analytic rubric.)
There is no specific number of levels a rubric should or should not possess. It will vary
depending on the task and your needs. A rubric can have as few as two levels of performance (e.g., a
check-list) or as many as ... Well, as many as you decide is appropriate. Also, it is not true that there
must be an even number or an odd number of levels. Again, that will depend on the situation.
Generally, it is better to start with a smaller number of levels of performance for a criterion and
then expand, if necessary. Making distinctions in student performance across two or three broad
categories is difficult enough. As the number of levels increases, and those judgments become finer
and finer, the likelihood of error increases.
Thus, start small. For example, in an oral presentation rubric, amount of eye contact might be
an important criterion. Performance on that criterion could be judged along three levels of
performance: never, sometimes, always.
Although these three levels may not capture all the variation in student performance on the
criterion, it may be sufficient discrimination for your purposes. Or, at the least, it is a place to start.
Upon applying the three levels of performance, you might discover that you can effectively group your
studentsʼ performance in these three categories. Furthermore, you might discover that the labels of
never, sometimes and always sufficiently communicate to your students the degree to which they can
improve on making eye contact.
On the other hand, after applying the rubric you might discover that you cannot effectively
discriminate among student performance with just three levels of performance. Perhaps, in your
view, many students fall in between never and sometimes, or between sometimes and always, or
neither label accurately captures their performance. So, at this point, you may decide to expand the
number of levels of performance to include never, rarely, sometimes, usually and always.
There is no “right” answer as to how many levels of performance there should be for a criterion
in an analytic rubric; that will depend on the nature of the task assigned, the criteria being evaluated,
the student's involved and your purposes and preferences. For example, another teacher might
decide to leave off the “always” level in the above rubric because “usually” is as much as normally can
be expected or even wanted in some instances. Thus, the “makes eye contact” portion of the rubric
for that teacher might be:
We recommend that fewer levels of performance be included initially because such is:
CHAPTER EXERCISES
B. Choose any five activities below and then construct your own scoring rubrics.
2. Devise a game.
3. Participate in a debate.
5. Draw a picture that illustrates whatʼs described in a story or article. Explain what you have drawn,
using details from the story or article.
11. Develop a classification scheme for something and explain and justify the categories.
12. Justify one point of view on an issue and then justify the opposing view.
15. Combine information from several sources to draw a conclusion about something.
16. Determine alternative courses of action, giving advantages and disadvantages of each.
17. Analyze how a particular system works and the way the components work together to affect each
other.
19. Answer questions beginning, “What will happen if...” or “What would you do if...” or “How would
things be different if....”
20. Write a summary of an article.
21. Critique your own or someone elseʼs work, giving examples and details.
CHAPTER 3
The role of assessment in teaching happens to be a hot issue in education today. This has led to an
increasing interest in “performance-based education”. Performance-based education poses a
challenge for teachers to design instruction that is task-oriented. The trend is based on the premise
that learning needs to be connected to the lives of students through relevant tasks that focus on
student’s ability to use their knowledge and skills in meaningful ways. In this case, performance-based
tasks require performance-based assessments in which the actual students performance is assessed
through a product,such as a completed project or work that demonstrates levels of task achievement.
At times, performance-based assessment has been used interchangeably with “authentic assessment”
and “alternative assessments”. In all cases, performance-based assessment has led to the use of a
variety of alternative ways of evaluating student progress (journals,
checklist,portfolios,projects,rubrics etc.) as compared to more traditional methods of measurement
(paper-and-pencil testing.)
Student performances can be defined as targeted tasks that lead to a product or overall learning
outcome. Products can include a wide range of student works that target specific skills. Some
examples include communication skills such as those demonstrated in reading, writing, speaking, and
listening, or psychomotor skills requiring physical abilities to perform a given task. Target tasks can
also include behavior expectations targeting complex tasks that students are expected to achieve.
Using rubrics is one way that teachers can evaluate or assess student performance or proficiency in
any give task as it relates to a final product or learning outcome. Thus, rubrics can provide valuable
information about the degree to which a student has achieved a defined learning outcome based on
specific criteria that defined the framework for evaluation.
The learning competencies associated with products or outputs are linked with an assessment of
the level of “expertise” manifested by the product. Thus, product-oriented learning competencies
target at least three (3) levels: novice or beginnerʼs level, skilled level, and expert level. Such levels
correspond to Bloomʼs taxonomy in the cognitive domain in that they represent progressively higher
levels of complexity in the thinking processes.
There are other ways to state product-oriented learning competencies. For instance, we can
define learning competencies for products or outputs in the following way:
* Level 1: Does the finished product or project illustrate the minimum expected parts or
functions? (Beginner)
* Level 2: Does the finished product or project contain additional parts and functions on top of the
minimum requirements which tend to enhance the final output? (Skilled level)
* Level 3: Does the finished product contain the basic minimum parts and functions, have
additional features on top of the minimum, and is aesthetically pleasing? (Expert level)
Example: THE desired product is a representation of a cubic prism made out of cardboard in an
elementary geometry class.
2. be sturdy, made of durable cardboard and properly fastened together —(skilled specifications)
3. be pleasing to the observer, preferably properly colored for aesthetic purposes – (expert level)
Example : THE product desired is a scrapbook illustrating the historical event called EDSA I People
Power.
1. contain pictures, newspaper clippings and other illustrations for the main characters of EDSA I
People Power namely: Corazon Aquino, Fidel V. Ramos, Juan Ponce Enrile, Ferdinand E. Narcos,
Cardinal Sun. – (minimum specifications)
2. contain remarks and captions for the illustrations made by the student himself for the Roles
played by the characters of EDSA I People Power – (skilled level).
3. be presentable, complete, informative and pleasing to the reader of the scrapbook – (expert level).
Performance - based assessment for products and projects can also be used for assessing outputs of
short-term tasks such as the one illustrated below for outputs in a typing class. :
2.possess no more than 5 errors in spelling while observing proper format based on the document to
be typerwritten —(skilled level)
3. possess no more than 5 errors in spelling, has the proper format, and is readable and presentable —
(expert level).
Notice that in all of the above examples, product-oriented performance based learning competencies
are evidence-based. The teacher needs concrete evidence that the student has achieved a certain
level of competencies based on submitted products and projects.
How should a teacher design a task for product-oriented performance based assessment? The
design of the task in this context depends on what the teacher desires to observe as outputs of the
students. The concepts that may be associated with task designing include:
a. Complexity. The level of complexity of the project needs to be within the range of ability of
the students. Projects that are too simple tend to be uninteresting for the students while projects that
are too complicated will most likely frustrate them.
b. Appeal. The projector or activity must be appealing to the students. It should be interesting
enough so that students are encouraged to pursue the task to completion. It should lead to self-
discovery of information by the students.
c. Creativity. The project needs to encourage students to exercise creativity and divergent
thinking. Given the same set of materials and project inputs, how does one best present the project?
It should lead the students into exploring the various possible ways of presenting the final output.
d. Goal-Based. Finally, the teacher must bear in mind that the project is produced in order to
attain a learning objective. Thus, projects are assigned to students not just for the sake of producing
something but for the purpose of reinforcing learning.
Example: Paper folding is a traditional Japanese art. However, it can be used as an activity to teach
the concept of plane and solid figures in geometry. Provide the students with a given number of
colored papers and ask them to construct as many plane and solid figures from these papers without
cutting them (by paper folding only)
Exercise 3.1
7. Identify similarities and differences of at least two major dialects in the Philippines.
For instance, scoring rubrics can be most useful in grading essays or in evaluating projects suvh as
scrapbooks. Judgement concerning the quality of a given writing sample may vary depending upon
the criteria established by the individual evaluator. One evaluator may heavily weigh the evaluation
process upon the linguistic structure, while another evaluator may be more interested in the
persuasiveness of the argument. A high quality essay is likely to have a combination of these and
other factors. By developing a pre-defined scheme for the evaluation process, the subjectivity
involved in evaluating essay becomes more objective.
The criteria for the scoring rubrics are statements where identify "what really counts" in the final
output. The following are the most often used major criteria for product assessment.
• Quality
• Creativity
• Comprehensiveness
• Accuracy
• Aesthetics
From the major criteria, the next yask is to identify substatements that would make the major
criteria mote focused and objective. For instance, if we were scoring an essay on " Three hundred
years of Spanish Rule in the Philippines" the major Criterion " Quality" may possess the following
substatements:
• Ifentifies the key players in each period of the Spanish rule and the roles that they played.
• Succed in relating the history of the Philippine Spanish Rule ( rated as professional, not quite
professional, and novice).
The example below displays a scoring rubrics that was developed to aid in the evaluation of essays
writen by college students in the classroom ( based loosely on Leyden's & Thompson, 1997 ). The
scoring rubrics in this particular example exemplifies ehat is called a "holistic scoring rubric". It will be
noted at each score category describes the characteristics of a response that would receive the
respective score. Describing the Characteristics of responses within each score category increases the
likelihood that two independent evaluators would assign the same score to a given response. In
effect, this increases the objectivity of the assessment procedure using rubrics. In the language of test
and measurement, we are actually increasing the " inter-rater reliability".
Substatements:
* The document can be easily followed. A combination of the following are apparent in the
document:
3. The graphics are descriptive and clearly support the document's purpose.
* The document is clear and concise and appropriate grammar is used throughout.
Adequate
* The document can be easily followed. A combination of the following are apparent in the
document :
3. Some supporting graphics are provided, but are not clearly explained.
* The document contains minimal distractions that appear in a combination of the following forms:
1. Flow in thought
2. Graphical presentations
3. Grammar/mechanics
Needs Improvement
2. Rambling format
4. Ambiguous graphics
* The document contains numerous distractions that appear in the a combination of the following
forms:
1. Flow in thought
2. Graphical presentations
3. Grammar/mechanics
*Inadequate
Grading essays is just one example of performances that may be evaluated using scoring rubrics.
There are many other instances in which scoring rubrics may be used successfully: evaluate group
activities, extended projects and oral presentations (e.g., Chicago Public Schools, 1999;
Danielson,1997a;1997b; Schrock, 2000; Moskal, 2000). Also, rubrics scoring cuts across disciplines and
subject matter for they are equally appropriate to the English, Mathematics and Science classrooms
(e.g., Chicago Public Schools, 1999; State of Colorado, 1999; Danielson, 1997a; 1997b; Danielson &
Marquez, 1998; Schrock, 2000).
Where and When a scoring rubric is used does not depend on the grade level or subject, but rather on
the purpose of the assessment.
Other Methods
Authentic assessment schemes apart from scoring rubrics exist in the arsenal of a teacher. For
example, checklist may be used rather then scoring rubrics in the evaluation of essays. Checklist
enumerate a set of desirable characteristics for a certain product and the teacher marks those
characteristics which are actually observed. As such, checklists are an appropriate choice for
evaluation when the information that is sought is limited to the determination of whether specific
criteria have been met. On the other hand, scoring rubrics are based on descriptive scales and support
the evaluation of the extent to which criteria have been met.
The ultimate consideration in using a scoring rubrics for assessment is really the "purpose of the
assessment”. Scoring rubrics provide at least two benefits in the evaluation process. First, they
support the examination of the extent to which the specified criteria have been reached. Second, they
provide feedback to students concerning how to improve their performances. If these benefits are
consistent with the purpose of the assessment, then a scoring rubric is likely to be an appropriate
evaluation technique.
In the development of scoring rubrics, it is well to bear in mind that it can be used to assess or
evaluate specific tasks or general or broad category of tasks. For instance, suppose that we are
interested in assessing the student’s oral communication skills. Then, a general scoring rubric may be
developed and used to evaluate each of the oral presentation given by that student. After each such
oral presentation of the students, the general scoring rubrics is shown to the students which then
allows them to improve on their previous performances. Scoring rubrics have this advantage of
instantaneous providing a mechanism for immediate feedback.
In contrast, suppose now that the main purpose of the oral presentation is to determine the student’s
knowledge of the facts surrounding the EDSA I revolution, then perhaps a specific scoring rubrics
would be necessary. A general scoring rubric for evaluating a sequence of presentation may not be
adequate since, in general, events such as EDSA I (and EDSAII) differ on the surrounding factors (what
caused the revolutions) and the ultimate outcomes of these events. Thus, to evaluate the student’s
knowledge of these events, it will be necessary to develop specific rubrics scoring guide for each
presentation.
The development of scoring rubrics goes through a process. The first step in the process entails the
identification of the qualities and attributes that the teacher wishes to observe in the student’s
outputs that would demonstrate their level of proficiency. (Brookhart, 1999) These qualities and
attributes from the top level of the scoring criteria for the rubrics. Once done, a decision has to be
made whether a holistic or an analytical rubric would be more appropriate. In an analytic scoring
rubrics, each criterion is considered one by one and the descriptions of the scoring levels are made
separately. This will then result in separate descriptive scoring schemes for each of the criterion or
scoring factor. On the other hand, for holistic scoring rubrics, the collection of criteria is considered
throughout the construction of each level of the scoring rubric and the result is a single descriptive
scoring scheme.
The next step after defining the criteria for the top level of performance is the identification and
definition of the criteria for lowest level of performance. In other words, the teacher is asked to
determine the type of performance that would constitute the worst performance or a performance
which would indicate lack of understanding of the concepts being measured. The underlying reason
for this step is for the teacher to capture the criteria that would suit a middle level of performance for
the concept being measured. In particular, therefore, the approach suggested would result in at least
three levels of performance.
It is of course possible to make greater and greater distinctions between performances. For instance,
we can compare the middle level performance expectations with the best performance criteria and
come up with an above average performance criteria; between the middle level performance
expectations and the worst level of performance to come up with a slightly below average
performance criteria and so on. This comparison process can be used until the desired number of
score level is reached or until no further distinction can be made. If meaningful distinction between
the score categories cannot be made, then additional score categories should not be created
(Brookhart, 1999). It is better to have a few meaningful score categories then to have many score
categories that are difficult or impossible to distinguish.
A note of caution, it is suggested that each score category should be defined using descriptors of the
work rather than value-judgement about the work (Brookhart, 1999). For example, “Student’s
sentences contains no errors in subject-verb agreements,” is preferable over, “Student’s sentence are
good.” The phrase “are good” requires the evaluator to make a judgement whereas the phrase “no
errors” is quantifiable. Finally, we can test whether our scoring rubrics is “reliable” by asking two or
more teachers to score the same set of projects or outputs and correlate their individual assessments.
High correlation between the raters imply high interrater reliability. If the scores assigned by teachers
differ greatly, then such would suggest a way to refine the scoring rubrics we have developed. It may
be necessary to clarify the scoring rubrics so that they would mean the same thing to different scores.
Resources
Currently, there is a broad range of resources available to teachers who wish to use scoring
rubrics in their classrooms. These resources differ both in the subject that they cover and the level
that they are designed to assess. The examples provided below are only a small sample of the
information that is available.
For K-12 teachers, the State of Colorado (1998) has developed an on-line set of general, holistic
scoring rubrics that are designed for the evaluation of various writing assessments. The Chicago Public
Schools (1999) maintain an extensive electronic list of analytic and holistic scoring rubrics that span
the broad array of subjects represented throughout K-12 education. For mathematics teachers,
Danielson has developed a collection of reference books that contain scoring rubrics that are
appropriate to the elementary, middle school and high school mathematics classrooms (1997a, 1997b;
Danielson & Marquez, 1998).
Resources are also available to assist college instructors who are interested in developing and using
scoring rubrics in their classrooms. Kathy Schrock's Guide for Educators (2000) contains electronic
materials for both the pre-college and the college classroom. In The Art and Science of Classroom
Assessment: The Missing Part of Pedagogy, Bookhart(1999) provides a brief, but comprehensive
review of the literature on assessment in the college classroom. This includes a description of scoring
rubrics and why their use is increasing in the college classroom. Moskal (1999) has developed a Web
site that contains links to a variety of college assessment resources, including scoring rubrics.
The resources described above represent only a fraction of those that are available. The ERIC
Clearinghouse on Assessment and Evaluation [ERIC/AE] provides several additional useful Web sites.
One of these, Scoring Rubrics - Definition & Constructions (2000b), specifically addresses questions
that are frequently asked with regard to scoring rubrics. This site also provides electronic links to Web
resources and bibliographic references to books and articles that discuss scoring rubrics. For more
recent developments within assessment and evaluation, a search can be completed on the abstracts
of papers that will soon be available through ERIC/AE (2000a). This site also contains a direct link to
ERIC/AE abstracts that are specific to scoring rubrics.
Search engines that are available on the Web may be used to locate additional electronic resources.
When using this approach, the search criteria should be as specific as possible. Generic searches that
use the terms “rubrics” or “scoring rubrics” will yield a large volume of references. When seeking
information on scoring rubrics from the Web, it is advisable to use an advanced search and specify the
grade level, subject area and topic of interest. If more resources are desired than result from this
conservative approach, the search criteria can be expanded.
CHAPTER EXERCISES
C. What factors determine the use of a scoring rubrics over other authentic assessment procedures?
Discuss.
D. Identify and describe the process of developing scoring rubrics for product-oriented performance-
based assessment.
12. Laboratory output in “Determining the gravitational constant using a free fall experiment”
Chapter References
Brookhart, S. M. (1999). The Art and Science of Classroom Assessment: The Missing Part of Pedagogy.
ASHE-ERIC Higher Education Report (Vol. 27,No.1). Washington, DC: The George Washington
University, Graduate School of Education and Human Development.
Danielson, C. (1997b). A Collection of Performance Tasks and Rubrics: Middle School Mathematics.
Larchmont, NY: Eye on Education Inc.
Danielson, C. (1997b). A Collection of a Performance Tasks and Rubrics: Upper Elementary School
Mathematics. Larchmont, NY:Eye on Education Inc.
Danielson, C & Marquez, E. (1998). A Collection of Performance Tasks and Rubrics: High School
Mathematics. Larchmont, NY: Eye on Education Inc.
ERIC/AE (2000a). Search ERIC /AE draft abstracts. [Available online at: http://ericae.net/sinprog.htm].
ERIC/AE (2000b). Scoring Rubrics - Definitions & Construction [Available online at:
http://ericae.net/faqs/rubrics/scoring_rubrics.htm].
Knecht, R., Moskal, B. & Pavelich, M. (2000). "The Design Report Rubric: Measuring and Tracking
Growth through Success”, Paper presented at the annual meeting of the American Society for
Engineering Education.
Leydens, J. & Thompson, D. (August, 1997), Writing Rubrics Design (EPICS) I, Internal Communication,
Design (EPICS) Program, Colorado School of Mines.
Shrock, K. (2000). Kathy Schrock's Guide for Educators. [Available online at:
http://school.discovery.com/schrockguide/assess.htm1].