Professional Documents
Culture Documents
1993 - Brookhart - Teachers Grading Practices Meaning and Values
1993 - Brookhart - Teachers Grading Practices Meaning and Values
An earlier version of this article was presented at the Annual Meeting of the
American Educational Research Association, April 1992, San Francisco.
123
Brookhart
project was only 25% of the grade. The C is bringing his grade down
[RU--this teacher, who chose to assign a C instead of an F (straight
average) or an A (ignores missing student project), is looking for
corroborating evidence for the level of work reflected in the grade
assigned].
3. Value Implications ( V I ) - - W h a t does the grade m e a n when it is assigned
to a student, and of what value is it? H e r e is an example of a response to a
scenario about a student getting a B on the first test and improving to an A
on the second test.
Assuming the average came out to a B [CV--this teacher thinks a grade
is an average of points], I would give him the B. I don't think he should
receive a higher grade simply because he scored higher on the second
test [RU--the teacher is considering the relevance of the evidence].
What if the situation were reversed, and someone got an A first, and a B
second, and wouldn't get the A because they didn't improve. That
wouldn't be fair [VI--the teacher is considering the fairness of her
treatment].
4. Social Consequences ( S C ) - - W h a t does the grade m e a n when it is as-
signed to a student, and of what value is it, and what will happen because
of it? This is an example of a response to a scenario about a low-ability
student who worked hard but missed a passing grade by 2 points.
For some children learning science doesn't come very easily ]CV--this
teacher thinks a grade reflects learning]. If I felt Barbara was really
trying her best [RU--the teacher would look for corroborating evidence
for this], then I think she deserves a passing grade [VI--the teacher is
thinking about values, in this case about what merits or deserves special
treatment]. This would help motivate her to keep working [SC--the
teacher thinks a potential consequence of her grading decision would be
continued student work].
Method
Sample
T h e population of interest was practicing classroom teachers. Students
enrolled in MSEd classes at Duquesne University were selected for sample if
they met two criteria: if they were (a) certified teachers who were (b) currently
employed in classroom positions. The sample included two groups: (a) students
from educational m e a s u r e m e n t classes, surveyed at the end of the course so
that they had received m e a s u r e m e n t instruction, and (b) a comparison group of
teachers, surveyed in other MSEd classes, who thus had received no measure-
m e n t instruction. The m e a s u r e m e n t course is required for master's programs in
special education and in reading, but it is not required for master's programs in
school administration and supervision. Thus the major difference between the
m e a s u r e m e n t and n o n m e a s u r e m e n t groups, which were similar in most re-
spects, was a difference in program. The m e a s u r e m e n t course included instruc-
tior both in classroom assessment and in the interpretation of standardized test
results as appropriate for classroom teachers.
126
Teachers' GradingPractices
Data were collected in spring, summer, and fall terms in 1991. Final sample
size included 84 teachers (80% were female), 40 with measurement instruction
and 44 without. Years of teaching experience ranged from 1 to 25 with a
median of 5. The sample included teachers from all grade levels: K-4 (32%),
5-8 (30%), 9-12 (23%), other (16%).
Instrumentation
The instrument presented scenarios about grading and multiple choices for
responses about what the teacher would do in each situation. An open-ended
question, "Why did you make this choice," asked teachers to explain their
reasons in their own words. The scenarios have been used in other research
about grading practices (Manke & Loyd, 1990) to investigate what achievement
and nonachievement factors teachers used in grading. The open-ended ques-
tion was added for this study to probe the reasons behind the choices. Because
one of the purposes of this study was to investigate the degree to which value
judgments are part of teachers' grading practices as well as whether knowledge
about the many uses of grades is part of this process (Brookhart, 1991), a
coding and scoring scheme based on Messick's model of validity was used.
Messick's consideration of interpretation and use, evidence, and consequences
formed the theoretical basis for the research questions and, therefore, for the
analysis.
A pilot test in spring 1991 investigated the feasibility of this scoring scheme.
The pilot test also compared the responses to two versions of the instrument,
each of which included three scenarios about effort commensurate with ability,
two scenarios about missing work, and two scenarios about improvement. The
version that prompted the greater variation among responses was selected for
this study. The seven scenarios from the main study version are included in
Tables 1 through 3.
Three reasons support the validity of using these three scenario types
(effort/ability, missing work, and improvement). First, these scenarios repre-
sent common grading contexts for classroom teachers. These contexts each
require a decision that has at least two options, thereby allowing teachers'
reasoning to show. Second, these scenarios were used in prior research that
investigated teachers' use of nonachievement factors in grading (Manke &
Loyd, 1990). Another of the purposes of the present study was to probe the
reasons behind these practices. Third, empirical evidence for construct validity
was examined. The scenarios behaved as expected in eliciting responses. For
example, effort/ability scenarios drew more responses about grades as payment
for work done, and missing work scenarios drew more responses about treating
students justly.
Analysis
Open-ended responses were coded, according to Messick's (1989b) catego-
ries, as to whether the reason given considered construct validity (CV),
relevance/utility (RU), value implications (VI), and/or social consequences
(SC). Examples of this process may be found in the section above, Messick's
127
Brookhart
theory of validity applied to grading. Two raters coded 588 written responses
with an agreement of 97%. The 3% of disagreements were resolved by
discussion. A score was assigned to each written answer; scale value depended
on the degree to which comments progressed from a concern with only the
grade itself to a concern with values and consequences (1 = CV, 2 = RU,
3 = VI, 4 = SC). A response was scored according to the highest level reflected
in the comments. Internal consistency reliability (alpha) for responses scored in
this manner was .65, and the scale was additive. Generalizability (based on
seven items) was .65 for relative decisions and .64 for absolute decisions; the
research questions for this study called for both kinds of interpretations.
In this article, choices will be used to mean the multiple-choice responses to
the scenarios, indicating what the teacher would do in that situation. Scores will
be used to mean the scale values of the open-ended responses to the question,
"Why did you make this choice," after each scenario, indicating progressive
reasoning according to Messick's categories. Descriptive statistics were calcu-
lated for the choices and scores for each item. Crosstabs and chi-square tests
examined whether item score varied by answer choice. Crosstabs and chi-
square tests also examined variation in choices and scores by measurement and
grade level. Mann-Whitney tests investigated whether the degree of concern
for values and consequences, indicated by scores for each item on Messick's
scale, differed by measurement instruction.
After each answer was scored to indicate the degree of consideration of
values and consequences, qualitative analysis was done within each of the four
Messick categories, using the constant comparative method. In this method,
data are examined comparatively, and then related indicators are labeled as to
the class they represent (Strauss, 1987). For example, all of the comments
scored 1 (CV) were compared and sorted as indicators of 11 concepts (the
subcategories in Table 4); these concepts were then organized at a higher level
of abstraction into the four categories in Table 4.
CV and RU statements were examined to inform research Question 1, what
do teachers have in mind as the meaning of the construct grade. VI and SC
statements were examined to inform research Question 2, what value judg-
ments do teachers make regarding their grading. Using both quantitative and
qualitative methods on the same data allowed analysis of both the degree of
attention teachers paid to the uses and consequences of grades and what they
thought about these issues. Neither response choice, nor scores indicating level
of thinking, nor qualitative response categories indicating substance of thinking
differed significantly by the grade level at which the respondent taught;
therefore, grade levels were aggregated for analysis.
Results
Scenario Choices
Tables 1 through 3 present the choices respondents made to each of the
seven scenarios. All three of the scenarios in Table 1, about working to ability,
128
Teachers' Grading Practices
Table I
Results for ~.narios aboutWor1~m~to Ability
n %
68 81 GrMe C~ris on the ~aliLy of her work in co~arison to the class,
without being concernedabout the amountof work that she could have
done.
Score
Choice 1 2 3 4 Total
0.~ 33 7 19 9 68
I,ouergra~ 3 1 3 6 13
Raise grade 0 i 0 2 3
36 9 22 17 84
129
Brookhart
Table I, e ~
~esultsfor Scemri~ abut ~rkin; to ~ility
nt
79 94 RaiseIk~'b~'s ~ and ass£~ her a Dfor the effort she has sllo~m.
n t
4 5 Assi~ Sandya l~e~ qrade because she couldhaveput in ~re effort in
yoorclass and couldhave~ e better.
130
Teachers' Grading Practices
~ie 2
~ult~ for Scor~tiou@out~ i ~ Work
n t
4251 k ~ i ~ Terry a 0 ~or the pro~-t anclan F on his report card beeat~e h2s
a~ r ~ d be 681.
Response choices for two scenarios differed by the degree of reasoning about
values and consequences displayed in the written comments. In both cases,
respondents were more likely to invoke arguments about social consequences
when deviating from grading on a straight average. This principle held whether
the deviation was against the popular choice (Table 1, Chris's scenario) or with
the popular choice (Table 3, David's scenario).
Substantive Comments
Table 4 presents the results of a qualitative analysis of all the written
responses that were scored 1 (CV)---that is, where the thinking reflected in the
comments indicated simply what the teacher thought grades meant. For most
of the teachers, a grade was a form of payment to students. Students earned
grades or points for the work they did. According to the teachers, grades
131
Brookhart
functioned as the coin of the realm. Fewer comments said grades were to
indicate academic achievement, either absolute or self-referenced.
Table 5 presents a qualitative analysis of all the written comments that were
scored 2 ( R U ) - - t h a t is, where the thinking reflected in the comments indicated
both what the grade meant and where the teacher would look for confirming
evidence to use with that grade to send home with a student. For most of the
relevant evidence, teachers would look to other aspects of student perfor-
mance: whether a student had shown effort, whether other academic measures
agreed with the grade, and whether there were mitigating circumstances for
classroom performance. In a smaller category of responses, some teachers
indicated they would look for evidence about whether they had made an effort
to remind, encourage, and notify the student about progress.
The written responses analyzed in Tables 4 and 5 were limited to statements
that defined what grades were supposed to mean and where evidence to use a
particular grade might come from. These CV and R U responses illustrate the
meaning grades have for teachers. The remainder of the written comments,
~le 3
~sults for S~r~ri~ ~ I~ro~t
You are a hi~ schoolal~ra ~ . h ~ur classof ~menl and ~cad~ic tra~ studmts,
~ou givetwo tests ln ea~ gradingperiod. David'sscoreon the firsttestwas an L Onthe
secondtest, he obtai~M a low D. In this sitmtion, you z~M
n %
Smre
24 22 10 22 78
(continued)
Note. From Manke and Loyd, 1990, p. 40. Adapted by permission.
132
Teachers' Grading Practices
?able 3, eou~nued
Results for Scenariosabout la~rove~.nt
You are a biology tsaeher of a hi~ school class which consists of stodentswith varying
ability levels. For ~ class ~u give two exams in each term. As you emlputs Bernie's
grade for this tam, you see @at on the first~,, he obtaineda score eqivalentto a B and
on the second exam, a low k. In this situation,you would
n %
~tructins
Assign B 24 34 5B
Assign A 15 6 21
Total 39 40 79
For each of the improvmnt scenarios, 5 out of the 84 subjects (6t) choue not to pick an
answer, even thou~a some of ~ rnspoedentsmade written~ .
Note. From Loyd, 1991. Adapted by permission.
coded VI or SC, included comments about the meaning of grades and evidence
for that meaning but went further to also include comments about the
consequences of that meaning, either by referring to the value implications
inherent in grade interpretation or to the social consequences expected for
grade use. The results from the qualitative analyses of these responses illus-
trate the value judgments and reasoning about consequences teachers apply in
their grading practices.
Table 6 presents the qualitative analysis of all the written comments that
were scored 3 (VI)--that is, where the thinking reflected in the comments
included value statements about grade interpretation. The single word used
most often in these comments was fair. These and a cluster of related comments
were all appeals to justice. Treating all students equally, enforcing the require-
133
Brookhart
Table 4
co~str~ ~IMit~ Ca~ies
(CommentsiMioate~at the teac~er Lakes as evi~moeaboutt~ grMe's ~ h ~ )
(n teachers, n comets)*
SuMa~ (n Leachers, n commts) Samplecomeg:
Grademea~aca~o ac~imme~t(2~,n)
Peffon~e (19,27) ~ades are a refleetfonof ~h/~.
* Total n of t e ~ t ~ = 84
pouilale ~ (84 teachers 17 ~ i 0 s ) = ~8
ments, and doing it consistently were the themes o f these fairness and justice
comments. A smaller n u m b e r of comments were value statements that had to
do with mercy, although most of these were also couched in legalistic terms;
rewarding effort makes a statement to students that effort is important.
Table 7 presents the qualitative analysis of all the written comments that
were scored 4 ( S C ) - - t h a t is, where the thinking reflected in the comments
indicated how the teacher considered the consequences of giving a particular
grade. Most of these c o m m e n t s referred to school consequences, although not
134
Teachers' Grading Practices
Table 5
all of these were academic. The two largest categories of school consequences
considered were changes in student effort and attitude. A good number of
comments referred to consequences beyond the school, and most of these were
considerations of how the grades would affect student self-esteem and confi-
dence.
Table 6
val~ Implio~tioncat~ori~
(cm~ta ia~li~at~hn~ the ~ o~ida~ the ~ of intaprct~ a ~r~)
* Total n of ~ = 84
Totalpossibleemmmts (B~ maelm~ x 7 s~ari~) --588
0~e teael~'s e~lmnt (If you can get a hi~ gradewith~ ll~kinq,lore ~ to you!)was
s~ed as a valuestatmnt but not includedin the eat~ory coding.
136
Teachers" Grading Practices
Table 7
S~ial Conseque~ Categories
(co~ts indicatebow tha teadaerconsidersthe cow~uenc~of using a grade)
Schooleon~ea~s (32,54)
harder (20,29) e), assig~L~] tJe O/'~ F], tJ~ By encoaraqeOarid to try
harder and gi~,e him ~/~ntive ~o succeed.
Fubaregrades (9,9) ~yha iS sha sees a n sha'11 realize if sha did some~aetork
it ~ald comeup.
Consequencesoutsideschool (30,44)
Self-estea (22,31) ,I, & would do JEe Zor ~'n/e's sel£-esteei and conIide~e
leve/.
* Total n of teache~ = 84
possible eomBts (84 ~ X 7 see~ios) -- 588
without to talk about a self-referenced meaning. Within this category, all of the
ability comments and two thirds of the improvement comments came from
teachers without measurement instruction. Describing where they would look
for confirming evidence to use with the grade, those with measurement
instruction were much less likely than those without to talk about evidence
other than student performance. These two differences are consistent with the
content of an introductory course in educational measurement and are also
consistent with this study's hypothesis that measurement instruction makes a
difference in how teachers think about the meaning of grades but not in the
amount or kind of thinking they do about the value implications and social
137
Brookhart
Table 8
~ae Relationship of lkmurement ImtnctJon to emerita about the
andValuesof GradeInterpretatioe and Ore
(n teael~, n emem)
consequences of grades. Note also that only two categories of thought about
grade meaning differed with measurement instruction. The image of grades as
classroom currency was clearly envisioned by both groups of teachers.
Discussion
The following discussion considers the issues raised by these results. A clear
limit on the findings of this study is the nature of the sample. All of the teachers
were master's students at Duquesne University. There is also a confounding
with degree program. The findings do not generalize beyond this institution.
The study's findings are potentially important enough to invite replication with
a broader population.
This study contributes to the theory of educational measurement by helping
to further define and describe the constructs teachers use in grading. The study
helps ground the content of educational measurement courses in the classroom
context. The results of this study support the conclusion of Stiggins and his
138
Teachers' GradingPractices
Meaning of Grades
The meaning of the construct grade, for teachers, is closely related to the idea
of student work. Teachers used phrases like "the work (s)he did" to describe
grades or grading. Performance or perform were words often used, in the sense
of something done or accomplished. To teachers, grades are something stu-
dents earn; they are compensation for a certain amount of work done at a
certain level. Thus, achievement is part of the construct but not the whole of it.
Among teachers, a more common image than achievement is that of grades as
currency; this image is evident in teachers' frequent use of the words earn, work,
and perform. The teachers' emphasis is on the activities students perform, not
what grades indicate about theoretical achievement constructs.
When deciding what grade a student has earned, the teacher functions as a
judge of student performance. While measurement instruction seemed to help
teachers see that self-referenced grades were not very interpretable, it did not
change their minds about the image of grades as pay. This may be the key to the
relationship between grading and classroom management, which Stiggins
(1988) points out needs to be investigated. Grades seem to be used in a kind of
academic token economy, and they function in classroom management as the
reward for work done.
It is worth wondering whether the concept of grades as pay is related to the
thinking teachers do about value issues in grading, especially the issue of
fairness. On the face of it, the legalistic version of fairness represented in the
teachers' comments does not seem very lofty. Is this because teachers must
maintain credibility with students who are at a legalistic level of moral
development? Or is it, at least in part, related to some of the economic
concepts of fairness: equal pay for equal work, even perhaps a piecework or
hourly wage notion? An economic interpretation fits with the use of grades~as
classroom currency and might help explain the teachers' generally legalistic
level of ethical thinking. A legalistic orientation, in turn, is consistent with
many current classroom management practices. This economic/legalistic mech-
anism is a potential theoretical framework for investigating the role of grades in
classroom management.
139
Brookhart
References
Barnes, S. (1985). A study of classroom pupil evaluation: The missing link in teacher
education. Journal of Teacher Education, 36(4), 46--49.
Bishop, J. H. (1992). Why U.S. students need incentives to learn. Educational Leader-
ship, 49(6), 15-18.
Briscoe, C. (1991, April). Making the grade: Perspectives on a teacher's assessment
practices. Paper presented at the Annual Meeting of the American Educational
Research Association, Chicago.
Brookhart, S. M. (1991). Letter: Grading practices and validity. Educational Measure-
ment: Issues and Practice, 10(1), 35-36.
Brookhart, S. M., & Freeman, D. J. (1992). Characteristics of entering teacher candi-
dates. Review of Educational Research, 62, 37-60.
Friedman, S. J., & Manley, M. (1991, April). Grading practices in the secondary school:
Perceptions of the stakeholders. Paper presented at the Annual Meeting of the
National Council on Measurement in Education, Chicago.
Loyd, B. H. (1991). Survey on grading practices. Unpublished manuscript, University of
Virginia, Charlottesville.
Manke, M. P., & Loyd, B. H. (1990, April). An investigation ofnonachievement related
factors influencing teachers' grading practices. Paper presented at the Annual Meeting
of the National Council on Measurement in Education, Boston.
Manke, M. P., & Loyd, B. H. (1991, April). A study of teachers' understanding of their
grading practices. Paper presented at the Annual Meeting of the American Educa-
tional Research Association, Chicago.
Messiek, S. (1989a). Validity. In R. L. Linn (Ed.), Educational measurement (3rd ed., lbp.
13--103). New York: Macmillan.
Messick, S. (1989b). Meaning and values in test validation: The science and ethics of
assessment. Educational Researcher, 18(2), 5-11.
Stiggins, R. J. (1988). Revitalizing classroom assessment: The highest instructional
priority. Phi Delta Kappan, 68, 363-368.
141
Brookhart
Author
SUSAN M. BROOKHART is Assistant Professor, School of Education, Duquesne
University, 600 Forbes Ave., Pittsburgh, PA 15282. Degree: PhD, Ohio State Univer-
sity. Specializations: classroom assessment and teacher education.
142