Download as pdf or txt
Download as pdf or txt
You are on page 1of 20

Journal of Educational Measurement

Summer 1993, Vol. 30, No. 2, pp. 123-142

Teachers' Grading Practices: Meaning and Values


Susan M. Brookhart
Duquesne University

Classroom teachers do not always follow recommended grading practices. Why


not? It is possible to conceptualize this question as a validity issue and ask whether
teachers' concerns over the many uses of grades outweigh concerns about the
interpretation of grades. The purpose of this study was to investigate the meaning
classroom teachers associate with grades, the value judgments they make when
considering grades, and whether the meaning or values associated with grades
differed by whether teachers had measurement instruction. A sample of 84
teachers, 40 with and 44 without measurement instruction, responded to classroom
grading scenarios in two ways----with multiple-choice responses indicating what
they would do and with written responses to the question, "Why did you make this
choice ?" A coding scheme based on Messick 's (1989a, 1989b) progressive matrix
of facets of validity was used for quantitative and qualitative ana~ses of written
responses. The meaning of grades is closely related to the idea of student work;
grades are pay students earn for activities they perform. The relationship of this
notion to classroom management should be investigated. Teachers do make value
judgments when assigning grades and are especially concerned about being fair.
Teachers also are concerned about the consequences of grade use, especially for
developing student self-esteem and good attitudes toward future school work.
Measurement instruction made very little difference, although it did reduce the
amount of self-referenced grading reported.

Classroom teachers do not follow many of the recommended practices for


grading (Barnes, 1985; Manke & Loyd, 1990; Stiggins & Conklin, 1992).
Teachers are often uncomfortable with grading (Barnes, 1985). Stiggins, Fris-
bie, and Griswold (1989) postulated three general reasons for the common
discrepancies between recommended and actual practice: Best practices may
be a matter of opinion; recommended practices do not take some of the
practical aspects of teaching into account, or teachers lack training or expertise
in sound practices. They suggested that these reasons should be investigated,
because they each have different implications for action. Additionally, Stiggins
and his colleagues noted that research questions about grading practices are
associated with value questions.
The meaning of a score and the values implied in interpreting and using that
score are intertwined in the concept of validity (Messick, 1989a, 1989b) and are
crucial to sorting out teachers' grading practices (Brookhart, 1991; Stiggins et
al., 1989). Brookhart (1991) suggests that the discrepancy between recom-
mended and actual grading practices and teachers' discomfort over it are

An earlier version of this article was presented at the Annual Meeting of the
American Educational Research Association, April 1992, San Francisco.
123
Brookhart

symptoms of a validity problem: Teachers' grading practices reflect teachers'


consideration of the consequences of grades, sometimes at the expense of
considering the interpretability of grades.
The research questions for this study were designed to address both the
meaning (Question 1) and values (Question 2) issues as well as the degree to
which these questions are related to instruction in measurement (Question 3).
1. What meaning do teachers wish to convey when they assign grades to their
students? That is, for teachers, what is the nature of the construct grade?
2. To what degree are value judgments part of the grading process? What
kinds of value judgments do teachers make when assigning their grades?
3. Do the construct meaning and value judgments underlying grades vary
with whether or not the teacher has had instruction in educational
measurement?
It was hypothesized that measurement instruction would be associated with a
difference in thinking about grade meaning but not in values or social conse-
quences. Measurement courses instruct teachers directly about the meaning of
grades but are largely silent about value issues and the impact grades have on
individuals (Stiggins, in press).

Messick's Theory of Validity Applied to Grading


Messick's theory of validity undergirds the theory about teachers' grading
practices that this study was designed to investigate (Brookhart, 1991). Messick
(1989a, 1989b) frames the validity question in terms of both meaning and
values by identifying two facets of validity: the intended function for the score
(interpretation or use) and the source of justification (empirical evidence or
social consequences). He forms aprogressive matrix by crossing these two facets
to form a table. The four resulting cells correspond to four aspects of the basic
validity question, which he phrases, "To what degree--if at all--on the basis of
evidence and rationales, should the test scores be interpreted and used in the
manner proposed?" (Messick, 1989b, p. 5).
Construct validity (CV) is invoked when score interpretation is supported by
empirical evidence. Construct validity forms the basis for all further consider-
ations of validity evidence, which build on this foundation. When scores are to
be used for some purpose, the relevance and utility (RU) of the scores to the
purpose must also be considered, but these depend on the meaning of the
scores (construct validity). When consequences of score interpretation are
considered, value implications (VI) come into play, but these too depend on
the meaning of the scores. When scores are used for some purpose, validity
evidence must come from both empirical evidence and social consequences
(SC), and the last cell in Messick's progressive matrix becomes CV + RU +
Vl + SC (Messick, 1989b, p. 10).
In this theory, validity is a unified concept based on the soundness of
construct validity. The matrix categories allow for adding the emphases that
become important as scores move from signifying an idea in someone's head
out into the world to be used for different purposes. Applying Messick's
progressive categories to grades, one can consider the degree to which ap-
124
Teachers' GradingPractices

praisal of values and consequences of grade use is part of teachers' reflections


about their grading practices. Placing teachers' reflections into this continuum
of progressively broader considerations of validity evidence is supported by
Messick's own reasoning about validity as a unitary concept based on construct
validity and the progression
as one moves from appraisal of evidence for the construct interpretation per
se, to appraisal of evidence supportive of test use, to appraisal of the value
consequences of score interpretation, and, finally, to appraisal of the social
consequences--or, more generally, of the functional worth--of test use.
(Messick, 1989b, p. 10)
Additional support for considering teachers' thinking about evidence for the
validity of grades on a continuum from narrow to broad comes from the fact
that, in grading, the distinction between interpretation and use is more blurred
than for most other educational measures. Printing the grade on a report card
and sending it home with a student certainly constitute a use of the measure,
and grades are only calculated at all because they are due at the end of the
report period to be used for this purpose. In grading, the use function drives the
interpretation function. Teachers' decisions about what to put into grades work
backward from this use: Given report cards, teachers must decide what to put
into the grades that go on them. Thus, a matrix with two dimensions (Messick's
twin facets, function and source of justification) can be collapsed into unidimen-
sionality, when applied to teachers' reasoning about grades, as the distinction
between the functions of interpretation and use becomes almost moot and the
distinction between empirical and consequential sources of justification be-
comes salient.
The following questions describe teachers' reflections about the validity of
grades at the four levels. Empirical evidence for the hierarchical or progressive
nature of these categories was found in teachers' reflective comments. If
thinking in one category was evident, then usually the comment also included
thinking in all categories up to that point. In the present study, notation will be
e..hortened to the last term in the progression. For example, Messick's notation
for Category 2 is CV + RU; in this study, the notation R U will be used for
Category 2. Following, then, are Messick's categories applied to grading, their
notation in this study, and one teacher comment from this study that typifies
the category, with analytical assumptions in brackets.
1. Construct Validity ( C V ) - - W h a t does the grade mean per se?
I feel strongly that effort is a very significant part of a grade [CV--this
teacher thinks effort is part of the construct grades are designed to
measure].
2. Relevance/Utility (RU)---What does the grade mean when it is assigrled
to a student? The following is an example of a response to a scenario
about an A student not turning in a project worth 25% of his grade.
I would give him the benefit since he did get an A average on his quizzes
and test grades [CV--this teacher thinks quiz and test scores are what
grades are designed to measure], I wouldn't give him an F because the
125
Brookhart

project was only 25% of the grade. The C is bringing his grade down
[RU--this teacher, who chose to assign a C instead of an F (straight
average) or an A (ignores missing student project), is looking for
corroborating evidence for the level of work reflected in the grade
assigned].
3. Value Implications ( V I ) - - W h a t does the grade m e a n when it is assigned
to a student, and of what value is it? H e r e is an example of a response to a
scenario about a student getting a B on the first test and improving to an A
on the second test.
Assuming the average came out to a B [CV--this teacher thinks a grade
is an average of points], I would give him the B. I don't think he should
receive a higher grade simply because he scored higher on the second
test [RU--the teacher is considering the relevance of the evidence].
What if the situation were reversed, and someone got an A first, and a B
second, and wouldn't get the A because they didn't improve. That
wouldn't be fair [VI--the teacher is considering the fairness of her
treatment].
4. Social Consequences ( S C ) - - W h a t does the grade m e a n when it is as-
signed to a student, and of what value is it, and what will happen because
of it? This is an example of a response to a scenario about a low-ability
student who worked hard but missed a passing grade by 2 points.
For some children learning science doesn't come very easily ]CV--this
teacher thinks a grade reflects learning]. If I felt Barbara was really
trying her best [RU--the teacher would look for corroborating evidence
for this], then I think she deserves a passing grade [VI--the teacher is
thinking about values, in this case about what merits or deserves special
treatment]. This would help motivate her to keep working [SC--the
teacher thinks a potential consequence of her grading decision would be
continued student work].

Method
Sample
T h e population of interest was practicing classroom teachers. Students
enrolled in MSEd classes at Duquesne University were selected for sample if
they met two criteria: if they were (a) certified teachers who were (b) currently
employed in classroom positions. The sample included two groups: (a) students
from educational m e a s u r e m e n t classes, surveyed at the end of the course so
that they had received m e a s u r e m e n t instruction, and (b) a comparison group of
teachers, surveyed in other MSEd classes, who thus had received no measure-
m e n t instruction. The m e a s u r e m e n t course is required for master's programs in
special education and in reading, but it is not required for master's programs in
school administration and supervision. Thus the major difference between the
m e a s u r e m e n t and n o n m e a s u r e m e n t groups, which were similar in most re-
spects, was a difference in program. The m e a s u r e m e n t course included instruc-
tior both in classroom assessment and in the interpretation of standardized test
results as appropriate for classroom teachers.
126
Teachers' GradingPractices

Data were collected in spring, summer, and fall terms in 1991. Final sample
size included 84 teachers (80% were female), 40 with measurement instruction
and 44 without. Years of teaching experience ranged from 1 to 25 with a
median of 5. The sample included teachers from all grade levels: K-4 (32%),
5-8 (30%), 9-12 (23%), other (16%).

Instrumentation
The instrument presented scenarios about grading and multiple choices for
responses about what the teacher would do in each situation. An open-ended
question, "Why did you make this choice," asked teachers to explain their
reasons in their own words. The scenarios have been used in other research
about grading practices (Manke & Loyd, 1990) to investigate what achievement
and nonachievement factors teachers used in grading. The open-ended ques-
tion was added for this study to probe the reasons behind the choices. Because
one of the purposes of this study was to investigate the degree to which value
judgments are part of teachers' grading practices as well as whether knowledge
about the many uses of grades is part of this process (Brookhart, 1991), a
coding and scoring scheme based on Messick's model of validity was used.
Messick's consideration of interpretation and use, evidence, and consequences
formed the theoretical basis for the research questions and, therefore, for the
analysis.
A pilot test in spring 1991 investigated the feasibility of this scoring scheme.
The pilot test also compared the responses to two versions of the instrument,
each of which included three scenarios about effort commensurate with ability,
two scenarios about missing work, and two scenarios about improvement. The
version that prompted the greater variation among responses was selected for
this study. The seven scenarios from the main study version are included in
Tables 1 through 3.
Three reasons support the validity of using these three scenario types
(effort/ability, missing work, and improvement). First, these scenarios repre-
sent common grading contexts for classroom teachers. These contexts each
require a decision that has at least two options, thereby allowing teachers'
reasoning to show. Second, these scenarios were used in prior research that
investigated teachers' use of nonachievement factors in grading (Manke &
Loyd, 1990). Another of the purposes of the present study was to probe the
reasons behind these practices. Third, empirical evidence for construct validity
was examined. The scenarios behaved as expected in eliciting responses. For
example, effort/ability scenarios drew more responses about grades as payment
for work done, and missing work scenarios drew more responses about treating
students justly.

Analysis
Open-ended responses were coded, according to Messick's (1989b) catego-
ries, as to whether the reason given considered construct validity (CV),
relevance/utility (RU), value implications (VI), and/or social consequences
(SC). Examples of this process may be found in the section above, Messick's
127
Brookhart

theory of validity applied to grading. Two raters coded 588 written responses
with an agreement of 97%. The 3% of disagreements were resolved by
discussion. A score was assigned to each written answer; scale value depended
on the degree to which comments progressed from a concern with only the
grade itself to a concern with values and consequences (1 = CV, 2 = RU,
3 = VI, 4 = SC). A response was scored according to the highest level reflected
in the comments. Internal consistency reliability (alpha) for responses scored in
this manner was .65, and the scale was additive. Generalizability (based on
seven items) was .65 for relative decisions and .64 for absolute decisions; the
research questions for this study called for both kinds of interpretations.
In this article, choices will be used to mean the multiple-choice responses to
the scenarios, indicating what the teacher would do in that situation. Scores will
be used to mean the scale values of the open-ended responses to the question,
"Why did you make this choice," after each scenario, indicating progressive
reasoning according to Messick's categories. Descriptive statistics were calcu-
lated for the choices and scores for each item. Crosstabs and chi-square tests
examined whether item score varied by answer choice. Crosstabs and chi-
square tests also examined variation in choices and scores by measurement and
grade level. Mann-Whitney tests investigated whether the degree of concern
for values and consequences, indicated by scores for each item on Messick's
scale, differed by measurement instruction.
After each answer was scored to indicate the degree of consideration of
values and consequences, qualitative analysis was done within each of the four
Messick categories, using the constant comparative method. In this method,
data are examined comparatively, and then related indicators are labeled as to
the class they represent (Strauss, 1987). For example, all of the comments
scored 1 (CV) were compared and sorted as indicators of 11 concepts (the
subcategories in Table 4); these concepts were then organized at a higher level
of abstraction into the four categories in Table 4.
CV and RU statements were examined to inform research Question 1, what
do teachers have in mind as the meaning of the construct grade. VI and SC
statements were examined to inform research Question 2, what value judg-
ments do teachers make regarding their grading. Using both quantitative and
qualitative methods on the same data allowed analysis of both the degree of
attention teachers paid to the uses and consequences of grades and what they
thought about these issues. Neither response choice, nor scores indicating level
of thinking, nor qualitative response categories indicating substance of thinking
differed significantly by the grade level at which the respondent taught;
therefore, grade levels were aggregated for analysis.

Results
Scenario Choices
Tables 1 through 3 present the choices respondents made to each of the
seven scenarios. All three of the scenarios in Table 1, about working to ability,

128
Teachers' Grading Practices

had clear m o d a l responses. In two of the scenarios, w h e r e working below ability


did not result in a failing grade, the m o d a l choice was to assign the grade based
on the quality of work. In B a r b a r a ' s scenario, w h e r e a student tried hard but
missed a passing score, the m o d a l choice was to assign a D instead of the F that
reflected the quality of her work.
R e s p o n s e s differed for the scenarios about missing work (Table 2). R e s p o n s e
was mixed to the scenario about an A student's missing project. A b o u t half the
respondents would assign a 0, resulting in an F average, and about half would
assign a C, counting off some for not turning in the project. T h e scenario a b o u t

Table I
Results for ~.narios aboutWor1~m~to Ability

Youare a sixth gradete~er of a class that is groupedheteroqeneously. C~ris,one of the


stater~ in your class, has hi~ mMemiceilit L as sbe~ by her previouswork, test
results, reIm'tsof other Leachers,aM youram o~.rvation, ks you lcokoverher workfor
the qradingperiod, yourealize ~ thins: the quality of her workis Coveaver~ for the
class, ~ttJ~e~k ~ ~r~t the best1~atsl~ddo. ~eeffort sbe~as s l ~
has beenmi~ml, Mr, became of her hi@ ability,her workhas bee, reasonably~ood. h
~his sitmti~, ~ would

n %
68 81 GrMe C~ris on the ~aliLy of her work in co~arison to the class,
without being concernedabout the amountof work that she could have
done.

13 16 Lover (~is'S gra~ becatlseshe did ~c makea seriouseffortin your


class; she couldhavedonebetterwork.

3 4 Assign(~risa hi~er ~ado to encourageher to wo~ MrdoL

I~y did you W~ethis choice? medianscore = 2

R e s e t s wit~lowerscoreswererare likely to pickchoico#1, whiler e s e t s


with hiOer stores werelore likely to pick cbeice #2
(X= = 14.61, dr:4, p=.02)

Score

Choice 1 2 3 4 Total

0.~ 33 7 19 9 68

I,ouergra~ 3 1 3 6 13

Raise grade 0 i 0 2 3

36 9 22 17 84

-contir~ co the mfc [mqe-

129
Brookhart

Table I, e ~
~esultsfor Scemri~ abut ~rkin; to ~ility

~ are a ~ qradeseimeetead~ o~a el~s ~e~ is gr~apod~tu~jw~sly. Sartma


is ~e of !~r st~t.s whohas lo~ ability basedo~previousperf~ee a~da~ration of
formerseimeeteadm~. ~zouql~ t ~ gradingperiod, I~ ~.ice that she bs ~ked
vet/hard. S~e ~as t~'~ed in her assi~m*~ts on t.i~e and Im oftsn ~ e to/o~ for extra
help I ~ e tests. Horaveragefor this ~radingperiod is 2 pointsbel~ ¢~at is neededf~
a D~ the gradingscale ~a use. In this situation, you~tld

nt
79 94 RaiseIk~'b~'s ~ and ass£~ her a Dfor the effort she has sllo~m.

5 6 ~rade ~i~ara accordingto the uorkshe has doneand assignhar an F.

I~/did y~u~ahatl~s c~oiee? ned~ansc~e = 3

Y~ are a fifth gradeteacher of a ela~s of nixedability le~ls. S~/, a student in your


class, aplmrs to haveaveraqeability to do the requiredwork. In evainati~ ~ ~rk for
this q~li~ period, Youaserve that she did mt do the ~ she is capableof; she could
have~ebett~. She,lmever, mmgedt~metutatisrequimtforhet0getaC. In
this situation,you w~ld

n t
4 5 Assi~ Sandya l~e~ qrade because she couldhaveput in ~re effort in
yoorclass and couldhave~ e better.

80 95 ~ i ~ ~ e gradelined ~a the q~lity d her ~k, witlmt taking


into aee0unttie stunt 0f ~lt that ske todd ~ve possiblyput in.

0 0 ~ i ~ Sa~ a ~aer gradeto emour~ ~ tz ~ har~.

Waydid y0~rake tl~ ct0iee? m~innsc~e = 2

Note. From Manke and Loyd, 1990, p. 39. Adapted by permission.

a D student w h o did not turn in homework, however, received a clear m o d a l


response: Use a straight average, and assign the F.
R e s p o n d e n t s f o u n d the scenarios about i m p r o v e m e n t (Table 3) difficult to
answer, and they said so in their written comments. T h e m o d a l choice differed
by scenario. In the case where i m p r o v e m e n t was f r o m an F to a low D,
r e s p o n d e n t s chose to assign a D rather than the straight average (F). Their
c o m m e n t s indicated this was to support the student. In the case w h e r e the
i m p r o v e m e n t was f r o m a B to a low A, the modal choice was to go with the
straight average (B).
This B - t o - A - i m p r o v e m e n t scenario was the only o n e with a difference in
response choice by group. T h o s e with m e a s u r e m e n t instruction were m o r e
likely than those without to assign the grade of B, while those without had a
higher probability o f raising the grade to an A. Only 27% o f both groups,
however, would raise the grade (Table 3).

130
Teachers' Grading Practices

~ie 2
~ult~ for Scor~tiou@out~ i ~ Work

In ~ seveathq~de socialstudiescla~, reportcardqra~ wre basedoucpizzes,tests,


ar~ an out-of-classFoje,t ~id~ cmft~l M 2~t of t~ ~de. ~c/etzi~ ~ A ~,~je
on his quizzesand testsbut ~ not tur~ in ~ projectdmpite frmlm~trai~s. In
t~issit.ati~,~

n t

i I Excludethe li~in~projectand ~ssi~ ~ an ~.

4251 k ~ i ~ Terry a 0 ~or the pro~-t anclan F on his report card beeat~e h2s
a~ r ~ d be 681.

aO4~ mi~ ~ / a c, ea~ti~ off ~ for ~ tm~j h the project.


1~nydid youike this choiou? medianscm'e= 3

You are the~ql~ teache~of a classof ninthqnderawithwry~ abilit~levels. Durinq


~ period,the ¢a~em' ~ are based on ~ z m , tests,~ bamork
a~igmeats ~id~ involveworkingout em~ises. Kellyhas not turm~ in any Immmtk
~ i ~ m ~ ~ite yourfrequentr ~ , ~ . ~ ~ ou ~ ~ z ~ ~ve rm4ed fr~ 65~
to 75~,and he receiveda Do~eachof the tests, In tl~ sit~ti~, ~ mtld
n t

71 86 k~i~ Kellya ~le of 0 for the hammmrka~i~lats and includethis


in the ~de, thusgivin9his an e~t~je d r for the ~ p~iod.
Ii 13 l~orethe missingI~w0rk as$iqm~tsand assig~Kellya D.
1 1 I~re the lissi~ ]~m~k ud ~ i g n Kelly a C.

Waydid youme this choice? medianscore= 2

Note. From Loyd, 1991. Adapted by permission.

Response choices for two scenarios differed by the degree of reasoning about
values and consequences displayed in the written comments. In both cases,
respondents were more likely to invoke arguments about social consequences
when deviating from grading on a straight average. This principle held whether
the deviation was against the popular choice (Table 1, Chris's scenario) or with
the popular choice (Table 3, David's scenario).
Substantive Comments
Table 4 presents the results of a qualitative analysis of all the written
responses that were scored 1 (CV)---that is, where the thinking reflected in the
comments indicated simply what the teacher thought grades meant. For most
of the teachers, a grade was a form of payment to students. Students earned
grades or points for the work they did. According to the teachers, grades

131
Brookhart

functioned as the coin of the realm. Fewer comments said grades were to
indicate academic achievement, either absolute or self-referenced.
Table 5 presents a qualitative analysis of all the written comments that were
scored 2 ( R U ) - - t h a t is, where the thinking reflected in the comments indicated
both what the grade meant and where the teacher would look for confirming
evidence to use with that grade to send home with a student. For most of the
relevant evidence, teachers would look to other aspects of student perfor-
mance: whether a student had shown effort, whether other academic measures
agreed with the grade, and whether there were mitigating circumstances for
classroom performance. In a smaller category of responses, some teachers
indicated they would look for evidence about whether they had made an effort
to remind, encourage, and notify the student about progress.
The written responses analyzed in Tables 4 and 5 were limited to statements
that defined what grades were supposed to mean and where evidence to use a
particular grade might come from. These CV and R U responses illustrate the
meaning grades have for teachers. The remainder of the written comments,

~le 3
~sults for S~r~ri~ ~ I~ro~t

You are a hi~ schoolal~ra ~ . h ~ur classof ~menl and ~cad~ic tra~ studmts,
~ou givetwo tests ln ea~ gradingperiod. David'sscoreon the firsttestwas an L Onthe
secondtest, he obtai~M a low D. In this sitmtion, you z~M

n %

23 29 Assign I~vid an overall gr~e of F based on the a~erage of his


~rfonm~e in the twoe1~.

56 71 ~ssi~ David an overallgade of D becausehe showediz;z~mmt h his


[Mrfoz1~a]Ee.

i~y did ~ lake this ~oice? indiansc~e : 2

with loler scores~e Note likely to pick@oi~ ll, ~ile r ~ c l ~ s


with hi~ Bc~es ~ ,.re li~/y to pid~ ~oim 12
(I' : 17.95, CE~-3,F.O005)

Smre

~i@ F 14 4 3 I 22 (ore respond~ rode


no zritta c~mmts)
~t~i~ D 10 18 7 21

24 22 10 22 78
(continued)
Note. From Manke and Loyd, 1990, p. 40. Adapted by permission.

132
Teachers' Grading Practices

?able 3, eou~nued
Results for Scenariosabout la~rove~.nt

You are a biology tsaeher of a hi~ school class which consists of stodentswith varying
ability levels. For ~ class ~u give two exams in each term. As you emlputs Bernie's
grade for this tam, you see @at on the first~,, he obtaineda score eqivalentto a B and
on the second exam, a low k. In this situation,you would

n %

58 73 Assi~ Bernie an overallgrade of B which is the aver~jeof his scores


on the two enms.

21 27 AssigmBernie an overallgrade of A, notingthat there was illprovm~t


in hie perform~e.

Way did you make this choice? medianscore : 2

Respondentswith measurementinstructionwere lore likelyto pick choice|I, while


respondentswithoutmmsur~mmt instructionwere more likely to pick choice 12
(X= = 4.43, dr=l, p=.04, Yarns' correctim~applied)

~tructins

Choice ~o Yes Total

Assign B 24 34 5B

Assign A 15 6 21

Total 39 40 79

For each of the improvmnt scenarios, 5 out of the 84 subjects (6t) choue not to pick an
answer, even thou~a some of ~ rnspoedentsmade written~ .
Note. From Loyd, 1991. Adapted by permission.

coded VI or SC, included comments about the meaning of grades and evidence
for that meaning but went further to also include comments about the
consequences of that meaning, either by referring to the value implications
inherent in grade interpretation or to the social consequences expected for
grade use. The results from the qualitative analyses of these responses illus-
trate the value judgments and reasoning about consequences teachers apply in
their grading practices.
Table 6 presents the qualitative analysis of all the written comments that
were scored 3 (VI)--that is, where the thinking reflected in the comments
included value statements about grade interpretation. The single word used
most often in these comments was fair. These and a cluster of related comments
were all appeals to justice. Treating all students equally, enforcing the require-

133
Brookhart

Table 4
co~str~ ~IMit~ Ca~ies
(CommentsiMioate~at the teac~er Lakes as evi~moeaboutt~ grMe's ~ h ~ )

(n teachers, n comets)*
SuMa~ (n Leachers, n commts) Samplecomeg:

Gra~eis l~pent for ~X em (61,n~)

lleqoLremts(29,:15) If b0mv0rltwaspart of Lbe~Trade,it has to e0u,t.

~rk ~ee (~,~3) Grades am a re~lecti0nof a student's Ieve/ of


wor~ oo~leted.

Effort (23,~) ~ ioclede,/~every9rMe, ,,~ eff~ Fade.

n~ned (~,~5) Be .,stbe g/yenthe gradebe earned.

Points (~,,5) I grade lined on ~0iats. If bet ~Lqts ~ine a eert.aia


grade, she ~J.V~ tlmtgrade,

ne~ a calcu].ated score (2~,29)


She0vera.LIam'aqeof his gradeis equira/entto a
B.

grades (6,6) ~f she q0t a c t ~ t/ut,s bet grade.

Grademea~aca~o ac~imme~t(2~,n)
Peffon~e (19,27) ~ades are a refleetfonof ~h/~.

Jmem~, ~ ~/n tograspaM ~mMM the-~ia/.

(12,n) ~e ~coredhigh. I vo~la~eel~t be is ~ .

~ili~ (3,4) Nay~e ~ ~ ~ r k ~ to his on ~ili~.

* Total n of t e ~ t ~ = 84
pouilale ~ (84 teachers 17 ~ i 0 s ) = ~8

ments, and doing it consistently were the themes o f these fairness and justice
comments. A smaller n u m b e r of comments were value statements that had to
do with mercy, although most of these were also couched in legalistic terms;
rewarding effort makes a statement to students that effort is important.
Table 7 presents the qualitative analysis of all the written comments that
were scored 4 ( S C ) - - t h a t is, where the thinking reflected in the comments
indicated how the teacher considered the consequences of giving a particular
grade. Most of these c o m m e n t s referred to school consequences, although not
134
Teachers' Grading Practices

Table 5

(Cmmt~ ~m~te what~ ~ ta~ass cor--~atLmj evMmoe


~boutt~e mef t~e g~a~to ~ c r ~ a ~ t )

~Ce~T (n te~e~, n ame~ts),


sutmte~ (nteac~, nc(me~) ~ecoae~

~ia~ee frm smle~tper[mmce(5~,t05)


Evi~mce of effort (33,44) If she has ~ for extra ~eip, she certainly
triM.

Othe~studies(14,17) I ]~L~m ymkee to tee isto ez=eet a//are~


a a child's ~ort-m ~ oas.
~ibi~ity (z~,13) 8e was 9ivea an 0pl~rtmty to c0~lete the
~ j , ct.

Ca[mbiliW(ll,ll) SinceChrisIrespr~re~h t~ past tMt s/~ is e~m~tleof


doim/,.requalitynrt, I wou/dexpectthatof her.

l~viomseorm (IO,ii) ~AstMe~scoresAonlo~ tests.

Socialevince (8,9) r'a tz7 to astatiisb w/~s/e m m ,orti~ to ~


l~mtia/-tr~b/e at/me, pratlemwit~frieads,
etc.

~videBceexLemalto LMstMmL (18,21)

Rai~n/Encour,~,~ (11,13) aett~ sse/to So,.re ~0rt is my~ ,


as/mr teacher.

~ for ~ (8,8) It ~d J, luck,~t ~rm,~.

* ~tal n of taclm~ --84


TotalpoasiIolecmmemts(84~ X 7 scemrios)= 588

all of these were academic. The two largest categories of school consequences
considered were changes in student effort and attitude. A good number of
comments referred to consequences beyond the school, and most of these were
considerations of how the grades would affect student self-esteem and confi-
dence.

Effects of Measurement Instruction


The effects of measurement instruction on the amount of thinking at each of
Messiek's four levels were investigated by both Mann-Whitney and chi-square
tests on the scores for written responses to each of the seven scenarios. None of
the Mann-Whitney or chi-square tests on the level of reasoning scores for the
written responses indicated a significant difference by instructional group.
135
Brookhart

Table 6
val~ Implio~tioncat~ori~
(cm~ta ia~li~at~hn~ the ~ o~ida~ the ~ of intaprct~ a ~r~)

cat~ory (n toacm~, n eaerS)*

subeateqou (n teachers,n esments) Saltplecommmt

.~peal to whatis jun (6o,lo8)


Fair (30,42) When questionedabout a grade, I can show I was fair to all
the ~mdents.

Cluaneegivm [21,28) Terry Im~ the oriteriaabed of tim.

l~erve~ (16,19) ~ dida't do the prod:t, so he didn't deservean L

~ll~i~j ~idolir~ (7,13) If ffr~tlJ~/oriteriaare clearly k ~ lr1 ~eats,


they slm/d be f011ated.

Ri~/~:ong (2,3) It'snot rightto ~i~ a h/q~ Qr l~r ~r~e on t~ ~is


of wMt she s/rm/d or eamtd/~,edoue.

Consisting/(2,3) If l eountedeveryo~'s projectand rwxrttan a ~rM% l ~oald


h turn ~ t ~'erry'sin tM sam waF.

Appealto whatis mreifal (n,41)

~inforem~at for effort(2Z,251 fo ~ e ~a" r~.6~foreeae~tfor her effort.

~eaeher'sbeliefs(10,13) I do not believe~ ]o~er~ ~l'~d~.

(Ml~i0n (2,2) I ~ rids i~ really l~q0eriti~al [afta" a tmlm~


aehi~eaant],~ I ~ou/dmt a~si@ an F to a ~tMent ~ith

Student~ (i,i) Onlessthereis a validreason,/ ~ou/dgive a O.

* Total n of ~ = 84
Totalpossibleemmmts (B~ maelm~ x 7 s~ari~) --588
0~e teael~'s e~lmnt (If you can get a hi~ gradewith~ ll~kinq,lore ~ to you!)was
s~ed as a valuestatmnt but not includedin the eat~ory coding.

T h e s e quantitative analyses indicated that t e a c h e r s with and without m e a s u r e -


m e n t instruction did not differ in the level of thinking about g r a d e interpreta-
tion and use.
A b r e a k d o w n of the qualitative categories by m e a s u r e m e n t training category
indicated that the groups did differ in the c o n t e n t s of their thoughts but only at
the C V and R U l e v e l s - - t h a t is, only in w h a t they thought a b o u t the m e a n i n g of
grades. T w o differences e m e r g e d (see T a b l e 8). Describing the construct grade,
t e a c h e r s with m e a s u r e m e n t instruction were m u c h less likely than those

136
Teachers" Grading Practices

Table 7
S~ial Conseque~ Categories
(co~ts indicatebow tha teadaerconsidersthe cow~uenc~of using a grade)

C~ory (nteam, n onmment~)*

Subcategory (n teachers,n cmea~s) ~a~le comaat

Schooleon~ea~s (32,54)

harder (20,29) e), assig~L~] tJe O/'~ F], tJ~ By encoaraqeOarid to try
harder and gi~,e him ~/~ntive ~o succeed.

Attitude(i0,i0) If I faJ./h/m,be lay not studyfor fi/tm'et~'ts a~dql/~ses


take tha at'Lit'ode '~ot~ counts, why do it.'

Fubaregrades (9,9) ~yha iS sha sees a n sha'11 realize if sha did some~aetork
it ~ald comeup.

Continueto Lmprove(4,¢) This mig~ ha a ~tivati~/factor to emime to


apr..
Work in otherclasses(2,2) mighttAia if ha etnld~t naF tit~ not d0i~/
it in ~ class that ha ~ get ~Wl with not
doi~ it in other classes.

Consequencesoutsideschool (30,44)

Self-estea (22,31) ,I, & would do JEe Zor ~'n/e's sel£-esteei and conIide~e
leve/.

Learnaccountability(8,9) Chrisshadd im0¢t~t sha has to ¢0rkbardto the


best of bar ability...t~ s/axldbe enforcedto
help her i~ adultlife.

Parentcoamplaint(4,4) BecauseI worldnot rant to facean irateparent.

* Total n of teache~ = 84
possible eomBts (84 ~ X 7 see~ios) -- 588

without to talk about a self-referenced meaning. Within this category, all of the
ability comments and two thirds of the improvement comments came from
teachers without measurement instruction. Describing where they would look
for confirming evidence to use with the grade, those with measurement
instruction were much less likely than those without to talk about evidence
other than student performance. These two differences are consistent with the
content of an introductory course in educational measurement and are also
consistent with this study's hypothesis that measurement instruction makes a
difference in how teachers think about the meaning of grades but not in the
amount or kind of thinking they do about the value implications and social

137
Brookhart

Table 8
~ae Relationship of lkmurement ImtnctJon to emerita about the
andValuesof GradeInterpretatioe and Ore
(n teael~, n emem)

Iea~elent lie llea~ete~

Gradeis pa~emtfer workdone (28,52) (33,72) (61,124)

G~ademeansa calculated soore (12,16) (12,13) (24,29)

G~ademeansacadetLc ac~evmeeL (11,18) (11,13) (22,31)

~r~e has self-referencedmeaning (4,5) {Ii,12) {15,17)

Evidence from ~:udelrLperfermance (29,48) (29,57) (58,105)

*~idence~ernal to the student (5,5) (13,16) (18,21)


C(lll~l~~ool ~ADEVAI~gB~ SOCI~C~SKgI~
Aplmlto~at is just (30,53) (30,55) (60,I08)

Appealtowhat is~etcifal (14,20) (17,21) (31,41)

School consequenc~ (18,33) (14,21) (32,54}

Comequencesoutside sehoul (16,23) (14,21) (30,44)

(40 teaem) (44 teaeam) (84 teachers)


*Differencenotedbetweenmeaserementandnon-measurmnt

consequences of grades. Note also that only two categories of thought about
grade meaning differed with measurement instruction. The image of grades as
classroom currency was clearly envisioned by both groups of teachers.

Discussion
The following discussion considers the issues raised by these results. A clear
limit on the findings of this study is the nature of the sample. All of the teachers
were master's students at Duquesne University. There is also a confounding
with degree program. The findings do not generalize beyond this institution.
The study's findings are potentially important enough to invite replication with
a broader population.
This study contributes to the theory of educational measurement by helping
to further define and describe the constructs teachers use in grading. The study
helps ground the content of educational measurement courses in the classroom
context. The results of this study support the conclusion of Stiggins and his
138
Teachers' GradingPractices

colleagues (1989) that lack of measurement training is not enough to explain


the discrepancies between recommended and actual grading practices. This
study also supports the theory of Brookhart (1991) that teachers think about
the uses and consequences of grades in their grading practices. Teachers do not
follow recommended grading practices partly because there is a conflict
between the recommended practices, which concentrate on grade interpreta-
tion and meaning, and concerns with the uses of grades. For some of the uses,
most notably for the development of student self-esteem, both achievement
and effort are relevant constructs. Recommended practices that would limit
grades to measures of achievement would make more sense if teachers could
guarantee that grades would only be used as measures of achievement, and
they cannot do this. Perhaps recommendations should be amended to take into
account issues of value and social consequences.

Meaning of Grades
The meaning of the construct grade, for teachers, is closely related to the idea
of student work. Teachers used phrases like "the work (s)he did" to describe
grades or grading. Performance or perform were words often used, in the sense
of something done or accomplished. To teachers, grades are something stu-
dents earn; they are compensation for a certain amount of work done at a
certain level. Thus, achievement is part of the construct but not the whole of it.
Among teachers, a more common image than achievement is that of grades as
currency; this image is evident in teachers' frequent use of the words earn, work,
and perform. The teachers' emphasis is on the activities students perform, not
what grades indicate about theoretical achievement constructs.
When deciding what grade a student has earned, the teacher functions as a
judge of student performance. While measurement instruction seemed to help
teachers see that self-referenced grades were not very interpretable, it did not
change their minds about the image of grades as pay. This may be the key to the
relationship between grading and classroom management, which Stiggins
(1988) points out needs to be investigated. Grades seem to be used in a kind of
academic token economy, and they function in classroom management as the
reward for work done.
It is worth wondering whether the concept of grades as pay is related to the
thinking teachers do about value issues in grading, especially the issue of
fairness. On the face of it, the legalistic version of fairness represented in the
teachers' comments does not seem very lofty. Is this because teachers must
maintain credibility with students who are at a legalistic level of moral
development? Or is it, at least in part, related to some of the economic
concepts of fairness: equal pay for equal work, even perhaps a piecework or
hourly wage notion? An economic interpretation fits with the use of grades~as
classroom currency and might help explain the teachers' generally legalistic
level of ethical thinking. A legalistic orientation, in turn, is consistent with
many current classroom management practices. This economic/legalistic mech-
anism is a potential theoretical framework for investigating the role of grades in
classroom management.
139
Brookhart

Values Invoked in Grading Practices


The degree to which teachers engage in value judgments was measured with
the quantitative scoring scale. All of the scenarios evoked responses at all levels
of Messick's model for considering the interpretation and use of scores.
Comments about the value implications and social consequences of grades
were as frequent as comments about the meaning of grades. Manke and Loyd
(1991) also noted teachers' concerns with fairness in the grading process.
Teachers do think seriously about these issues.
When considering what consequences a grade will have for a student, the
teacher functions as an advocate for the student. Concern about this function
does not differ for those who have measurement instruction. This is consistent
with the study's hypothesis. Measurement instruction can be expected to clarify
teachers' concepts of the meaning of grades, but there is no reason to expect
that measurement instruction will change thinking about values and social
consequences. These thoughts are more closely related to the altruism that
motivates people to enter teaching in the first place (Brookhart & Freeman,
1992). Child-centered orientations are powerful and pervasive among teachers.
They may help explain teachers' discomfort with the grading process (Barnes,
1985). If a teacher's first priority is to be an advocate for the student, concern
about consequences to students may be expected to have more influence on
grading practices than concern about interpretability. This priority may help
explain why, in this study and others (Friedman & Manley, 1991; Stiggins &
Conklin, 1992; Stiggins et al., 1989), teachers report considering effort as well
as achievement when they assign grades.
Bishop (1992) suggests that the dual roles of advocate and judge are not
compatible. He recommends that teachers yield the judging role to external
assessment in order to function more fully as advocates, coaches, and mentors
to their students. This recommendation would require a change in current
classroom structure if grades are indeed part of classroom management. In any
case, teachers would still need assessment information, to identify levels of
success and areas for improvement, in order to function as coaches.
There is a double standard of just deserts: An average or above student gets
"what (s)he earns," while a below-average student gets "a break" if there is any
way to justify it. The difference is in how the teacher perceives the student and
reflects the teacher's advocacy function. One such perception is whether the
student is academically able. For Chris's and Sandy's scenarios, in which high
and average students did less than their best, teachers would assign grades
based on a straight average. But for Barbara's scenario, in which a low-ability
student had made every effort, teachers chose a D rather than an F. Another
perception is whether or not the student is trying. For Kelly's scenario, in which
a D student was missing homework, F was the grade of choice. But for David's
scenario, in which a student's performance improved from F to D, D was the
grade teachers would assign.
Terwilliger (1989) also sees the pass/fail decision as distinct from the
assignment of other grades but not in the way the teachers describe. TerwiUiger
recommends that pass/fail decisions be based strictly on performance on tests
140
Teachers' Grading Practices

and quizzes that measure minimal objectives. This recommendation is based on


equating grade interpretation with achievement of basic expectations; in
Messick's terms, Terwilliger is concerned with construct validity and interpret-
ability. While the teachers in this study were concerned with construct validity,
they also considered value implications and social consequences and reasoned
in this manner to arrive at exactly the opposite conclusion. It is in pass/fail
decisions that the consequences are most grave for the student, and so it is
precisely at this point that teachers were most likely to let mitigating factors
intervene in their decisions. "I could not fail a student who was trying" because
a student who "works hard" does not "deserve" to fail. Briscoe (1991) suggests
this issue is larger than the psychology of individual teachers. One of the myths
or beliefs in school cultures, she asserts, is "Good teachers don't fail students."
The grading process, as currently practiced, leaves teachers to work out the
compromises they must make in their dual role as both judge and advocate for
their students. Recommended grading practices, suggesting no compromises,
are of limited help to teachers on this issue. This study's results suggest that
teachers mix the roles of judge and advocate differently for students of different
ability, and this in itself is a value-laden act.

References
Barnes, S. (1985). A study of classroom pupil evaluation: The missing link in teacher
education. Journal of Teacher Education, 36(4), 46--49.
Bishop, J. H. (1992). Why U.S. students need incentives to learn. Educational Leader-
ship, 49(6), 15-18.
Briscoe, C. (1991, April). Making the grade: Perspectives on a teacher's assessment
practices. Paper presented at the Annual Meeting of the American Educational
Research Association, Chicago.
Brookhart, S. M. (1991). Letter: Grading practices and validity. Educational Measure-
ment: Issues and Practice, 10(1), 35-36.
Brookhart, S. M., & Freeman, D. J. (1992). Characteristics of entering teacher candi-
dates. Review of Educational Research, 62, 37-60.
Friedman, S. J., & Manley, M. (1991, April). Grading practices in the secondary school:
Perceptions of the stakeholders. Paper presented at the Annual Meeting of the
National Council on Measurement in Education, Chicago.
Loyd, B. H. (1991). Survey on grading practices. Unpublished manuscript, University of
Virginia, Charlottesville.
Manke, M. P., & Loyd, B. H. (1990, April). An investigation ofnonachievement related
factors influencing teachers' grading practices. Paper presented at the Annual Meeting
of the National Council on Measurement in Education, Boston.
Manke, M. P., & Loyd, B. H. (1991, April). A study of teachers' understanding of their
grading practices. Paper presented at the Annual Meeting of the American Educa-
tional Research Association, Chicago.
Messiek, S. (1989a). Validity. In R. L. Linn (Ed.), Educational measurement (3rd ed., lbp.
13--103). New York: Macmillan.
Messick, S. (1989b). Meaning and values in test validation: The science and ethics of
assessment. Educational Researcher, 18(2), 5-11.
Stiggins, R. J. (1988). Revitalizing classroom assessment: The highest instructional
priority. Phi Delta Kappan, 68, 363-368.
141
Brookhart

Stiggins, R. J. (in press). Teacher training in assessment: Overcoming the neglect. In S.


Wise (Ed.), Teacher training in assessment. Hillsdale, N J: Erlbaum.
Stiggins, R. J., & Conklin, N. F. (1992). In teachers' hands: Investigating the practices of
classroom assessment. Albany: SUNY Press.
Stiggins, R. J., Frisbie, D. A., & Griswold, P. A. (1989). Inside high school grading
practices: Building a research agenda. Educational Measurement Issues and Practice,
8(2), 5-14.
Strauss, A. L. (1987). Qualitative analysis for social scientists. New York: Cambridge
University Press.
Terwilliger, J. S. (1989). Classroom standard setting and grading practices. Educational
Measurement: Issues and Practice, 8(2), 15-19.

Author
SUSAN M. BROOKHART is Assistant Professor, School of Education, Duquesne
University, 600 Forbes Ave., Pittsburgh, PA 15282. Degree: PhD, Ohio State Univer-
sity. Specializations: classroom assessment and teacher education.

142

You might also like