Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 17

Fetscher 1

Helen Fetscher
Professor Wolcott
ENC 1102
14 March 2014

An Annotated Bibliography on Standardizing Writing Through Rubrics


Introduction
Although it has been a worldwide conversation of debate for many centuries, there have
been stronger impressions particularly within the past few decades about how to standardize
writing. The rubric debate persists as many argue of whether using a rubric to analyze and grade
subjective criteria such as writing and literature can be properly judged in the standard rubric that
is used in most grading systems. The argument in focus sheds light on the method that is being
used to grade writing, specifically within the primary and secondary educational levels. By doing
so, the shared objective of these references consider the many aspects that are conjoined in the
writing process and how these factors can bridge or further gap the communication between
writer and grader through usage of the rubric along with some possible solutions. Not only
educators, but everyone can find importance in this conversation because it provides a better
understanding of how each individuals writing can be influential and based on a habit formed
method of analyzing writing, people may be missing a great idea from great content when
surrounded by what readers often focus on and remember.
In order to set the scene of how rubrics are utilized, there must be a form of definition to
stand by as rubrics in itself can be rather subjective. According to Linda Mabry, a professor in
the psychology of education insists a rather simple definition for rubric: Rules by which the

Fetscher 2
quality of answers is determined and that they have power to undermine assessment (Mabry
1999). In its simplest form, Mabry addresses how rubrics cause such great conflict of how it
should be used and otherwise suggests that its subjectivity in regards to the grader needs to be
considered lest it takes the power of assessment to minimize feedback to a diluted form.
However, Mabry does not suggest removing the idea of rubrics altogether but to
acknowledge what it does and use it accordingly to the extent in which it recognizes the power of
the graders thoughts and feedback as well. Many who argue have a similar mindset that agrees
to Mabrys definition but propose an alternative form of it such as portfolios, or techniques to
incorporate with the rubric. This annotation includes different methods of writing assessment:
Through standard rubrics, portfolios, teacher case studies, and other possible writing assessment
solutions. In depth, the annotations also provide psychological angles that suggest how students
writings are influenced by environment, testing conditions, etc. and invites results of how writing
is affected in whole. Therefore, these annotations include much of writing assessment through
exams, standardized tests, and many demographic factors that come into play that influence
writing. These authors gain credibility through much education of writing or a background in
understanding the assessment process. Little to no teaching experience or sources that were too
radical with authors of little credibility were unused for an extreme bias was present with little
support.

Baker, Libby; Cooperman, Naomi; Storandt, Barbara. Reading, Writing, and Rubrics. Journal of
Staff Development: Vol. 34 No. 4, pp. 46-49. August 2013. Print.
The writers Libby Baker, Naomi Cooperman, and Barbara Storandt express the
fundamentals of a rubric and how the assessment works in a rather broad spectrum. They

Fetscher 3
follow the concept that writing is on the basis of norming, scoring, and calibrating.
With this mindset, the overall view and purpose for a rubric is to be placed under a set
and single goal of how all the writing should become as one. But an issue that is
approached in its scoring is how Baker et al. researched the Writing Matters program that
was established through Teaching Matters. Each individual classroom may have been
able to set a norm based on what skills were apparent in the students. However, Baker et
al. shows that this may have set a bar that was too low for students or that the rubric may
be placed on a relative scale but not calibrated universally. In other words, a teacher may
have set a norm on the classroom that deems fit but the classroom may be as a whole may
be substantially lower or higher than the set bar. This compares to Turleys work in that it
may be best to use the rubric as a tool but not to view the rubric as beneficial or harmful.
It does not coincide with one another however, as Baker et al. explains the essence of a
rubric must be calibrated at a large or universal level. Therefore, it is by their definition
that a rubric cannot be used as a tool alone for a classroom setting or for clear
communication between teacher and student, but rather for the sake of standardization.

Broad, Bob. Pulling Your Hair Out: Crises of Standardization in Communal Writing Assessment.
Research in the Teaching of English: Vol. 35 No. 2. Pp.213-260. November 2000. Print.
This work may not be as useful as it talks around the key of what is being studied. Bob
Broad discusses the idea of portfolios to be used as opposed to testing by itself which has
already been reinforced with previous research. His studies show some validity through
cases such as with the First Year English (FYE) Program but the stance does not bring
clear results regarding the idea of a rubric but focusing on portfolios alone. This work is

Fetscher 4
something to consider including in research and possibly referring to when considering
the radical position of outing rubrics altogether and devising a separate form of genre to
use altogether. However, because Broad excludes rubrics primarily in writing assessment
but proposes portfolios altogether, it serves a viable purpose but not to the interested topic
of understanding standardization of rubrics.
When considering the alternative genres in English education, Broads case studies and
research will serve more than adequately when looking more into portfolios or if
someone were to decide that rubrics do not function its purpose well enough for proper
feedback, this article by Broad provides enough information to justify that claim.

Brown, Gavin T.L; Glasswell, Kath; Harland, Don. Accuracy in the Scoring of Writing: Studies
of reliability and validity using a New Zealand writing assessment system. Elsevier.
Assessing Writing 9: pg. 105-121. 2004. Sciencedirect.com. Web. 27 February 2014.
Brown introduces a possible type of software that analyzes different aspects of writing
that may be a solution to standardizing writing. The authors first demonstrate the number
of issues that appear when standardizing English and give several examples across the
world for current use of standardized tests. The three authors then use these past
examples and interlink them with each other based on their consistency scores when
grading papers. Overall, they elaborating explain how the program asTTle (assessment
tools for teaching and learning) is used to help standardize work and how it can be a
solution for all rubrics.
This research coincides likewise with Eric Turley and how it perfects the Hillegas
scale for standardization. It provides the teachers and students with a tool to best assess a

Fetscher 5
students work fairly in hopes of a proper standardization scale. However, in this great
rubric debate, it does not comply with any sense of subjectivity or how the students could
precisely get proper feedback through asTTle. The credibility of Brown et al. is quite
reliable however as it properly sources much of its findings through other scholarly
works, professors, and test results that have been proven. Therefore, Brown et al. should
be carefully analyzed for plausible computation that creates a comparable result.

Cho, Yeonsuk. Assessing Writing: Are We Bound by Only One Method?. Assessing Writing 8:
Elsevier pp. 165-191. 2003. Sciencedirect.com. Web. 1 March 2014.
Yeonsuk Cho takes an approach by first addressing the writing process from a cognitive
point of view and suggests that the writing assessment cannot be understood until
evaluating the writing process. She explains that the writer goes through stages of
constructing good work such as the planning, executing, and reviewing their work. Cho
also explains more in depth of how good writers tend to plan and revise more as opposed
to poor writers cognitively try and get it right on their first try.
This research would suggest that if poor writers are determined on their tendency
to try and write their very best in one attempt, it sheds light on the idea of writing
assessments that are graded on first attempt, let alone in a time-frame setting. She also
brings about the idea of how the subject matter for the writing that is placed for
assessment raises a question of how well students will write for a test and whether or not
this would be an adequate to review a students aptitude in writing. This explained as
students may have to write about a subject that was reviewed in class as opposed to a
subject given to them by a standardized test overall. There becomes an issue of whether

Fetscher 6
time constraints may affect a students ability to perform and to assess this writing may
be skewed on the basis of which students are capable of writing in pressured
environments or capable of deriving information on things they have no prior knowledge
and/or interest. She brings about how university level students may turn in portfolios as
opposed to the traditional classroom setting environment to write a well-organized,
quickly processed paper.
The procedure executed was for graduate students to take an EPT (English
placement test) at the inception of the semester and two raters evaluated a students work
on the day of the exam. But through this method, errors existed as the students were
required to hear a lecture, read an article, and write a coherent essay in a 40 minute time
frame. The results proved a 22% misplacement of their abilities which according to Cho
is undeniably reasonable. These forms of assessment test more than exclusively their
writing abilities and writing composition. It tests a number of factors in a short time span
that does not accurately depict their capabilities according to this study. Cho provides an
appendix that reveals how the graders used a rubric to give students an option of a second
draft for this test. Through a full seven step process in the methodology of this study, it
proved that the ICC (Intra-class correlation coefficient) to be a way to remove bias from
evaluating and using these as scores to place people accordingly, but it does not directly
address the rubric used in Appendix A that the graders used or what particular feedback
the students received.

Fetscher 7
Dappen, Leon; Isernhagen, Jody; Anderson, Sue. A Statewide Writing Assessment Model:
Student Proficiency and Future Implications. Elsevier. www.sciencedirect.com.
Assessing Writing 13, pp. 45-60. 2008. Web.
This article more or less corresponds to a comparison between the national assessment
law of No Child Level Behind (NCLB) and the Nebraska State-wide curriculum for
their standardized tests. The tests that were implemented were called the STARS showed
a more direct teacher, student, and learning connection that resulted in debate of whether
the state implemented tests are more proficient than something that is graded at a national
level. Overall, the research had concluded that the STARS state-wide test had more clear
responses to the teachers that were assessing the writing given to the students. This
research has some validity in that it considers that writing assessment should be done in
smaller scales and not necessarily on a single based rubric or curriculum on the national
level. In other words, this study reveals better student outcomes and better results when
tested on a rubric base they may be more familiar with and can be more relatable towards
them, unlike the original implementations of the No Child Left Behind Laws with only
one criteria that like other journalists and professors suggest. Students may be
uninterested or cannot perform their very best when under standards they are not very
familiar with and placed in a testing environment. In lieu of the ranking tests that are
suggested as reliable forms of assessment such as the NRT, the study done in Nebraska in
relations to the STARS test suggest an assessment model that helps to sort students in a
way that acknowledges that students learn at different levels and therefore are given more
than one assessment test. This removes focus from keeping all students at one criteria bar.

Fetscher 8
DelleBovi, Betsy M. Literacy Instruction: From Assignment to Assessment. Elsevier. Sciverse,
ScienceDirect. Assessment Writing 17: pp. 271-292. 2012. Print.
Unlike many of the previous journalists or professors who wrote about a study that was
done by other professors, DelleBovi is a professor of education who performed her own
experiment on her graduate education students who intended to be secondary education
teachers. In her study, she wanted to attempt a holistic form of assessment on her students
through reference of Hamp-Lyons definition of holistic writing. Holistic scoring is often
characterized as an impressionistic scoring method, one that uses multiple raters in order
to compensate for interrater unreliability (qtd Hamp-Lyons 1990); which in turn,
suggests that the analytical form of assessment is not as reliable. DelleBovi split up the
scoring method based on the type of student and what they desired to teach in their
futures, ie.) Social sciences, German, English, etc. By doing this method, DelleBovi was
able to critique different writings with the expectations of different subjects of writing,
and therefore, different strong points or other ideas that may be emphasized. Her research
does conclude that the holistic approach may be more suitable to obtain the best results
from students and incorporating the teachers/graders abilities to determine what is valid
and considered good writing in their own rubric format. However, whether or not this
holistic approach is valid on a large scale of students that are required to be tested on
similar criteria, it may be difficult to perform the same procedures and have the same
results. Therefore, DelleBovis work is not a waste in the search for understanding the
rubric debate, but it cannot be held to the same standards as testing students that are in
fact learning in the secondary education programs, as opposed to graduate students that
were tested.

Fetscher 9
Hamp-Lyons, Liz. The Scope of Writing Assessment Assessing Writing 8: Elsevier. Pp 5-16.
2002. Print.
Hamp-Lyons explains the overall history and origins of evaluating assessment through
writing as opposed to assessment of writing. She explains how in Chinese history
throughout the Chou Period (1111-771BC), writing was already underway of being
evaluated or attempts were being made to find a way to equalize work with each other
and to criticize it. Hamp-Lyons later explains how the methodology of assessment
changes especially towards modern times in the New World as written assessment into
universities became popular as opposed to the traditional oral exams. This later discusses
how the ideals of studying rhetoric and the substance of the writing as a derived spoken
word became separated and more focused on particular standards and requirements in
writing. She analyzes writing assessment through origins, connecting it to much of
modern day culture and then follows with modern technology such as writing assessment
online. Much of her research is reinforced by professors and authors who have studied
writing assessment. However, her opinion does not hold too much validity in the sense of
how she supports writing portfolios and feedback by teachers. Hamp-Lyons demonstrates
a sense of understanding and provides information that allows the reader to make their
own connection of why writing assessment has importance. But pertaining to a plausible
solution or evidence towards a proper way to evaluate writing, there is none. The
information suggests that time, culture, and individualistic values for writing change and
that the assessment will correspondingly change, but the article does not provide any
evidence of how rubrics or writing composition should be conducted and evaluated.

Fetscher 10
Hernandez, Rosalinda; Menchaca, Velma; Huerta, Jeffery. A Study of the Relationship between
Student Anxiety and Test Performance on State-Mandated Assessments. ERIC. US-China
Education Review: Vol. B4. Pg 579-585. 2011. Print.
In regards to the study performed by the Texas Education Agency, Hernandez et al. was
able to discover how Hispanic students at the third grade level and above respond to a
writing response portion of the state-mandated test: the Texas Assessment of Knowledge
and Skills. Unfortunately, in regards to the study, it did not question much of the rubric or
the actual writing assessment but rather, it focused on testing the anxiety levels of
students as they were to take the test. Therefore, incorporating Hernandez into the same
category of argument would not be entirely accurate. However, Hernandez does help to
reinforce the factors that contribute to the unacknowledged factors in using rubrics such
as the demographics of the students such as their backgrounds or ethnicity. Similarly, this
research results in showing the anxiety levels that were shown in study cases such as
Lucy Spences research on the third grader, Dulce and how language barriers can cause
further anxiety and not result in a proper reading of how well a student can perform,
particularly in a testing environment. The article however neglects to provide further
information on the desired argument of how writing assessment is analyzed or any
solution within that topic. The ideas and tests in regards to the TAKS test is viable only to
the extent in which it connects to how they are assessed, not the anxiety levels in itself.
Likewise, this article provides data that permits implications of how the writing is
assessed but not professional conjecture about the related topic.

Fetscher 11
Knoch, Ute. Rating Scales for Diagnostic Assessment of Writing: What Should They Look Like
and Where Should the Criteria Come From?. Elsevier. Language Testing Research
Center. Assessing Writing 16: pp. 81-96. 2011. Print.
This article written by Ute Knoch primarily focuses on studying the actual substances of
several different kinds of rubrics and determines what are their strengths and weaknesses
and how a teacher or grader can decide which one be the most proficient to use when
grading students. Between holistic, analytical, multi and single trait rubrics, Knochs
work would be most beneficial to other teachers who desire to create new rubics and
what they are used for in a given circumstance. Although Knoch does not give any insight
of whether or not rubrics are viable or whether or not teachers should use them, it is
implied that rubrics are seen similarly as to Eric Turleys work in 2008 that suggests
rubrics be used as a tool that is not neutral, but simply can be utilized depending on the
efforts and overall intentions of the teacher. Therefore, Knoch presenting the benefits in
his conclusions of the different rubrics provides a solution to teachers to know how they
can assess a students writing. It becomes clear that not only is Knoch reserving the use
the rubrics, but he is not debating whether or not only one style should be used, but that it
depends on the situation in which the teacher is grading. This in mind, if solutions can be
discovered to use more than one kind of rubric and where it has its benefits and
deficiencies, the debate is diluted further to reassessing how writing should be assessed
through rubrics, assuming rubrics are desired by the teachers curriculum.

Mabry, Linda. Writing to the Rubric: Lingering Effects of Traditional Standardized Testing on
Direct Writing Assessment. Phi Delta Kappan Paper. 1999. Print.

Fetscher 12
Linda Mabry, a professor of educational psychology at Indiana University explains the
purpose of the rubric, the origin of how it became used widespread particularly in the
United States, as well as where its errors lie. According to Mabry, a very simple
definition of a rubric is Rules by which the quality of answers is determined but then
establishes how this definition of a rubric alone is enough to say that they have power to
undermine assessment. This article connects much to how rubrics are used and even
explains how during the 1990s, the National Council of Teachers of Mathematics
(NCTM) successfully implemented a standardized rubric and it began to spread to other
subjects and became a requirement to use for assessment. But unlike mathematics, Mabry
confirms that rubrics create confusion although they are intended as a form of genre to
form clear communication between students and teachers. It suggests that like other
subjects, writing rubrics will explain what a student should know and accordingly like a
matrix is read upon the different axes. By doing so, Mabry shows how it creates the
illusion to graders who are in attempt to seek standardization that they have found an
elaborate and concrete way to form a holistic score. Her work then shows how because of
its format and gridding, it can be very debatable or rather confusing. Upon her
psychological background for teaching methods, it provides great credentials to say that
making a single score for work only appears to be holistic. This format in many writing
rubrics forms a tool moreover to have many teachers or graders derive to the same score
with closer consistency. However, by doing so she explains the error in that it limits the
variability among writing. This idea of Mabry suggests that instead of a writing rubric
becoming a response to graders to open up to vast types of writing, it closes up instead

Fetscher 13
the likelihood of finding different forms of writing as the writer seeks to appease a
particular audience.

Rezaei, Ali Reza; Lovorn, Michael. Reliability and Validity of Rubrics for Assessment Through
Writing. Elsevier. College of Education. ScienceDirect. Assessing Writing 15: pp. 18-39.
April 2010. Print.
In a study performed in a second education setting, an overall test was to be conducted on
whether or not rubrics are even necessary. Two groups of students were to write on papers
that they would have previous knowledge on, in order to remove any factors of ignorance
of content being an issue. When doing so, the first student was noticed to have efficient
content that was somewhat unclear and unorganized but had very few local errors such as
spelling, grammar, sentence structure, etc. On the other hand, the other student was seen
to have very good content and an interesting method and approach, but was clouded with
bad spelling, grammar, and other local errors. Both papers were graded by teachers, once
without the rubric and then again with a rubric in order to conduct whether or not the
grading differences would heavily vary. In fact, the results according to Rezaei suggest
that when many grammatical or local errors are present, regardless of content, will be
overall seen as a poorer paper. Granted, if only one test trial was truly conducted then the
results may be a bit invalid. However, these results display the possibility that teachers
would need to be more conscious of their rubric and be solely relying on it if they so
choose. If not, then proper formatting needs to be considered according to Rezaei, to be
able to best analyze different students papers fairly.

Fetscher 14
Spence, Lucy K. Discerning Writing Assessment: Insights into an Analytical Rubric. National
Council of Teachers of English. Language Arts: Vol. 87, No. 5, Locating Standards in
Language Arts Education, pp. 337-352. May 2010. Print.
The article written by Lucy K. Spence provides a detailed case description of using a
rubric to analyze a students work that has a language barrier and may not be able to
perform up to the normal standards that other students are measured upon and graded. In
this study, two graders: Karla and Berta have their dialogue and work narrated for them
throughout the article as they discuss how to evaluate the work of Dulce. Dulce is
described as a third grade student who does not speak English very fluently but debate
lies in how to grade her based on how they know her as a person. If strictly following the
six-trait rubric, someone who does not know Dulce may give her a poor score based on
her efforts. However, Spence provides how this six-trait rubric does not account for any
language barrier, a section evaluating the effort placed in the project and work, and
therefore where is the lining difference between considering the subjective factors in the
scoring process or to stick to the rubric.
The six-trait rubric just illustrates the overall format of a 6x6 matrix with descriptors to
explain how well a student did, often with three positive reinforcement and three negative
reinforcement responses based on the score. This analytical scoring described by Spence
has been placed as a clearer definition than other articles. However, it appears as though
rubrics has been seen more in the holistic sense and attacked for the lack of feedback
unlike the analytical scoring that is explained more thoroughly in Spences work.

Fetscher 15
Turley, Eric D. and Gallagher, Chris W. On the Uses of Rubrics: Reframing the Great Rubric
Debate. The English Journal: Vol. 97 No. 4 pp 87-92. March 2008. Print.
Turley and Gallagher describe a more evaluative approach towards the concept of rubrics.
In lieu of finding out what is a good or bad one, he brings up the idea of Hillegas scale to
consider rubrics as a tool to be used but not a neutral tool that refuses to see good and
bad. Instead, it shows a degree of whether teachers use this tool to standardize within a
classroom, or even how schools may use it to test how well the teachers are teaching to a
particular degree.
This piece of work can definitely be useful for research because it has many original
works and citations to refer to suggesting Turley has done quite a bit of research before
making such claims. Similarly, he comes more so to an answer of how can the rubrics be
used in a manner that does not try to categorize rubrics or place it on what side of the
argument. This is a more progressive and recent article that reveals the possibility of what
rubrics can be used for in an agreeable manner.

Welch, Roland A. The Hillegas Scale as a Measure of Composition Work (Nassau County
Supplement). The English Journal: Vol. 15 No. 8 pg. 618-620. Oct 1926. Print.
Welch performs an experiment to test the Hillegas scale in which later was used in more
modern rubrics as a form of proper standardization of composition and writing deeming
the Hillegas scale a proper way to fairly grade composition. His method was to
supervise a peer review in a classroom that was below average in their writing and had all
names labeled as numbers instead so no bias from the other students would be present.
Testing the Hillegas scale in such a manner proved from his results that the in relativity to

Fetscher 16
one another, having a large set of judges were fairly accurate to determining how other
students would critique one another. For such an ancient method to score students
abilities, Eric Turley mentions Hillegas scale as a tool that indeed quantifies the work but
should not be taken as a neutral tool to determine whether something is good or bad.
Based on Welchs experiment, that is in fact all he proved so possibly should be
incorporated with other means of standardization if there is a search for determining what
is defined as good writing.
Because this test was performed in 1926 by a teacher who simply was testing a grading
approach without interfering as a teacher, his results may be credible during the time
period, but it does not address the current problem on a wider scale. Rather, he proves the
quantitative matters of composition in relation with each other can be measured through
this long used scale, but through acceptance of its quantifiable abilities, it simply does
just that. Likewise, an error in his methodology may be that he relies on the judgment of
the student as he had explained he is not familiar with the grading criteria himself.

Wilson, Maja. Why I Wont Be Using Rubrics to Respond to Students Writing. The English
Journal: Vol. 96 No. 4 pg 62-66. March 2007. Print.
In Wilsons approach journal entry she simply explains just as the title dictates. Wilson
reveals examples of how writing and language cannot be standardized to such extreme
degrees because language itself is more so interpreted. She conveys the idea that words
and writing is something described as opposed to defined. Wilson also explains as a
teacher, that she does not use rubrics for the sake of having it graded upon how the
student wrote and their intentions. She explains that rubrics solidify and make a concrete

Fetscher 17
response to the work as opposed to a subjective response to better the work that may be
anything from bad to great.
Maja Wilsons findings are useful because the concepts of which she depicts of how
overall writing should be graded is much more subjective and addresses writing as it is
qualitative, not quantitative. By expressing and focusing on the communicative aspect of
grading writing, it highly reflects my view and what I want to gear my research towards.
She creates a very good view point of how to assess an individuals paper and give them
proper feedback to improve even good writers. However, this journal piece will require
me to find other resources that find a better solution in standardized writing situations, if
it can be done.

You might also like