Professional Documents
Culture Documents
Evaluating Assessment Evidence
Evaluating Assessment Evidence
by Simon Carruthers
Submitted for Master of Teaching, Victoria University, New Zealand.
Grade Achieved: A+
Introduction
This paper will analyse primary school achievement data of three students
who are at risk of underachieving in 2015. The purpose of this exercise is to
inform teaching and expectations for students learning progress for the year.
Firstly, a summary of student information will be given. Then the types of
achievement data and information that were available will be outlined. This
paper, from the available data options, only selects the assessment tool of
STAR in which to make a more in-depth and thorough evaluation against
assessment theory. This paper then collates, analyses and summarizes STAR
Reading Test achievement data for the three students in particular and
suggests learning goals for each. This paper concludes with the idea that
achievement data carries only some information relevant for informing
teaching and expectations for students learning. Further, this paper finds
that other information, particularly mentally held and verbally communicated
information and interpretations from the teacher, provides the backdrop to
the students that is most likely to impact their future learning and, as such,
learning goals need to be tailored to this reality.
Student Information Summary
For the purposes of this study, the three students are given pseudonyms,
Student A, Student B and Student C. The three students in this study come
from a composite class of Year 5s and Year 6s at a Contributing Primary. All
three students selected happened to be in Year 6. For all three students, this
meant that they were going to have the same teacher again for Year 6 as
they had had for Year 5. Therefore, these students start to the year has little
to do with getting to know a new teacher as it is about getting to know the
half of the class who have entered as Year 5s. This could generally be
expected to assist the students learning as the teacher is already well
informed of the learning needs of each student and the student is familiar
with the teaching style and directions of the teacher. Such arrangement was
also of particular interest because the teacher was able to assign buddy pairs
of one experienced Year 6 and have them demonstrate certain tasks to a Year
5 partner. Additionally, since this is the students final year of schooling at
their current school, as the end of the year approaches, they are likely to
develop increased anticipation of change of school to an intermediate. It is
unclear, at this stage, and beyond the scope of this paper, to discuss how this
anticipation of change may affect their learning.
Achievement Data & Information
Evidence of both summative assessments of learning and formative
assessments for learning of the students were available from a large physical
binding folder holding paper located inside a lockable cupboard inside the
students classroom and contained:
the material that is contained within the domain of study for each year level.
Therefore, they should fairly reliably cover what has already been taught in
the classroom and therefore provide a useful measure of where students are
at and assist the teacher to plan for future learning to an effective degree.
Criterion-Related Evidence
STAR tests meet criterion-related evidence of validity issues when a student
does not sit the reference test at the start or end of the year but mid-year. If
they complete the test at the mid-point of the year, then they would likely be
assessed as being comparatively more advanced than their peers. This would
create inaccuracy in prediction of performance in the end of year STAR test
and would affect norm referenced comparisons in particular. This in turn
would likely have resource allocation effects if based on data alone. To
attempt to compensate, STAR teacher guidelines simply state that teachers
should include information in the form of notes if it is the case that students
completed the test later in the year.
Construct-Related Evidence
Do high stanines in the STAR test actually prove reading ability or do the
assessments measure something alternative or additional? The STAR tests in
New Zealand are more than a test of only reading ability. They are, more
specifically, a test of English reading ability. ESL students may have a higher
reading capability in their native language than what they can read in the
English language. Therefore, a low stanine result on a STAR test doesnt
necessarily prove that the student who achieved that result actually has a
low reading ability. A student with strong reading skills in their native
language might reasonably be expected to have a reading ability in English
that starts at a base lower to their New Zealand peers, however, during the
course of the year displays an accelerated rate of progressions through the
LLPs as they migrate transferrable reading skills in their native language to
those in English.
However, to complicate matters further, a cursory examination of a STAR test
for Years 4-6 from 2014, revealed that the test is also a minor test of the
Maori language as reflected in this question:
Choose the word that means the same or nearly the same as the word in
bold:
Today, we learned a new waiata
song/poem/game/trick
Years 4-6, NZCER
It is notable that the meaning of the Maori word waiata, which means song in
English, has no way of being inferred just from the sentence itself. A sentence
that would infer the English meaning would replace the simple past form of
the verb learn with the simple past irregular form of the verb sing. Thus,
Today, we sang a new waiata would still make it possible to infer that
waiata means song in English, even with no prior exposure to the Maori
language.
The final section where this test of Maori language occurs only has ten
questions. Therefore, this one question means that ten percent of this section
displays very low construct-related validity because it assesses something
other than what it set out to assess. In such an overall relatively small test,
this one erring question likely significantly distorts the entire assessment and
thus renders the stanine results outputted unreliable in terms of being useful
indicators of where students are at with their learning.
Consequences of Using
The consequences of using STAR tests for their own validity are likely to be
generally positive given that they follow the LLPs for their year level and are
unlikely to be set at a level that is too difficult or too easy. LLPs are based on
well researched progress that students make at different ages and school
years. Students may therefore generally have been expected to have had
some practice with the reading skills that are being assessed. Since STAR
tests are not too easy, a meaningful distribution of scores may be obtained.
This distribution then reveals comparatively stronger and weaker students
and enables the setting of individualized and targeted learning goals that can
enhance motivation and improve learning overall.
The limited number of stanines in STAR tests, at only nine bands, means that
only a relatively coarse estimate of student achievement is obtained. A
student could be right on the edge of one side of a band while another is
much closer to the other edge of the same stanine band. These two students
may therefore differ in their reading ability to a fairly significant degree even
though they have been assessed as equal. This could detrimentally affect
allocation of teaching resources, where a student more in need does not
receive enough extra support pushing them further behind, while a student
less in need receives more pushing them even further ahead.
Although STAR tests are only considered supplementary to other assessments
carried out by the teacher and also that their purpose is to enable more
accurate setting of teaching and learning goals, students themselves may not
always see them this way. The physical nature of the standardized pre-printed
paper test booklets presents, for the student, an impression of formality and
seriousness that may not otherwise be as present in other forms of
assessment that the teacher carries out. This may affect some students more
negatively than others. As a result, student stress levels may rise and result
in test performance that isnt the best indicator of students true reading
ability. Subsequently assigning learning goals that are too low and that do not
extend student learning may result in boredom and loss of motivation for the
student, perhaps then turning into behavioural issues that lead to the student
actually falling behind in learning.
Reliability
STAR reading tests are susceptible to systematic and random error of
measurement as with all assessments and therefore will never present a
perfectly accurate depiction of student performance.
Year
3
Year
4
Year
5
Student A
Word
Recognitio
n
Tota
l
Stani
ne
2012
22
2013
22
2014
23
This data shows that Student A (SA) improved a stanine level in 2013
compared to 2012 but then regressed back to the original stanine in 2014.
The vocabulary score for 2013 was unusually low. Such data may represent
movement within the bounds of a reasonable error of measurement and thus
mean that SA is actually learning at a reasonably steady progression.
However, while SA appears able to decode text to a strong degree, a skill he
seems to have greatly enhanced in Year 3, his comprehension scores have
declined. This appears to indicate the barking at print ability described by
Year
3
Year
4
Year
5
Student B
Word
Recognitio
n
Tota
l
Stani
ne
2012
19
2013
15
2014
15
Student B (SB) has declined in stanine level for three consecutive years from
four to two. SB performed particularly poorly in the paragraph comprehension
section for 2014, scoring only one mark in that section. Word recognition
began particularly high in Year 3 and word recognition remains SBs strongest
area. This again confirms the teachers assertion that students can bark at
print comparatively well, meaning that they are not truly understanding
what they are reading.
SB requires a serious intervention in order to stem what appears to be a
learning deficit that is compounding on itself, pushing SB ever further behind.
Anecdotal suggestion from the students teacher is that there is suspicion of
an undiagnosed learning disorder.
Year
3
Year
4
Year
5
Student C
Word
Recognitio
n
Tota
l
Stani
ne
2012
24
2013
10
33
2014
24
Student C (SC) had an unusually strong STAR test score in Year 4 but by Year
5 SC had declined by two full stanines. This appears to indicate that for some
reason, 2013 may have been a difficult year for SC perhaps at school, or
perhaps at home. It is particularly notable that SC had very high scores in
paragraph comprehension in 2012 and 2013 but this fell steeply in 2014.
Information from the students teacher indicates that siblings have learning
difficulties and there was some speculation that SC may also have something
but as yet undiagnosed. This wouldnt appear to explain previously high
results though, unless the onset of a disorder began sometime in 2013.
Resurrecting previous years high comprehension scores would be a broad
learning goal worth working towards while awaiting the results of any doctors
diagnoses.
Conclusion
When developing learning goals for students at risk of underachieving or any
student for that matter, it is not sufficient to rely on achievement data alone.
All three students in this study clearly have different circumstances in their
lives that may be impacting on their learning progress. It is necessary to
consider the nature of these circumstances on an individual basis. For
example, Student A appeared to be falling behind because his family had
moved home and he had changed schools, whereas Student B appeared to
be falling behind because of what may be an undiagnosed learning disorder.
Clearly the approach to each student will be different. Student A may be
reasonably expected, perhaps with encouragement only, to be able to work
and focus harder in order to catch up. Whereas Student B may require more
specialist assistance that supports his learning progress in other ways. In
analogous terms, achievement data is like the engine heat gauge on a
vehicle. It tells you when the car may be overheating but it doesnt tell you
whether the radiator has sprung a leak or if a load youre towing is too heavy
for the engine size rating. The same problem therefore sometimes demands
more than one different remedy depending on context. Therefore it is
necessary to know the context of a situation in order to decide which remedy
is needed.
The development of student learning goals in conjunction with achievement
data can best be augmented with the situationally contextualizing
information that is only usually held mentally and shared verbally by the
students teacher. This oral information, much given in confidence, includes
often sensitive areas such as students family status, relationships,
disabilities, tendencies and other background. Professional discretion is
usually a watchword. This may include for example (usually to the best
present knowledge of the teacher): parental status such as divorce or death;
parental acrimony; shared-custody or other living arrangements; abuse or
suspected abuse; sibling presence or former presence at the same school;
cordial or otherwise relationships among certain students; ethnic and
linguistic background interpretation; religion; religious community status;
teacher perceptions of student perceptions of teachers teaching style. All of
these factors have the potential for strong impact on student learning and
References
Crooks, T. (2004). Tensions between Assessment for Learning and Assessment
for Qualifications. Paper presented at the Third Conference of the
Commonwealth Association for Examinations and Accreditations Bodies,
Nandi, Fiji.
Elley, W. (2001) STAR supplementary tests of achievement in reading
teachers manual. Wellington, New Zealand: New Zealand Council for
Educational Research.
Gronlund, N.E. (1998). Validity and Reliability. In Assessment of Student
Achievement, Allyn and Bacon, Boston, p. 199-221.
Ward, J. and Thomas, G. (2011).National Standards: School Sample Monitoring
and Evaluation Project, Chapters 5 and 6, p. 31-59.