Evaluating Assessment Evidence

Evaluating Assessment Evidence
by Simon Carruthers
Submitted for Master of Teaching, Victoria University, New Zealand.
Grade Achieved: A+
Introduction
This paper will analyse primary school achievement data of three students
who are at risk of underachieving in 2015. The purpose of this exercise is to
inform teaching and expectations for students learning progress for the year.
Firstly, a summary of student information will be given. Then the types of
achievement data and information that were available will be outlined. This
paper, from the available data options, only selects the assessment tool of
STAR in which to make a more in-depth and thorough evaluation against
assessment theory. This paper then collates, analyses and summarizes STAR
Reading Test achievement data for the three students in particular and
suggests learning goals for each. This paper concludes with the idea that
achievement data carries only some information relevant for informing
teaching and expectations for students learning. Further, this paper finds
that other information, particularly mentally held and verbally communicated
information and interpretations from the teacher, provides the backdrop to
the students that is most likely to impact their future learning and, as such,
learning goals need to be tailored to this reality.
Student Information Summary
For the purposes of this study, the three students are given pseudonyms,
Student A, Student B and Student C. The three students in this study come
from a composite class of Year 5s and Year 6s at a Contributing Primary. All
three students selected happened to be in Year 6. For all three students, this
meant that they were going to have the same teacher again for Year 6 as
they had had for Year 5. Therefore, these students start to the year has little
to do with getting to know a new teacher as it is about getting to know the
half of the class who have entered as Year 5s. This could generally be
expected to assist the students learning as the teacher is already well
informed of the learning needs of each student and the student is familiar
with the teaching style and directions of the teacher. Such arrangement was
also of particular interest because the teacher was able to assign buddy pairs
of one experienced Year 6 and have them demonstrate certain tasks to a Year
5 partner. Additionally, since this is the students final year of schooling at
their current school, as the end of the year approaches, they are likely to
develop increased anticipation of change of school to an intermediate. It is
unclear, at this stage, and beyond the scope of this paper, to discuss how this
anticipation of change may affect their learning.
Achievement Data & Information
Evidence of both summative assessments of learning and formative
assessments for learning of the students were available from a large physical
binding folder holding paper located inside a lockable cupboard inside the
students classroom and contained:
Literacy data including: handwriting samples, running records, reading

wedge graphs, STAR tests, asTTle Reading, asTTle Writing
Numeracy data including: GLOSS tests, Basic Facts tests, samples of

work
OTJ summaries and copies of school reports
In addition, a password protected online database at www.etap.co.nz was

used by the school to store a digital record of the above achievement data,
particularly test score information.
Evaluation of Assessment Tools
The purpose of assessment is to identify where students are at with their
learning this is where summative assessment is most concerned.
Knowledge of where students are at then informs learning goal setting and
further progression of learning this is where formative assessment is most
concerned (Crooks, 2004). Different assessments will carry a purpose of
measuring a different aspect of learning and seek to measure as little else as
possible. The most important considerations when evaluating an assessment
tool are validity and reliability. Assessment validity relates to the relevance
and usefulness of the interpretation of the assessment results to the prior
stated purpose of the assessment. Assessment reliability refers to error and
consistency in the assessment, where good reliability may indicate, for
example, that a group of learners in another place or at another time would
achieve similar results under similar conditions when undertaking the
assessment (Gronlund, 1998).
The following is an evaluation of each of the assessment tools that were used
to collect the students achievement data.
STAR Tests
STAR assesses a range of reading skills that closely follows The Literacy
Learning Progressions (LLP) from the Ministry of Education. The tests are
administered nationally usually once a year. The tests are considered
supplementary to other assessments of reading that the teacher carries out.
The STAR test result assigns a stanine to a student that aids national norm
referenced comparisons. The STAR tests have undergone a number of
revisions and improvements since they were first developed between 1999
and 2003.
Validity
Content-Related Evidence
STAR tests include questions that align with the content of the domain, that
is, the LLPs. These LLPs follow the year levels. As the tests closely follow the
LLP, they can be expected to provide a reasonably representative sample of
the material that is contained within the domain of study for each year level.
Therefore, they should fairly reliably cover what has already been taught in
the classroom and therefore provide a useful measure of where students are
at and assist the teacher to plan for future learning to an effective degree.
Criterion-Related Evidence
STAR tests meet criterion-related evidence of validity issues when a student
does not sit the reference test at the start or end of the year but mid-year. If
they complete the test at the mid-point of the year, then they would likely be
assessed as being comparatively more advanced than their peers. This would
create inaccuracy in prediction of performance in the end of year STAR test
and would affect norm referenced comparisons in particular. This in turn
would likely have resource allocation effects if based on data alone. To
attempt to compensate, STAR teacher guidelines simply state that teachers
should include information in the form of notes if it is the case that students
completed the test later in the year.
Construct-Related Evidence
Do high stanines in the STAR test actually prove reading ability or do the
assessments measure something alternative or additional? The STAR tests in
New Zealand are more than a test of only reading ability. They are, more
specifically, a test of English reading ability. ESL students may have a higher
reading capability in their native language than what they can read in the
English language. Therefore, a low stanine result on a STAR test doesnt
necessarily prove that the student who achieved that result actually has a
low reading ability. A student with strong reading skills in their native
language might reasonably be expected to have a reading ability in English
that starts at a base lower to their New Zealand peers, however, during the
course of the year displays an accelerated rate of progressions through the
LLPs as they migrate transferrable reading skills in their native language to
those in English.
However, to complicate matters further, a cursory examination of a STAR test
for Years 4-6 from 2014, revealed that the test is also a minor test of the
Maori language as reflected in this question:
Choose the word that means the same or nearly the same as the word in
bold:
Today, we learned a new waiata
song/poem/game/trick
Years 4-6, NZCER
- p8, STAR Test Booklet, Form A,
It is notable that the meaning of the Maori word waiata, which means song in
English, has no way of being inferred just from the sentence itself. A sentence
that would infer the English meaning would replace the simple past form of
the verb learn with the simple past irregular form of the verb sing. Thus,
Today, we sang a new waiata would still make it possible to infer that
waiata means song in English, even with no prior exposure to the Maori
language.
The final section where this test of Maori language occurs only has ten
questions. Therefore, this one question means that ten percent of this section
displays very low construct-related validity because it assesses something
other than what it set out to assess. In such an overall relatively small test,
this one erring question likely significantly distorts the entire assessment and
thus renders the stanine results outputted unreliable in terms of being useful
indicators of where students are at with their learning.
Consequences of Using
The consequences of using STAR tests for their own validity are likely to be
generally positive given that they follow the LLPs for their year level and are
unlikely to be set at a level that is too difficult or too easy. LLPs are based on
well researched progress that students make at different ages and school
years. Students may therefore generally have been expected to have had
some practice with the reading skills that are being assessed. Since STAR
tests are not too easy, a meaningful distribution of scores may be obtained.
This distribution then reveals comparatively stronger and weaker students
and enables the setting of individualized and targeted learning goals that can
enhance motivation and improve learning overall.
The limited number of stanines in STAR tests, at only nine bands, means that
only a relatively coarse estimate of student achievement is obtained. A
student could be right on the edge of one side of a band while another is
much closer to the other edge of the same stanine band. These two students
may therefore differ in their reading ability to a fairly significant degree even
though they have been assessed as equal. This could detrimentally affect
allocation of teaching resources, where a student more in need does not
receive enough extra support pushing them further behind, while a student
less in need receives more pushing them even further ahead.
Although STAR tests are only considered supplementary to other assessments
carried out by the teacher and also that their purpose is to enable more
accurate setting of teaching and learning goals, students themselves may not
always see them this way. The physical nature of the standardized pre-printed
paper test booklets presents, for the student, an impression of formality and
seriousness that may not otherwise be as present in other forms of
assessment that the teacher carries out. This may affect some students more
negatively than others. As a result, student stress levels may rise and result
in test performance that isnt the best indicator of students true reading
ability. Subsequently assigning learning goals that are too low and that do not
extend student learning may result in boredom and loss of motivation for the
student, perhaps then turning into behavioural issues that lead to the student
actually falling behind in learning.
Reliability
STAR reading tests are susceptible to systematic and random error of
measurement as with all assessments and therefore will never present a
perfectly accurate depiction of student performance.
Systematic error in STAR tests is likely to be minimized due to the centralized

creation and distribution of the tests by the NZCER organization. That is, the
scale of the STAR assessment business covering most schools in the country
would financially allow a higher amount of expert analysis of the tests to
occur such that systematic errors would be more likely to be identified and
eliminated prior to test release to market.
The reliability coefficient (RC) is a statistical representation of random error
due to such factors as temporal variations in student performance,
carelessness in marking and luck. The RC for the STAR reading test using the
Test-Retest method was 0.87 for the Year 6 level in 2014 (Elley, 2001) which
indicates that the STAR reading test has good reliability in terms of its ability
to measure students under similar testing conditions but at a different time.
Student Achievement Data & Information

The last STAR reading test for the three students under consideration was
undertaken in February 2014. This data therefore is relatively dated. The
implications of this are that the data does not represent the effects of any
(presumed) teaching interventions undertaken throughout 2014. It will not be
until the STAR Reading Test is again repeated in February 2015, that a more
accurate picture of where these students are at becomes available.
The lack of relevant STAR achievement data means that less useful
information is available for informing teaching and learning expectations for
this group of students in 2015. However, some logical conclusions about
previous learning trajectories and anticipated future outcomes can still be
made from the historical data of three years of STAR testing.
Student A Achievement Data
Year
3
Year
4
Year
5
Student A
Word
Recognitio
n
STAR Reading Test Achievement Data

Sentence
Comprehensio
Paragraph
Vocabula
n
Comprehension ry
Tota
l
Stani
ne
2012
22
2013
22
2014
23
This data shows that Student A (SA) improved a stanine level in 2013
compared to 2012 but then regressed back to the original stanine in 2014.
The vocabulary score for 2013 was unusually low. Such data may represent
movement within the bounds of a reasonable error of measurement and thus
mean that SA is actually learning at a reasonably steady progression.
However, while SA appears able to decode text to a strong degree, a skill he
seems to have greatly enhanced in Year 3, his comprehension scores have
declined. This appears to indicate the barking at print ability described by
the students teacher, where decoding is occurring but comprehension is

lagging. According to SAs teacher, SA joined the current school after moving
home in 2013. Some of the decline in Year 5 scores can therefore probably be
attributed to an adjustment period that SA underwent.
Areas for SA to focus on in reading should be in overall comprehension. SA
must work hard to regain the learning ground lost after moving house.
Student B Achievement Data
Year
3
Year
4
Year
5
Student B
Word
Recognitio
n

Sentence
Comprehensio
Paragraph
Vocabula
n
Comprehension ry
Tota
l
Stani
ne
2012
19
2013
15
2014
15
Student B (SB) has declined in stanine level for three consecutive years from
four to two. SB performed particularly poorly in the paragraph comprehension
section for 2014, scoring only one mark in that section. Word recognition
began particularly high in Year 3 and word recognition remains SBs strongest
area. This again confirms the teachers assertion that students can bark at
print comparatively well, meaning that they are not truly understanding
what they are reading.
SB requires a serious intervention in order to stem what appears to be a
learning deficit that is compounding on itself, pushing SB ever further behind.
Anecdotal suggestion from the students teacher is that there is suspicion of
an undiagnosed learning disorder.
Student C Achievement Data
Year
3
Year
4
Year
5
Student C
Word
Recognitio
n

Sentence
Comprehensio
Paragraph
Vocabula
n
Comprehension ry
Tota
l
Stani
ne
2012
24
2013
10
33
2014
24
Student C (SC) had an unusually strong STAR test score in Year 4 but by Year
5 SC had declined by two full stanines. This appears to indicate that for some
reason, 2013 may have been a difficult year for SC perhaps at school, or
perhaps at home. It is particularly notable that SC had very high scores in
paragraph comprehension in 2012 and 2013 but this fell steeply in 2014.
Information from the students teacher indicates that siblings have learning
difficulties and there was some speculation that SC may also have something
but as yet undiagnosed. This wouldnt appear to explain previously high
results though, unless the onset of a disorder began sometime in 2013.
Resurrecting previous years high comprehension scores would be a broad
learning goal worth working towards while awaiting the results of any doctors
diagnoses.
Conclusion
When developing learning goals for students at risk of underachieving or any
student for that matter, it is not sufficient to rely on achievement data alone.
All three students in this study clearly have different circumstances in their
lives that may be impacting on their learning progress. It is necessary to
consider the nature of these circumstances on an individual basis. For
example, Student A appeared to be falling behind because his family had
moved home and he had changed schools, whereas Student B appeared to
be falling behind because of what may be an undiagnosed learning disorder.
Clearly the approach to each student will be different. Student A may be
reasonably expected, perhaps with encouragement only, to be able to work
and focus harder in order to catch up. Whereas Student B may require more
specialist assistance that supports his learning progress in other ways. In
analogous terms, achievement data is like the engine heat gauge on a
vehicle. It tells you when the car may be overheating but it doesnt tell you
whether the radiator has sprung a leak or if a load youre towing is too heavy
for the engine size rating. The same problem therefore sometimes demands
more than one different remedy depending on context. Therefore it is
necessary to know the context of a situation in order to decide which remedy
is needed.
The development of student learning goals in conjunction with achievement
data can best be augmented with the situationally contextualizing
information that is only usually held mentally and shared verbally by the
students teacher. This oral information, much given in confidence, includes
often sensitive areas such as students family status, relationships,
disabilities, tendencies and other background. Professional discretion is
usually a watchword. This may include for example (usually to the best
present knowledge of the teacher): parental status such as divorce or death;
parental acrimony; shared-custody or other living arrangements; abuse or
suspected abuse; sibling presence or former presence at the same school;
cordial or otherwise relationships among certain students; ethnic and
linguistic background interpretation; religion; religious community status;
teacher perceptions of student perceptions of teachers teaching style. All of
these factors have the potential for strong impact on student learning and
should be considered carefully alongside achievement data when informing

teaching and expectations of student learning.
References
Crooks, T. (2004). Tensions between Assessment for Learning and Assessment
for Qualifications. Paper presented at the Third Conference of the
Commonwealth Association for Examinations and Accreditations Bodies,
Nandi, Fiji.
Elley, W. (2001) STAR supplementary tests of achievement in reading
teachers manual. Wellington, New Zealand: New Zealand Council for
Educational Research.
Gronlund, N.E. (1998). Validity and Reliability. In Assessment of Student
Achievement, Allyn and Bacon, Boston, p. 199-221.
Ward, J. and Thomas, G. (2011).National Standards: School Sample Monitoring
and Evaluation Project, Chapters 5 and 6, p. 31-59.

Evaluating Assessment Evidence

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Evaluating Assessment Evidence

Uploaded by

Copyright:

Available Formats

Evaluating Assessment Evidence

Literacy data including: handwriting samples, running records, reading

Numeracy data including: GLOSS tests, Basic Facts tests, samples of

OTJ summaries and copies of school reports

In addition, a password protected online database at www.etap.co.nz was

- p8, STAR Test Booklet, Form A,

Systematic error in STAR tests is likely to be minimized due to the centralized

Student Achievement Data & Information

STAR Reading Test Achievement Data

the students teacher, where decoding is occurring but comprehension is

Student B Achievement Data

STAR Reading Test Achievement Data

Student C Achievement Data

STAR Reading Test Achievement Data

should be considered carefully alongside achievement data when informing

You might also like