Download as pdf or txt
Download as pdf or txt
You are on page 1of 23

Teaching Education

ISSN: 1047-6210 (Print) 1470-1286 (Online) Journal homepage: https://www.tandfonline.com/loi/cted20

Improving teachers’ assessment literacy through


professional development

Kim H. Koh

To cite this article: Kim H. Koh (2011) Improving teachers’ assessment literacy
through professional development, Teaching Education, 22:3, 255-276, DOI:
10.1080/10476210.2011.593164

To link to this article: https://doi.org/10.1080/10476210.2011.593164

Published online: 15 Aug 2011.

Submit your article to this journal

Article views: 4166

View related articles

Citing articles: 24 View citing articles

Full Terms & Conditions of access and use can be found at


https://www.tandfonline.com/action/journalInformation?journalCode=cted20
Teaching EducationAquatic Insects
Vol. 22, No. 3, September 2011, 255–276

Improving teachers’ assessment literacy through professional


development
Kim H. Koh*

Curriculum, Teaching and Learning Academic Group, National Institute of Education,


Nanyang Technological University, Singapore
(Received 25 June 2010; final version received 3 February 2011)

This study examined the effects of professional development on teachers’


assessment literacy between two groups of teachers: (1) teachers who were
involved in ongoing and sustained professional development in designing
authentic classroom assessment and rubrics; and (2) teachers who were given
only short-term, one-shot professional development workshops in authentic
assessment. The participating teachers taught Year 4 and 5 English, science, and
mathematics. The findings showed that the assessment literacy of teachers who
were involved in ongoing, sustained professional development had increased
significantly during the second year of study. These teachers had also gained a
better understanding of authentic assessment.
Keywords: teachers’ assessment literacy; professional development; authentic
assessment

Introduction
In many countries’ educational reform movements, assessment has become a key
policy lever for improving education. Basil Bernstein (1990), a prominent British
sociologist of education, has long held that assessment will ultimately pull curricu-
lum and pedagogy along. This is because of teachers’ tendency to reorient their cur-
riculum and pedagogy to assessment. As noted by the Australian researchers behind
the Queensland School Reform Longitudinal Study (Lingard et al., 2001), develop-
ing productive assessments acts as one of the best levers for engaging teachers with
pedagogical change for higher intellectual demand in their daily classroom. Like-
wise, many educators and policy-makers in the United States believe that ‘what gets
assessed is what gets taught’ and that the assessment format influences the format
of instruction (O’Day & Smith, 1993).
Given the tendency of teachers to mirror classroom instruction to assessment, an
obvious educational reform strategy is to change the content and format of assess-
ments to enhance the coverage of higher intellectual learning outcomes (e.g., com-
plex thinking, reasoning, problem-solving, communication, and conceptual
understanding of subject matter) and to move curriculum and instruction toward the
development of these skills (Smith & O’Day, 1990). In response to these ideas,
many assessment programs around the world have been revised over the past two

*Email: kimhong.koh@nie.edu.sg

ISSN 1047-6210 print/ISSN 1470-1286 online


Ó 2011 Taylor & Francis
DOI: 10.1080/10476210.2011.593164
http://www.informaworld.com
256 K.H. Koh

decades to reflect more challenging learning goals and to include more authentic,
open-ended assessment tasks.
Proponents of alternative, authentic assessment have long advocated holistic
assessment of student outcomes and learning progress on authentic tasks that are
closely aligned with higher order instructional goals. In contrast to conventional
paper-and-pencil tests that focus on knowledge reproduction and low-level cognitive
processing skills in artificial, contrived contexts, authentic assessment tasks empha-
size knowledge construction, complex thinking, elaborated communication, collabo-
ration and problem-solving in authentic contexts. These are the essential skills for
students to succeed in the twenty-first century knowledge-based economy. One of
the notable school reform programs by Newmann, Marks, and Gamoran (1996) in
the United States has demonstrated that students’ exposure to authentic assessment
tasks or assignments that focus on higher intellectual demands produced more intel-
lectually complex work and achieved better academic performance across different
subjects and grades. However, the Newmann study was not based on an experimen-
tal design and there is a possibility that the teachers using more authentic tasks
were just better teachers (Wiliam, Lee, Harrison, & Black, 2004).
Although authentic assessment is widely accepted as a tool of educational
reform in the last two decades, a commonly encountered problem by many educa-
tion systems around the world is the relative lack of assessment literacy among
teachers and school leaders. Many teachers are found not competent in developing
and implementing authentic performance assessments due to inadequate training
and support during the pre-service teacher education programs (Bol, Nunnery,
Stephenson, & Mogge, 2000; Hargreaves, Earl, & Schmidt, 2002; Stiggins, 1995).
The problem of teachers’ low level of assessment literacy is exacerbated by the
external pressures for accountability in student learning and achievement as well as
other practical constraints at school (e.g., time constraint, content coverage, class-
room management, and support from school leaders). As a result, most teachers
resort to modifying the content and format of instruction to fit the content and for-
mat of high-stakes assessments. Moreover, classroom assessments or teacher-made
tests tend to mimic high-stakes, standardized achievement tests, which often only
assess discrete bits of knowledge and skills or low-level knowledge reproduction
(Fleming & Chambers, 1983).
In the classroom assessment and teacher education community, there is a con-
sensus that high-quality professional development will provide in-service teachers
with training and support to improve their assessment literacy, specifically in
designing and implementing authentic assessments at the classroom level
(Aschbacher, 1991; Bol et al., 2000; Newmann et al., 1996; Stiggins, 1991a, 2002).
The extant literature on teacher professional development has also shown that
improving the quality of education relies heavily on teachers’ continuing develop-
ment and learning of new knowledge and skills to change or improve classroom
practice, which in turn leads to increased student learning and achievement (Fullan
& Miles, 1992; Desimone, Porter, Garet, Yoon, & Birman, 2002; Desimone, 2009).
As assessment is noted as a key lever for driving teachers’ instructional practice,
changing or improving classroom practice will require teachers’ improved knowl-
edge and skills in designing and implementing new forms of assessment. In fact,
teacher professional development has long been touted by both teacher educators
and policy makers as a cornerstone of systemic reform efforts to increase teachers’
capacity to teach to high standards. Therefore, as we enter the second decade of the
Teaching Education 257

twenty-first century, teacher professional development in assessment literacy has


become increasingly important because teachers are expected to master the knowl-
edge and skills relevant to the teaching and assessment of twenty-first century com-
petencies.

Assessment literacy and professional development


The term assessment literacy was first coined by Stiggins (1991b) as an under-
standing of the principles of sound assessment. According to Stiggins (1991b),
teachers who are assessment literates for the twenty-first century classroom should
know how to meet the following five standards of high-quality classroom assess-
ment: (1) starting with clear purposes of assessment; (2) understanding the impor-
tance of assessing different kinds of interrelated achievement targets (i.e.,
mastering of content knowledge, developing reasoning proficiencies, attaining per-
formance skills, and developing high-quality products); (3) selecting proper assess-
ment methods for the different kinds of achievement targets; (4) sampling and
collecting student achievement based on representative performance tasks; and (5)
avoiding assessment bias and distortion that arise from technical and practical
problems. Although the term authentic assessment has not been used directly in
his article, all the five standards spelt out by Stiggins correspond to the ideas and
principles of authentic assessment. In short, teachers’ assessment literacy involves
being prepared to define, teach, and assess the different kinds of competencies
that match the higher order instructional goals for the twenty-first century. In
order to be assessment literate, teachers must not only be competent to develop
and use high-quality authentic assessments and scoring rubrics, but also be able
to master evaluative skills to make sound judgments about student performance
(Saddler, 1998).
Previous studies on classroom assessment have consistently shown that many
teachers are inadequately trained and ill-prepared to develop, administer, and inter-
pret the results of various types of assessments (Bol, Stephenson, O’Connell, &
Nunnery, 1998; Stiggins & Conklin, 1992; Wiggins, 1989). In general, teachers
who were less prepared and skilled in developing authentic assessments perceived
the assessments as being more difficult to develop than traditional paper-and-pencil
tests. Moreover, teachers’ assessment practices were not well aligned with their
instructional goals and tended to demand a low level of cognitive processing in
classroom assessment tasks. Many teachers were also not good judges of the quality
of their own assessment tasks (Black & Wiliam, 1998; Bol & Strage, 1996).
Although there is a considerable body of research on the links between teacher
professional development and classroom assessment, most of the empirical studies
have focused on professional development and teachers’ formative assessment prac-
tices since the seminal work by Black and Wiliam (1998). The meta-analysis of
250 empirical studies conducted by Black and Wiliam (1998) showed that the inte-
gration of formative assessment practices into teachers’ everyday teaching practice
resulted in substantial gains in student achievement scores on standardized tests.
The effect sizes of the formative assessment experiments were found to be between
0.4 and 0.7, larger than most of those found for educational interventions. This find-
ing prompted Black and Wiliam (1998) to suggest that sustained programs for pro-
fessional development and support must be in place in order for teachers to
improve their formative assessment practice.
258 K.H. Koh

The only study to date that focused on teacher professional development and
authentic assessment was conducted by Borko and her colleagues in Colorado,
USA (Borko, Mayfield, Marion, Flexer, & Cumbo, 1997). Borko et al. (1997)
examined the process of change experienced by teachers who had undergone the
training for developing and implementing mathematics performance assessment. In
the study, 14 third-grade teachers participated in a year-long series of weekly work-
shops that focused on topics such as selecting, extending, and designing materials
and activities for assessment and instruction; observing students and keeping
records of observations; analyzing student work; and developing and using scoring
rubrics. Teachers’ change process was examined by analyzing conversations about
scoring tasks between teachers and researchers during workshops conducted
throughout the school year and interviews conducted at the beginning, middle, and
end of the year. The study showed that teachers benefited from professional devel-
opment experiences that provided them with the opportunities to explore new
assessment strategies and ideas in the context of their own classroom practice.
Additionally, the collective participation of teachers as members of a learning com-
munity enabled professional conversations about assessment to take place. It is an
effective tool for the social construction of new ideas and practices.
In an intervention study, Wiliam et al. (2004) worked with a group of 24 sec-
ondary mathematics and science teachers to develop their formative assessment
practice for two years. Using a quasi-experimental design in their study, Wiliam
et al. found a modest effect size of formative assessment on student achievement.
Wiliam et al.’s study also indicates that sustained professional development is
needed for developing teachers’ formative assessment practice.
In their work on building teacher capacity in classroom assessment, McMunn,
McColskey, and Butler (2004) reiterated that professional development, as situated
or embedded in the daily work lives of teachers is critical for classroom improve-
ments that can lead to increased student achievement. Furthermore, high-quality or
effective professional development must be aligned with a more constructivist
model for teacher learning, wherein teachers are involved in active learning through
professional conversations. It is important for teachers to work together on assess-
ment issues in a collaborative setting. Some of the worthwhile assessment issues
are clarifying the instructional goals and purposes of assessment, integrating more
authentic assessments into classroom assessment methods, examining the quality of
assessment tasks, and looking together at the quality of student work. McMunn
et al. further emphasized that ongoing, sustained professional development is more
effective than one-time, ad-hoc workshops to support teachers’ efforts at improving
their assessment practices.
Sato, Wei, and Darling-Hammond (2008) conducted a longitudinal study to
track the changes of mathematics and science teachers’ classroom formative prac-
tices as a result of their participation in the National Board Certification process.
The National Board Certification provided teachers with professional development
in using rigorous assessment and teaching standards. Based on their analyses of vid-
eotaped lessons, student work samples, and interviews with the teachers, Sato et al.
(2008) found pronounced changes in teachers’ use of a variety of assessments and
the way in which assessment information was used to support student learning in
the everyday classroom instruction. Their results also indicate that effective profes-
sional development strategies that focused on teachers’ actual classroom practice –
classroom interactions and analysis of student work, are essential for improving
Teaching Education 259

teachers’ assessment practices. The content focus on such strategies is more consis-
tent with teachers’ knowledge and goals as they are directly related to the work of
teaching. Moreover, the analysis of student work reflects teachers’ active learning in
a professional community (Borko, 2004; Garet, Porter, Desimone, Birman, & Yoon,
2001). It is recognized as a powerful strategy for teachers to examine the evidence
of student learning and to reflect on the teaching and assessment associated with
student learning. As teachers are engaged in reviewing student work in the topic
areas being covered, it helps them develop a deep understanding of how such eval-
uations of learning can inform their instructional choices and improve their class-
room practices (Shepard, Hammerness, Darling-Hammond, et al., 2005).
In Graham’s study (2005), teacher candidates in the United States reported that
they were strongly influenced by professional dialogue about planning and assess-
ment in both teacher training program and mentored field experiences. This implies
the importance of active learning. Most teacher candidates accepted alternative
assessment as a valuable source of evidence that indicated student learning. How-
ever, they were concerned about their skills in identifying goals, designing rubrics,
and determining the technical accuracy of assessments. The Graham (2005) findings
were supported by Volante and Fazio’s (2007) study of primary/junior teacher can-
didates in Canada. They found that the majority of the teacher candidates reported a
low level of assessment literacy and expressed the need for improving their assess-
ment knowledge through specific courses in classroom assessment and evaluation,
including good mentorship in the field. Although both studies involved pre-service
teachers, the findings did suggest that ongoing support and professional develop-
ment opportunities should be given to in-service teachers who would then mentor
teacher candidates on how to apply effective classroom assessment practices.
In short, the studies on teacher professional development and classroom assess-
ment reiterate the importance of the following five core features of effective profes-
sional development: content focus, coherence, active learning, collective participation,
and duration. In their view of building teachers’ capacity in assessment for twenty-first
century teaching and learning, Wiliam and Thompson (2008) aptly summed up that:

. . . teacher professional development is more effective when it is related to the local


circumstances in which the teachers operate, takes place over a sustained period rather
than being in the form of sporadic one-day workshops, and involves the teachers in
active, collective participation. (Wiliam & Thompson, 2008, p. 55)

Purpose of study
Since 1997, the Singapore Ministry of Education has launched many policy initia-
tives to reform the nation’s education system. The government’s key initiatives for
developing a productive, resilient, and lifelong learning nation to face the chal-
lenges of the twenty-first century knowledge-based economy are as follows: ‘Think-
ing Schools, Learning Nation’ (TSLN), ‘Innovation and Enterprise’ (I&E), ‘Teach
Less, Learn More’ (TLLM), and Curriculum 2015 (C2015). These initiatives have
advocated teaching for deep understanding and higher-order thinking skills rather
than rote memorization of factual and procedural knowledge. The initiatives also
imply that changes in teachers’ assessment practices are imperative if the ultimate
goal is to enhance students’ mastery of twenty-first century competencies. In their
efforts to promote students’ higher-order thinking skills, real-world problem-solving
skills, positive habits of mind, and communication skills, teachers in Singapore are
260 K.H. Koh

encouraged to move toward more constructivist teaching approaches and to adopt


new forms of assessment, such as authentic assessment.
Because of the need for improvement in teachers’ assessment practices, the Min-
istry of Education has provided teachers with resources, support, and professional
development over the past four years. Although millions of dollars have been
invested in teachers’ professional development at the school level, most of the pro-
fessional development programs are designed and delivered as ad hoc, one–two day
workshops and teachers may not be able to benefit from such workshop experience.
In general, Singapore teachers are enthusiastic about the learning of new forms of
assessment as they are receptive to MOE’s policy initiatives. But they are often
caught between the learning of new forms of assessment and the intransigence of
long-established practices of conventional assessment. This tension is exacerbated
by the demands of high-stakes accountability examination system. According to
Guskey (2002), change in teachers’ attitude toward new classroom practices takes
place primarily after some change in student learning has been evidenced. Given
the tension faced by Singapore teachers, there is a pressing need for more evidence-
based professional development research to support teachers’ efforts to change their
classroom practice in the context of local schools.
Prominent educators and authors of teacher learning and professional develop-
ment (Borko, 2004: Desimone, 2009; Garet et al., 2001; Luke & McArdle, 2009)
have called for systematic longitudinal research to be conducted on the effects of
professional development on improvements in teacher knowledge, classroom
practice, and student learning. They also highlighted the importance of using more
rigorous research design and data collection methods, which include quasi-experi-
mental comparison of professional development intervention versus control schools;
longitudinal tracking of teachers and students; analysis of longitudinal changes in
student performance, work and outcomes; and focus groups of teachers. Such
collection of valuable formative and summative data will inform ongoing develop-
ments of programs and design decisions about future professional development.
The purpose of this study was to examine the effects of professional development
(as a form of intervention) on teachers’ assessment literacy and student learning.
Teachers’ assessment literacy was measured by two indicators: the quality of class-
room assessment tasks and teachers’ conceptions about authentic assessment.
Because changes in teachers’ assessment literacy are expected to bring about changes
in student learning, it was also important for this study to examine the changes in the
quality of student work in response to changes in the quality of teachers’ assessment
tasks. Research on the effects of classroom assessment on student learning tends to
focus on using standardized test scores as a proxy for student learning. Standardized
tests might lack curricular validity (McClung in Wiliam et al., 2004) because they do
not accurately reflect what the teachers were teaching in their classrooms. To estab-
lish curricular validity, samples of student work for each classroom assessment task
were collected and used as a proxy for student learning in this study.
The rationale for improving teachers’ assessment literacy in the form of authen-
tic assessment task design and rubric development supports the need for equipping
teachers with contemporary knowledge and skills in developing assessment tasks
that would elicit students’ higher-order thinking skills (Cizek cited in Pellegrino,
Chudowsky, & Glaser, 2001). Teachers’ literacy in setting high-quality assessment
tasks would, in turn match the higher-order goals in teaching and learning (Wiliam
et al., 2004). According to Navarro (2008):
Teaching Education 261

. . . the general framework that guides the design and development of both formative
and summative assessments often does not address testing the full range of learning,
from the procedural and memorization level, all the way through transfer to the high-
est conceptual understanding that is demonstrated through transfer to new situations or
to solving new problems. (Navarro, 2008, p. 254)

This notion reiterates the importance of improving teachers’ assessment literacy in


designing authentic classroom assessments that could be used to assess the full
range of learning.

Method
Research design
Using a longitudinal, quasi-experimental group design, this study examined teach-
ers’ assessment literacy over the course of two school years. The study’s partici-
pants were teachers who taught Year 4 and 5 English, science, and mathematics
from eight neighborhood schools. The schools were matched based on their socio-
demographic characteristics (i.e., type of school and ranking of students’ academic
achievement) and were randomly assigned to one of two groups: four intervention
schools or four comparison schools.

Intervention schools
Teachers from the intervention schools received ongoing, sustained professional
development over the course of two school years. These teachers were engaged in a
series of professional development workshops that focused on authentic assessment
task design and rubric development. Additionally, the teachers participated in two
moderation meetings at the end of each school year to look together at the quality
of their classroom assessment tasks and student work while using a set of authentic
intellectual quality criteria. During the monthly school meetings, the project’s
researcher and trained research assistants also met with the teachers to discuss
issues regarding the implementation of authentic assessment tasks and rubrics. The
professional development program was designed to include almost all the core fea-
tures of effective professional development as below.

Content focus
The teachers engaged in the learning of concepts and principles of authentic assess-
ments and rubrics in their respective subject areas. Such knowledge is related to
their teaching in the daily classroom.

Active learning
The teachers were actively involved in the analysis and moderation of assessment
tasks and related student work samples.

Coherence
The design and use of new forms of assessment were consistent with the curriculum
reforms and policy initiatives in Singapore.
262 K.H. Koh

Duration
Ongoing, sustained professional development workshops with activities that spread
over two school years were provided to the teachers. They were involved in work-
shops during school holidays and monthly school meetings.

Collective participation
Teachers from the same school, grade, and department participated in the profes-
sional development.

Comparison schools
During each school year, teachers from the comparison schools were given a one-
day professional development workshop. The two ad-hoc workshops provided an
overview of authentic assessment and two hands-on sessions focused on task design
and rubric development. Over the course of two teacher moderation sessions, the
teachers were also taught how to analyze the quality of assessment tasks and stu-
dent work using the given authentic intellectual quality criteria. However, no
monthly follow-up meetings were held with the teachers from the comparison
schools.

Data sources
The assessment tasks and associated student work samples were collected from both
the intervention and comparison schools at three points in time – before the inter-
vention (baseline), at the end of the first year (Phase I), and at the end of the sec-
ond year (Phase II). Toward the end of the study, a focus group interview was
conducted with the intervention schoolteachers about their conceptions of authentic
assessment. The data served to corroborate the quantitative findings of teachers’
assessment literacy.

Authentic intellectual quality rubrics


Five authentic intellectual quality criteria were used to train the participating teach-
ers in authentic assessment task design and rubric development. Following New-
mann and associates’ (1996) framework of authentic intellectual work, teachers’
assessment tasks are expected to give students opportunities to demonstrate higher-
order thinking, real-world problem-solving, and communication skills. The five
authentic intellectual-quality criteria are: depth of knowledge, knowledge criticism,
knowledge manipulation, sustained writing, and making connections to the real
world beyond the classroom. Depth of knowledge includes three types of knowl-
edge: factual knowledge, procedural knowledge, and advanced concepts based on
the revised Bloom’s knowledge taxonomy (Anderson & Krathwohl, 2001). Higher-
order thinking is defined by two criteria: knowledge criticism and knowledge manip-
ulation. Knowledge criticism is exemplified by tasks that ask students to compare
and contrast different sources of information and critique knowledge, while knowl-
edge manipulation is exemplified by tasks that demand the following from students:
organize, analyze, interpret, synthesize, and evaluate information; apply knowledge
Teaching Education 263

and skills; and construct new meaning or knowledge. In addition, sustained writing
and making connections to the real world beyond the classroom are important for
students to engage in the three knowledge domains.
A four-point rating scale was used for the scoring rubric of authentic intellectual
quality for both teachers’ assessment tasks and student work. Through rigorous
training of the participating teachers, the assessment tasks and student work samples
were scored on each of the authentic intellectual quality criteria in four moderation
sessions. The percentages of exact agreement ranged from 67% to 90% in English,
65% to 99% in science, and 69% to 97% in mathematics, indicating moderate to
high interrater reliability.

Results
The quantitative results of the analyses using both descriptive statistics and t-tests
were presented on the following two aspects: (1) differences in the quality of teach-
ers’ assessment tasks and student work between the intervention and comparison
schools; and (2) changes in the quality of teachers’ assessment tasks and student
work over time for both intervention and comparison schools.

Comparisons of the quality of English teachers’ assessment tasks


As shown in Table 1, the mean score differences for the quality of English assessment
tasks between the intervention and comparison schools were compared on each of the
authentic intellectual criteria for each time point. The mean score differences were not
statistically significant, at the a = 0.05 level. At Phase I, the mean differences on the
authentic intellectual quality criteria were less obvious between the intervention and

Table 1. Changes in the quality of English assessment tasks.


Mean score difference
Intervention schools Comparison schools
Baseline vs. Baseline vs. Baseline vs. Baseline vs.
Criteria Phase I Phase II Phase I Phase II
Factual knowledge 0.29 0.43 0.01 0.67⁄
Procedural knowledge 0.43 0.57 0.33 0.18
Advanced concepts 0.57 0.57 0.16 0.23
Presentation of knowledge as given 0.43 1.00⁄ 0.50 0.79⁄
Comparing and contrasting 0.43 0.86⁄ 0.35 0.54
knowledge
Critique of knowledge 0.14 0.72⁄ 0.67 0.51
Knowledge reproduction 0.00 1.00⁄ 0.78 0.33
Organization, interpretation, 0.86⁄ 1.15⁄ 0.32 0.07
analysis, synthesis,
and/or evaluation
Application 0.15 0.58 0.34 0.51
Generation of new knowledge 0.43 0.86⁄ 0.35 0.25
Sustained writing 0.86⁄ 1.57⁄ 0.60 0.24
Connections to the real world 0.71⁄ 1.43⁄ 1.57⁄ 1.37⁄
beyond the classroom
Note: ⁄Mean score differences between the intervention and comparison schools are significant (p < .05).
264 K.H. Koh

comparison schools. At Phase II, the means on both factual and procedural knowl-
edge were lower in intervention schools than in comparison schools. The same pattern
was noted in presentation of knowledge as given and knowledge reproduction. During
Phase II, the mean scores on higher-order thinking skills, such as understanding
advanced concepts, comparing and contrasting knowledge, critique of knowledge,
application, generation of new knowledge, sustained writing, and making connections
to the real world beyond the classroom, were significantly higher in intervention
schools than comparison schools.
The results displayed in Table 1 also showed that most of the changes in mean
scores from baseline to Phase II were statistically significant in the intervention
schools. There were significant fewer assessment task demands on presenting knowl-
edge as given and knowledge reproduction. Most of the English assessment tasks
tended to focus on assessing students’ higher-order thinking skills, sustained writing,
and real-world application. In contrast, the assessment tasks collected from the com-
parison schools from baseline to Phase II showed an increased demand for factual
knowledge, presentation of knowledge as given, and knowledge reproduction.

Comparisons of the quality of student work in English


At baseline, the mean score differences on all of the authentic intellectual criteria
were not significant between the intervention and comparison schools. During Phase
II, student work from the intervention schools had lower mean scores than the stu-
dent work from the comparison schools on factual knowledge, presentation of
knowledge as given, and knowledge reproduction. As for the criteria of higher-order
thinking, the mean scores of the intervention schools were significantly higher than
those of the comparison schools.
Similar to the quality of English assessment tasks, the changes in scores from the
student work in English on the authentic intellectual criteria were statistically signifi-
cant for intervention schools from baseline to Phase II (see Table 2). Student work
demonstrated a significant decrease in presentation of knowledge as given and knowl-
edge reproduction, whereas there was a significant increase in higher-order thinking
skills, sustained writing, and real-world application. An opposite pattern was
observed in the change scores of student work in the comparison schools from base-
line to Phase II.

Comparisons of the quality of science teachers’ assessment tasks


There were no significant baseline differences between the intervention and compari-
son schools on the authentic intellectual quality criteria. During Phase II, the mean
scores on factual knowledge, presentations of knowledge as given, and knowledge
reproduction, were lower in the intervention schools as compared to the comparison
schools. In contrast, the mean scores of the intervention schools were higher than
those of the comparison schools on the higher-order thinking domain, which included
understanding advanced concepts, comparing and contrasting knowledge, critique of
knowledge, organization, interpretation, analysis, synthesis and evaluation, problem-
solving, generation of new knowledge, sustained writing, and making connections to
the real world.
As displayed in Table 3, the mean scores on factual knowledge, presentation of
knowledge as given, and knowledge reproduction had decreased from baseline to
Teaching Education 265

Table 2. Changes in the quality of student work in English.


Mean score difference
Intervention schools Comparison schools
Baseline Baseline Baseline Baseline
vs. vs. vs. vs.
Criteria Phase I Phase II Phase I Phase II
Factual knowledge 0.08 0.40 0.01 0.61
Procedural knowledge 0.23 0.62⁄ 0.06 0.27
Advanced concepts 0.01 0.71⁄ 0.15 0.47
Presentation of knowledge as given 0.65⁄ 1.37⁄ 0.56 1.23⁄
Comparing and contrasting knowledge 0.25 1.08⁄ 0.10 0.52
Critique of knowledge 0.09 0.89⁄ 0.26 0.71⁄
Knowledge reproduction 0.38 1.07⁄ 0.44 1.25⁄
Organization, interpretation, analysis, 0.10 0.75⁄ 0.25 0.03
synthesis, and/or evaluation
Application 0.03 0.78⁄ 0.24 0.18
Generation of new knowledge 0.08 0.92⁄ 0.39 0.44
Sustained writing 0.61⁄ 1.54⁄ 0.51 0.23
Connections to the real world 0.59⁄ 1.79⁄ 0.61 1.26⁄
beyond the classroom
Note: ⁄Mean score differences between the intervention and comparison schools are significant (p < .05).

Phase I in the intervention schools. These mean scores decreased further from base-
line to Phase II with significant change scores. In contrast, a significant increase in
the mean scores from baseline to Phase II was observed in the intervention schools
on the following criteria: advanced concepts, critique of knowledge, organization,

Table 3. Changes in the quality of science assessment tasks.


Mean score difference
Intervention schools Comparison schools
Baseline Baseline Baseline Baseline
vs. vs. vs. vs.
Criteria Phase I Phase II Phase I Phase II
Factual knowledge 0.80⁄ 2.60⁄ 0.08 0.64⁄
Procedural knowledge 1.00⁄ 1.76⁄ 0.92⁄ 0.71⁄
Advanced concepts 1.10⁄ 1.82⁄ 0.33 0.32
Presentation of knowledge as given 0.50 1.90⁄ 0.83⁄ 0.36
Comparing and contrasting knowledge 0.30 1.06⁄ 0.33 0.54⁄
Critique of knowledge 0.10 1.02⁄ 0.25 0.00
Knowledge reproduction 0.20 1.60⁄ 0.83⁄ 0.39
Organization, interpretation, analysis, 0.80⁄ 1.84⁄ 0.50 0.11
synthesis, and/or evaluation
Application/problem-solving 1.80⁄ 2.20⁄ 0.58⁄ 0.00
Generation of new knowledge 0.90⁄ 2.14⁄ 0.08 0.57⁄
Sustained writing 0.90⁄ 1.70⁄ 0.00 0.07
Connections to the real world 0.40 2.00⁄ 0.08 0.89⁄
beyond the classroom
Note: ⁄Mean score differences between the intervention and comparison schools are significant (p < .05).
266 K.H. Koh

interpretation, analysis, synthesis and evaluation, problem-solving, generation of


new knowledge, sustained writing, and making connections to the real world.

Comparisons of the quality of student work in science


The patterns of the mean score differences of science student work were similar to
those of the science assessment tasks. Table 4 shows that, at Phase II, the mean
scores from the intervention schools were significantly lower than those from the
comparison schools on the following criteria: factual knowledge, presentation of
knowledge as given, and knowledge reproduction. A significant increase in mean
scores was noted in the intervention schools on advanced concepts, comparing and
contrasting knowledge, critique of knowledge, organization, interpretation, analysis,
synthesis and evaluation, problem-solving, generation of new knowledge, sustained
writing, and making connections to the real world.
The changes in mean scores of student work on the authentic intellectual crite-
ria followed the same patterns of assessment tasks (see Table 4). In the interven-
tion schools, the changes in scores on the authentic intellectual criteria were
significantly larger from baseline to Phase II than those changes from baseline to
Phase I. An opposite pattern was noted for the comparison schools. From baseline
to Phase I, student work in the comparison schools demonstrated a significant
increase on presentation of knowledge as given and knowledge reproduction. In
contrast, there was a significant decrease in the quality of student work in terms
of advanced concepts, comparing and contrasting knowledge, organization, inter-
pretation, analysis, synthesis, and evaluation, and problem-solving. The change
scores from baseline to Phase II were somewhat smaller than those from baseline
to Phase I in the comparison schools.

Table 4. Changes in the quality of student work in science.


Mean score difference
Intervention schools Comparison schools
Baseline Baseline Baseline Baseline
vs. vs. vs. vs.
Criteria Phase I Phase II Phase I Phase II
Factual knowledge 0.48 1.22⁄ 0.52⁄ 0.27
Procedural knowledge 1.10⁄ 1.76⁄ 0.84⁄ 0.82⁄
Advanced concepts 0.52⁄ 1.42⁄ 0.42⁄ 0.45⁄
Presentation of knowledge as given 0.50⁄ 2.00⁄ 1.28⁄ 0.38
Comparing and contrasting knowledge 0.05 1.04⁄ 0.44⁄ 0.07
Critique of knowledge 0.31 1.33⁄ 0.13 0.01
Knowledge reproduction 0.24 1.89⁄ 1.27⁄ 0.34
Organization, interpretation, 0.43⁄ 1.03⁄ 0.94⁄ 0.13
analysis, synthesis, and/or evaluation
Application/problem-solving 1.23⁄ 1.68⁄ 0.40⁄ 0.26
Generation of new knowledge 0.76⁄ 1.22⁄ 0.08 0.25
Sustained writing 0.96⁄ 1.75⁄ 0.03 0.10
Connections to the real world 0.96⁄ 2.01⁄ 0.17 0.80⁄
beyond the classroom
Note: ⁄ Mean score differences between the intervention and comparison schools are significant (p < .05).
Teaching Education 267

Comparisons of the quality of mathematics teachers’ assessment tasks


On factual and procedural knowledge, presentation of knowledge as given, and
knowledge reproduction, the mean scores from the intervention schools were lower
than those from the comparison schools during Phase II. For the intervention
schools, the mean scores were higher than those for the comparison schools on the
higher-order thinking domain, which included understanding advanced concepts,
comparing and contrasting knowledge, critique of knowledge, organization, inter-
pretation, analysis, synthesis, and evaluation, problem-solving, generation of new
knowledge, sustained writing, and making connections to the real world. In the
comparison schools, the mean scores at Phase II were noted to be lower than those
at baseline on the higher-order thinking criteria, such as advanced concepts, com-
paring and contrasting knowledge, and organization, interpretation, analysis, syn-
thesis, and evaluation, problem-solving, generation of new knowledge, sustained
writing, and making connections to the real world.
As presented in Table 5, the mean scores from baseline to Phase II decreased sig-
nificantly in the intervention schools on procedural knowledge, presentation of knowl-
edge as given, and knowledge reproduction. There was a moderate increase in
advanced concepts, critique of knowledge, and generation of new knowledge. For the
mathematics assessment tasks in the comparison schools, a significant decrease was
found in advanced concepts, comparing and contrasting knowledge, application or
problem-solving, sustained writing, and making connections to the real world.

Comparisons of the quality of student work in mathematics


The mean scores on all the authentic intellectual criteria at Phase II were higher in
intervention schools than in comparison schools. The changes in mean scores in

Table 5. Changes in the quality of mathematics assessment tasks.


Mean score difference
Intervention schools Comparison schools
Baseline Baseline Baseline Baseline
vs. vs. vs. vs.
Criteria Phase I Phase II Phase I Phase II
Factual knowledge 0.07 0.44 0.30 0.22
Procedural knowledge 0.03 0.47⁄ 0.25 0.05
Advanced concepts 0.43 0.50⁄ 1.15⁄ 0.45⁄
Presentation of knowledge as given 0.69 0.78⁄ 1.00⁄ 0.00
Comparing and contrasting knowledge 0.32 0.27 0.02 0.68⁄
Critique of knowledge 0.52 0.95⁄ 0.71⁄ 0.09
Knowledge reproduction 0.56 0.58⁄ 1.22⁄ 0.18
Organization, interpretation, analysis, 0.09 0.19 1.09⁄ 0.16
synthesis, and/or evaluation
Application/problem-solving 0.14 0.16 1.07⁄ 0.73⁄
Generation of new knowledge 0.13 0.97⁄ 0.82⁄ 0.18
Sustained writing 0.09 0.27 1.02⁄ 0.49⁄
Connections to the real world 0.54 0.01 0.75⁄ 0.45⁄
beyond the classroom
Note: ⁄ Mean score differences between the intervention and comparison schools are significant (p < .05).
268 K.H. Koh

intervention schools showed that mathematics student work demonstrated less pre-
sentation of knowledge as given and knowledge reproduction at Phase I as com-
pared to baseline (see Table 6). However, the mean scores on these two criteria at
Phase II had increased, although there was a significant increase on comparing and
contrasting knowledge, organization, interpretation, analysis, synthesis and evalua-
tion, and problem-solving. For the comparison schools, the mean scores on all the
authentic intellectual criteria decreased from baseline to Phase II. Further, there was
a significant decrease on problem-solving and sustained writing.

Teachers’ conceptions about authentic assessment


The focus group interview data were transcribed and coded for teachers’ concep-
tions about authentic assessment as they emerged. Three themes related to teachers’
conceptions of authentic assessment emerged from the data. They were authentic
assessment, changes in task design, and rubrics.

Authentic assessment
The focus group interview excerpts revealed that teachers from the intervention
schools had a better conception or understanding of what authentic assessment
was after participating in the professional development for two years. They were
able to associate the features of authentic assessment with the criteria of authen-
tic intellectual quality as used in the professional development workshops. The
following paragraphs present the comments made by the participating teachers
from each subject area regarding the authentic intellectual quality of their assess-
ment tasks.

Table 6. Changes in the quality of student work in mathematics.


Mean score difference
Intervention schools Comparison schools
Baseline Baseline Baseline Baseline
vs. vs. vs. vs.
Criteria Phase I Phase II Phase I Phase II
Factual knowledge 0.09 0.55⁄ 0.21 0.21
Procedural knowledge 0.01 0.14 0.13 0.16
Advanced concepts 0.15 0.07 0.30 0.43
Presentation of knowledge as given 0.60⁄ 0.11 0.19 0.66
Comparing and contrasting knowledge 0.31 0.59⁄ 0.21 0.62
Critique of knowledge 0.23 0.16 0.25 0.11
Knowledge reproduction 0.43⁄ 0.59⁄ 0.46⁄ 0.28
Organization, interpretation, analysis, 0.05 0.79⁄ 0.19 0.02
synthesis, and/or evaluation
Application/problem-solving 0.30 0.70⁄ 0.38 0.50⁄
Generation of new knowledge 0.11 0.33 0.38 0.02
Sustained writing 0.20 0.04 0.56⁄ 0.39⁄
Connections to the real world 0.55⁄ 0.06 0.55⁄ 0.29⁄
beyond the classroom
Note: ⁄Mean score differences between the intervention and comparison schools are significant (p < .05).
Teaching Education 269

English

Teacher J: Hmm, it can relate to the real world. Pupils can see that it is worth learning
by doing authentic assessment tasks rather than taking tests and exam. I
think authentic tasks can help us to develop children in a holistic way, like
being artistic, being creative, being able to do things. For the normal tasks
that we are doing, we just develop children to take an exam. How has it
changed my teaching? I guess, I used to be more result-oriented. I looked
at their final results, but now, with the rubrics, I tend to think of the pro-
gress they make.

Teacher C: I think it made lessons more interesting to the pupils because now they
were no longer just doing pen and paper work, worksheets; now, they are
more involved in authentic tasks. They will get to see themselves improv-
ing, and, at the same time, their improvements will motivate them.

Science

Teacher H: It allows them to think more because it is less guided. We leave them to
think of what kind of materials they want to use. They have to think of
the steps taken for the experiments. The tasks are more investigative, more
hands on, with a more student-centered kind of learning. This method is
more open ended. It requires them to do research. Again, it is something
different from the activity book. They will have a more in-depth knowl-
edge of this topic.

Teacher Y: Umm, they do this on their own, and I have already given them the notes
and the information. So they are supposed to make use of what they
already have and reorganize the whole thing to make linkages. So it is
really applying the skills. Activities in the activity book definitely have
less focus on thinking skills. For the authentic task, it is about learning
through hands-on experience. It is also about linking whatever concepts
they have learned and putting them together.

Teacher I: I think it also allows them to think what they are writing. Thinking of what
they have learnt. Because sometimes if you just do quiz, you just try to catch
what is the main thing. I mean just the answer only but here they have to plan
the idea first and think over it. So it assesses their higher order thinking.

Mathematics

Teacher A: OK, the task is authentic; I think it is fun. It is a real project-based task;
this is what the people are doing outside as in their workplace. We inte-
grated different topics. We also focused on factual knowledge, for exam-
ple, the area of a triangle and the facts about money, something that they
have already learned and, procedural knowledge, whether they can calcu-
late the cost of the flooring.

Teacher T: I set out to identify the task. The key criterion is whether it is related to a real-
world situation. We wanted something that is real and not just hypothetical,
so it would be more interesting for the girls. From the accuracy of their calcu-
lations, we can tell whether they know their concepts; from their reasoning,
we know whether they can analyze and think clearly, and we understand their
logical process to problem-solving. Factual knowledge-wise, we want to
know if they know the area of a rectangle and a triangle. Procedural knowl-
edge, they need to know the operations, the steps to complete the task.
270 K.H. Koh

These teachers’ comments all indicated that their conceptions about authentic
assessment had improved as a result of active learning in the professional develop-
ment workshops. In designing authentic assessment tasks, teachers in all three sub-
ject areas had taken into account the criteria of authentic intellectual quality. They
were competent in articulating the key features of authentic assessment tasks and
making meaningful associations to the higher order learning outcomes. These are
the two key criteria of assessment literacy as defined by Stiggins (1995).
Clearly, the mathematics teachers considered the importance of assessing stu-
dents’ factual and procedural knowledge in addition to higher-order thinking skills
and real-world problem-solving. Compared to the mathematics teachers, the English
and science teachers were more able to make significant changes in their assessment
tasks because they could see how the changes in classroom assessment had shaped
their teaching and student learning. For example, Teachers J, C, H, I, and Y stated
that they found the differences between authentic assessment and conventional
assessment in measuring students’ outcomes.

Changes in task design


The theme of changes in task design recurred most frequently in the transcripts.
The teachers were able to change their thinking and planning of assessment tasks
after their participation in the professional development. Prior to crafting the assess-
ment tasks, they began with identifying the learning objectives or goals and the col-
lective participation of teachers who taught the same subject, level, and school
made the task design process more exciting and effective for the teachers.

English

Teacher M: Before the project, I guess you went ahead to do it without much thought
about the end product as in the assessment modes and all that. For this
task, you have to think ahead. How are you going to design it in a way
that it is aligned with the rubrics that you set up? And how are the kids
going to perform? You’ve got to think about how they will be able to
level up.

Science

Teacher H: Before this, I make use of lead.com. Other than textbook right, I also
make use of lead.com. They provide a tutorial and also quizzes for stu-
dents to complete and some fun activities as well. After participating in this
project, teachers who teach at the same level are supposed to sit down
together to plan a lesson and of course it must be something different. More
student-centered kind of learning. More investigative tasks, more hands-on
tasks, more student-centered learning. Our objectives must be clear of
course. We must sort of like provide them (students) with an example first.
When we are designing the tasks, we are actually very excited.

Mathematics

Teacher A: OK. The task objectives were actually with reference to our scheme of
work. This task actually emphasizes the mathematical connections to
everyday life.
Teaching Education 271

Rubrics
Teachers’ knowledge and skills of crafting rubrics is one of the criteria of assessment
literacy. As evidenced by the focus group data, the participating teachers not only had
a better conception of rubrics, but also believed that they were competent in designing
rubrics. They also commented that rubric is an essential tool in alternative assessment
and can be used to give formative feedback to students. It indicates that the teachers
were able to appreciate the use of the assessment information for formative assessment
or assessment for learning. For example, Teacher I said, ‘If there is a rubric you can
look into it, and then children understand it because we have a rubric to guide us. If
you mark it wrong then you need to explain why it is wrong’.

English

Teacher C: What I’ve done is I’ve applied. Maybe make changes to the way they
think of rubrics. That’s the only impact we’ve managed to make in Star-
light, to relook at our rubrics and it’s like a collaborative thing. Before
joining the project, people had different ideas of rubrics. Now at least the
rubrics in our school are slightly better.

Science

Teacher I: You can consider it as an alternative kind of assessment. Very different


from the usual one we always do, marking! OK, if there is a rubric you
can look into it, and then children understand it because we have a rubric
to guide us. If you mark it wrong then you need to explain why it is
wrong.

Teacher Y: I think first I need to list down what I’m assessing for that task. What are
the specific skills that I am looking for in an assessment? Whether I am
looking at the product or I am looking more on the process. How much
weightage would I want to give to the product and process?

Mathematics

Teacher A: Designing rubrics. We are all very familiar with rubrics being categorized
into four levels. Level 1 shows no understanding. Then the second level
would be showing a little understanding depending on the concepts and
the third level shows some understanding but there are still some errors in
them. Level 4 would be those show complete understanding.

Discussion
The findings in the English and science subject areas indicated that the authentic intel-
lectual quality of teachers’ assessment tasks had significantly improved after the inter-
vention. Such improvement was also observed in the quality of student work.
Teachers’ increased use of authentic assessments in English and science were well
aligned with their improved assessment literacy and conceptions about authentic
assessment. In addition, they also found that, through the use of authentic assessments,
students were engaged in authentic learning of knowledge and skills. Furthermore, the
use of rubrics allowed students to assess their own progress and motivated them to
make progress toward meeting the standards. It is one of the important strategies of
assessment for learning in the day-to-day learning process. These benefits could not
have been achieved by using conventional paper-and-pen assessments alone.
272 K.H. Koh

The quality of mathematics assessment tasks and student work had improved
slightly. This slight improvement could be attributed to the nature of the subject,
which tends to emphasize the reproduction of factual and procedural knowledge.
This finding is not surprising because many mathematics teachers still believe that
students’ mastery of factual and procedural knowledge is important for their con-
ceptual understanding. This notion also echoes Hiebert and Carpenter’s (1992) argu-
ment for the importance of emphasizing both procedural and conceptual knowledge
in mathematics teaching.
This study provides some insight into the format of effective professional devel-
opment for equipping teachers with contemporary knowledge and skills in develop-
ing authentic assessments and rubrics. The findings concur with the teacher learning
and professional development literature arguing that ongoing, sustained professional
development is more powerful than short-term, one-shot professional development
workshops (McMunn et al., 2004; Wiliam & Thompson, 2008). Additionally, the
core features of effective professional development such as content focus, active
learning, coherence, duration, and collective participation are necessary to be taken
into consideration in the planning and design of professional development pro-
grams. The findings further suggest that, when teachers are better prepared and lit-
erate in developing authentic assessments, they tend to adopt the use of authentic
assessments in their day-to-day classroom practice, resulting in better quality work
from students (Bol et al., 1998; Stiggins, 1991b).
One of the most recent recommendations made by the Singapore Primary Edu-
cational Review and Implementation Committee (PERI, 2009) has called for
schools to explore the use of bite-sized forms of assessments (i.e., topical tests) to
provide regular feedback to students and their parents regarding students’ learning.
According to Klenowski (2009), there is a possibility that this form of assessment
could encourage performance-orientated learning to the detriment of sustained and
real learning. Klenowski’s concern is valid for two reasons: (1) many teachers mis-
construed formative assessment as frequent, mini-summative assessments, and (2)
many teachers contended that they have adopted formative assessment practices by
merely using summative tests for formative purposes, such as answer checking.
According to Harlen (2006), one of the limitations of using summative tests as evi-
dence to improve student learning is that the information derived from the summa-
tive tests is not sufficiently detailed to be diagnostic. Thus, the most important
component in teacher professional development in assessment is to equip teachers
with the knowledge and skills to develop and implement authentic assessment and
rubrics. Such assessment tools have a closer alignment with specific learning goals
and give more detailed diagnostic information. These authentic assessment tools are
deemed more appropriate for formative assessment purposes. As evidenced by the
focus group interview data, teachers in this study were able to appreciate the value
of using authentic assessment tasks and rubrics to shape their own teaching and stu-
dent learning. The professional development experience provided them with ample
opportunities to engage in active learning of task design and rubric development.
The collective participation of teachers in the learning communities brought about
positive changes in their conceptions of authentic assessment, task design, and rub-
rics. This finding explains the improved quality of teachers’ assessment tasks and
student work. It also suggests that changing teachers’ assessment practices entailed
a shift in conceptions and beliefs about the new forms of assessment. When teach-
ers are able to make connections between assessment methods and higher order
Teaching Education 273

instructional goals or learning outcomes, they will be more willing to make changes
in their classroom practices.

Implications for teacher education and professional development


The findings of this study call for more localized, ongoing and sustained profes-
sional development for teachers. The findings also concur with McMunn et al.’s
(2004) contention that actualizing assessment reform at the school or classroom
level is a long-term endeavor that will not happen as the result of a single or spo-
radic workshop. Cohen and Hill (1998) have discussed the limited value of one-
time professional development workshops with respect to sustaining changes in tea-
cher practice. As demonstrated by the negligible changes in the quality of teachers’
assessment tasks and student work from the comparison schools, where teachers
received only two assessment workshops over the course of the study, professional
development can no longer be viewed as an ad hoc event that occurs on only a few
days of the school year. Therefore, it is of paramount importance to take into
account the sustainability factors in the planning and implementation of professional
development programs for in-service teachers.
Professional development must also be part of the daily professional practice of
teachers, meaning that teachers should be encouraged to actively collaborate
through school-based professional learning communities that are sustained over
time. This type of networking would provide teachers with ample opportunity for
self-reflection and dialogue with colleagues and allow for changes in teachers’
assessment practices to occur developmentally (Wiliam & Thompson, 2008).
The analysis of student work in moderation meetings is an effective strategy that
has been used increasingly in teacher education and professional development pro-
grams to help teachers examine student learning and reflect on the teaching associ-
ated with the learning (Shepard et al., 2005). The benefit of such strategy would
also help new teachers develop an understanding of how classroom assessment can
inform their instructional choices. The literature on teacher professional develop-
ment has suggested that teachers can improve their classroom practices when they
collectively review student work to do the following: analyze what has been learnt
by students, uncover students’ misconceptions, and reflect on their own curriculum
or instructional adaptations necessary to promote student understanding. Such an
active learning approach has been recognized as one of the core features of effec-
tive professional development in the teacher learning and professional development
literature. In addition, the rubric helps teachers to deepen and reflect on the quality
of classroom assessment tasks and the impact on the quality of student work.
According to Aschbacher (1999), such a tool is useful in both pre-service and in-
service professional development because teachers are able to reflect on the teaching
and assessment associated with student learning.
Shepard et al. (2005) also noted that engagement in assessment design is one of
the promising pedagogical approaches in helping new teachers develop an under-
standing of student learning. The participating teachers from this study’s interven-
tion schools lamented their lack of assessment preparation during their pre-service
teacher training programs. However, their assessment literacy had increased after
participating in the professional development workshops on developing authentic
assessment tasks and rubrics, which is a key component of the intervention. The
teachers had also gained a better understanding of student learning because they
274 K.H. Koh

were required to think about the learning goals and success criteria in the process
of task design and rubric development. These promising findings suggest that the
professional development workshops can be scaled to include more schools to bene-
fit more teachers. Furthermore, the contents of the professional development work-
shops used with the intervention school teachers can be adopted for pre-service
teacher training. Graham’s (2005) and Volante and Fazio’s (2007) findings in the
United States and Canada, respectively, have shown that many teacher candidates
expressed the need for improving their assessment literacy. New teachers’ lack of
preparation in classroom assessment is a commonly encountered problem by many
educational systems around the world. There is thus a need for assessment training
to begin as early as possible in pre-service teacher education.
Although teacher professional development has become increasingly important
to prepare teachers for the curriculum and assessment of twenty-first century com-
petencies and many programs have been put in place at both the school and system
level, policy-makers and teacher educators should not neglect to collect valuable
formative and summative data that might be useful to inform ongoing developments
and the planning of future professional development programs (Luke & McArdle,
2009). The quantitative findings of this study derived from a longitudinal, quasi-
experimental design show the strength of the data, which ruled out the confounding
variables (e.g., teacher factors or novelty effects) that might lead to spurious effects
of professional development (intervention). The findings also contribute to the tea-
cher learning and professional development literature by establishing the links
between the effects of professional development, teachers’ assessment literacy, and
student learning. Given that the data were based on the artifacts (i.e., assessment
tasks and student work samples) embedded within teachers’ day-to-day classroom
instruction, they were the immediate measures of teachers’ classroom practice and
student learning. Such measures took into account instructional sensitivity and
hence the curricular validity of the classroom data was established in the study
(Ruiz-Primo, Shavelson, Hamilton, & Klein, 2002). As a result, we can firmly
believe that the effects of professional development on the quality of teachers’
assessment tasks and student work were not due to measurement errors.
One limitation of this study is that it focused on improving teachers’ assessment
literacy within the realm of authentic assessment task design and rubric develop-
ment. Although it is essential to equip teachers with contemporary knowledge and
skills in developing assessment tools that tap into conceptual understanding and
higher-order thinking skills, their competency in using assessment information to
assist student learning through timely, formative feedback is equally important.
Because of the importance of using formative assessment to support learning (Black
& Wiliam, 1998; Hattie & Timperley, 2007), future studies should focus on build-
ing teachers’ capacity in formative assessment through professional development.

Acknowledgements
The author would like to thank the Singapore Ministry of Education for funding this
research. The author is also grateful to the participating teachers in the study and the
research assistants who helped with the data collection.

References
Anderson, L.W., & Krathwohl, D.R. (2001). A taxonomy for learning, teaching, and assess-
ing: A revision of Bloom’s taxonomy of educational objectives. New York: Longman.
Teaching Education 275

Aschbacher, P.R. (1991). Performance assessment: State activity, interest, and concerns.
Applied Measurement in Education, 4(4), 275–288.
Aschbacher, P.R. (1999). Developing indicators of classroom practice to monitor and support
school reform. CSE Technical Report 513. Los Angeles, CA: University of California,
Los Angeles.
Bernstein, B. (1990). Class, codes and control: The structuring of pedagogic discourse.
(Vol. 4). London: Routledge.
Black, P., & Wiliam, D. (1998). Assessment and classroom learning. Assessment in Educa-
tion: Principles, Policy, and Practice, 5, 7–74.
Bol, L., Stephenson, P.L., O’Connell, A.A., & Nunnery, J.A. (1998). Influence of experience,
grade level, and subject area on teachers’ assessment practices. The Journal of Educa-
tional Research, 91, 323–330.
Bol, L., & Strage, A. (1996). The contradiction between teachers’ instructional goals and
their assessment practices in high school biology courses. Science Education, 80, 145–
163.
Borko, H. (2004). Professional development and teacher learning: Mapping the terrain. Edu-
cational Researcher, 33(8), 3–15.
Borko, H., Mayfield, V., Marion, S., Flexer, R., & Cumbo, K. (1997). Teachers’ devel-
oping ideas and practices about mathematics performance assessment: Successes,
stumbling blocks, and implications for professional development. Teaching & Tea-
cher Education, 13(3), 259–278.
Cohen, D.K., & Hill, H.C. (1998). Sate policy and classroom performance: Mathematics
reform in California. CPRE Policy Brief No RB-23. Philadelphia: Consortium for Policy
Research in Education, University of Pennsylvania.
Desimone, L.M. (2009). Improving impact studies of teachers’ professional development:
Toward a better conceptualizations and measures. Educational Researcher, 38(3), 181–
199.
Desimone, L.M., Porter, A., Garet, M.S., Yoon, K.S., & Birman, B.F. (2002). Effects of pro-
fessional development on teachers’ instruction: Results from a three-year longitudinal
study. Educational Evaluation and Policy Analysis, 24(2), 81–112.
Fleming, M., & Chambers, B. (1983). Teacher-made tests: Windows on the classroom. In W.
E. Hathaway (Ed.), Testing in the schools (pp. 29–38). San Francisco: Jossey-Bass.
Fullan, M.G., & Miles, M.B. (1992). Getting reform right: What works and what doesn’t.
Phi Delta Kappan, 73(10), 744–752.
Garet, M.S., Porter, A.C., Desimone, L.M., Birman, B.F., & Yoon, K.S. (2001). What makes
professional development effective? Results from a national sample of teachers. Ameri-
can Educational Research Journal, 38(4), 915–945.
Graham, P. (2005). Classroom-based assessment: Changing knowledge and practice through
preservice teacher education. Teaching and Teacher Education, 21, 607–621.
Guskey, T.R. (2002). Professional development and teacher change. Teachers and Teaching:
Theory and Practice, 8(3/4), 381–390.
Hargreaves, A., Earl, L., & Schmidt, M. (2002). Perspectives on alternative assessment
reform. American Educational Research Journal, 39(1), 69–95.
Harlen, W. (2006). On the relationship between assessment for formative and
summative purposes. In J. Gardner (Ed.), Assessment and learning (pp. 103–117).
London: Sage.
Hattie, J., & Timperley, H. (2007). The power of feedback. Review of Educational Research,
77, 81–112.
Hiebert, J., & Carpenter, T.P. (1992). Learning and teaching with understanding. In D.A.
Grouws (Ed.), Handbook of research on mathematics teaching and learning (pp. 65–97).
New York: Macmillan.
Klenowski, V. (2009). Assessment for learning revisited: An Asia-Pacific perspective.
Assessment in Education: Principles, Policy & Practice, 16(3), 263–268.
Lingard, B., Ladwig, J., Mills, M., Bahr, M., Chant, D., Warry, M., Ailwood, J., Capeness,
R., Christie, P., Gore, J., Hayes, D., & Luke, A. (2001). The Queensland School Reform
Longitudinal Study. Brisbane: Education Queensland.
Luke, A., & McArdle, F. (2009). A model for research-based state professional development
policy. Asia-Pacific Journal of Teacher Education, 37(3), 231–251.
276 K.H. Koh

McMunn, N., McColskey, W., & Butler, S. (2004). Building teacher capacity in classroom
assessment to improve student learning. International Journal of Educational Policy,
Research, & Practice, 4(4), 25–48.
Navarro, M.S. (2008). Assessment and school reform: Lessons from 15 years in the field. In
C.A. Dwyer (Ed.), The future of assessment: Shaping teaching and learning (pp. 245–
262). New York: Lawrence Erlbaum Associates.
Newmann, F.M., & Associates. (1996). Authentic achievement: Restructuring schools for
intellectual quality. San Francisco: Jossey Bass.
Newmann, F.M., Marks, H.M., & Gamoran, A. (1996). Authentic pedagogy and student per-
formance. American Journal of Education, 104, 280–312.
O’Day, J.A., & Smith, M.S. (1993). Systemic reform and educational opportunities. In S.H.
Fuhrman (Ed.), Designing coherent educational policy: Improving the system (pp. 1–34).
San Francisco: Jossey-Bass.
Pellegrino, J.W., Chudowsky, N., & Glaser, R. (2001). Knowing what students know: The
science and design of educational assessment. Washington, DC: National Academy
Press.
Ruiz-Primo, M.A., Shavelson, R.J., Hamilton, L., & Klein, S. (2002). On the evaluation of
systemic science reform: Searching for instructional sensitivity. Journal of Research in
Science Teaching, 39(5), 369–393.
Saddler, R. (1998). Formative assessment: Revisiting the territory. Assessment in Education,
5, 77–84.
Sato, M., Wei, R.C., & Darling-Hammond, L. (2008). Improving teachers’ assessment prac-
tices through professional development: The case of National Board Certification. Ameri-
can Educational Research Journal, 45(3), 669–700.
Shepard, L., Hammerness, K. Darling-Hammond, L., et al. (2005). Assessment. In L.
Darling-Hammond & J. Bransford (Eds.), Preparing teachers for a changing world:
What teachers should learn and be able to do (pp. 275–326). San Francisco: John Wiley
& Sons.
Smith, D.C., & O’Day, J. (1990). Systemic school reform. London: Taylor and Francis.
Stiggins, R.J. (1991a). Relevant classroom assessment training for teachers. Educational
Measurement: Issues and Practice, 10(1), 7–12.
Stiggins, R.J. (1991b). Assessment literacy. Phi Delta Kappan, 72, 534–539.
Stiggins, R.J. (1995). Assessment literacy for the 21st century. Phi Delta Kappan, 77(3),
238–245.
Stiggins, R.J. (1999, November). Assessment, student confidence, and school success. Phi
Delta Kappan, 81, 191–198.
Stiggins, R.J. (2002). Assessment crisis: The absence of assessment for learning. Phi Delta
Kappan, 83(10), 758–765.
Stiggins, R.J., & Conklin, N.F. (1992). In teachers’ hands: Investigating the practices of
classroom assessment. Albany: State University of New York Press.
Volante, L., & Fazio, X. (2007). Exploring teacher candidates’ assessment literacy: Implica-
tions for teacher education reform and professional development. Canadian Journal of
Education, 30(3), 749–770.
Wiggins, G. (1989, May). A true test: Toward more authentic and equitable assessment. Phi
Delta Kappan, 70, 703–713.
Wiliam, D., Lee, C., Harrison, C., & Black, P. (2004). Teachers developing assessment for
learning: Impact on student achievement. Assessment in Education, 11(1), 49–65.
Wiliam, D., & Thompson, M. (2008). Integrating assessment with learning: What will it take
to make it work? In C.A. Dwyer (Ed.), The future of assessment: Shaping teaching and
learning (pp. 53–82). New York: Lawrence Erlbaum Associates.

You might also like