Download as pdf or txt
Download as pdf or txt
You are on page 1of 27

J Math Teacher Educ

DOI 10.1007/s10857-017-9369-z

Improving the judgment of task difficulties: prospective


teachers’ diagnostic competence in the area of functions
and graphs

Andreas Ostermann1 • Timo Leuders1 • Matthias Nückles2

 Springer Science+Business Media Dordrecht 2017

Abstract To teach adaptively, teachers should be able to take the students’ level of
knowledge into account. Therefore, a key component of pedagogical content knowledge
(PCK) is the ability to assume the students’ perspectives. However, due to the so-called
expert blind spot, teachers tend to misestimate their students’ knowledge, such as when
estimating the difficulty of a given task. This empirically well-documented estimation bias
is predicted by Nickerson’s anchoring and adjustment model, which generally explains
how people take on other people’s perspectives. In this article, we present an intervention
study that aims to improve the accuracy of prospective teachers’ judgments of task diffi-
culty in the area of functional thinking. Two types of treatments are derived from Nick-
erson’s model. In the first condition (PCK group), participants acquire knowledge about
task characteristics and students’ misconceptions. The second condition (sensitizing group)
serves to control the idea that potential improvements in the PCK group are not merely
based on a pure sensitization of the expert’s estimation bias. Accordingly, these partici-
pants are only informed about the general tendency of overestimating task difficulties. The
results showed that the PCK group improved both in terms of the accuracy of the estimated
solution rates and the accuracy of rank order, whereas the sensitizing group only improved
in regard to the former. Thus, the study shows that prospective teachers’ diagnostic
judgments can be improved by teaching them relevant PCK in a short amount of time.

Keywords Diagnostic competence  Tasks difficulty  Expert blind spot  Pedagogical


content knowledge  Functions and graphs

& Andreas Ostermann


Andreas.Ostermann@ph-freiburg.de
1
Institut für Mathematische Bildung, Pädagogische Hochschule Freiburg, Kunzenweg 21,
79117 Freiburg, Germany
2
Institut für Erziehungswissenschaft, Universität Freiburg, Rempartstrasse 11, 79098 Freiburg,
Germany

123
A. Ostermann et al.

Introduction

Among the many tasks teachers have to accomplish, diagnostic activities are considered
pivotal for preparing appropriate instructional settings and creating learning opportunities
(Berliner 1994; Demaray and Elliot 1998; Hoge and Coladarci 1989). The goal of diag-
nostic activities is to obtain valid information on students’ achievements, and conse-
quently, diagnostic competence is often defined as teachers’ ability to accurately assess
students’ expected or actual performance (Schrader 2009; Südkamp et al. 2012). Many
different approaches try to conceptualize, measure, and empirically investigate diagnostic
competence (e.g., Spinath 2005; Klug et al. 2013) and its impact on learning (Anders et al.
2010; Helmke and Schrader 1987). However, there is no coherent model encompassing the
different aspects of diagnostic competence yet.
This article focuses on a single yet crucial component of diagnostic competence: The
accuracy with which teachers are able to estimate the difficulty of tasks (Helmke and
Schrader 1987; Cronbach 1955; Südkamp et al. 2012). This can be regarded as an indicator
of a teacher’s ability to take on students’ perspectives—e.g., by anticipating misconcep-
tions or difficult steps in a solution process. Appraising the potential difficulties of a task is
considered an important prerequisite for selecting or constructing adequate tasks that
match the students’ abilities. In current models of mathematical knowledge for teaching
(MKT) (Ball et al. 2008; Baumert et al. 2010), the availability of both content knowledge
(CK) and pedagogical content knowledge (PCK) is regarded as relevant when making
diagnostic judgments: A teacher has to understand the mathematical concepts and solution
steps addressed in a task, which is considered as subject-matter knowledge for teaching.
The teacher must also anticipate any errors that students might make due to typical mis-
conceptions, which requires knowledge that extends beyond pure mathematical knowl-
edge. The cognitive skills that draw on these types of knowledge can enable a teacher to
take the students’ perspectives into account when they solve a task, and thus to conceive of
solution processes or to predict obstacles.
In order to measure the difficulty of a task, many studies use the ‘‘solution rate’’ which
is determined by the percentage of correct solutions within a certain student group (e.g.,
Krawitz et al. 2016). As an example, see the task in Fig. 1.

Fig. 1 What percentage of The picture below shows a part of the graph of a funcon.
students in an average eighth-
grade high school class would Complete the graph assuming that the funcon is decreasing
presumably solve this task
correctly? for x ranging between 0 and 2
y

0
0 2 x

123
Improving the judgment of task difficulties: prospective…

In the example in Fig. 1, a mathematics expert may solve the task at a glance and
assume that the task can be solved by students as well, as long as the terms ‘‘function,’’
‘‘graph,’’ and ‘‘decreasing’’ are familiar. However, students’ mathematical thinking does
not merely rely on the availability of steps or elements from a regular solution, but it is
also influenced by misconceptions. A previous study on eighth-grade students from a
sample (N = 230) of German ‘‘gymnasiums’’ (a higher stream of secondary education)
revealed that only 9.1% of the students were able to solve this task correctly (Leuders
et al. 2017). Of course, solution rates may differ regionally depending on the students’
curricular state and prior knowledge. However, students’ misconceptions on functions
and graphs seem to be internationally prevalent (Hadjidemetriou and Williams 2002a;
Leinhardt et al. 1990). In a preliminary study, we found that prospective teachers with
different levels of teaching expertise expected a solution rate between 30 and 100% for
the task in Fig. 1. In this example, inexperienced teachers seem to be unable to take into
account that many students tend to see graphs as always being linear or smooth (Had-
jidemetriou and Williams 2002a; Leinhardt et al. 1990), which leads them to incorrect
solutions. A similar overestimation of solution rates was previously demonstrated in the
studies by Nathan and Koedinger (2000), and it was termed the expert blind spot (Nathan
and Petrosino 2003): Owing to their widespread and extensively integrated professional
knowledge, it can be difficult for experts to adequately anticipate a novice’s under-
standing and perspective on a topic.
The tendency of imputing one’s own knowledge to another person is predicted
by the model proposed by Nickerson (1999). In this model, the ‘‘knowledge of what
categories of people [e.g., students] know’’ is considered one factor (among others) that
influences the adequacy of judgments. Within this interpretation, the teachers’
awareness of misconceptions belongs to the knowledge about the ‘‘category’’ of eighth-
grade high school students. This example suggests that Nickerson’s model, which we
will introduce and discuss in detail, appears to be suitable for describing the process of
diagnostic judgments, and it also provides hints how to improve diagnostic skills. Even
though Nickerson’s model has rarely been used in pedagogical contexts, it inspires the
design of the study presented in this paper and acts as a lens through which to interpret
the results.
Accordingly, we are pursuing two purposes in this article:
1. The first purpose is theoretical: We introduce Nickerson’s model into mathematics
education research as a framework for diagnostic judgments by relating its categories
to those dimensions of content knowledge and PCK which are relevant for diagnostic
competence. We intend to inspire a continued discourse on the suitability of the
Nickerson model to describe diagnostic processes and diagnostic competence beyond
this article.
2. The second purpose is empirical: Until now, there is a lack of empirical studies in
the domain of mathematics which show within a controlled design that diagnostic
competences can be improved by training. We present an intervention study that
contributes to closing this research gap. The study investigates whether a short
training program based on knowledge of students’ misconceptions can improve
prospective teachers’ diagnostic judgment accuracy in the area of functions and
graphs.

123
A. Ostermann et al.

Theoretical background

Three perspectives on teachers’ diagnostic competences—two of which are familiar and


frequently used, and one that constitutes a complementary perspective—are proposed in
this article. Firstly, we provide a brief overview of the research on teachers’ judgment
accuracy. Secondly, we address various components of mathematical knowledge for
teaching (Ball et al. 2008) which are relevant for diagnostic competences; we take a
closer look at the knowledge about students’ misconceptions in the area of functions and
graphs, as this facet is particularly important to our study. Finally, we introduce Nick-
erson’s (1999) model, which describes experts’ ways of adopting laypersons’ perspec-
tives. We use this model for diagnostic judgments in the pedagogical context by
explaining its relation to familiar models of MKT. This analysis will yield a framework
that is useful for (1) testing hypotheses on the factors that influence diagnostic com-
petence, and (2) designing training approaches for fostering diagnostic competence in
teacher education.

Accuracy of teachers’ judgments on task difficulties

In the literature from the last two decades, diverse approaches for conceptualizing and
investigating constructs associated with diagnostic competence can be found. Although the
breadth of the definitions and terminology differ considerably, common fields of interest
are the structure and genesis of teachers’ diagnostic judgments, the factors that influence
these judgments, and their impact on learning. Diagnostic competence is often seen as
being related to student achievement; for that purpose, it is defined as a teacher’s ability to
accurately assess students’ expected performance (Südkamp et al. 2012). Most studies
report on the judgments regarding a specific group of students, which means that teachers
estimate the performance of their class (Baumert et al. 2010; Südkamp et al. 2012). Two
frequently used ways of measuring the accuracy of diagnostic judgments are the following
(Helmke and Schrader 1987; Spinath 2005):
• The estimation of expected solution rates of tasks for specific student groups is
regarded as an indicator of the accuracy of judgments with reference to a whole class.
The accuracy can be measured by the difference of the estimated and actual percentage
of correct responses to a given task. A smaller difference is an indicator for a better
estimation. There is a large body of research showing that teachers tend to overestimate
the percentages of correct responses (Nathan and Koedinger 2000; Nathan and
Petrosino 2003).
• The estimation of expected rank order of specific students’ performance indicates to
what extent teachers are able to rank students in terms of their achievement, or to rank
tasks in terms of their difficulty. This component can be quantified by analyzing the
correlation between expected rank order and actual rank order. A larger correlation is
an indicator for a better estimation. Current research usually draws upon the latter
criterion, as measured by the correlation between a teacher’s judgment of students’
performance and students’ actual performance. The research summarized in Hoge and
Colardaci (1989, 16 studies) and Südkamp et al. (2012, 75 studies) showed an average
correlation of about 0.66 and 0.63, respectively, indicating that teachers do quite well in
judging their students’ achievement; however, there is considerable variation among
the teachers (typically 0.3–0.9), so there is still need for improvement.

123
Improving the judgment of task difficulties: prospective…

Although the judgments regarding specific student groups may have a high ecological
validity, such an operationalization has the disadvantage that special properties of specific
classes may bias teachers’ diagnostic judgments: For example, Dünnebier et al. (2009)
pointed out that marks given by other teachers can influence the estimation of students’
performance. Furthermore, teachers’ estimation of specific students also underlies the
influence of reference group effects (high-performing vs. low-performing classes, Süd-
kamp and Möller 2009). Information about students’ intelligence as well as performance in
other school subjects constitutes additional anchors that bias teachers’ specific judgments
(Kaiser et al. 2012). These examples for biasing anchors must be seen as confounding
variables which influence the statistical comparison of diagnostic judgments of different
teachers groups.
In order to empirically identify factors that influence diagnostic judgments by the use of
statistical methods (as intended in the present intervention study), one has to compare the
judgment accuracy of different teacher groups, which can be biased by confounders as
mentioned above. One can avoid such confounders originating from specific student groups
by restricting the investigation on teachers’ judgments on ‘‘average’’ (i.e., non-specific) or
‘‘representative’’ student groups. However, this approach requires data on the difficulty of
tasks derived from a sufficiently representative student group, which provides the empir-
ical solution rate and—resulting from this—the empirical rank order of task difficulty, i.e.,
the rank order of tasks according to their empirical difficulty. Consequently, the mea-
surement and the interpretation of the accuracy of diagnostic judgments should be per-
formed as follows:
• The estimation of expected solution rates of tasks for average student groups indicates
the accuracy of judgments with reference to a representative student group. The
accuracy can be measured by the difference of the estimated and the empirical solution
rate to a given task. A smaller difference is an indicator for a better estimation.
• The estimation of expected rank order of tasks indicates to what extent teachers are
able to rank tasks in terms of their empirical difficulty. This component can be
quantified by analyzing the correlation between expected rank order and empirical rank
order. A larger correlation is an indicator for a better estimation.
In our study, we investigate only prospective teachers’ judgments accuracy of non-
specific student groups. The ability to accurately judge non-specific groups can be a useful
foundation for the estimation of specific groups, as we will see in ‘‘Modeling of cognitive
processes in taking the perspectives of other people’’ section by means of Nickerson’s
model. When estimating non-specific student groups, the focus is less on one’s familiarity
with the performance of specific students, but rather on the requirements of a given task.
This may include anticipating plausible solution processes or identifying typical barriers to
understanding. Such knowledge may be drawn from empirical research findings or from
one’s former experience with students. Judgments on task difficulties (independent of a
specific group) are important for the selection of learning materials (e.g., a new class
textbook), for the preparation of a teaching unit for a new class, or for deciding on a new
topic that specific students do not have experience with.
The analysis of judgments about the difficulty of a specific task independently of a
specific group certainly implies an ecological restriction of the construct. However, the
pivotal advantage for research using this type of operationalization is that the diagnostic
judgments made by different groups of teachers can be compared statistically with respect
to the same criteria, i.e., the empirical solution rate and the empirical rank order.

123
A. Ostermann et al.

Components of teachers’ knowledge relevant to diagnostic judgments

In order to perform an adequate estimation of the difficulty of a given task, a teacher can
draw upon knowledge from different sources. He or she can reflect on the complexity of an
ideal student solution. Such an analysis requires what Morris et al. (2009) describe as
decompression of the task. Decompression is considered as a close-mesh step-by-step
solution that contains all the mathematical concepts necessary for solving a given task. In
the area of functions and graphs, these concepts may include different aspects of the term
of function (object, covariation, and assignment; Leinhardt et al. 1990) and different
representations (term, table, graph, and situation), as well as the changes between these
representations (Swan 1986; Duval 2002). Within the conceptualization of MKT of Ball
et al. (2008), this expanded knowledge is part of specialized content knowledge (SCK),
which is unique for mathematics teachers.
Decompression only yields information about potential difficulties that originate from
the incorrect application of concepts or from the omission of steps during an ideal solution
of a task (resulting from the application of mathematical concepts). However, there are
barriers (beyond the mere definition of mathematical concepts) that originate in students’
misconceptions and cannot be identified by decompression, such as the ‘‘graph-as-picture
error’’: students mistake the graph of a function for a geometric representation of a situ-
ation (Bell and Janvier 1981). A teacher who identifies such misconceptions as error
sources draws upon knowledge of content and students (KCS), as in the terminology of
Ball et al. (2008). In view of the study, we will present a list of students’ misconceptions in
the area of functions and graphs. Hadjidemetriou and Williams (2002b) analyzed diag-
nostic judgments of task difficulty in this domain. The authors show that teachers were not
able to spontaneously identify typical misconceptions that many students hold when
solving tasks in this area. Table 1 offers an overview of the most frequent misconceptions,
as collected by Hadjidemetriou and Williams (2002a). In the present study, we use these
findings by analyzing whether knowledge of these misconceptions improves the accuracy
of teachers’ diagnostic judgments.

Table 1 Students’ misconceptions in the area of functions and graphs, taken from Hadjidemetriou and
Williams (2002a)
1. Slope-height confusion: Pupils cannot distinguish between the highest value and the greatest slope,
and thus, the height is serving as a powerful distractor when interpreting the slope
2. Prototypes where pupils tend to sketch linear graphs and expect some form or reasonableness, such as
‘‘smooth,’’ ‘‘symmetrical,’’ and ‘‘continuous’’ graphs (Leinhardt et al. 1990)
This error also entails what we call the ‘‘y = x’’ prototype where pupils’ tend to construct the
y = x graph (a symmetric and linear graph) in inappropriate situations, and
The ‘‘Origin’’ prototype where the origin is treated as an indispensable part of the graph leading them
to drawing all their graphs through it (even if they have to draw a graph to show how the height of a
person varies from birth to early thirties)
3. Graph as ‘‘picture’’: Many pupils, unable to treat the graph as an abstract representation of
relationships, appear to interpret it as a literal picture of the underlying situation;
4. Pupils’ tendency to reverse the x and the y co-ordinates and their inability to adjust their knowledge in
unfamiliar situations
5. Misreading the scale: Pupils prototypically read a scale to a unit of one or ten

123
Improving the judgment of task difficulties: prospective…

Modeling of cognitive processes in taking the perspectives of other people

In their everyday professional practice, teachers continuously have to assess the state of
knowledge of specific students or classes. To estimate the difficulty of a task in a specific
or non-specific group of students, teachers must be aware of the students’ level of
knowledge—e.g., of the students’ special or typical abilities, notions, ideas, or miscon-
ceptions. Generally, diagnostic judgments can be considered as experts’ judgments on the
state of knowledge of other people. Although there is some research on judgment accuracy
(Hoge and Colardaci 1989; Südkamp et al. 2012), we lack efforts that explicitly try to
elucidate the cognitive processes activated during diagnostic judgments within cognitive
frameworks (Schrader 2009, p. 688; Philipp and Leuders 2014). However, outside the
pedagogical domain, there is a long tradition of describing and explaining quite similar
processes of judgment. Nickerson (1999) presents a general model based on a large body of
research on peoples’ understanding of the knowledge of others, as well as on the processes
of imputing one’s own knowledge to other people.
When estimating the state of other peoples’ knowledge, people consistently tend to use
their own knowledge as a default model. By using information about unusual aspects of their
own knowledge, about categories of people, and from long-term knowledge of specific others,
individuals adjust their model continuously by gaining information to obtain a working
model of specific others. Figure 2 presents the model of Nickerson in its original form.
This model predicts an expert blind spot as follows: People take their own knowledge
(e.g., expert knowledge from higher mathematics) as an anchor by which they estimate
other peoples’ (mathematical) knowledge and then only insufficiently adjust this estima-
tion by considering further factors. Therefore, in this model people tend to estimate their
own knowledge (and also their own ignorance) as being more common than it really is.
This phenomenon, the dominance of the anchor of one’s own knowledge, is supported by
findings in a broad number of areas of knowledge (Nickerson 1999). This heuristic of
anchoring and adjustment (Tversky and Kahneman 1974) explains the ubiquitous phe-
nomenon of overestimation, which is also known as the ‘‘curse of expertise’’ (Hinds 1999).
Krauss and Fussell (1991) explain the dominance of this anchor with the well-known
availability heuristic (Tversky and Kahneman 1974). For experts, their extensive and cross-
linked knowledge is so easily accessible that it inhibits them from taking on others’
(laypersons’) perspectives. This anchoring principle also manifests itself in the hindsight bias
(Fischhoff 1975) or the illusion of simplicity (Kelley 1999): Experts retrospectively assume
that their subject-matter knowledge was always there and that it is quite simple to under-
stand. In a preliminary study (Ostermann et al. 2015), we showed that teachers’ and
mathematics masters students’ estimations of task difficulty among eighth-grade students
were significantly correlated with their own effort to solve the tasks (independent of student-

Fig. 2 A model for peoples’ assumptions on what other people know (taken from Nickerson 1999)

123
A. Ostermann et al.

related task characteristics). Taking the teachers’ frequent overestimations of students’


performance into account, the suitability of Nickerson’s model to describe diagnostic pro-
cesses appears plausible: A teacher might project his or her own mathematical expert
knowledge or skills onto students, and expects a task to be easier than it actually is.
Applying this model to the diagnostic situation in which teachers estimate the students’
performance, one can describe the process of overestimation in more detail in the fol-
lowing way: Teachers have become experts for mathematics while studying mathematics
at university. Thus, teachers’ strategies for solving a task may strongly differ from stu-
dents’ strategies. A teacher who estimates the mathematical knowledge of an eighth-grade
student starts with his or her own mathematical knowledge, which constitutes the anchor.
He or she has to subtract unusual aspects of his/her own knowledge (such as knowledge of
higher mathematics, which may help to solve a task directly). Furthermore, he or she can
activate his/her knowledge of content and students with respect to the solution processes
expected from the category of eighth-grade students and his/her long-term knowledge of
specific students’ solutions or errors. The teacher can refine the model using information
obtained on an ongoing basis to observe the specific student. It is possible that the teacher
is not aware of all these different factors (or he/she weighs these factors insufficiently) and
thus imputes his/her own knowledge as a mathematical expert to the student, which might
lead to a typical misjudgment. This interpretation of the diagnostic process is supported by
the findings of Philipp and Leuders (2014), who conducted think-aloud interviews (Eric-
sson and Simon 1993) with teachers and teacher students who had to assess mathematical
tasks. The results indicated that teachers actually drew upon different sources, such as their
own solution of the task, an ideal student solution of the task, and knowledge about
students’ typical cognitions and misconceptions.
Building on these findings, we propose some specific relations between Nickerson’s
model and mathematical knowledge for teaching (MKT) with respect to diagnostic judg-
ments, as follows:
1. Own knowledge as an initial point When estimating other peoples’ knowledge, experts
start with their own subject-matter knowledge (Nickerson 1999). As high school teachers
acquire comprehensive expertise on higher mathematics at university, they can easily
refer to this source of knowledge in their entire mathematical thinking. In diagnostic
situations, this type of knowledge leads to the illusion of simplicity and, consequently, to
the overestimation of students’ performance—e.g., when teachers project their own way
of solving a task onto students (Nathan and Petrosino 2003). Thus, teachers’ subject-
matter knowledge of higher mathematics falls into this category and constitutes the
anchor, which must be adjusted by the influence of the following categories.
2. Unusual aspects of own knowledge Firstly, this category builds on the teachers’
awareness of the fact that his or her subject-matter knowledge in higher mathematics is
generally unusual in the sense that laypersons (e.g., students) do not possess it. This
awareness may sensitize teachers when judging the state of knowledge of students, and
prevent teachers—to a certain extent—from imputing their own knowledge. Being
familiar with this tendency of overestimating students’ performance may have
corrective effects when selecting tasks or setting the aspiration level for tests.
Furthermore, teachers can use another more specific source of knowledge, which may
compensate for the pitfalls of referring to higher mathematics. According to Ball et al.
(2008), SCK is a type of mathematical subject-matter knowledge that is only needed
by mathematics teachers and not by people of other mathematics-related professions.
In diagnostic situations, SCK may provide information about the mathematical

123
Improving the judgment of task difficulties: prospective…

components or steps of a solution that contribute to the complexity, either by


decompressing a solution or by considering various representations or mental models
for a mathematical concept. In this sense, SCK contains information about which
aspects of teachers’ knowledge of higher mathematics are unusual for non-teachers.
3. Knowledge about categories of people This category contains knowledge about
students’ cognitions which cannot be deduced from mathematical subject-matter
knowledge, such as knowledge about curricular connections, solution steps that are
only used in classroom contexts and, most importantly, frequent misconceptions. With
reference to the classification of teacher knowledge by Ball et al. (2008), this category
corresponds to the knowledge of content and students (KCS).
4. Long-term knowledge of specific others and information obtained on an ongoing basis
These categories refer to the experiences with specific students that are acquired
during the years of teaching practice; this enriches the more categorical knowledge of
the factor described previously.
Of course, the demarcations between the categories are fluid, just as Ball and colleagues
observed for the MKT subcategories. However, these delineations can still be useful as a
working model to describe teachers’ diagnostic judgments. Figure 3 summarizes these
considerations. We rely on the original structure of Nickerson’s model and adapt the
content to the teachers’ estimations of students’ performance:
If teachers estimate non-specific student groups (see ‘‘Accuracy of teachers’ judgments
on task difficulties’’ section), only knowledge of the gray area in Fig. 2 can be used. This
type of knowledge can be supported by empiric research findings and—according to
Nickerson’s model—can serve as a helpful foundation for the estimation of specific stu-
dents. In order to improve teachers’ diagnostic judgments, one can refer to this model
adapted to diagnostic situations (Fig. 3) and devise various debiasing methods. For
example, one could refer to factor (2) (unusual aspects) and sensitize teachers for the
exclusivity of their knowledge of higher mathematics in order to prevent them from
imputing their own knowledge to others. Alternatively, one could request that teachers
decompress tasks before estimating their difficulty (as in Morris et al. 2009). An inter-
vention that concentrates on the factors mentioned in (4) would guide teachers in analyzing
solution processes of their own students (as in Carpenter, et al. 1989). When devising a
debiasing intervention that focuses on factor (3), which aims at knowledge about non-

contributes to

Long-term Information
knowledge of Obtained on
Specific Student On-going Basis

Model of own Default Model of Inititial Model of Working Model of


Knowledge of Knowledge of Random Specific Students‘ Specific Students‘
Higher Mathematics Students‘ Knowledge Knowledge Knowledge

Unusual Aspects of Mathematical Knowledge of ategories of tudents:


Subject Matter Knowledge: (e.g. 8th Grade Students)
contributes to
• Knowledge of Expert Blind Spot and • Knowledge of Students‘ Misconceptions
Sensitization for Misjudgments • Knowledge of Students Cognitions
• Specialized Content Knowledge (SCK) • Knowledge of typical Student Solutions
• Decompression and ideal Solutions • Knowlegde of Curricular Content

Knowledge gathered from Empirical Research

Fig. 3 A model for teachers’ assumptions on what students know (based on Nickerson 1999)

123
A. Ostermann et al.

specific groups, namely the category of eighth-grade students, one can present information
on students’ misconceptions and their barriers to understanding. In the study presented in
the next paragraph, we created and examined two of these debiasing approaches. Nick-
erson (2001) already states an expected outcome of such an effort: ‘‘Simple awareness of a
tendency to overimpute one’s own knowledge to others may be helpful, but probably not
fully corrective. How best to teach people to make more accurate estimates of what other
people know, and to counteract the tendency to overimpute their own knowledge to others,
remains a challenge to research.’’ (p. 172)

Methodology

Main aim, research design, and hypotheses

The aim of our present study was is to investigate whether prospective teachers’ diagnostic
judgments can be improved by suitable training. We focus on the estimation of the difficulty
of given tasks in the area of functions and graphs, because functions are an important and
ever-present domain in the mathematics curriculum and in teacher education. We decided to
investigate prospective teachers’ judgments on non-specific student groups in order to avoid
confounding variables as explained in 2.1. and thus enhance the statistical comparability of
participants’ performance. Although such estimations require the integration of several
sources of knowledge, such as the complexity of different mathematical concepts (SCK), we
concentrate the training only on the transfer of knowledge about students’ misconceptions
(as a part of KCS) in order to outline the significance of this domain in diagnostic activities.
This specialization is based on the following reasons: Since we regard diagnostic activity as
a process of taking students’ perspectives with respect to Nickerson’s model, we first
assume that KCS, as a part of knowledge about what categories of people know, is an
important component of students’ perspectives. Secondly, even practicing teachers seem to
be unaware of students’ misconceptions in the area of functions and graphs (Hadjidemetriou
and Williams 2002b), which might be one reason for the overestimation of students’ per-
formance. In our intervention study, we investigated three groups of participants with
different forms of input between pre- and posttest.
We provided one group of participants with empirically verified knowledge about
students’ misconceptions (PCK group). This type of knowledge should supply diagnosti-
cally relevant information about the tasks and help the participants to compare the tasks
with respect to their difficulty by taking specific task features into account. The acquisition
of this knowledge is expected to consequently lead to improvements in the accuracy of the
predicted rank order.
Furthermore, familiarization with students’ misconception might sensitize participants
to their partial ignorance (blind spot) of relevant task characteristics. This insight could
reveal that tasks were more difficult for students than they previously imagined. This
change in perspective should lead to improvements in the estimation of solution rates,
because the awareness of difficult task characteristics should sensitize for overestimations.
To ensure that these possible effects are not only based on pure sensitization for
overestimations, but on specific substantial knowledge, we provided another group of
participants only with a non-specific short text on the expert blind spot in the domain of
teacher judgments (sensitizing group). This information is expected to make participants
aware of the exclusivity of their mathematical expert knowledge as unusual aspects of their

123
Improving the judgment of task difficulties: prospective…

knowledge in terms of Nickerson’s model. This form of sensitization, however, does not
deliver any specific information on task features. Therefore, it may, at best, only reduce the
overestimation of solving rates, while having no influence on the accuracy of the predicted
rank order of the tasks. Although the sensitizing group can be regarded as a comparison
group, we also provide a control group that did not receive any input between pre- and
posttest at all in order to exclude the effects of repetition.
Summarizing these considerations, we propose the following three hypotheses:
1. The PCK group improves in the accuracy of predicted solution rates and in the
accuracy of predicted rank order.
2. The sensitizing group only improves in terms of the accuracy of predicted solution
rates, but not in the accuracy of the predicted rank order.
3. The control group remains constant across both the accuracy of predicted solution
rates and in the accuracy of predicted rank order.

Participants

To ensure that all participants were on a comparable level with respect to their mathematical
competence and their teaching experience, we investigated 107 prospective teachers (78
females, 29 males) from the University of Education of Freiburg (Germany), who were all
aiming to earn a teaching degree in mathematics. The participants’ mean age was 22.52
(SD = 2.35). They had attended courses on learning and teaching algebra and functions,
participated in an internship at a school, and some had individual experiences in private
tutoring. Through these courses, they were familiar with the curriculum of eighth-grade
students. They all participated in a course about ‘‘functions,’’ which is typically attended in
the third or fourth semester. Our study took place within one session of this course.
The participants were randomized into one of three groups: Group 1 (N = 34), which
received PCK training; group 2 (N = 37), which was merely sensitized with respect to
overestimation; and group 3 (N = 36), a control group. The participants were rewarded
with book vouchers (€10 per participant).

Materials and design

The study followed the design of an intervention study with three groups: two treatment groups
and one control group. All parts of the study—pretest, intervention, and posttest—were carried
out by an online survey (generated with www.soscisurvey.com) within one 90-min session.

Pretest and posttest

Tasks We used items from a previous study (Leuders et al. 2017), which provided a large
set of 80 tasks to assess the competence of eighth-grade students on functions and graphs.
The ecological validity and content validity of these tasks were ensured by drawing upon
tasks from regular mathematics textbooks and through the use of think-aloud interviews
(Leuders et al. 2017). For all tasks, the empirical solution rates of students from different
German high schools (Gymnasium) were surveyed. The authors provided the items of the
study and data of 230 students, who constituted a heterogeneous sample with a typical
range of performance.
For the study presented here, we selected ten of these tasks, with a mean empirical
solution rate of 49.86% (SD = 22.06%). These items were chosen to span a broad

123
A. Ostermann et al.

distribution of empirical solution rates and to use different kinds of problems (for exam-
ples, see Appendix Figs. 8, 9, 10, 11). These empirical solution rates constitute the ref-
erence values for our study, which were used for comparison with the participants’
estimations (see ‘‘Accuracy of teachers’ judgments on task difficulties’’ section).
Estimation of solution rates After the presentation of each task, the participants were
asked to answer the following two questions:
1. ‘‘Predict how difficult an average student from an eighth-grade high school class
(gymnasium) would presumably find this task.’’ (Six-point scale: very easy, easy,
rather easy, rather difficult, difficult, and very difficult.) The entries of the Likert scale
were not analyzed, but served as an anchor or a first orientation for the participants to
perform the next more precise measure.
2. ‘‘Imagine an average eighth-grade high school class with 30 students with different
abilities. How many students would presumably solve this task properly?’’ (Numerical
entry between 0 and 30.)
Estimation of rank order All ten tasks were simultaneously presented on the screen in
the form of cards featuring task pictures. The estimation of the rank order was performed
by sorting the cards in terms of their difficulty via drag-and-drop.
Procedure First of all, the participants received a printout with all ten tasks that were to
be estimated in pre- and posttest. Pretest and posttest were carried out in the same manner:
To provide the participants with an overview of the range of empirical solution rates and
the content areas of the tasks, they were first shown two specifically selected tasks, which
were not to be estimated: one very easy task, and one task that was very difficult for the
students to solve (see Appendix Figs. 6, 7). These two examples should serve as anchors
for participants’ estimations. Afterward, the participants were asked to estimate both the
empirical solution rate and the rank order of task difficulty, as described earlier. For the
estimation of the rank order, the participants were instructed to refer to the items on the
printout in case they could not easily read all details of the cards on the screen.
Preparation of pre- and posttest data After the data collection was performed for each
participant, four values were calculated: average overestimation of the empirical solution
rate for both, pre- and posttest, and the correlation between estimated und empirical rank
order for both, pre- and posttest (as described in ‘‘Accuracy of teachers’ judgments on task
difficulties’’ section). For these calculations, the empirical solution rates derived from the
sample of Leuders et al. (2017) served as reference values.

Intervention

Between the pre- and posttest, we provided different interventions: Group 1 (PCK group)
was provided with PCK about functions and graphs, with a focus on students’ miscon-
ceptions; group 2 (sensitizing group) was sensitized by reading a text about the expert blind
spot; and group 3 (control group) did not receive an intervention:
• Group 1 was provided with PCK about students’ misconceptions on ‘‘Functions and
Graphs,’’ which were explained by short texts (80–120 words in length). The selection
and explanation of these misconceptions were based on the classification of
Hadjidemetriou and Williams (2002a), which refers to lineary prototype, origin
prototype, smooth prototype, continuous prototype, graph-as-picture error, slope-
height confusion, and misreading the scale (for further explanations see Appendix
Fig. 12). Then, the participants performed the following assignments:

123
Improving the judgment of task difficulties: prospective…

1. They rated the relevance of each misconception in everyday lessons on the topic of
functions and graphs with respect to an average eighth-grade high school class
(nine-point scale, with responses ranging from ‘‘extremely relevant’’ to ‘‘not at all
relevant’’) (one screen page, see Appendix Fig. 12).
2. They rated the relevance of each misconception for five given tasks, which were
different from the tasks in the test with respect to an average eighth-grade high school
class (nine-point scale, with responses ranging from ‘‘extremely relevant’’ to ‘‘not at all
relevant’’) (five screen pages, one for each task, see Appendix Fig. 13 for an example).
3. They wrote down the potential difficulties for each of these five new tasks with
respect to an average eighth-grade high school class (description field) (five screen
pages, one for each task, see Appendix Fig. 14 for an example).
4. They explained given solution rates for three further graphical tasks, after having
estimated their difficulty with respect to an average eighth-grade high school class
(three screen pages, one for each task).
5. They rated the relevance of each misconception in the participants’ memories of
their own school day (nine-point scale, with responses ranging from ‘‘extremely
relevant’’ to ‘‘not at all relevant’’) (one screen page).

These assignments were given to intensively engage the participants with the required
content. Thus, the survey entries obtained during this intervention period were evaluated
with respect to processing time and answer length (the number of words in the open-
answer forms) to control the participants’ engagement.
• Group 2, the sensitizing group, did not receive any PCK, but these participants were
sensitized generally to the expert blind spot, as they had to read the following text:

Empirical research has largely shown that by estimating the difficulty of tasks, some teachers
tend to perform extensive misestimations. This can be attributed to their widespread and
well-linked content knowledge. Teachers are susceptible to the so-called illusion of sim-
plicity, and they tend to estimate that tasks are easier than they actually are for students. We
ask you now to estimate the tasks from the beginning of the questionnaire once more.

• Group 3 constituted the control group, and the participants in this group did not receive
any kind of intervention between the pretest and posttest periods.
Finally, in order to treat all groups equally, groups 2 and 3 were also provided with the
PCK instruction of group 1 after the posttest.

Results

Overestimation of the empirical solution rate

A Kolmogorov–Smirnov test showed that overestimation values were sufficiently normally


distributed, which allows for the comparison of means using parametric tests. To measure
the improvement in the estimation between pre- and posttest in the three groups, a repe-
ated-measures analysis of variance was conducted. It could be shown that the estimation of
task difficulty improved in both the PCK group and the sensitizing group. In the control
group, the estimation remained constant.

123
A. Ostermann et al.

20
PCK instruction
Sensitization

ercent
Control
15

Overestimation
10

0
Pretest Posttest

Fig. 4 Mean overestimation of the solution rates (%)

Table 2 Overview of the precise statistical values for overestimation


Group Mean Mean SD SD df t Sig. Effect size
pretest posttest pretest posttest (2-tailed) Cohen’s d

PCK instruction 15.786 3.729 11.056 10.406 33 9.581 \0.001 1.123


Sensitization 9.941 4.812 10.130 10.460 36 4.698 \0.001 0.498
Control 11.223 11.519 11.989 11.292 35 –0.331 0.742 0.025

Accordingly, there was a significant interaction between the treatment and time of
measurement, F(2,104) = 23.400, p \ 0.001, part. g2 = 0.31. Evidently, mean overesti-
mation changed differently from pre- to posttest, depending on the type of treatment
(Fig. 4).
To obtain an overview of the precise values, and to point out the effect size of the
improvement (i.e., the reduction) of the overestimations, we report in Table 2 (for each
group) the results of the one-sample t-tests for repeated measures using the Bonferroni-
adjusted alpha level, p \ 0.01.

Rank correlation

As Pearson’s correlations are not interval-scaled (meaning that their variance could not be
considered independently of their amount), Fisher’s z-transformation was carried out.
Fisher’s z-transformed measure of correlation is approximately interval-scaled (Cohen
1988), and a Kolmogorov–Smirnov test showed that these values are sufficiently normally
distributed, which allows us to compare means using parametric tests.
In order to test whether Fisher’s values for the rank correlation improved, a repeated-
measures analysis of variance was conducted as well. Only the values for the PCK group
improved significantly, whereas the values for the sensitizing group and the control group
remained constant.
We can report that there was also a significant interaction between the treatment and
time of measurement, F(2,104) = 7.43, p = 0.001, part. g2 = 0.126. This effect tells us
that the Fisher’s z-values changed in accordance with the treatment used (Fig. 5).

123
Improving the judgment of task difficulties: prospective…

1.0
PCK−Instruction
Sensitization
0.8 Control

Fisher's z 0.6

0.4

0.2

0.0
Pretest Posttest

Fig. 5 Mean Fisher’s z-values representing the average rank correlations between the estimated and
empirical solution rates

Table 3 Overview of the precise statistical rank correlation values (Fisher’s z-values)
Group Mean Mean SD SD df t Sig. Effect size
pretest posttest pretest posttest (two-tailed) Cohen’s d

PCK instruction 0.583 0.782 0.318 0.309 33 –3.654 0.001 0.634


Sensitization 0.431 0.392 0.465 0.436 36 1.397 0.171 0.087
Control 0.350 0.361 0.485 0.553 35 –0.216 0.830 0.021

To obtain an overview of the precise values, and to also point out the effect size of the
improvement in the rank correlation, we report in Table 3 (for each group) the results of the
one-sample t-tests for repeated measures using the Bonferroni-adjusted alpha level, p \ 0.01.
The highly significant mean difference in Fisher’s z-values in the PCK instruction group
between the pretest and posttest seems to be remarkable, as we observed a rather large
effect of Cohen’s d = 0.698.
No gender differences could be found in the entire dataset. As expected, the PCK instruction
group’s data on processing time during the intervention (M = 30.45 min; SD = 5.98 min) and
on the length of the open answers (M = 199.7 words; SD = 70.9 words) had ensured that the
participants achieved the desired level of engagement with the survey.

Discussion

Summary and interpretation of results

To date, the model for the formation of people’s knowledge about other people’s
knowledge proposed by Nickerson (1999) has not been used in mathematics education.
However, it seems to be applicable as a means through which to interpret processes of
diagnostic judgments. We provided relations between the categories of Nickerson’s model
and diagnostically relevant aspects of mathematical knowledge for teaching (MKT, Ball
et al. 2008). This model also served as a conceptual framework for generating assumptions
used to interpret the results of the present study.

123
A. Ostermann et al.

In the pretest, all groups overestimated the solution rates of the tasks presented, and they
performed rather poorly in the estimation of the rank order. In accordance with Nickerson’s
model, this phenomenon could be interpreted as a symptom of the expert blind spot, which
results from overimputing one’s own knowledge to others.
The main interest of this study was to investigate the impact of substantial PCK on the
accuracy of judgments (PCK group). The intervention contained instructions about stu-
dents’ misconceptions in the domain of functions and graphs, drawing upon previous
research findings (Hadjidemetriou and Williams 2002a; Leinhardt et al. 1990). The
treatment produced significant improvements with high effect sizes in both components of
judgment accuracy, in the estimation of expected solution rates, and in the expected rank
order of task difficulty. This type of treatment refers to knowledge of content and students
(Ball et al. 2008), which can be regarded as a part of knowledge about categories of people
in Nickerson’s model. The newly acquired knowledge appears to have enabled the par-
ticipants to take a wider range of task characteristics into account and consequently to
perform better at ranking tasks according to their difficulty. Furthermore, the participants
might have been sensitized to the exclusivity of their mathematical expert knowledge by
being confronted with task difficulties that they were not previously aware of. This sen-
sitization to the exclusivity of their expert knowledge (as unusual aspects of one’s own
knowledge), together with the knowledge about students’ misconceptions (as a part of
knowledge about categories of people), apparently reduced the dominance of the anchor of
their previous mathematical expert perceptions of task difficulty.
The difficulty of a complex task is not only dependent on a single possible miscon-
ception, nor can the influence of different sources of difficulty be weighed in a simple and
straightforward manner. To perform an appropriate estimation, participants first have to
recognize any potential misconceptions and, second, they need to integrate knowledge
from additional and different sources (e.g., SCK, decompression), which must be weighted
sensibly. For that reason, the effectiveness of such an intervention can by no means be
taken for granted. However, the participants in the PCK group were obviously able to
transfer the newly acquired knowledge to the tasks to be estimated; this allowed them to
integrate this new knowledge into their preexisting knowledge about the mathematical task
complexity. As the control group’s performance remained constant in both criteria, one can
be sure that the improvements really resulted from the treatment and not from repetition
effects.
To ensure that the improvements in the PCK group can be interpreted as a result of
substantial PCK knowledge and not only of a general sensitization by the treatment, a second
group was investigated (sensitizing group). This second condition involved sensitizing
teachers to the expert-blind-spot phenomenon by a short text, which only reduced teachers’
tendency of overestimating task difficulty, but it had no impact on rank-order accuracy. This
information possibly made participants aware of the exclusivity of their mathematical expert
knowledge, which refers to the unusual aspects of own knowledge in Nickerson’s model.
However, sensitization—as conducted here—must be seen as a superficial intervention,
which gives reason to assume that participants in the sensitizing group had only systemati-
cally reduced the value of the expected solution rate in the posttest without taking substantial
qualitative reflections about task characteristics into account.
These findings also contribute to clarifying the interpretation of the two types of
measuring diagnostic accuracy, which have already been identified as empirically distinct
by Spinath (2005). The results suggest that the rank-order component seems to be an
indicator of explicit knowledge about task content. Thus, one could be inclined to assume
that the accuracy of expected solution rates is a rather weak criterion of diagnostic

123
Improving the judgment of task difficulties: prospective…

competence, and which can be easily manipulated. Nonetheless, an improvement in one’s


ability to correctly estimate solution rates is highly desirable when setting an adequate
aspiration levels in lessons or tests, which are convenient for the whole class.
As a conclusion, one may say that knowledge about students’ misconceptions plays a
crucial role for fostering diagnostic competence in teacher education. In particular, it could
be shown that the accuracy of diagnostic judgments can be improved by training in quite a
short period of time. Consequently, diagnostic accuracy is not only acquired by practical
experience, but it can also be obtained by systematic instructions pertaining to empirically
confirmed PCK. These findings suggest that prospective teachers are not necessarily caught
in the expert blind spot, but that they are able to adjust their points of view to students’
perspectives.

Limitations and directions for future research

We would like to draw attention to some important restrictions associated with the
interpretability of the results. Recent research on diagnostic judgments has primarily
focused on teachers’ estimations of specific student groups (Schrader 2009; Spinath 2005;
Südkamp et al. 2012). We operationalized diagnostic competence by the judgment accu-
racy related to non-specific groups of students (see ‘‘Accuracy of teachers’ judgments on
task difficulties’’ section). Apart from the many advantages of this operationalization (such
as placing a greater focus on tasks than on familiarity with special students, or on the
comparability of groups), there is also the limitation that this measure has not yet been
compared to measures that refer to specific groups known to the teacher. Further research
could illuminate the extent to which these measures correlate to each other and can be
influenced by different types of interventions.
Furthermore, the intervention and the test had to be accomplished within 90 min, as the
time was restricted by the duration of the session. Therefore, it was not possible to collect
further data. To gain more insight into the actual application of different types of
knowledge, it would be illuminating to shed light on the various cognitive processes
involved by collecting qualitative data with think-aloud interviews (Ericsson and Simon
1993), which would be conducted during or after the process of estimation. This way it
might be possible to gain knowledge on how the intervention content influences the pro-
cess of judgment.
In this study, we only focused on students’ misconceptions. However, the difficulty of a
given task in the area of functions and graphs results from the interaction of a number of
different aspects. Specifically, these include different concepts of the function terms
(Leinhardt et al. 1990), different representations, as well as the changes between these
representations (Swan 1986; Duval 2002) and certain aspects of the definition and epis-
temology of the term of function (Harel and Dubinsky 1992; Gagatsis and Monoyiou
2011). Instructions about the entire range of task difficulty should include these aspects as
well. Moreover, it would be interesting to obtain confirming information about possible
long-term effects of such debiasing methods.
With regard to Nickerson’s model (1999), the present study only focused on the compo-
nents ‘‘model of unusual aspects of own knowledge’’ and ‘‘knowledge of what categories of
people know.’’ Although these aspects seem to play an important role in the process of
diagnostic judgments, it would be desirable to go further and investigate the role of the
remaining categories (‘‘long-term knowledge of specific others’’ and ‘‘information obtained
on an ongoing basis’’) to cover the model as a whole. Possibly also other parts of Nickerson’s
model may be supported by further empirical investigations; thus, it might be shown by future

123
A. Ostermann et al.

research that this model can serve as a useful means through which to comprehensively
describe the cognitive processes involved in making diagnostic judgments.
It must also be noted here that the two treatments of the PCK group and the sensitizing
group were quite different in their duration and substance, which makes achieving a
balanced comparison of the two difficult. Of course, these two different instructions should
not be regarded as equivalent treatments. As mentioned earlier, the sensitizing group was
included as a means through which to ensure that improvements in the PCK group not only
resulted from pure sensitization, but from substantial knowledge. It cannot be ruled out that
more extensive and more sophisticated types of sensitization will have greater effects on
different measures of judgment accuracy.
In this study, we did not carry out a test on participants’ prior knowledge on functions
and graphs. However, there is research on practicing and prospective teachers’ under-
standing of functions and graphs which shows that their knowledge is misaligned with
expert knowledge (Even 1993; Hadjidemetriou and Williams 2002b). This might be a
problem, since a teacher’s misestimation has a different explanation, when he or she
holds the same misconceptions as attributed to students. However, all participants in our
study had performed a profound mathematical education during their studies including
all previous exams and had particularly attended a university course on functions. Thus,
we assume that the participants had no difficulties with solving the tasks correctly on
their own and therefore attribute the participants’ pretest overestimations to the expert
blind spot.
In summary, the present study must be seen as a first attempt to acquire valid infor-
mation about the possibilities of improving diagnostic judgment accuracy. For further
research, it would be desirable to collect more explanatory variables that contain additional
qualitative aspects, so as to investigate the impact of sophisticated SCK and to conduct
follow-up tests to illuminate the full potential of such debiasing approaches.

Implications for practice

The present study suggests that knowledge about students’ misconceptions has a positive
impact on the diagnostic judgments of prospective teachers. This means that this important
facet of diagnostic competence cannot only be achieved by practical teaching experience,
but also by way of systematic instruction. University lecturers and teacher trainers could
provide systematic knowledge about students’ misconceptions and their barriers to
understanding, which are not only based on teachers’ own teaching experience, but also on
the results of empirical research. Courses that present and discuss such empirical findings
could be included in the curricula of universities of education and in teacher professional
development. Of course, such knowledge should not only be presented theoretically, but
also in the context of diagnostic situations. Since such situations can already be simulated
at university level with less contact to students, fostering diagnostic competence can be
part of the first phase of teacher education. To provide similar learning opportunities for
practicing teachers, one could either build on real students’ work (e.g., Busch et al. 2015)
or deliver knowledge on students’ thinking and errors in connection with test instruments
(e.g., Stacey et al. in press).
Since there is strong evidence that educating teachers in analyzing their students’
thinking when supported by systematic research knowledge can have an impact on the
teachers’ behavior in the classroom and on their students’ achievement (Carpenter et al.

123
Improving the judgment of task difficulties: prospective…

1999). It would be worthwhile not only to further investigate the structure of diagnostic
competence but also to develop and evaluate ways to foster it during teacher education.

Appendix

See Figs. 6, 7, 8, 9, 10, 11, 12, 13 and 14.

Fig. 6 Anchoring example for a The following figure shows the graph of a function.
very easy task presented before
the pretest (empirical solution
rate 89.7%)

Which is the largest visible y-value of the function?

At x = is the largest y-value: y = .

Draw in the following coordinate system the graph of a function that complies with the following
statements:

1) At x = -4 the function value is y = 3.

2) At x = 0 the function value is y = -2.5

3) The graph decreases until x = 0; after that, it increases

Fig. 7 Anchoring example for a very difficult task presented before the pretest (empirical solution rate
8.1%)

123
A. Ostermann et al.

Fig. 8 ‘‘Hiking tour’’ is one of The graph shows the course of a hiking tour.
the tasks to be estimated in the
pre- and posttest (empirical Miles per hour
solution rate 44%)

Hours
How fast were the hikers after their break?

Answer: ________ miles per hour

In a crash test, a car crashes head-on into a wall.

velocity A velocity B

time time

velocity C velocity D

time time

Choose the correct time–velocity chart that describes the situation.

Fig. 9 ‘‘Crash-Test’’ is one of the tasks to be estimated in the pre- and posttest. It is possible that the graph-
as-picture error or the linearly smooth prototype might occur (empirical solution rate: 54.2%)

123
Improving the judgment of task difficulties: prospective…

Each of the two charts shows a section of a linear function:

For which value of x do the two functions have the same y-value?

Please solve the problem with a drawing:

Fig. 10 ‘‘Intersection’’ is one of the tasks to be estimated in the pre- and posttest. Common errors are
misreading the scale and assuming that the functions are restricted to the charts (empirical solution rate
20.9%)

123
A. Ostermann et al.

The following shows the graph of a function:

Please mark on the x-Axis the interval in which the function has its largest slope

Mark this way:

Fig. 11 ‘‘Slope’’ is one of the tasks to be estimated in the pre- and posttest. A common error is the slope-
height confusion (empirical solution rate 79.8%)

123
Improving the judgment of task difficulties: prospective…

How relevant are the following misconceptions in lessons on the topic of functions and graphs in
your opinion?

Please read the description of each misconception and relate your reflections to an average 8th
grade high-school class. Then rate the relevance of each misconception by clicking on the option that
corresponds to your estimation.

not at all extremly


relevant relevant

Linearity Prototype
Some pupils sketch linear graphs in O O O O O O O O O
situations where this is not adequate.

Origin Prototype
Some pupils treat the origin as an
indispensable part of the graph, which O O O O O O O O O
leads them to draw all their graphs
through it.

Smooth Prototype
Some pupils sketch graphs in O O O O O O O O O
situations where this is not adequate.

Continuous Prototype
Some pupils sketch continuous graphs O O O O O O O O O
in situations where this is not adequate.

Graph as Picture Error


Some pupils, unable to treat the graph
as an abstract representation of O O O O O O O O O
relationships, appear to interpret it as a
literal picture of the underlying situation.

Slope Height Confusion


Some pupils cannot distinguish between
the highest value and the greatest slope, O O O O O O O O O
thus the height is serving as a powerful
distractor when interpreting the slope.

Misreading the Scale


Some pupils read a scale by counting O O O O O O O O O
wrong units

Fig. 12 Intervention item: Rating the item relevance of students’ misconceptions in lessons on the topic of
‘‘functions and graphs’’

123
A. Ostermann et al.

Please read the following task

Please draw a sketch for the change of the filling


height while filling the container shown below.

filling height

How relevant are in your opinion the following misconceptions


with respect to the difficulty of the task shown above?

Relate your reflections to an average 8th grade high-school class.


Then rate the relevance of each misconception by clicking on the
option that corresponds to your estimation.

not at all extremly


relevant relevant

Linearity Prototype
Some pupils sketch linear graphs in O O O O O O O O O
situations where this is not adequate.

Origin Prototype
Some pupils treat the origin as an
indispensable part of the graph, which O O O O O O O O O
leads them to draw all their graphs
through it.

Smooth Prototype
Some pupils sketch graphs in O O O O O O O O O
situations where this is not adequate.

Continuous Prototype
Some pupils sketch continuous graphs O O O O O O O O O
in situations where this is not adequate.

Graph as Picture Error


Some pupils, unable to treat the graph
as an abstract representation of O O O O O O O O O
relationships, appear to interpret it as a
literal picture of the underlying situation.

Slope Height Confusion


Some pupils cannot distinguish between
the highest value and the greatest slope, O O O O O O O O O
thus the height is serving as a powerful
distractor when interpreting the slope.

Misreading the Scale


Some pupils read a scale by counting O O O O O O O O O
wrong units

Fig. 13 Intervention item: Rating the item relevance of students’ misconceptions with respect to a given
task

123
Improving the judgment of task difficulties: prospective…

Please read carefully the following task

A craftsman invoices 20€ for travel costs and an amount of 35€ for every started
working hour.

Draw a suitable graph for the total cost of the craftsman’s work up to the working
time of 5 hours.

Which potential difficulties might occur while solving the task?


Relate your reflections to an average 8th grade high-school student:

Fig. 14 Intervention item: Analyzing potential difficulties of a given task

References
Anders, Y., Kunter, M., Brunner, M., Krauss, S., & Baumert, J. (2010). Diagnostische Fähigkeiten von
Mathematiklehrkräften und ihre Auswirkungen auf die Leistungen ihrer Schülerinnen und Schüler.
[Mathematics Teachers‘Diagnostic Skills and Their Impact on Students‘Achievements]. Psychologie
in Erziehung und Unterricht, 3, 175–193.
Ball, D. L., Thames, M. H., & Phelps, G. (2008). Content knowledge for teaching what makes it special?
Journal of Teacher Education, 59(5), 389–407.

123
A. Ostermann et al.

Baumert, J., Kunter, M., Blum, W., Brunner, M., Voss, T., Jordan, A., et al. (2010). Teachers’ mathematical
knowledge, cognitive activation in the classroom, and student progress. American Educational
Research Journal, 47(1), 133–180.
Bell, A., & Janvier, C. (1981). The interpretation of graphs representing situations. For the Learning of
Mathematics, 2(1), 34–42.
Berliner, D. C. (1994). Expertise: The wonders of exemplary performance. In C. C. Block & J. N. Mangieri
(Eds.), Creating powerful thinking in teachers and students (pp. 141–186). New York: Holt, Rinehart
& Winston.
Busch, J., Barzel, B., & Leuders, T. (2015). Promoting secondary teachers’ diagnostic competence with
respect to functions: Development of a scalable unit in Continuous Professional Development. ZDM
Mathematics Education, 47(1), 1–12.
Carpenter, T. P., Fennema, E., Franke, M. L., Levi, L., & Empson, S. B. (1999). Children’s mathematics:
Cognitively guided instruction. Heinemann, 361 Hanover Street, Portsmouth, NH 03801-3912.
Carpenter, T. P., Fennema, E., Peterson, P. L., Chiang, C., & Loef, M. (1989). Using knowledge of
children’s mathematics thinking in classroom teaching: An experimental study. American Educational
Research Journal, 26(4), 499–531.
Cohen, J. (1988). Statistical power analysis for the behavioral sciences (2nd ed.). Hillsdale, NJ: Lawrence
Erlbaum.
Cronbach, L. J. (1955). Processes affecting scores on ‘‘understanding of others’’ and ‘‘assumed similarity’’.
Psychological Bulletin, 52, 177–183.
Demaray, M. K., & Elliot, S. N. (1998). Teachers’ judgments of students’ academic functioning: A com-
parison of actual and predicted performances. School Psychology Quarterly, 13(1), 8–24.
Dünnebier, K., Gräsel, C., & Krolak-Schwerdt, S. (2009). Urteilsverzerrungen in der schulischen Leis-
tungsbeurteilung: Eine experimentelle Studie zu Ankereffekten. [Biases in Teachers’ Assessments of
Student Performance: An Experimental Study of Anchoring Effects.]. Zeitschrift für Pädagogische
Psychologie, 23(34), 187–195.
Duval, R. (2002). Representation, vision and visualization: Cognitive functions in mathematical thinking-
Basics issues for learning. In F. Hitt (Ed.), Representations and mathematics visualization. Mexico-
City: Cinvestav-IPN, Departamento de Matemática Educativa.
Ericsson, K. A., & Simon, H. A. (1993). Protocol analysis: Verbal reports as data (rev ed.). Cambridge,
MA: MIT-press.
Even, R. (1993). Subject-matter knowledge and pedagogical content knowledge: Prospective secondary
teachers and the function concept. Journal for Research in Mathematics Education, 24(2), 94–116.
Fischhoff, B. (1975). Hindsight is not equal to foresight: The effect of outcome knowledge on judgment
under uncertainty. Journal of Experimental Psychology: Human Perception and Performance, 1(3),
288.
Gagatsis, A., & Monoyiou, A. (2011). The structure of primary and secondary school students’ geometrical
figure apprehension. In B. Ubuz (Hrsg.). Proceedings of the 35th conference of the international group
for the psychology of mathematics education (pp 2–369).
Hadjidemetriou, C., & Williams, J. (2002a). Children’s graphical conceptions. Research in Mathematics
Education, 4(1), 69–87.
Hadjidemetriou, C., & Williams, J. (2002b). Teachers’ pedagogical content knowledge: Graphs, from a
cognitivist to a situated perspective. In A. D. Cockburn, & E. Nardi (Eds.), Proceedings of the 26th
conference of the international group for the psychology of mathematics education (Vol. 3, pp. 57–64).
Norwich: PME.
Harel, G., & Dubinsky, E. (1992). The process conception of function. In G. Harel & E. Dubinsky (Eds.),
The concept of function: Aspects of epistemology and pedagogy (pp. 85–106). New York: Mathe-
matical Association of America.
Helmke, A., & Schrader, F.-W. (1987). Interactional effects of instructional quality and teacher judgement
accuracy on achievement. Teaching and Teacher Education, 3(2), 91–98.
Hinds, P. J. (1999). The curse of expertise: The effects of expertise and debiasing methods on prediction of
novice performance. Journal of Experimental Psychology: Applied, 5(2), 205–221.
Hoge, R. D., & Coladarci, T. (1989). Teacher-based judgments of academic achievement: A review of
literature. Review of Educational Research, 59(3), 297–313.
Kaiser, J., Helm, F., Retelsdorf, J., Südkamp, A., & Möller, J. (2012). Zum Zusammenhang von Intelligenz
und Urteilsgenauigkeit bei der Beurteilung von Schülerleistungen im Simulierten Klassenraum. [On
the Relation of Intelligence and Judgment Accuracy in the Process of Assessing Student Achievement
in the Simulated Classroom.] Zeitschrift für Pädagogische Psychologie, 251–261.
Kelley, C. M. (1999). Subjective experience as basis of ‘‘objective’’ judgments: Effects of past experience
on judgments of difficulty. In D. Gopher & A. Koriat (Eds.), Attention and performance XVII:

123
Improving the judgment of task difficulties: prospective…

Cognitive regulation of performance: Interaction of theory and application (pp. 515–536). Cambridge,
MA: MIT Press.
Klug, J., Bruder, S., Kelava, A., Spiel, C., & Schmitz, B. (2013). Diagnostic competence of teachers: A
process model that accounts for diagnosing learning behavior tested by means of a case scenario.
Teaching and Teacher Education, 30, 38–46.
Krauss, R. M., & Fussell, S. R. (1991). Perspective-taking in communication: Representations of others’
knowledge in reference. Social Cognition, 9(1), 2–24.
Krawitz, J., Achmetli, K., Blum, W., Vogel, S. & Besser, M. (2016). Report on the relative strengths and
weaknesses of the United States in PISA 2012 mathematics. OECD Education working papers, 151.
Paris: OECD Publishing.
Leinhardt, G., Zaslavsky, O., & Stein, M. K. (1990). Functions, graphs, and graphing: Tasks, learning, and
teaching. Review of Educational Research, 60(1), 1–64.
Leuders, T., Bruder, R., Kroehne, U., Naccarella, D., Nitsch, R., Henning-Kahmann, J., et al. (2017).
Development, validation, and application of a competence model for mathematical problem solving by
using and translating representations of functions. In D. Leutner, J. Fleischer, J. Grünkorn, & E. Klieme
(Eds.), Competence assessment in education: Research. Models and Instruments: Springer.
Morris, A. K., Hiebert, J., & Spitzer, S. M. (2009). Mathematical knowledge for teaching in planning and
evaluating instruction: What can preservice teachers learn? Journal for Research in Mathematics
Education, 49(5), 491–529.
Nathan, M. J., & Koedinger, K. R. (2000). An investigation of teachers’ beliefs of students’ algebra
development. Cognition and Instruction, 18(2), 209–237.
Nathan, M. J., & Petrosino, A. (2003). Expert blind spot among preservice teachers. American Educational
Research Journal, 40(4), 905–928.
Nickerson, R. S. (1999). How we know—And sometimes misjudge—What others know: Imputing one’s
own knowledge to others. Psychological Bulletin, 125(6), 737–795.
Nickerson, R. S. (2001). The projective way of knowing: A useful heuristic that sometimes misleads.
Current Directions in Psychological Science, 10(5), 168–172.
Ostermann, A., Leuders, T., Nückles, M. (2015). Wissen, was Schülerinnen und Schülern schwer fällt.
Welche Faktoren beeinflussen die Schwierigkeitseinschätzung von Mathematikaufgaben? [Knowing
What Students Know. Which Factors Influence Teachers’ Estimation of Task Difficulty?]. Journal für
Mathematik-Didaktik, 36(1), 45–76.
Philipp, K., & Leuders, T. (2014). Diagnostic competences of mathematics teachers—Processes and
resources. In P. Liljedahl, C. Nicol, S. Oesterle, & D. Allan (Eds.), Proceedings of the joint meeting of
PME 38 and PME-NA 36 (Vol. 1, pp. 425–432). Vancouver: PME.
Schrader, F.-W. (2009). Anmerkungen zum Themenschwerpunkt Diagnostische Kompetenz von Lehrk-
räften. [The Diagnostic Competency of Teachers]. Zeitschrift für Pädagogische Psychologie, 23(34),
237–245.
Spinath, B. (2005). Akkuratheit der Einschätzung von Schülermerkmalen durch Lehrer und das Konstrukt
der diagnostischen Kompetenz. [Accuracy of Teacher Judgments on Student Characteristics and the
Construct of Diagnostic Competence]. Zeitschrift für Pädagogische Psychologie, 19(1), 85–95.
Stacey, K., Steinle, V., Price, B., & Gvozdenko, E. (in press). Specific mathematics assessments that reveal
thinking: An online tool to build teachers’ diagnostic competence and support teaching. In T. Leuders,
J. Leuders, K. Philipp, & T. Dörfler (Eds.), Diagnostic competence of mathematics teachers—Un-
packing a complex construct in teacher education and teacher practice. New York: Springer.
Südkamp, A., Kaiser, J., & Möller, J. (2012). Accuracy of teachers’ judgments of students’ academic
achievement: A meta-analysis. Journal of Educational Psychology, 104(3), 743–762.
Südkamp, A., & Möller, J. (2009). Referenzgruppeneffekte im Simulierten Klassenraum: direkte und
indirekte Einschätzungen von Schülerleistungen. [Reference-Group-Effects in a Simulated Classroom:
Direct and Indirect Judgments.]. Zeitschrift für Pädagogische Psychologie, 23(34), 161–174.
Swan, M. (1986). The language of functions and graphs. Manchester: Joint Matriculation Board, reprinted
2000, Nottingham: Shell Centre Publications.
Tversky, A., & Kahneman, D. (1974). Judgment under uncertainty: Heuristics and biases. Science,
185(4157), 1124–1131.

123

You might also like