Bahan Tugas 3.3.2

Aspects of Mathematical Arguments that Influence Eighth Grade Students’ Judgment of
Their Validity
Dissertation
Presented in Partial Fulfillment of the Requirements for the Degree Doctor of Philosophy
in the Graduate School of The Ohio State University
By
Yating Liu, M.A.
Graduate Program in Education
The Ohio State University
2013
Dissertation Committee:
Azita Manouchehri, Advisor
Patricia Brosnan
Herb Clemens
Copyright by
Yating Liu
2013
Abstract
The study examined how middle school students evaluate arguments in a wide
range of mathematical contexts. The analysis included investigations on the types of
mathematical arguments that students found convincing, exploratory and appealing,
common aspects and features of arguments that impacted students’ evaluation of the
arguments, and problem contexts’ impact on their judgment.
The study involved two phases, a survey and a follow-up interview. Over five
hundred 8th grade students from five Ohio public schools participated in the survey study,
where they were provided a variety of arguments in four different mathematical contexts
and were asked to determine which of these arguments were convincing, explanatory and
appealing to them. Eight subjects, whose survey responses were distinct from each other,
were selected to participate in the follow-up interviews, where they were asked to explain
their rationale for determining their evaluation of an argument.
Both quantitative and qualitative methods were utilized in data analysis.
Statistical data from the survey was used to identify types of mathematical arguments that
students found convincing, exploratory and appealing. Interview data were coded using a
proof classification framework to identify the aspects and features of arguments that
impacted students’ evaluation of the arguments.
ii
Findings from the survey and interview suggested that the participants’ evaluation
of the same argument was highly diverse among individuals. Their judgment of the same
type of arguments also differed across the problem contexts. The subjects’ explanation in
interviews revealed that the source of evidence had the largest impact on their judgment
of an argument, followed by representation. The reasoning mode, i.e. the link between
evidence and conclusion, was the least concerned aspect. Further investigations indicated
that examples, i.e. results from immediate tests, were the most referenced type of
evidence to support a convincing argument. Students’ preferred representation and
reasoning modes varied. Lastly, it was found that the subjects possessed personal
standards to determine if an argument was convincing. Most subjects didn’t consider the
ability to show the general validity of a conjecture as a requirement for convincing
arguments.
iii
Dedication
Dedicated to my family and friends
iv
Acknowledgement
Studying mathematics education in the U.S. has been a transformative experience.
From the first time I was introduced to the theories of teaching and learning, to the
process of designing, producing and refining the dissertation work, none of these could
be achieved without the help of those individuals whom I can only begin to thank with
the following words.
I am incredibly grateful to my advisor, Professor Azita Manouchehri. Dr.
Manouchehri first introduced me to the Young Scholars Program, which sparked my
interest in students’ mathematical reasoning. Since then, she has invested much of her
time into my growth as a mathematics educator, including the development of my
teaching and research skills. Through our many conversations, she has provided
thoughtful feedback, helping me clearly and coherently express ideas and shed light on
patterns in my analysis. Always urging me to take one step further, Dr. Manouchehri has
truly helped me grow as a professional in the field.
I would also like to thank Professor Herb Clemens and Professor Patti Brosnan
for their contribution as members of my committee. I thank Dr. Clemens for his insight
on mathematics education from the perspective of a university mathematics professor and
for guiding me in obtaining my master’s degree in mathematics. I thank Dr. Brosnan,
who, along with Professor Diana Erchick, introduced me to the Mathematics Coaching
v
Program, providing the environment through which I was able to develop my
understanding of the education system and to connect with the schools whose students
form the basis of my dissertation study. I would also like to thank the mathematics
coaches of those schools, who helped facilitate the collection of the data, as well as the
students who participated in the study.
I also very much appreciate my parents and my wife, who with their love, support
and understanding, helped me preserve my physical and psychological well-being
throughout my study. I would finally like to thank my graduate student colleagues, with
whom I could, when needed, commiserate — but more often and better still, collaborate
and celebrate.
vi
Vita
June 2012 M.A., Education, The Ohio State University
March 2011 M.S., Mathematics, The Ohio State University
July 2008 B.S., Mathematics, Peking University, P. R. China
Publications
Liu, Y., Zhang, P., Brosnan, P., & Erchick, D. (2012). Examining the geometry items of
state standardized exams using the van Hiele model: Test content and student
achievement. Research in Education, Assessment, and Learning, 3(1), 22-28.
Liu, Y., & Manouchehri, A. (2012). What kinds of arguments do eighth graders prefer:
Preliminary results from an exploratory study. In Proceedings of the 34th Annual
Conference of the North American Chapter of the International Group for the
Psychology of Mathematics Education. Kalamazoo, MI: Western Michigan
University.
Liu, Y., & Manouchehri, A. (2012). Nurturing high school students’ understanding of
proof as a convincing way of reasoning: Results from an exploratory study. In
Proceedings of the 12th International Congress on Mathematics Education (pp.
2848-2857). Seoul, Korea.
Manouchehri, A., Zhang, P., & Liu, Y. (2012). Forces hindering development of
mathematical problem solving among school children. In Proceedings of the 12th
International Congress on Mathematics Education (pp. 2974-2983). Seoul,
Korea.
vii
Liu, Y., Harrison, R., & Zollinger, S. (2011). Enhancing K-8 mathematics coaches’
knowledge for teaching probability. In T. Lamberg & Weist, L. (Eds.),
Proceedings of the 33rd annual meeting of the North American Chapter of the
International Group for the Psychology of Mathematics Education. Reno, NV:
University of Nevada, Reno.
Liu, Y., Zhang, P., Brosnan, P., & Erchick, D. (2010). Examining the geometry content of
state standardized exams using the van Hiele model. In P. Brosnan, D. Erchick,
and Flevares, L. (Eds.), Proceedings of the 32nd annual meeting of the North
American Chapter of the International Group for the Psychology of Mathematics
Education (Vol. 6, pp. 616-624). Columbus, OH: The Ohio State University.
Zhang, P., Brosnan, P., Erchick, D., & Liu, Y. (2010). Analysis and inference to students’
approaches about development of problem-solving ability. In P. Brosnan, D.
Erchick, and Flevares, L. (Eds.), Proceedings of the 32nd annual meeting of the
North American Chapter of the International Group for the Psychology of
Mathematics Education (Vol. 6, pp. 823). Columbus, OH: The Ohio State
University.
Field of Study
Major Field: Education
(Mathematics Education)
viii
Table of Contents
Abstract .......................................................................................................................... ii
Dedication ......................................................................................................................iv
Acknowledgement ........................................................................................................... v
Vita ...............................................................................................................................vii
List of Tables .................................................................................................................xii
List of Figures ............................................................................................................... xv
Chapter 1. Introduction .................................................................................................... 1
The Role of Proof in Mathematics ........................................................................... 2
The Debates and Status of Proof Learning............................................................... 3
Educational Research about Proof Learning ............................................................ 7
Pilot Study Findings ................................................................................................ 9
Purpose of the Study ............................................................................................. 12
Overview of Research Methodology ..................................................................... 12
Significance of the Study ...................................................................................... 14
Chapter 2. Literature Review ......................................................................................... 15
The Nature of Mathematical Proof: A Philosophical Account ................................ 15
The Functions of Proof in the Study of Mathematics ............................................. 26
Existing Theories of Proof Learning...................................................................... 36
ix
Theoretical Framework ......................................................................................... 47
Chapter 3. Methodology ................................................................................................ 57
Mixed Method Designs ......................................................................................... 57
Procedure of the Study .......................................................................................... 59
Sample .................................................................................................................. 60
Survey Participants ............................................................................................... 61
Survey Instrument: Survey of Mathematical Reasoning ........................................ 62
Interview Participants ........................................................................................... 84
Interview Procedure .............................................................................................. 85
Data Analysis ........................................................................................................ 91
Chapter 4. Results........................................................................................................ 115
Findings from SMR ............................................................................................ 115
Findings from the Interviews .............................................................................. 150
Chapter 5. Conclusion ................................................................................................. 238
Overview of the Study ........................................................................................ 238
Summary of the Findings .................................................................................... 239
Contribution to the Literature .............................................................................. 246
Limitation of the Study ....................................................................................... 252
Reflection on Existing Theories .......................................................................... 253
Implication for Proof Teaching ............................................................................ 258
References ................................................................................................................... 261
Appendix A. Survey results: Pairwise comparisons of arguments in each problem ....... 271
Appendix B. Survey results: Comparison between subgroups of students .................... 280

x
Appendix C. Interview results: Pairwise comparison of the rankings of arguments in each
problem ................................................................................................................ 287
xi
List of Tables
Table 1. The alignment between functions of proof and learners’ purpose in conducting
proof.................................................................................................................34
Table 2. Outline of the procedure of the study ................................................................60
Table 3. Type of the arguments used in SMR .................................................................83
Table 4. Background information of the subjects............................................................85
Table 5. Overview of the interview process ....................................................................87
Table 6. Outline of data analysis process ........................................................................91
Table 7. Rankings of arguments provided by Allen ........................................................96
Table 8. Summary of comments made by Allen ........................................................... 103
Table 9. Table of codes ................................................................................................ 107
Table 10. Categories of comments made by Allen ........................................................ 109
Table 11. Summary of the most understandable, convincing, explanatory and appealing
arguments as evaluated by the participants in each problem ............................ 134
Table 12. Summary of the least understandable, convincing, explanatory and appealing
arguments as evaluated by the participants in each problem ............................ 135
Table 13. Summary of high and low rated arguments by type ....................................... 140
Table 14. Rankings of arguments provided by Abby .................................................... 152
Table 15. Summary of comments made by Abby ......................................................... 153
xii
Table 16. Categories of comments made by Abby ........................................................ 157
Table 17. Rankings of arguments provided by Alice..................................................... 160
Table 18. Summary of comments made by Alice .......................................................... 161
Table 19. Categories of comments made by Alice ........................................................ 164
Table 20. Rankings of arguments provided by Amy ..................................................... 168
Table 21. Summary of comments made by Amy .......................................................... 169
Table 22. Categories of comments made by Amy ......................................................... 173
Table 23. Rankings of arguments provided by Beth ..................................................... 177
Table 24. Summary of comments made by Beth........................................................... 178
Table 25. Categories of comments made by Beth ......................................................... 182
Table 26. Rankings of arguments provided by Betty .................................................... 186
Table 27. Summary of comments made by Betty ......................................................... 187
Table 28. Categories of comments made by Betty ........................................................ 189
Table 29. Rankings of arguments provided by Blake .................................................... 194
Table 30. Summary of comments made by Blake ......................................................... 195
Table 31. Categories of comments made by Blake ....................................................... 199
Table 32. Rankings of arguments provided by Brenda.................................................. 204
Table 33. Summary of comments made by Brenda ....................................................... 205
Table 34. Categories of comments made by Brenda ..................................................... 208
Table 35. Summary of the subjects’ argument rankings ................................................ 212
Table 36. Categories of comments made by all subjects ............................................... 215
Table 37. Summary of the subjects’ rationale in argument evaluation ........................... 220
Table 38. Similarities and differences in the subjects’ rationale of argument evaluation 228
xiii
Table 39. Pairwise comparisons: Participants’ ratings on whether the arguments in each
problem were understandable ......................................................................... 272
Table 40. Pairwise comparisons of survey results: Participants’ ratings on whether the
arguments in each problem were convincing ................................................... 274
arguments in each problem were explanatory.................................................. 276
arguments in each problem were appealing ..................................................... 278
Table 43. Survey results: Between school comparison ................................................. 281
Table 44. Survey results: Between gender comparison ................................................. 283
Table 45. The gender * school effect ............................................................................ 285
Table 46. Pairwise comparison of the rankings of arguments in each problem .............. 288
xiv
List of Figures
Figure 1. Balacheff’s (1988) classification of students’ proving schemes .......................38
Figure 2. Proof schemes and sub schemes (Sowder & Harel, 1998) ...............................39
Figure 3. The van Hiele Model (van Hiele, 1986) ..........................................................41
Figure 4. The broad maturation of proof structure (Tall et al, 2012) ...............................46
Figure 5. Evaluation of argument is based on understanding ..........................................48
Figure 6. Reading Comprehension of Geometry Proof (RCGP) Model (Yang & Lin,
2008) ................................................................................................................50
Figure 7. Framework to classify students’ comprehension of a mathematical argument .. 55
Figure 8. Survey of Mathematical Reasoning .................................................................64
Figure 9. The structure of SMR......................................................................................81
Figure 10. The additional problem used in interview ......................................................90
Figure 11. Illustration of Allen’s rationale for evaluating mathematical arguments ....... 112
Figure 12. The percentage of participants who considered each argument understandable
....................................................................................................................... 116
Figure 13. Distribution of the number of arguments indicated understandable by each
participant ...................................................................................................... 117
Figure 14. Distribution of the number of arguments indicated not understandable by each
participant ...................................................................................................... 118
xv
Figure 15. Illustration of how understandable the arguments were to the participants ... 120
Figure 16. Illustration of how convincing the arguments were to the participants ......... 122
Figure 17. Illustration of how explanatory the arguments were to the participants ........ 126
Figure 18. The percentage of participants who considered each argument the appealing
....................................................................................................................... 129
Figure 19. An example of the data transformation for within group ANOVA test ......... 130
Figure 20. Illustration of how appealing the arguments were to the participants ........... 131
Figure 21. Plots for variables on which the gender * school effect was significant ....... 147
Figure 22. Illustration of Abby’s rationale for evaluating mathematical arguments ....... 159
Figure 23. Illustration of Alice’s rationale for evaluating mathematical arguments ....... 167
Figure 24. Illustration of Amy’s rationale for evaluating mathematical arguments ........ 176
Figure 25. Illustration of Beth’s rationale for evaluating mathematical arguments ........ 185
Figure 26. Illustration of Betty’s rationale for evaluating mathematical arguments ....... 193
Figure 27. Illustration of Blake’s rationale for evaluating mathematical arguments ...... 202
Figure 28. Illustration of Brenda’s rationale for evaluating mathematical arguments .... 211
Figure 29. Factors that impacted the subjects’ conviction ............................................. 219
Figure 30. Factors that caused inconsistent evaluation of the same type of arguments .. 234
xvi
CHAPTER 1. INTRODUCTION
Proof, in everyday language, usually refers to evidence, explanations and
arguments that are used to verify the truth of a statement. For example, in judicial process,
testimonies from witness are usually adopted as proofs. In an election, one’s past career
achievement is usually used as a proof of his/her leadership capability. In sports, winning
games is often considered as proof of competence. In natural science, proofs come from
empirical evidence observed in nature or experiments. There is no absolute standard of
sufficiency at which evidence and arguments become proof, as a common foundation for
all sorts of discussion (Pruss, 2006). The conventions and regulations about what can be
used as a reliable source and what can be accepted as a valid argument are highly area-
dependent, even when the discussion is restricted within the study of mathematics (Baker,
2009; Tall, 1991; Thurston, 1995; Usiskin, 1980). Despite the absence of a fixed and
precise standard, mathematical proof, not an exception, also certifies the truth of a claim
in concrete mathematical context. However proof plays a more significant role in
mathematics than in other disciplines.
There is no other scientific or analytical discipline that uses proof as readily and
routinely as does mathematics. This is the device that makes theoretical
mathematics special: the tightly knit chain of reasoning, following strict logical
rules, that leads inexorably to a particular conclusion. It is proof that is our
1
device for establishing the absolute and irrevocable truth of statements in our
subject. This is the reason that we can depend on mathematics that was done by
Euclid 2300 years ago as readily as we believe in the mathematics that is done
today. No other discipline can make such an assertion (p. 1, Krantz, 2007).
The Role of Proof in Mathematics
Although the general idea of mathematical proof, i.e. deriving a new result from a
known result, remains unchanged for more than 2000 years, details about how such a
process can be formulated have been debated and modified by mathematicians
throughout the development of mathematics (Jaffe & Quinn, 1993). Primitive forms of
mathematics (before Euclid’s Elements) didn’t imply an awareness of the need for proofs
when verifying statements. Conclusions were drawn from empirical examinations of
shapes and numerical relationships. Mathematical proof, in a deductive sense, started to
be attributed to an explicit meaning in Euclid’s geometry, which has been widely
regarded as the prototype of how a mathematical system should look (Krantz, 2007).
Ever since the Elements, rules have been set to demand that a mathematical proof must
root in definitions and axioms and evolve following acceptable forms of deductions.
Despite the historical and ongoing debates about what can be used as definitions, axioms,
and deductions within the community, consensus exists among mathematicians that a
mathematical proof must be timeless, impersonal, rigid and dependable (Brabiner, 2009;
Davis, 1976; Krantz, 2007; Tall et al., 2012). It is such a pursuit that makes mathematics
a reliable tool that is widely applied in physics, engineering, economy, and many other
disciplines.
2
Traditional discussion of mathematical proof focused on its reliability in
determining the truth of a statement (Brown, 2008). Such a perspective places emphasis
on precise descriptions of the definitions and premises (axioms) and a rigorous layout of
steps of deductions to make sure proofs were presented as a delicate and complete
product. Carl F. Gauss supported this idea by comparing a mathematician to an architect,
who “didn’t leave up the scaffolding so that people could see how he constructed a
building” (cited in Krantz, 2007). David Hilbert had hoped for a rigorization of
mathematics into a comprehensive and self-contained axiomatic system before this
conjecture was proved unachievable (Gödel, 1931). The influential modern mathematics
book series published in the middle of last century, written by the Nicolas Bourbaki
group, strictly adheres to the doctrines of formal mathematics by offering stern axiomatic
structures and not including any pictorial or other forms of intuitional assistance in proofs.
The New Math curriculum extended such a style into the education of growing
individuals, expecting early exposure to a rigorous format could help integrate such
practices into students’ mathematical thinking (Hanna, 1983). The pursuit of formal proof
has influenced generations of mathematicians and has greatly advanced the community’s
understanding of mathematics. However, its limitations that relate to its educational
impact were also criticized by scholars (Freudenthal, 1973; Lakatos, 1976; Schoenfeld ,
1991; Tall, 1999).
The Debates and Status of Proof Learning
Lakatos (1976) wrote “… (In formal proof) all propositions are true and all
inferences valid. Mathematics is presented as an ever-increasing set of eternal,

3
immutable truth. Counterexamples, refutations, criticism cannot possibly enter. An
authoritarian air is secured for the subject … Deductivist style hides the struggle, hides
the adventure …” (p. 142). Hanna (2000b) also claimed that “a proof, valid as it might be
in terms of formal derivation, actually becomes both convincing and legitimate to a
mathematician only when it leads to real mathematical understanding” (p. 7). Krantz
(2007) expressed the same opinion, advocating “In mathematics, we are not simply after
the result. Our ultimate goal is understanding” (p. 32). Tall (1999) added to the
discussion by suggesting “formal proof is appropriate only for some, that some forms of
proof may be appropriate for more” (p. 1). De Villiers (1990, 2003) offered a framework
to describe the six functions of proof in mathematics, including verification, explanation,
systemization, discovery, communication and intellectual challenge. All these efforts tend
to reconceptualize proof as a human activity rather than a passive mechanical procedure.
Aside from theoretical challenges posed by researchers, the instruction of formal
proof has also faced difficulties in school practice, especially at the introductory levels.
Historically (and currently), in the US, a course on Euclidean geometry has served as the
main venue for the development of students’ skills in deductive reasoning with the
expectation that such skills would automatically transfer to other mathematical and non-
mathematical areas (González & Herbst, 2006; Herbst & Brach, 2006). This goal,
however, remains unfulfilled. “Research results on students’ conception of proof are
amazingly uniform; they show that most high school and college students don’t know
what a proof is nor what it is supposed to achieve” (Dreyfus, 1999, p. 94). It is
recognized that this failure might be due to the school treatment of topics in curriculum
and instruction. There is evidence that in many mathematics classrooms proofs and
4
proving process is taught as a procedural topic instead of a conceptual tool for reasoning
(Herbst & Brach, 2006; Reid, 2011). As a consequence, students tend to view proof as a
special “form” of producing written work (e.g. two-column proof) instead of a viable
vehicle for production of reliable explanations, or even means for understanding (Chazan,
1993; González & Herbst, 2006; Healy & Hoyles, 2000; Schoenfeld, 1988). Additionally,
there is evidence that an understanding of the role of mathematical proofs in establishing
validity of arguments remains underdeveloped at all grade levels (Chazan, 1993; Chazan
& Lueke, 2009; Harel & Sowder, 1998; Heinze & Reiss, 2009; Kuchemann & Hoyles,
2009; Mason, 2009; Waring, 2000; Weber, 2001; Schoenfeld, 1988). Furthermore, even if
a learner showed an awareness of and the ability to produce complete proofs in a certain
mathematical domain, such knowledge might not transfer to other topic areas, nor would
it automatically grow into an overarching understanding of the deductive system (Fawcett,
1938/1995; Freudenthal, 1971; Liu & Manouchehri, 2012; Reid, 2011). Therefore, calls
for shifting the focus of instruction from assimilating students into the tradition of
producing a rigorous mathematical format to helping them reason logically and
coherently about concrete contexts have been made. Following such a trend, recent
reform efforts on mathematics curriculum place less emphasis on the layout of proof
while paying more attention on nurturing students’ proof skills upon understanding of
specific topics throughout the grades (de Villiers, 1990, 2003; Hanna, 2000a, 2000b; Reid,
2011).
The Principles and Standards for School Mathematics (NCTM, 2000) published
by the National Council of Teachers of Mathematics recommended that the students’
ability to reason and produce proofs must be fostered at all levels of the mathematics
5
curriculum (Hanna, 2000a). According to the standards, K-12 mathematics education
should enable high school graduates to “recognize reasoning and proof as fundamental
aspects of mathematics, make and investigate mathematical conjectures, develop and
evaluate mathematical arguments and proofs, and select and use various types of
reasoning and methods of proof” (p. 56). Furthermore, there is an explicit statement that
suggests nurturing the proof capacity in a broader content area, addressing “reasoning
and proof cannot simply be taught in a single unit on logic, for example, or by “doing
proofs” in geometry” (p. 56).
The Common Core State Standards (CCSSO, 2010) also place tremendous
emphasis on the need to assist students in developing their proving skills. Among the 8
Standards for Mathematical Practice in CCSS, 4 (i.e. reasoning abstractly and
quantitatively, construct viable arguments and critique the reasoning of others, look for
and make use of structure, look for and express regularity in repeated reasoning) are
directly related to exploring, perceiving and systemizing logical relationship.
However, realizing goals proposed by various standard documents requires
significant modification to and enrichment of teaching materials as well as a shift in
traditional classroom culture. The key idea of the transformation is that elements and
properties of mathematics, as developed by mathematicians, should not be the sole
determinant of the curriculum and instruction. The nature of students’ thinking and
behavior when engaged in mathematical activities (including proof) must be fully
respected in the design and practice of teaching (Ball & Bass, 2000, 2003; Boero, 2007;
Dreyfus, 2006; Schoenfeld, 1988; Shulman, 1986). Therefore, in order to create
instructional and curricular models that nurture and promote students’ comprehension of
6
proof and their ability to produce mathematically complete arguments, an understanding
of the nature of students’ thinking in proof related activities must first be developed.
Educational Research about Proof Learning
Stylianides and Stylianides (2008a) identified three cohorts of scholarly
investigations focused on studying proofs in mathematics education research. The first
cohort seeks evidence that students possess the ability to use deductive reasoning in
constructing arguments and proofs, even at the early elementary grades. The second
cohort describes students’ common difficulties and mistakes in producing proofs across
the grade levels and content areas. The third cohort offers an account of pedagogical
factors that could facilitate students’ learning about proofs. Although these three cohorts
of studies, including both empirical reports and theoretical investigations, provide
insights into students’ analytics as well as challenges in learning proofs, suggesting
implications for practice, they do not posit a framework to capture the features of
students’ thinking when performing proof related tasks. Studies of students’ proof
schemes tend to close this gap by creating a framework that classifies different types of
proofs that students offer. Following previous scholars’ work, such as Bell (1976) and
Balacheff (1988, 1991), Harel & Sowder (1998) organized the types of proof students
may use in various content areas of mathematics and proposed a taxonomy of proof
schemes consisting of three main categories, i.e. “external,” “empirical,” & “analytical,”
each of which encompasses several subcategories.
Another body of work concerns the cognitive development of learners as they
achieve a more mature comprehension of mathematical proof. The van Hiele levels (van
7
Hiele, 1986) is one of the most well-known frameworks to outline the stages in the
development of geometric thinking. Shaughnessy’s (1992) four-stage micro model of the
development of stochastic reasoning was constructed in a similar manner but within a
different content area. Frameworks that address explicitly proof learning include the
proof levels (Waring, 2000), reading comprehension of geometry proof (Yang & Lin,
2008), and the broad maturation of proof structure (Tall et. al, 2012). Detailed account of
these frameworks will be offered in the next chapter.
Harel & Sowder (1998, 2007) observed that students could simultaneously hold
different proof schemes when working on different problems. Their model detects such a
difference but does not explain why such inconsistency might exist. The cognitive
development models can capture students’ progress in producing logical reasoning in a
certain mathematical field, but fail to describe why and how such a development may
emerge across content area differences. The categories, levels, and stages offered by
existing models are not precise enough to draw connection to students’ evaluation of the
arguments. Hence, little can be said about what kind of mathematical arguments students
find appealing, convincing, or explanatory since even arguments that are classified as the
same type can be judged quite differently among people and across the content areas.
Therefore, a more precise proof classification framework needs to be conceptualized so
to allow an inquiry on learners’ understanding of and preference for different arguments.
In order for the instruction to enable students to understand and appreciate proof as a
reliable way of reasoning (de Villiers, 2003; Fawcett, 1938/1995; Reid, 2011), learning
about ways to help students realize proof as a reasoning methodology is equally
important as teaching the skills of producing specific proofs. As Usiskin (1980) pointed
8
out, there are various ideas, methods and layouts of proofs in different branches of
mathematics. Therefore, investigations into the impact of content on students’ use and
judgment of different mathematical arguments deserve a critical position in the study of
student learning of proofs.
Pilot Study Findings
In order to investigate students’ production and evaluation of mathematical
arguments, a pilot study was conducted involving 41 secondary school students. The
participants were drawn from 19 different middle schools across the state of Ohio,
suggesting variety in both content and heuristics they may have had experienced at the
time of data collection. A Survey of Reasoning (SR) was designed and used to examine
the participants’ proving processes, simultaneously, in four different content areas as a
means to closely inspect the potential relationship between a problem’s content and proof
scheme that may have been elicited by it.
The SR consisted of four mathematics problems from four different branches of
mathematics (i.e. number theory, geometry, probability, and algebra). Each problem
consisted of several parts. First, the participants were presented with a conjecture and
were asked to determine whether they agreed with and were certain of the accuracy and
completeness of the statement. They were also asked to offer an explanation for their
choice and factors they considered when evaluating the statement. In the second part,
four arguments, each embodying a different proof scheme supporting or refuting the same
statement, were offered. The participants were asked to compare their own argument to
those given, and to decide whether they preferred any of the optional statements over
9
their own method. Lastly, they reported whether or not they considered each of the
optional arguments convincing as well as mathematically complete. We deliberately
chose the terms convincing and mathematically complete to evaluate students’ “two
conceptions of proof” (Healy & Hoyles, 2000), assuming that when judging the
convincingness of an argument the students might tend to rely on subjective perceptions
whereas when judging the mathematical completeness they might refer to an
understanding of existing mathematical conventions. All participants took the survey at
the same time and completed it within two hours.
Quantitative analysis of participants’ responses led to several findings, which are
included (for more details, refer to Liu & Manouchehri, 2012):
 The majority of students relied heavily on empirical proof schemes when
producing arguments to support validity of propositions.
 The proof schemes of each individual’s favorite arguments varied across the
four problems.
 The proof schemes of the most convincing argument indicated by each
individual also varied across the four problems.
 Considerable number of cases were observed where a student found an
argument convincing in one problem but labeled the argument with the same
proof scheme in a different context as not convincing.
 Neither how convincing an argument appeared nor whether it looked
mathematically complete solely determined students’ preference.
 The students didn’t necessarily persist on their own proof scheme when they
were asked to identify an argument as their favorite. If a student found a

10
given argument understandable and more persuasive, s/he noted preference
for it over his/her own argument, even when the two arguments represented
different proof schemes.
Results from the pilot study suggested that the students adopted and determined
their preferred reasoning schemes based on the concrete context of the problem instead of
following a broader uniform scheme. This implied that the transfer of proving skills from
one area (typically geometry) to other mathematical fields, as expected by current
curriculum design, didn’t automatically occur. When confronted with alternative
argument types students exhibited a tendency to favor those arguments involving
analytical reasoning, indicating a potentially productive pathway towards building
children’s proving capacity upon a wide range of mathematical contexts (Stylianides,
2007; Tall et al., 2012).
Due to the absence of qualitative data on individuals’ understanding of the
arguments, the pilot study could not explain why the participants had made certain
decisions or managed to maintain, simultaneously, preference for different proof schemes.
For instance, the pilot study categorized arguments based upon researchers’ interpretation
of the brief written explanations that the students had produced. This information was
insufficient to capture accurately what the students’ comprehension of the arguments
might have been. Without a clear understanding of students’ comprehension of the
arguments, it is impossible to identify the factors that shape students’ views of those
arguments. Therefore, the current research was conceptualized to extend the previous
work and to shed light on the processes and resources students draw from when judging
mathematical proofs.
11
Purpose of the Study
The purpose of the study is to investigate how students evaluate arguments in a
wide range of mathematical contexts. Data collection and analysis was guided by the
three research questions:
 Are there certain types of mathematical arguments that students found
convincing, exploratory and appealing?
 Are there common aspects and features of arguments that significantly impact
students’ judgment of the arguments? If yes, what are they?
 How does problem context impact students’ judgment of arguments?
It is believed that such an investigation can contribute to the literature on
individual decision making when evaluating mathematical arguments that can inform
curriculum and instruction. Drawing from Harel & Sowder’s (1998) proof scheme
taxonomy, Yang & Lin’s (2008) Reading Comprehension of Geometry Proof (RCGP)
model, Bruner’s (1966) synthesis on representation types, and Stylianides and
Stylianides’s (2008a) identification of three aspects of proof, the Classification Cube of
Internalized Arguments (CCIA), a theoretical framework was built to describe different
types of mathematical arguments (see Figure 7 in Chapter II for more details).
Overview of Research Methodology
Adopting a mixed methods design (Greene, Caracelli, and Graham, 1989), the
study consisted of the development, administration and analysis of a survey and follow
up interviews. The survey and interview protocol were designed and refined in 2012. The
12
revised survey (Survey of Mathematical Reasoning, SMR, see Figure 8) was administered
in January - February 2013, and the follow up interviews were conducted in April 2013.
The interested population of this study was 8th grade students. Two reasons
contributed to this choice. First, according to Piaget’s (1985) Intellectual Development
Stages, middle school students are at a critical cognitive phase where they can engage in
abstract and logical thinking. Therefore, how they learn to value different arguments at
this stage could potentially impact their reasoning skills and thinking habit in the later
years. Second, the grade band serves as a bridge between middle and high school
mathematics and the link between informal and more formal and abstract mathematical
reasoning. According to the curriculum standards (CCSSO, 2010), most 8th grade
students should have obtained basic understanding of numbers, shapes, chance, and
algebraic expressions, know some simple propositions and properties, and be able to see
the connection between concepts and ideas. However, they may not have yet adopted
abstract thinking or deductive ways of mathematical reasoning using conventional
proving techniques and forms. Therefore, the features of arguments they consider as
convincing, explanatory, and appealing can offer valuable references for the development
of resources and instructional explanations that can facilitate students’ internalization and
adoption of more mathematically rigid argumentations.
Data collection followed two phases. During the first phase, over 500 8th grade
students from 5 different public schools in Ohio took the SMR. The students’ responses
were then analyzed quantitatively to investigate their evaluation of the arguments used in
the SMR. In particular, the goal was to identify the type of arguments that they found
understandable, convincing, explanatory or appealing. During the second phase, eight

13
subjects, who had exhibited different patterns in their survey choices, were selected and
interviewed. Common factors that impacted each subject’s evaluation were summarized
and the individual differences were investigated through between subject contrasts.
Details about the participants of the study, development of survey instrument, procedures
of the interview, as well as the data analysis process are described in Chapter III.
Significance of the Study
This study had the potential to advance the understanding about proof learning on
three levels. First, empirical studies about the middle school students’ evaluation of
different proof types have been rare. Second, investigations that seek to identify
consistent features across content areas that individuals might consider when evaluating
mathematical arguments have been prominently absent from the literature. Lastly, the
type of studies that have worked towards developing a framework useful for identifying
the type of argumentations most likely to be assimilated by students, as their own
reasoning methodology was underdeveloped. The current study aims to make novel
contributions to each of the four needed areas.
14
CHAPTER 2. LITERATURE REVIEW
This review offers a summary of literature on the nature and functions of proof in
mathematics. Findings of published studies on students’ proving and reasoning, and
existing theoretical frameworks concerning proof learning are discussed. An overview of
theoretical framework guiding the current study is offered.
The Nature of Mathematical Proof: A Philosophical Account
Descartes claimed that a mathematical proposition must be “deduced from true
and known principles by the continuous and uninterrupted action of a mind that has a
clear vision of each step in the process” (cited in Baker, 2009, p. 1). Krantz (2007)
described mathematics as “(i) coming up with new ideas and (ii) validating those ideas by
way of proof” (p. 33). Despite such emphasis, the precise nature and role of mathematical
proof has long been debated by mathematicians and mathematical educators (de Villiers,
1990, 1998; Hanna, 2000b; Lakatos, 1976; Krantz, 2007; Tall, 2002). Because of the
centrality of proofs and proving in mathematics, discussions surrounding its nature has
resided at the heart of the philosophy of mathematics. In this section I will offer a review
of three prominent perspectives (i.e. Platonism, Formalism and Constructivism), whose
philosophical standpoints offer different, if not contradictory, accounts of the role and
function of proof in mathematics and hence provide distinct educational implications for
proof instruction.
15
Platonism
Platonism is a school of philosophy whose principles and perspective have not
been limited to mathematics alone. Philosophers and mathematicians maintain different
interpretations when applying Platonism in the domain of mathematics, nevertheless they
generally agree that Platonism in mathematics (in its pure sense) views mathematical
objects as abstract yet eternal and unchanging (Armstrong, 1970; Balaguer, 2008). For
example, when considering the “fact” that 1 + 1 = 2, those espousing Platonism believe
in the existence of exact concepts of 1, 2, + and = in an abstract world called mathematics,
and consider their relationship demonstrated by the equation an eternal truth without the
involvement of human beings. Therefore, the mission of mathematicians from this
paradigm is to explore and discover the unknown underlying truth (Weir, 2011).
According to Platonism, axioms must describe absolute and eternal truths of the world,
and proof is a method to find other absolute and eternal truths determined by the axioms.
However, Platonists were “rapidly losing support” (Weir, 2011) with the
development of more recent perspectives in mathematical philosophy. Among them,
formalism and constructivism have gained considerable attention and raise extensive
debates among mathematicians.
Formalism
Formalism was developed upon logicism, which views mathematics as a
systematic structure built upon axioms following certain rules. Carnap (1937) suggested
two logistic principles of the mathematical system:
16
 The concepts of mathematics can be derived from logical concepts through
explicit definitions.
 The theorems of mathematics can be derived from logical axioms through
purely logical deduction.
These suggest that a mathematical axiomatic system must start with finitely many
statements (namely axioms/postulates) that are assumed to be true, and that the judgment
of validity of other statements in that axiomatic system must be based on deduction from
them. With a deeper and abstract conception of the deductive procedure, mathematicians
attempt to convert mathematics into a symbolic system using Set Theory (Johnson, 1972)
by restricting what procedure could be considered as logically valid (i.e. what deduction
is) and what concept is usable in deduction (i.e. what a set is). David Hilbert, arguably the
most prominent mathematician of the formalist genre, set the groundwork for
operationalizing this view.
… he (Hilbert) adopted an instrumentalist stance with respect to higher
mathematics. He though that higher mathematics is no more than a formal game.
The statements of higher-order mathematics are uninterpreted strings of symbols.
Proving such statements is no more than a game in which symbols are
manipulated according to fixed rules (Weir, 2011).
Formalism agrees with Platonism by envisioning an ideal and static system within
which truth and falseness are indisputable. As such, formalism has a strong tendency to
eliminate the impact of human perception since the criteria allowed in mathematical
deduction are impersonal. However, formalism only concerns the validity of statement
within the established system without articulating how truth within the system relates to
17
truth enclosed in nature (Weir, 2011). For instance, a + b = b + a holds in a commutative
group, but a formalist is not at all interested in how “a”, “b” and “+” in the equation relate
to quantities and operations in real life. In other words, “truth” as viewed by a formalist is
a relative “truth,” which is “attached” to the validity of the axioms (assumptions). Hence
it is different from the absolute “truth” pursued by Platonists. Besides, “truth” in
formalism lies in a very restrictive and artificial setup of mathematical system that is
designed by mathematicians, which also seems to be opposite to Platonists’ belief that
mathematics should be discovered (Ernest, 1996).
Gödel’s incompleteness theorems (Gödel, 1931) directly rejected the possibility of
establishing a “perfect” system, i.e. a single, recursively axiomatizable formal system
which is consistent, purely deductive and complete (Baker, 2009), as envisioned by
formalism. In particular, the first incompleteness theorem claimed that if the finitely many
axioms in the system are not contradictory to each other, then there must be a statement
within the system whose validity cannot be determined by the axioms and deductive
results upon them. The second incompleteness theorem enhanced the first theorem by
claiming that even if an axiomatic system includes an axiom to confirm the system’s own
consistency, then the existence of contradiction (inconsistency among the axioms) will
become inevitable.
The impact of Gödel’s incompleteness theorems on formalism can be viewed in
two manners. First, the perfection of the axiomatic system envisioned by formalists is
totally denied. No matter how deliberate the axioms may be set up, the deductive system
won’t be able to solve all the problems within the system. In this sense, the methodology
designed by formalists by itself is “incomplete.” Second, the incompleteness theorems

18
don’t deny the value of proof where proof is possible. In other words, the incompleteness
theorems don’t demolish the value of the deductive system, in the sense that the system is
never perfect but functional and powerful in a very broad context. In fact, formalism
largely advanced the comprehension of many mathematical subjects. For instance,
because of the establishment of measure theory, mathematicians gained insights on
distance, area and volume. To date, the discovery and proof of new theorems is widely
regarded as the highest level of mathematical research (Tall et al., 2012).
Constructivism
Constructivism, radical or social, has largely been adopted in the social sciences,
and the study of mathematics education is not an exception (von Glasersfeld, 1994;
Vygotsky, 1978). However, the term mathematical constructivism usually refers to a
perspective that may not be familiar to educational researchers.
Mathematical constructivism refers to a group of studies about the structure of
mathematical systems, such as intuitionism introduced by Brouwer (1905) and finitism
developed by Hilbert and Bernays (1934/1939). Generally speaking, a starting point in
the mathematical constructivism process is that a mathematical object must be found
(constructed) to prove its existence. Despite recognizing that the foundations of
mathematics lie in human intuition (Troelstra, 1977), the theory of mathematical
constructivism does not take any particular individual’s subjective opinion into
consideration. Instead, the theory makes assumptions about what types of deductions
people intuitionally and commonly accept and redefines a new type of logic such that it is
no longer based on the classical Set Theory. For example, the intuitionistic logic is
19
actually a revision of the classic Set Theory logic by removing the law of the excluded
middle (i.e. either a statement is true or its negation is).
Despite the importance of mathematical constructivism in the development of
mathematical theories, another interpretation of constructivism in mathematics (i.e.
mathematical concepts and axiomatic systems) built upon human intuition, representation
and communication, is of greater concern in education studies 1.
In such a perspective, mathematics is a product created by human intelligence
instead of a pre-existing, ideal and perfect form that people attempt to discover.
Constructivism suggests that mathematical concepts do not exist beyond human
understanding (opposing Platonists’ view). Rather “concepts and structures are the result
of a cognitive/historical knowledge process. These originate from our action in space
(and time) and are further extended, by language and logic” (p. 22, Longo, 2009).
Therefore, mathematical concepts are not static. They may change as the result of the
discovery of new cases or when new needs emerge in the community. Different from
formalism, constructivism doesn’t admit or pursue the “best” form of mathematics. It is
believed that mathematical theories and methodologies can always be improved to
function in a broader context or in an innovative way (Ernest, 1996).
1
This interpretation is consistent with the concept of constructivism that is widely used in social science research. Since
ultimately this study concerns how an individual can develop deductive reasoning skills, human experience and
activities in conceptualizing and establishing mathematical ideas and structures offer valuable references. Hence for
convenience, constructivism in later text of this dissertation will all refer to the second interpretation unless specially
explained.
20
Lakatos (1976) offered remarkable examples to illustrate such a perspective. In
Proofs and Refutations, Lakatos recreated an imaginary classroom scenario where the
students were exploring the validity of Euler’s polyhedron formula (F + V – E = 2).
During the process, students found that their conception about what a face, an edge and a
vertex might be was substantially less rigorous than they had realized. Consequently the
students engaged in a discussion about the precise definition of those concepts (e.g.
vertex, edge, face, simple polyhedron, etc.). Students challenged each other’s definitions
using specific counter examples to demonstrate the incompleteness of the defined terms
and propositions attached to them. As a consequence, their understanding of the object
(polyhedron) was refined and deepened gradually in such a “proof and refutation”
process. Lakatos suggested that concepts, as the foundation of mathematical systems, are
constructed instead of discovered by human intelligence. It was not predetermined what
should be called an edge of a polyhedron. Instead, the concept was constructed by
humans and refined to fit into a more useful theory. Neither was it predetermined that
there must be Euler’s polyhedron formula in the theory. Indeed this property rests upon
existing definitions. However, there was no guarantee that it must serve as an important
theorem or proposition. The treatment of “local” and “global” counterexamples (Lakatos,
1976) may lead the development of theory towards another direction (referring to
treatment of the Parallel Postulate in different geometry systems). Furthermore, it was
not predetermined what types of deduction should be allowed in mathematical reasoning.
For instance, visual aid is used to prove the Pythagoras Theorem in Euclidean Geometry,
however such an approach isn’t considered as reliable in Real Analysis. Intuitionism even
rules out the method of proof by contradiction. Therefore, constructivism suggests that
21
concepts, axioms, propositions and proving methodologies in mathematics are all
inventions of mathematicians. The scenarios Lakatos presented in his imaginary
classroom depicted the journey which mathematicians took to establish the current
enterprise of mathematics, advocating the need for absolute respect for the natural
development of knowledge and implementation of a heuristic instructional methodology.
Notice that constructivism doesn’t deny the reliability of an already constructed
system, which is consistent with a formalist’s view. In fact, constructivism may suggest
that many of the axiomatic systems built by formalists are the best models that
mathematicians have produced to date. However, constructivism approves arguments to
be used in understanding mathematical objects and propositions which may not be
accepted by a formalist in many cases. Taking the proof of Jensen’s inequality (the simple
version) as an example, from the standpoint of constructivism, a proof using visual
implication is totally acceptable; but to a formalist, a rigid algebraic approach is
definitely more complete and accurate. Constructivism approves the reliability of visual
implication since it is valid and efficient in many situations (e.g. elementary Euclidean
geometry, graphs of low degree polynomials, etc.), therefore judgment with visual aid is
reliable when dealing with cases within a certain scope, even though this method may not
apply to a broader context. However, formalism may degrade the reliability of adopting
visual implication to the specific proof since it cannot deal with Dirichlet function. In
other words, formalism suggests a method to be superior to another when it applies to a
broader context, which won’t be influenced by any particular problem solver’s own
experience and judgment of the scope of discussion.
22
Summary
There is consensus within the mathematics community that proof must start with
facts that are known or assumed to be true, use if-then logic (regardless of whether the
logic must be defined upon Set Theory), and then truth or falseness (either in a relative or
absolute sense) of the targeted statements should be established (Harel & Sowder, 1998;
Krantz, 2007; Tall et al., 2012). Platonism suggests that proof discovers and verifies truth
based on known truth, while formalism and constructivism suggest that proof starts with
assumptions and follow a sequence of steps of deduction to achieve a judgment of the
validity of a proposed statement. However, a closer look reveals four major differences
between the latter two perspectives.
 Ideal vs. Instrumental. Formalism pursues an ultimate form of proof that
satisfies certain criteria which guarantee the validity of proof, despite the
individual’s perception. Constructivism stands with instrumentalism, in the
sense that concepts, axioms and proofs are all invented to solve problems and
upgraded to solve more complex problems in a wider range.
 Static vs. Dynamic. Formalism holds a narrower and more static view of
what can be viewed as assumptions and what kind of deductive steps should
be allowed in a proof. It has a more restrictive and fixed standard of what a
proof in mathematics should be like. Constructivism suggests that the
concepts, assumptions and proofs are constructed instead of discovered by
mathematicians. Hence they evolve with the rise of new questions and
discovery of new areas.
23
 Restrictive vs. Open. Formalism tends to deny or degrade the validity of
certain proof schemes once a “better” approach is found. Constructivism also
agrees that there could be proofs that are reliable in a broader context,
however the “best” proof mathematicians could come up with is not the only
acceptable form of proof and cannot replace the role and value of less rigid
arguments.
 Global vs. Local. Formalism suggests that when proving the validity of a
statement, people should be adequately familiar with related theories so they
know the scope in which the discussion lies and use methods within that
scope. Constructivism suggests that when proving a certain statement, people
draw from their own experience and community to determine the scope of
discussion and reliable methods.
Because of these different standpoints, constructivism and formalism have distinct
implication for mathematics education, especially at the introductory levels (Hersh, 2009;
Lakatos, 1976). Formalism, with an emphasis on presenting mathematics in its most
complete and rigid form, tends to introduce a theory from a careful layout of its basic
components, i.e. definitions and axioms, followed by deductive processes to build
knowledge upon the foundation. Constructivism suggests guiding the learners through the
journey that previous mathematicians took to establish theories, i.e. first offering
premature and informal perception of the subject and then refining and formalizing the
understanding through problem solving and critical reflections. These two instructional
methods are referred to by Lakatos (1976) as Deductivist and Heuristic approaches,
respectively. Currently, textbooks written in a manner of formalism dominate the

24
advanced field (college and beyond), while constructivism has gained much support at
elementary, secondary and early college level, particularly following the decline of New
Math era (Hanna, 2000a; Tall, 1991).
Note that in reality an individual’s philosophic perspective may lie somewhere in
between and shift depending on specific situations. In addition, there are different
terminologies used by scholars (e.g. Absolutist vs. Fallibilist (Ernest, 1996)) to describe
philosophical views of mathematics, and these different classifications maintain their own
criteria in organizing their respective perspectives. After all, Platonism, formalism,
constructivism and various other terms are perceptual concepts instead of defined
concepts (Bruner, 1987). Nevertheless, the essential purpose of the comparison is not to
determine how the three philosophical perspectives differ, but to help identify and
describe the criteria and features that a mathematical proof may possesses. From the
standpoint of a mathematics educator, the subject of study is learners with evolving views
of mathematical proofs, and the most relevant philosophical perspective is a school of
thought that respects an individual’s development of knowledge. Therefore,
constructivism serves as the basis for theoretical models in the learning of proof as well
as other fields in mathematics education research (Balacheff, 1991; de Villiers, 2012; Tall,
2009; van Hiele, 1986). This study is not an exception.
Constructivism recognizes proof as a human-involved activity rather than a
mechanical procedure. It shifts the attention from the content to learners’ thinking and
behavior. This implies that the content itself no longer solely determines how it should be
taught, rather learners’ “natural” behavior in the learning process must also be considered
as a fundamental component to guide instruction. As such, the purpose of teaching proof

25
is not to acclimate learners to a certain type of argument structure whose format must be
strictly followed; rather, the instruction should guide students to develop a personal
meaning of proof, in particular why proof is needed and what features it should possess to
meet the need (Hersh, 2009). Such a focus calls for investigations into two critical
questions:
 What does mathematical proofs mean to a learner?
 What is the nature of a learner’s thinking when developing an understanding
of and skills in constructing mathematical proofs?
The following two sections are devoted to reviewing and summarizing the
research studies responding to these questions.
The Functions of Proof in the Study of Mathematics
Proof as a methodology to verify the correctness of a mathematical statement has
served as a primary function in mathematics ever since it started to be used (Krantz, 2007;
Tall et al., 2012). Without understanding the concept of proof, it is impossible to perceive
what mathematical theory and practice might mean (Hanna, 2000b). With the prevailing
adoption of formalism in mathematics, proof in the deductive form came to be viewed as
the only acceptable way to establish arguments in mathematics, granting it a supreme
importance to the subject. More recently, with the rising attention to the perspective of
constructivism, additional functions of proof in mathematics were being studied and
synthesized (Tall, 1999).
Bell (1976) described the functions of proof as verification, illumination and
systematization. Balacheff (1991) suggested that “... a mathematical proof is a tool for
26
mathematicians for both establishing the validity of some statement, as well as a tool for
communication with other mathematicians” (p. 178). Schoenfeld (1994) claimed that “it
(proof) is an essential component of doing, communicating, and recording mathematics”
(p.76). Reflecting on existing literature and his own experiences with teaching and
learning of mathematics, de Villiers (1990, 2003) outlined six major functions of proof in
the discipline (p. 18):
 Verification (concerned with the truth of a statement),
 Explanation (providing insight into why a statement is true),
 Systematization 2 (the organization of various results into an organized system
of concepts, axioms, theorems and propositions),
 Discovery (the discovery or invention of new results),
 Communication (the transmission of mathematical knowledge), and
 Intellectual challenge (the self-realization/fulfillment derived from
constructing a proof).
Each of these functions is described in greater details below.
Verification
Verification is notably the most recognized function of proof. If a statement is
proved to be true without errors, then its correctness must have been clarified and there is
no space for counterexamples. In the mathematics community, a statement’s validity
remains unclear until it is proved. Until then, the statement can only be regarded as a
2
“systemization” is the original spelling used by de Villiers.
27
hypothesis even if it seems true by mathematical authorities and no counterexample is
found. Although de Villiers also pointed out that “proof is not necessarily a prerequisite
for conviction – to the contrary, conviction is probably far more frequently a prerequisite
for the finding of a proof” (p. 18), the level of conviction that mathematicians acquire
before and after obtaining a proof can never be denied. There is a clear difference
between “pretty sure” and “absolutely sure.” In fact, there were many occasions in the
history of mathematics where widely-conjectured-to-be-true statements were later
proved to be incorrect. One famous example could be the Kakeya needle problem, which
was proposed by Kakeya (1917) attempting to find the minimal area in 2-dimensional
Euclidean space within which a unit line segment can be rotated continuously through
180 degrees. Many mathematicians (including Kakeya himself) seemed to believe that
the deltoid would be the solution, since deltoid is composed by such elegant curves that
seem to satisfy conditions that are so crucial to obtain the “minimum.” Much effort had
been done to prove that deltoid is the correct solution until the Besicovitch set, a much
more complex and “artificial” shape, was proved to be the right response to the problem
(Besicovitch 1919; Pal, 1920). Another famous example is Leonhard Euler’s conjecture
that there were no integers x, y, z, and w such that x4 + y4 + z4 = w4, a counterexample to
which was found after almost 200 years (cited by IAS/PCMI, 2007). Since there is no
guarantee that a seemly true statement according to people’s or even experts’ intuition
might hold true in mathematics, it is proofs that distinguish true results from seemingly
plausible, but are not generally true, statements (Grabiner, 2012).
28
Explanation
Imagine the following scenario.
A teacher asked a student to write a 4-digit whole number on the blackboard, then
the teacher immediately said “it is divisible by 3.” The student checked with a
calculator and found out the teacher was correct. Then the student challenged the
teacher again with even larger whole numbers and the teacher could make a
correct and prompt judgment every time.
If you were the student, what would you want to know? Most likely, with the
assumption of the existence of curiosity, you would want to know “why.”
As Polya (1954) stated, “ … having verified the theorem in several particular
cases, we gathered strong inductive evidence for it … When you have satisfied yourself
that the theorem is true, you start proving it” (p. 83-84). This leads to the second function
of proof in mathematics, explanation.
As illustrated in the imaginary scenario, knowing the teacher was correct didn’t
satisfy the students’ curiosity (actually it was more likely to be aroused), and didn’t
advance the students’ understanding of the subject (Hanna, 2000b). De Villiers suggested
(1990) that merely verification “gives no psychological satisfactory sense of illumination
– no insight or understanding into how the conjecture is the consequence of other
familiar results” (p. 19).
“Proof helps us understand and explain mathematics” (p. 16, Reid, 2011). Proof
connects phenomena with more basic rules (theorems and axioms) that seem obvious and
unchallengeable. It unpacks the relationship between an unfamiliar proposition and
familiar results. Therefore, each truth statement is no longer an isolated piece of

29
knowledge, but is supported by more fundamental understanding of the subject. Realizing
and intentionally pursuing the structure of knowledge built by proof leads to a higher
order function of proof in the study of mathematics.
Systemization
Each proof uncovers a segment of an axiomatic system, allowing people to see a
twig of the structure. It ultimately leads to the understanding of the structure. It is an
indispensable tool for systematizing known results into a deductive axiomatic system (de
Villiers, 1990).
Systemization requires higher levels of thinking than merely producing a proof.
Even those who can generate a proof for some complex propositions (e.g. the Nine-Point
Circle) in Euclidean Geometry may not be conscious about the five (or ten, if counting
the assumptions about algebra) basic assumptions or be aware of how the axioms and
theorems work together as a system. This is because: 1) proving a particular statement
may only involve understanding of a small part of the system; and 2) accepting if-then
logic, as needed in generating a proof, does not necessarily require a global
understanding of if-then logic’s role in a deductive system. Therefore, there is a
difference between knowing a statement is true and knowing a proof is performed
correctly within a system. Hence, a major function of proof is to lead to the
understanding of mathematics as an organized and logically consistent network.
Nevertheless, it is impossible to understand the system as a whole without
perceiving proof as a local illustration of how the system works. It is commonly believed
that an overarching understanding comes after adequate local experiences (e.g. the van
30
Hiele model, 1986). Additionally, proof and investigation of a single case may also
inspire or directly cause a more insightful or even revolutionary view about the
knowledge structure (Lakatos, 1976). For example, the proof of Chinese Reminder
Theorem inspires the understanding of ideals in ring theory; Russell’s paradox (1903)
caused a reconsideration and reconstruction of the logic system. This leads to the
discussion of the next function of proof.
Discovery
Many discoveries in mathematics might be initially obtained or inspired by
empirical investigation (e.g. the law of large numbers) and guess and trials (e.g the four
color map conjecture/theorem). There are also new results that “were discovered or
invented in a purely deductive manner” (de Villiers, 1990). Reid (2011) illustrates this
point by suggesting that a proof of the statement that “the sum of two consecutive odd
numbers is even” leads to the discovery of a new fact that the sum must be a multiple of 4.
Another example could be Euler’s polyhedron formula, which largely narrows down the
possible cases of regular polyhedrons and directly implies the discovery of all the
possible cases.
Perhaps the best presentations of proof’s discovery function lie in the natural
science studies, where many phenomena were “found” in the theory before being
discovered in reality. This is among the most important reasons for mathematics to be
such a popular tool in those disciplines. A well-known example is the discovery of the
gravitational lens (bending of light by mass), which was deduced from Einstein’s general
theory of relativity before it was confirmed by observation. Another famous example is

31
about the molecule of C60, whose geometric structure was conjectured before its physical
appearance was found. Discovery in natural science by proof has a strong implication for
Platonism, in the sense that there are perceivable and predictable orders pre-existing in
the world before being “discovered” by human intelligence.
Discovery by proof also seems to be a natural outcome of systematization. When
talking about the verification and explanation function of proof, we consider the process
of tracing the statement “down” to the axioms and theorems. However, neither an
individual nor a community can examine or discover all true statements within a system,
especially when the system is newly established. Hence, when building “up” the theory
upon the axioms, theorems and other known results, it is quite possible to encounter
statements that have never been studied before.
Communication
While the discovery function of proof is somewhat compatible with the
perspective of Platonism, the communication function is more likely to share the basis of
constructivism. In the view of constructivism, mathematics is regarded as a product of
social construction (Balacheff, 1991). It is described as “careful reasoning leading to
definite, reliable conclusions” (Hersh, 2009). The communication delivered by
mathematical proof has two major features: clear definition and rigid layout of causal
relationships. In particular, mathematical concepts, while initially extracted from reality,
need to be understandable and communicable while carrying minimal intuitional
confusions (e.g. quantity is a mathematical concept but “prettiness” probably is not). In
addition, communication by proof also serves to maintain the least level of intuitional
32
confusion in the reasoning process. However, it is impossible to radically remove
intuition from proof. Description of the concepts involves external or even non-
mathematical language (Krantz, 2007). When performing perhaps the most rigorous
standard of deduction as theorized by the Set Theory, intuition still plays a part when
visualizing the inclusion and exclusion relationships. Nevertheless, there are intuitions
that seem to be accepted by all human beings, and hence they are used to form a common
ground for critical debates (Davis, 1976). Mathematical activities ultimately pursue
commonly accepted facts and perform commonly accepted reasoning. Proof, loyal to
both aspects, serves as the most explicit and reliable tool in communicating the substance
of mathematical thinking.
Intellectual challenge
According to de Villiers (1990), proof also serves the function of self-realization
and fulfillment. The motivation of doing proofs may come from the desire to conduct
more elegant proof or the satisfaction of conquering difficult challenges. Although
mathematical proof is set to pursue a common ground and a generally accepted way to
present causal relationships, those who can actually understand, appreciate and utilize the
idea and structure of mathematical proof only compose a small portion of the population.
The pursuit of certainty and accuracy in mathematical proof as a discipline is appealing to
scholars and learners of mathematics, science, philosophy and other logic intensive
disciplines.
33
Implication of de Villiers’ model to the learning of proof
De Villiers’ description (1990, 2003) of the functions of proofs in the study of
mathematics and its educational implications are of great importance to the mathematics
education community. De Villiers suggested that proof shouldn’t be taught only as a
method to verify mathematical statements, but also as a way to explain, organize,
investigate and communicate about mathematical facts. Students’ motivation in learning
proof can be stimulated by the curiosity of knowing “if something is true,” as well as the
willingness to know “why something is true,” “how things relate to each other,” “what
else may be true,” and “how to let other people know my ideas.” The functions proposed
by de Villiers are well aligned with learners’ interests brought into the context (see Table
1).
De Villiers’ Learners’ purpose in conducting proof aligned with the

classification of the function of proof
functions of proof
Verification To know if a statement is true
Explanation To know why a statement is true/false
Systemization To know how concepts and properties are related to each
other
Discovery To know what else is true/false
Communication To know how to communicate mathematical ideas with
other people
Intellectual Challenge To know how good s/he is in mathematical reasoning
Table 1. The alignment between functions of proof and learners’ purpose in conducting
proof
34
Healy & Hoyles (2000) categorized students’ view of proof and its purposes in a
large scale empirical study of children aged 14-15. They found that 28% of the students
didn’t show any understanding of the purpose of proof. In addition, only 1% of the
students acknowledged that proof might help discover new theories or systemize ideas.
The most recognized functions of proof were verification and explanation 3. Furthermore,
Healy & Hoyles posited that students’ understanding of the purposes of proof had a
significant influence on their ability to identify and construct a proof. From the
perspective of constructivism, only when students consider a proof as convincing and
explanatory they become more likely to assimilate it into their own reasoning method.
Without an understanding of students’ intention in producing and judging mathematical
arguments, it is impossible to understand why certain decisions are made. Therefore, the
power of validating and explaining is adopted in this study to depict participants’
evaluation of mathematical arguments in order to understand what features of the
arguments they value.
Certainly, identifying the possible intentions only serves as a starting point toward
understanding learners’ behaviors when engaged in proof related activities. Even if a
student clearly demonstrates an intention to verify a mathematical statement, researchers
might still not know why a certain strategy (algebraic vs. geometric, empirical checking
vs. deductive reasoning, etc.) was valued by the student. They might not know why the
student failed or succeeded in achieving his/her goal either. Hence, in order to obtain a
deeper understanding of how students’ knowledge of proof is constructed, investigations
3
In Healy & Hoyles’ (2000) classification of students’ view of the purposes of proof, the category named “explanation”
included both explanation and communication as identified in de Villiers’ (1990, 2003) model.
35
are needed to explore not only how students perform on tasks that demand proof, but also
their thinking which can be approached by carefully designed questions. Studies that
attempted these issues are discussed in the following section.
Existing Theories of Proof Learning
Stylianides and Stylianides (2008a) distinguished, within the genre of educational
research on proof, three major categories. The first category includes studies that
investigate students’ ability to perform proof related activity (Ball & Bass, 2003; Lampert,
1992; Marrades, & Gutiérrez, 2000; Reid, 2002; Sekiguchi, 1991; Zack, 1997). This body
of work suggests that students naturally possess the ability (Piaget, 1928, 1987) to reason
even at early elementary grades (Zack, 1997). They call for the design of interventions
that encourage students to reason coherently instead of assuming they are not ready and
providing them cognitively soft tasks to do (Bloom, 1984; Usiskin, 1987). The second
category of studies describe students’ common difficulties and mistakes when producing
proofs across different grades and content areas (Balacheff, 1988; Chazan, 1993;
Schoenfeld, 1988; Senk, 1985). The third category elaborates on the pedagogical factors
that could facilitate students’ learning about proofs (Hoyles, 1997). These three
categories of studies, including both reports of empirical investigations as well as
theoretical essays, offer insights into students’ ability along with challenges they
experience when learning proofs (Pirie, 1988). However, collectively, they fail to provide
a systematic and panoramic framework to capture features of students’ thinking when
performing proof related tasks. This gap inspired a body of studies on learners’ proof
schemes.
36
Proof schemes
The study of learners’ proof schemes has a long history and is currently a main
stream in didactics of mathematics. For instance, Bell (1976) identified “Empirical” and
“Deductive” as two major modes of justifications that students used when working on
problems that demanded proving. Empirical justification, according to his description,
relies on the use of examples whereas deductive justification relies on deduction to
connect data with conclusions.
Balacheff (1988) coined “pragmatic” and “conceptual” as two prominent modes
of justification used by students. Pragmatic justifications are based on the use of
examples (or on actions), and conceptual justifications are based on abstract formulations
of properties and of relationships among properties. He further identified three types of
pragmatic justifications to include: “naive empiricism,” in which a statement to be proved
is checked in a few (somewhat randomly chosen) examples; “crucial experiment,” in
which a statement is checked in a carefully selected example; “generic example,” in
which the justification is based on operations or transformations on an example which is
selected as a characteristic representative of a class. “Thought experiment” is identified as
conceptual justification, in which actions are internalized and dissociated from the
specific examples and the justification is based on the use of and the transformation of
formalized symbolic expressions (see Figure 1). Balacheff (1988) concluded that while
students experience difficulty producing proofs, they do however show awareness of the
necessity to prove and to use logical reasoning.
37
Pragmatic Conceptual
Justification Justification
Naive Crucial Generic Thought

empiricism experiment example experiment
Figure 1. Balacheff’s (1988) classification of students’ proving schemes
Extending the research of Bell (1976) and Balacheff (1991) and drawing from a
considerable collection of empirical data, Harel & Sowder (1998) proposed a taxonomy
of proof schemes consisting of three main categories, i.e. “external,” “empirical,” and
“analytical,” each of which encompasses several subcategories (see Figure 2). In
particular, external conviction proof schemes include instances where students determine
the validity of an argument by referring to external sources, such as the appearance of the
argument instead of its content (e.g. they tend to judge upon the kind of symbols used in
the argument instead of the embedded concepts and connection of those symbols), or
words in a textbook or told by a teacher. Empirical proof schemes, inductive or
perceptual, include instances when a student relies on examples or mental images to
verify the validity of an argument; the prior draws heavily on examination of cases for
convincing oneself, while the latter is grounded in more intuitively coordinated mental
procedures without realizing the impact of specific transformations. Lastly, analytical
proof schemes rely on either transformational structures (operations on objects) or
38
axiomatic modes of reasoning which include resting upon defined and undefined terms,
postulates or previously proven conjectures.
Figure 2. Proof schemes and sub schemes (Sowder & Harel, 1998)
39
Although the existing frameworks of proof schemes provide powerful vehicle for
classifying the types of proofs produced, they do not trace the cognitive stages that
learners might go through as they develop a more complete understanding of
mathematical proof. Attempt has been made to address this gap by studies focused on the
cognitive development of proof and reasoning.
Frameworks to depict the stages in proof learning
A great deal of research has been undertaken that explores and describes the
developmental stages that a learner goes through in their comprehension of mathematical
proof from the early stages in which s/he only possesses a primitive understanding of
mathematical objects and actions to more advanced levels where s/he is capable of
axiomatic reasoning (Tall et al, 2012). Since the ability to generate logical arguments is
among the most essential goals of any area of mathematics, progress in understanding the
subject is inseparable from the development of proof skills. Therefore, theories
concerning the learning progression often include a description of the maturation of
mathematical reasoning. The well-known van Hiele model (1986) for geometric thinking
is one of this type.
The van Hiele Model
The van Hiele model was originally proposed by two Dutch teachers, Pierre van
Hiele and Dina van Hiele-Geldof. They designed a framework which could depict the
development of geometric reasoning and hence explain how people grow in their
geometry knowledge. Five different levels of understanding through which an individual
40
passes when learning geometry were identified, including “visual,”
“descriptive/analytical,” “informal deductive,” “formal deductive,” and “rigor” (van
Hiele, 1986, see Figure 3). A brief description of each level is presented below.
Level 5: Rigor
Level 4: Formal Deductive
Level 3: Informal Deductive
Level 2: Descriptive/Analytic
Level 1: Visual
Figure 3. The van Hiele Model (van Hiele, 1986)
At the visual level (Level 1), learners could identify, name, and compare
geometric figures, such as triangles, rectangles, angles, parallel lines, etc., according to
how they look. For example, at this level, students may see the difference between
triangles and quadrilaterals by counting the number of their sides, but they may not be
able to tell a square is a rectangle since they “look different.”
At the descriptive/analytical level (Level 2), learners can recognize components
and properties of a figure; however, they cannot reason upon those properties. They are
able to describe figures in terms of their parts and relationships among these parts, to
41
summarize the properties of a class of figures, and use properties to solve basic
identification problems, but they cannot yet conduct deduction. For example, learners
know a right triangle is a triangle that has a right angle, but they cannot explain whether it
is possible for a triangle to have two right angles.
At the informal deductive level (Level 3), learners are able to connect figures with
their properties. They can justify figures by their properties as well as articulate the
properties of a given figure. The learners can understand and use precise definitions.
They are capable of using “if-then” thinking, but they cannot consciously use
mathematically correct language, nor can they realize the deductive property of their
reasoning. Their reasoning is based on intuition (Fischbein, 1982) instead of a
mathematical foundation. For example, the learners are able to claim that it is impossible
for a right triangle to have two right angles because if so there will be two sides that
cannot “meet.”
At the deductive level (Level 4), learners can reason about geometric objects
using their defined properties in a deductive manner. They could consciously construct
the types of proofs that one would find in a typical high school geometry course. They are
aware of what counts as a legitimate proof in mathematics.
At the highest level, rigor (Level 5), learners can compare different axiomatic
systems. Learners fully understand the structure of a system as well as its applications
and limitations. They can analyze and compare these systems.
The van Hiele model has been modified and extended by scholars to meet
particular research interest. For example, Clements and Battista (1992) add a level 0,
“pre-recognition,” where children were not able to visually identify the difference
42
between shapes, to this model to depict their cognition in geometry at the very beginning
stage. Pegg and Davey (1998) integrated the van Hiele model with another learning
theory, the SOLO taxonomy (Biggs & Collis, 1982), to describe how learning develops
within and through the levels.
The van Hiele model doesn’t trace the “in between levels of reasoning” (Burger &
Shaughnessy, 1986), nor does it offer enough details to depict how proof is perceived by
learners. This point became quite obvious when applying the model to study students’
development of proof skills. After all, the van Hiele model was not specially designed to
capture the development of proof ability in geometry, but to document levels of
geometric reasoning. The early two levels concern sense making and concept building,
while the ability to produce justification mostly develop at the higher three levels. It is
not suggested that the development of reasoning ability could be separated from sense
making and concept building; however an elaboration on children’s development of
proving capacity is needed.
Waring’s proof levels
Waring (2000) proposed the proof levels for elementary and secondary students to
“describe the development of proof concepts beginning with an appreciation of the need
for proof, then an understanding of the nature of proof, and finally pupils’ competence in
constructing proofs” (p. 10). Six levels are identified in the framework.
At Level 0, students are ignorant of the need to provide a mathematical
justification to confirm a statement. In particular, they may either think it is unnecessary
43
to offer a reason, or just refer to external (Harel & Sowder, 1998) sources, such as
teachers’ words and statements in the textbooks, to support their opinions.
At Level 1, students start to become aware of the need to provide a justification,
however they are not cognizant that a claim should be verified in all possible cases.
Instead, they just check a few cases and suggest the results are sufficient to support the
claim.
Moving up to Level 2, students still rely on empirical checking. While they are
more careful in choosing examples to verify, and may notice certain patterns in the
process, they still cannot produce a proof that accounts for all cases. This may be due to
their inability to realize the entire scope of discussion, absence of knowledge about the
need to clarify every case, or lack of language tools to describe the patterns they detect.
At Level 3, students become aware of the need to offer justification for general
cases, however they lack proof skills or basic understanding of the subject. Therefore,
they cannot produce a valid proof.
At Level 4, students are both aware and capable of producing generalized proofs.
However they can only do so in limited and familiar contexts.
At Level 5, students understand the rationale of proof as a reliable reasoning
method and can intentionally apply to justify in unfamiliar contexts.
Compared to the van Hiele model, Waring’s proof levels offer an account of how
the shift from informal to formal understanding of proofs may occur. However, both
frameworks provide a linear account of development (i.e. changes happen one after
another) with no space in the structure to describe processes that might occur randomly or
parallel to each other. For instance, lower levels in the van Hiele levels emphasize sense
44
making and concept building while reasoning is only emphasized in higher levels; while
in Waring’s model, awareness simply develops before understanding. Scholars have
proposed that the development of mathematical cognitions follows a more complex and
non-linear format (Lakatos, 1976; Kieren & Pirie, 1991; Martin, 2008; Pirie & Kieren,
1992).
The broad maturation of proof structures
Tall et al. (2012) proposed a two-dimensional model to depict the development of
factors that are involved in the maturation of one’s proof ability (see Figure 4). This
framework captures six key components (i.e. perceptual recognition, verbal description
and pictorial or symbolic representation, definition and deduction, equivalence,
crystalline concepts, and deductive knowledge structure) and their relationships in the
broad maturation of proof structure. Different from the van Hiele model, Tall et al. (2012)
suggest that the perceptual understanding doesn’t develop only at earlier stages. Instead it
continues to be refined when the understanding of the concept and deductive process is
advanced. This idea is consistent with the perspective of constructivism, in the sense that
the mathematical system possesses a dynamic structure so that a shift in understanding of
a factor may impact other components (Lakatos, 1976; Tall, 2005). Nevertheless, Tall et
al. (2012) don’t suggest that all the components in the structure develop simultaneously.
Instead, certain types of understanding serve as a prerequisite for others to occur. This
feature is denoted by the initial “height” of each component.
45
Figure 4. The broad maturation of proof structure (Tall et al, 2012)
Crystalline concept introduced in this framework has a crucial role in the
development of proof structure. It was described as “a concept that has an internal
structure of constrained relationships that cause it to have necessary properties as part
of its context” (p. 19). In other words, it is a concept with a pack of associated knowledge
attached to it. In order to construct deductive reasoning, involved concepts must not be
perceived as isolated objects. Only when the roads are built can a pass be drawn.
46
Theoretical Framework
The pilot study (Liu & Manouchehri, 2012) adopted Harel & Sowder’s (1998)
model to differentiate and classify different types of arguments, attempting to understand
if the proof scheme is a common indicator of whether an argument was found convincing
or appealing by an individual. The results of the pilot study suggested that students prefer
different proof schemes and may have distinct judgment of the same scheme in different
contexts. This result was consistent with Harel & Sowder’s (1998) finding that an
individual could simultaneously hold different proof schemes. Since the proof scheme
alone cannot determine whether an argument is preferred or accepted as reliable by a
student, more factors need to be considered to explain the phenomenon. De Villiers’
(1990, 2003) model offers one advantage (i.e. the intention of the learner in creating the
argument) to explain why a certain approach may be preferred by learners. Waring ’s
(2000) model provides another dimension to explain a learner’s conception of proof by
referring to stages the learners may be achieved at the time of assessment. The broad
maturation model proposed by Tall et al. (2012) adds to the conversation by considering
the impact of the learner’s understanding of related mathematical topics on their
perception of proofs. Healy & Hoyles (2000) suggested gender can also be a factor. In
addition, a substantial number of studies have tended to explain students’ understanding
of argumentation methodology from what they experienced in school intervention
(Dreyfus, 1999; Hoyles, 1997; Herbst & Branch, 2006; Schoenfeld, 1988; etc.). Indeed,
intention, and school experience are factors that impact learners’ preference for and use
of proof modes. However, what ultimately impacts students’ judgment is their
understanding of the argument (see Figure 5). In order to distinguish the types of
47
arguments students found convincing and appealing to them, we must understand what
features or factors of an argument that impact their evaluation.
The statement of Personal understanding Personal evaluation

the argument of the argument of the argument
Figure 5. Evaluation of argument is based on understanding
According to the survey conducted by Mejia-Ramos and Inglis (2009), a majority
of reported studies on mathematical proof are concerned about students’ estimation,
exploration and justification of a mathematical conjecture, but few studies pay attention
to how students comprehend and evaluate a given proof. In addition, instruments that
assess students’ comprehension of proof are also underdeveloped (Mejia-Ramos et al.,
2012). Research on students’ comprehension of given arguments are rare and in great
need. This is for several reasons. First, in school practice, reading and understanding the
proofs offered by teacher or course materials serves as a main venue for students to
develop their conception and skills of proofs (Weber, 2004). Second, the evidence and
logic students use to construct a proof is usually familiar to them, however when
evaluating a proof, they may encounter unknown resources and unfamiliar reasoning
methodologies, and their judgment of a reasoning method in such an unfamiliar scene
would reveal some basic features of their conviction system. Lastly, the ability to judge a
mathematical argument is an indispensible skill for one to consciously construct proofs.
Without an internal understanding of what kind of reasoning process is reliable, it is
48
impossible for one to monitor and inform his/her own construction of mathematical
proofs. Understanding how students evaluate and assimilate (or exclude) different ideas is
critical in understanding their learning process, and the learning of proof is not an
exception. Therefore more studies that focus on students’ thinking in reading and judging
about proof are needed (Healy & Hoyles, 2000; Mejia-Ramos & Inglis, 2009; Selden &
Selden, 2003; Yang & Lin, 2008).
Yang & Lin (2008) took an initiative to address the gap by proposing the Reading
Comprehension of Geometry Proof (RCGP) model to describe the stages that learners go
through in understanding a given geometric proof (see Figure 6). They suggested that
students first identified isolated conceptual and procedural knowledge in the statement at
the surface level. They then start to recognize some knowledge and statements are
premises, some are conclusions and some are description of properties. Moving up to the
chaining elements level, students are able to identify and understand the connection
between premises, conditions, properties and conclusions. While at the highest level,
encapsulation, students gain a systematic and organized view of the elements in the proof;
they are well aware of what the premises and conclusions are and fully understand the
causal relationships among them.
49
Figure 6. Reading Comprehension of Geometry Proof (RCGP) Model (Yang & Lin, 2008)
Following the identification of the four levels of understanding, Yang and Lin
(2008) further examined conditions under which a learner’s understanding could move
toward higher levels. In particular, they suggested that Basic Knowledge (i.e.
understanding of the terms and sentences), Logical Status (i.e. realization of the logical
relationship), Summarization (i.e. capturing the care of a logical relationship), Generality
(i.e. understanding to what extend the argument is valid), and Application (i.e. knowing
how to apply the proposition) make up the critical understanding a learner needs to
develop in order to achieve higher levels of proof reading comprehension, as illustrated in
Figure 6.
The RCGP model serves in understanding the development of students’
comprehension of a particular proof. The model suggests that only when certain
50
understanding is in place a learner can comprehend a proof as a logically coherent
argument. Therefore in order to evaluate the reasoning process of a mathematical
argument in an informed manner, students must at least reach the understanding at the
chaining elements level since only at this level can students start to see the relations
within the statement. In other words, judgment can only be made upon certain levels of
understanding, and judgment about reasoning method can only be made when they see
the connection.
According to the RCGP model, there are three kinds of understanding that
students need to develop in order to reach the chaining elements level. Generally
speaking, students need to be able to understand the concepts used in the argument,
students need to be able to identify what is the evidence on which the argument is based,
and students need to be able to see the connection between the premises and results.
When shifting the attention from students to the arguments, it is noticeable that
these three types of understanding in fact point out three key aspects about the argument
that could, to a large degree, influence students’ comprehension and evaluation of a
mathematical argument, i.e. the representation (which describes the concepts and other
terms), the source of conviction (which states what is taken for granted), and the link
between the evidence and conclusion (which represents the reasoning process). Three
similar aspects were addressed by Stylianides and Stylianides (2008a) as the modes of
arguments, the set of accepted statements, and the modes of argumentation. It is assumed
that only when students understand the presentation, agree with the source, and recognize
the link would they consider an argument as reliable. With the three identified aspects,
the investigation becomes one of examining what kind of presentation, what kind of
51
source, and what kind of link contributes to students’ conviction of an argument. So the
next step of model building is to identify the different genres in each aspect.
Bruner (1966) synthesized three kinds of representations when communicating
ideas, i.e. enactive (which involves the use of gesture and physical actions), iconic (which
involves the use of pictures, graphs and visual tools) and symbolic (which includes the
use of natural language, numbers, and logic). However, in the current study, where the
communication is carried out by mathematical arguments, the enactive way is not utilized.
In mathematics, algebraic representation indeed sets up as a different way of
communication from casual language. Therefore we see the need to distinguish the two
forms of arguments. In the theoretical framework of the study, four different
representations of a mathematical argument are conceptualized. They are: narrative,
numerical, symbolic, and visual. Narrative arguments refer to those using casual
language. A typical example could be “Because the car is slower, it takes a longer time to
get to the destination.” Numerical argument refer to those using numbers and elementary
mathematics symbols (such as “+,” “-,” and “<”). For example, “since 12 = 3 * 4, then 12
is a multiple of 3” falls in this category. Symbolic arguments refer to those using letter
symbols to represent mathematical concepts and communicate ideas. At the secondary
levels, it is not expected that students will use formal language as do those in an
advanced algebra class. Therefore, the symbolic arguments may contain a large amount
of casual language as well. However, what distinguishes a symbolic argument from a
narrative one is that it is impossible to understand a symbolic argument without knowing
the role played by the letter symbols. For instance, the argument “Since x2-2x+1=(x-1)2,
then it must be non-negative” falls in this category. The last type is visual arguments,
52
where visual aids are provided to present concepts and to communicate ideas. This
category is the same as the iconic way of communication conceptualized by Bruner
(1966). A typical geometry proof that uses figures falls in this category.
The classification of sources of conviction as well as the link between source and
conclusion is informed by Harel & Sowder’s (1998) model. However, unlike Harel &
Sowder’s model which categorizes arguments as a whole, this framework classifies the
source of conviction and the link between source and conclusion separately. This
alternative approach was impacted by a reflection on the application of Harel & Sowder’s
model on several concrete cases. For example, a step-by-step correct and complete proof
initiated from the Pythagoras Theorem would most likely be classified as a deductive
proof using Harel & Sowder’s model. However, students who write down the same proof
may actually have different comprehension of it. For instance, some of them may view
the Pythagoras Theorem as an external authority without understanding why it holds,
some of them may view it as an assumption, and some of them may view it as part of
their concept of a right triangle. It was suggested that students’ inconsistency in
preference and evaluation of proofs in different contexts as observed in the pilot study
(Liu & Manouchehri, 2012) could partly be due to the lack of preciseness when using
Harel & Sowder’s model to depict students’ way of reasoning. Therefore by looking at
source and link separately, we expect to get a more detailed picture of students’
comprehension of proof. In particular, students could view the source of an argument as
authority (i.e. what’s stated by a respectful knowledge carrier, e.g. teachers,
mathematicians, books, agreement of a community, etc.), example (i.e. result from an
immediate test), imaginary (i.e. mental image created upon or recalled from previous
53
experience), fact (i.e. well known existing mathematical results), an assumption (i.e. an
assumed truth for the argument to be based on), and opinion (conviction without an
explicit reason). The types of link between source and conclusion include direct
indication, perceptual connection, induction, transformation (Simon, 1996;), ritual
operation (Healy & Hoyles, 2000), and deduction. In direct indication, the conclusion is
the required condition of source without any additional understanding (e.g. “Since the
squares of a positive number, a negative number and 0 are all non-negative, then the
square of a real number is non-negative”). Perceptual connection refers to linking source
and conclusion based on visualization or intuition. The argument “Since f(x) is a much
longer term than g(x), then f(x) must be larger” is an illustration of the use of perceptual
connection in an argument. The use of metaphor falls into this category as well. Induction
and transformation both refer to a conclusion informed by several piece of empirical
evidence, however the later involves a further investigation and notice of properties that
connect the empirical cases. The use of generic examples (Balacheff, 1988) falls in the
category of transformation. Ritual operation and deduction both refer to a valid reasoning
procedure, however one using the former doesn’t know why the procedure works (e.g.
using an algorithm without knowing why it works) while one using the latter is well
aware that each step in the process connects an evidence to its required condition. The
structure of the framework is illustrated in Figure 7.
54
Presentation
Symbolic
Link
Deduction
Numerical
Ritual operation
Transformation
Narrative
Induction
Visual Perceptual connection
Direct indication
Source
Authority Example Imaginary Fact Assumption Opinion
Figure 7. Framework to classify students’ comprehension of a mathematical argument
To exemplify how the framework is used in the current study, let’s consider the
following argument: “Since 2+2=4, 2+4=6, 2+6=8, 4+6=10, then the sum of two even
numbers must also be even.”
The representation of the argument is narrative. The source of conviction is
experience since it is based on results of several trials. The link would most likely be
classified as induction. There shouldn’t be much ambiguity about the representation and
the source of conviction in this case. However, when judging the link between source and
conclusion, we cannot be certain about whether the nature of reasoning was purely
inductive. For instance, it is possible that when one reads the proof s/he may have noticed
some patterns from the trial results but hasn’t explicitly expressed the discoveries. In
order to clarify such concerns, additional information need to be elicited so as to confirm
55
conjectures or assumptions regarding choices made. Questions such as “why do you think
checking on a few cases is sufficient for a conclusion about every case?” can potentially
provide a venue to the individuals’ thinking.
In order to clarify ambiguities associated with sources contributing to individuals’
choices, it is important to acknowledge that neither the representation, source of
conviction, nor the link between source and the conclusion can be identified merely by
looking at the argument itself. Instead, they reside in one’s comprehension of the
argument, even though the expression of the argument can certainly influence one’s
understanding. For the purpose of convenience, one’s interpretation of an argument is
called an “internalized argument” for the rest of the work. Accordingly, this framework,
which classifies different internalized arguments, is called the Classification Cube of
Internalized Arguments (CCIA) thereafter.
Using CCIA to categorize different types of comprehensions of mathematical
arguments, the current study investigated the type of arguments students considered
convincing, explanatory and appealing. In addition, the study examined whether there
were common types of representation, source and link that contributed to students’
choices. Furthermore, similarities and differences among individuals and among the
contexts were studied in order to identify personal factors that influenced their judgment.
Detailed research methods and procedures are provided in the next chapter.
56
CHAPTER 3. METHODOLOGY
This study sought to investigate what kind of mathematical arguments students
considered as convincing, explanatory and appealing, and what factors influenced their
evaluations. In order to do so, both quantitative and qualitative data were collected and a
mixed method design was utilized.
Mixed Method Designs
Conducting mixed-methods research involves the collection, analysis, and
interpretation of both quantitative and qualitative data in a single study or in a series of
studies that investigate the same phenomenon (Onwuegbuzie & Leech, 2006; Creswell &
Plano Clark, 2011). Quantitative research emphasizes deductive logic, and utilizes
numerical data; whereas qualitative research emphasizes inductive logic, and often
utilizes textual and pictorial data (Teddlie & Tashakkori, 2009). Quantitative research
tends to eliminate researchers' biases, so that they can remain emotionally detached and
uninvolved with the objects of the study and test or empirically justify their stated
hypotheses; whereas qualitative research “contend(s) that multiple-constructed realities
abound, that time-and context-free generalizations are neither desirable nor possible, that
research is value-bound, that it is impossible to differentiate fully causes and effects, that
logic flows from specific to general and that knower and known cannot be separated
because the subjective knower is the only source of reality” (Johnson & Onwuegbuzie,
57
2004, p. 14). A combination of quantitative and qualitative research designs serves to
fulfill five purposes, i.e. triangulation, complementarity, development, initiation, and
expansion (Greene, Caracelli, and Graham, 1989). More specifically, triangulation seeks
common results from different methods to reduce the inherent method bias of any
particular method, including the inquirer bias, theory bias, and context bias.
Complementarity increases the meaningfulness of inquiry results by elaborating,
enhancing, illustrating and clarifying the results from one method with the results from
other methods. Development concerns the method design using the results from other
methods. Initiation deepens and broadens the inquiry by seeking new perspectives of
frameworks, or the discovery of paradox and contradiction between results from different
methods. Lastly, expansion extends the scope of inquiry by using methods most
appropriate for certain inquiries.
Johnson & Onwuegbuzie (2004) suggested that the logic of inquiry in mixed
methods research includes “the use of induction (or discovery of patterns), deduction
(testing of theories and hypotheses), and abduction (uncovering and relying on the best of
a set of explanations for understanding one’s results)” (p. 17). The quantitative and
qualitative methods can be mixed by ordering them sequentially, merging them, or
embedding one strand within the other.
This study aimed to explore the kind of mathematical arguments students
considered as convincing, explanatory and appealing, and sought explanations to such
evaluations. The quantitative method could help with collecting data from a large sample
and hence facilitate the discovery of patterns and testing of hypotheses. It served to
enhance the scope of inquiry and the generality of the findings (Cohen, 1988). Findings
58
based upon statistical analysis could highlight connections between students’ thinking
and characteristics of the content. However, since participants were only asked to
complete multiple-choice items in the survey that were predefined by the researchers, the
quantitative study ran short of providing opportunity to explore learners’ own
explanations. Therefore, the qualitative methods were critical in offering further
interpretations of emergent patterns, and enhancing the analysis of the study by providing
insights into specific cases (McConaughy & Achenbach, 2001; Yin, 2009). Both methods
were needed in order to provide a comprehensive and meaningful explanation for the
objective of this study, i.e. students’ evaluation of mathematical arguments.
Procedure of the Study
Adopting a mixed methods design, the study consisted of the development,
administration and analysis of a survey and follow up interviews (see Table 2). The
survey and interview protocol were designed and refined in 2012. The survey was
administered in January - February 2013, and the follow up interviews were conducted in
April 2013. The survey was administered in the participants’ schools and took 30-60
minutes to complete. Individual interviews lasted approximately an hour each.
Participants of the study, development of survey instrument, procedures of the interview,
as well as the data analysis process are described in the following sections of this chapter.
59
Timeline Task Summary of the task
2012 Instrument The instrument for the survey and the interview
Development protocol was designed based upon existing literature
and findings from pilot studies.
January - Survey The survey was administered online using the
February, Administration instrument called Survey of Mathematical Reasoning.
2013 The survey took 30-60 minutes to accomplish.
February – Survey Students’ evaluations of mathematical arguments
April, 2013 Analysis were quantitatively analyzed. Survey results were
used to determine the participants of the interview.
April, 2013 Interview Follow up one-on-one interviews were conducted
Conducted with individuals selected from those who had taken
the SMR to further investigate why they made certain
choices in the survey. Each interview lasted about an
hour
April – June, Interview Students’ responses in the interview were
2013 Analysis qualitatively analyzed. Factors that influenced
students’ decisions were conceptualized and
synthesized.
Table 2. Outline of the procedure of the study
Sample
The population of interest in this study was 8th grade students. Two reasons
contributed to this choice. First, according to Piaget’s (1985) Intellectual Development
Stages, middle school students are at a critical cognitive phase where they can engage in
abstract and logical thinking. Therefore, how they learn to value different arguments at
60
this stage could potentially impact their reasoning skills and thinking habit in the later
years. Second, the grade band serves as a bridge between middle and high school
mathematics and the link between informal and more formal and abstract mathematical
reasoning (Knuth, Choppin, & Bieda, 2009). According to the curriculum standards
(CCSSO, 2010), most 8th grade students should have obtained basic understanding of
numbers, shapes, chance, and algebraic expressions, know some simple propositions and
properties, and should be able to see the connection between concepts and ideas.
However, they may not have yet adopted abstract thinking or deductive ways of
mathematical reasoning using conventional proving techniques and forms. Therefore, the
features of arguments they consider as convincing, explanatory, and appealing can offer
valuable references for the development of resources and instructional explanations that
can facilitate students’ internalization and adoption of more mathematically sound
argumentations.
Survey Participants
Over 500 8th grade students from 5 different public schools in Ohio took the
survey in January and February of 2013. According to the 2012 spring Ohio state
standardized 7th grade mathematics test results, two of the schools had performed below
state average (at least 10% below as measured by percentage of proficiency), one
school’s performance was at the state average, while the other two schools’ performance
was above the state average (about 10% above as measured by percentage of proficiency).
The survey was given to the students in their respective school setting during a regular
class period.
61
Data trimming was conducted to exclude unreliable information. We excluded
data from those who hadn’t completed the survey and those who had chosen the same
option for almost all questions. In particular, the survey contained 48 questions that
required student to select one of the three options: “agree” “disagree” and “not sure.” If a
participant chose the same option for all but 5 questions (10% of the total), we considered
his/her responses not made based on careful analysis. Hence, the actual data used in
analysis in this study consisted of responses from 476 respondents. 48.1% of the
participants were “male,” and 49.8% were “female.” The remaining 2.1% chose not to
disclose their gender. In responses to the question about ethnicity, 78.6% selected “White,
not of Hispanic origin”, 7.1% selected “Black, not of Hispanic origin”, 1.7% selected
“Hispanic”, 2.1% selected “American Indian or Alaskan Native”, 0.4% selected “Asian
or pacific islander.” 10.1% of the respondents chose not to disclose their ethnicity. In
response to the question about the math courses completed, 88.5% of the students
indicated that they had taken or were taking Algebra I or an equivalent Integrated 8th
Grade Mathematics course, 10.3% indicated that they had taken or were taking Geometry,
and 2.5% indicated that they had taken or were taking Algebra II. Based on the
demographics of the sample, we believe our data to be a fair representative of the 8th
grade student population.
Survey Instrument: Survey of Mathematical Reasoning
The Survey of Mathematical Reasoning (SMR) (see Figure 8) was designed to
capture students’ judgment of different arguments and their personal comprehension of
62
them. SMR was published online and participants took the survey on the website during
one of their class periods. All the items on the SMR were multiple-choice.
63
BACKGROUND INFORMATION
Thank you for agreeing to participate in this study. Your responses to the survey are
confidential and will not be shared with your teachers in school.
The survey contains 4 mathematics problems. For each problem you will need to evaluate
4 mathematical arguments and answer related questions. There is no right answer to those
questions. We just want to know your opinion.
Please read the questions carefully and pick the options that best match your opinion.
Please plan on using 30 to 45minutes to complete the survey.
Now let's start!
1. The name of your school: _________________________
2. The grade you are in: 6 7 8 9 10 11 12
3. Your student ID as assigned by your school (or your math coach): ________________
4. Your gender:
Male Female I choose not to answer this question
5. Please describe your race/ethnicity.

American Indian or Alaskan Native
Asian or Pacific Islander
Black, not of Hispanic origin
White, not of Hispanic origin
Hispanic
Other (or I choose not to answer this question)
6. Mathematics courses you have taken (including the course you are taking):
Pre-algebra Algebra I Algebra II Geometry
Integrated 7th grade math Integrated 8th grade math
Other (please specify) _____________________
On the next page, we will start to work on some math problems. Ready to go?
Continued
Figure 8. Survey of Mathematical Reasoning 4
4
SMR used in the study was a web-based survey. Therefore, although sharing the same content, the survey on the
internet had a different layout from what is shown here.
64
Figure 8 continued
PROBLEM A
Shaina claimed that:
“A multiple of 6 must also be a multiple of 3.”
Arguments A1 - A4 are offered by different people to justify Shaina’s claim. Please read
each of the arguments carefully and pick the options that best describe your thinking in
Questions 7 - 11.
************************************************************************
Argument A1: I’ve tried plenty of multiples of 6 (like 12, 60, 606, etc.) and found they
are multiples of 3 as well. So I am sure that Shaina’s statement must be true.
7. What do you think of the argument above? Please pick the option that best matches
your opinion.
Agree Disagree Not Sure

You understand the concepts and
notations used in the argument.
The argument shows that the
statement is always true.
The argument helps you better
understand why the statement is true.
Argument A2: Any multiple of 6 can be written as 6n. We know that 6n = 3•2n, which is
a multiple of 3. Therefore a multiple of 6 must also be a multiple of 3.
your opinion.

continued
65
Figure 8 continued
Argument A3: If the total number of cookies is a multiple of 6, then we can put them
into several boxes where each box contains 6 cookies. We can further divide each
box into 2 packages, where each package contains 3 cookies. Now all the cookies
are put into packages of 3. Therefore, the total amount of cookies must also be a
multiple of 3.
your opinion.

Argument A4: The total number of square cards below is a multiple of 6:
We can rearrange the squares in this way:
Now we can see that a multiple of 6 must also be a multiple of 3.
your opinion.

continued
66
Figure 8 continued
11. After evaluating each argument, which of them is closest to what you will use in
arguing about Shaina's claim? 5
Argument A1 Argument A2 Argument A3 Argument A4
None of the arguments is close to what I will use. This is how I will argue:
continued
5
A1 - A4 were relisted below this question in the online version of SMR to allow students see all the arguments that
needed to be compared. The same layout was adopted for the other three problems used in the survey.
67
Figure 8 continued
PROBLEM B
Ryan claimed that:
“The diagonal of a rectangle must be longer than each of its sides.”
Arguments B1 - B4 are offered by different people to justify Ryan’s claim. Please read
Questions 12 - 16.
************************************************************************
Argument B1: I’ve drawn several rectangles and measured the length of their sides and
diagonals. I found that the diagonal of any of those rectangles is longer than any
side of the same rectangle. So Ryan’s statement must be true for all rectangles.
your opinion.

Argument B2: Imagine that you are standing on the corner of a football field. Then the
diagonal of the field is definitely longer than any of its sides. So Ryan’s claim
must be right.
your opinion.

continued
68
Figure 8 continued
Argument B3: As shown in the figure below, ABCD is a

rectangle. Since ∠A = 90°, then by the Pythagorean
Theorem,
BD^2 = AB^2 + AD^2.
So BD^2 > AB^2 and BD^2 > AD^2
(The notation X^2 means the square of X. For example, BD^2 means the square
of BD). Therefore, BD is longer than AB and longer than AD.
your opinion.

Argument B4: Suppose ABCD is a rectangle.

Draw a circle using B as the center and BD
as the radius. From the figure shown, we
can see that BD = BQ = BP. Since BC <
BP and BA < BQ, then both BA and BC
are shorter than BD. Therefore, the
diagonal of a rectangle must be longer than
any of its sides.
15. What do you think of the argument above?

Please pick the option that best matches your opinion.

continued
69
Figure 8 continued
arguing about Ryan's claim?
Argument B1 Argument B2 Argument B3 Argument B4
continued
70
Figure 8 continued
PROBLEM C
There are two triangles. The lengths of the three sides of Triangle I are A, B, and C and
the lengths of the three sides of Triangle II are a, b, and c. Jennifer claims that:
“If A > a, B > b and C > c, then the area of Triangle I must also be larger than
Triangle II.”
Arguments C1 - C4 are offered by different people to justify Jennifer’s claim. Please read
Questions 17 - 21.
************************************************************************
Argument C1: If A = B = C = 2, a = b = c =1, then Triangle I is obviously larger than

Triangle II. I also tried many other cases (as shown in the figures below) and
found Triangle I always has an area larger than that of Triangle II. So I am sure
Jennifer's claim must be correct.
your opinion.

continued
71
Figure 8 continued
Argument C2: We all know that the area of a triangle equals 1/2 of the product of its
base and height. As shown in the figures below, the area of Triangle I = BH/2, and
the area of Triangle II = bh/2. We know that B > b. In addition, since A > a and
C > c, then it must be true that H > h. So BH/2 must be larger than bh/2. Therefore
the area of Triangle I must be larger than the area of Triangle II.
your opinion.

continued
72
Figure 8 continued
Argument C3: As shown in the figures below, since each side of Triangle II is shorter
than the corresponding side of Triangle I, we can cut each side of Triangle I
shorter and then compose Triangle II using the shortened sides. Therefore, the
area of Triangle II must be smaller than the area of Triangle I.
your opinion.

Argument C4: Since each side of Triangle I is longer than the corresponding side of
Triangle II, then the perimeter of Triangle I must also be longer than the perimeter
of Triangle II. If we make the two triangles using wires, then it needs a longer
wire to make Triangle I than Triangle II. Using a longer wire we can make a larger
triangle. Therefore the area of Triangle I is definitely larger than the area of
Triangle II.
your opinion.

continued
73
Figure 8 continued
arguing about Jennifer's claim?
Argument C1 Argument C2 Argument C3 Argument C4
continued
74
Figure 8 continued
PROBLEM D
The sales tax rate of the state where Ravi lives is 5%. Ravi is buying a new bike in a local
bike store and has a $20 coupon. Ravi claims that:
“I can always save $1 if the $20 coupon is applied before tax rather than after tax,
regardless of the actual price of the bike.”
Arguments D1 - D4 are offered by different people to justify Ravi’s claim. Please read
Questions 22 - 26.
************************************************************************
Argument D1: Suppose the original price of the bike is $100.

If the coupon is applied before tax, then Ravi needs to pay
(100 – 20) × (1 + 5%) = 84 dollars.
If the coupon is applied after tax, then Ravi needs to pay
100 × (1 + 5%) – 20 = 85 dollars, which is $1 more than what he needs to pay if
the coupon is applied before tax.
I tried some other possible prices of the bike, such as $200, $500, etc., and found
he always pays $1 less if the coupon is applied before tax. Therefore, I am sure
Ravi’s claim is always right.
your opinion.

continued
75
Figure 8 continued
Argument D2: Suppose the original price of the bike is x dollars.

(x – 20) × (1 + 5%) = 1.05x – 21 dollars.
x × (1 + 5%) – 20 = 1.05x – 20 dollars.
Notice that (1.05x – 20) – (1.05x – 21) = 1. Therefore, Ravi always saves one
more dollar if the coupon is applied before tax rather than after tax.
your opinion.

Argument D3: If the coupon is applied before tax, then Ravi doesn’t need to pay the tax
for the $20 discount. If the coupon is applied after tax, then he needs to pay the
tax of the original price of the bike. Notice that $20 × 5% = 1. Therefore Ravi
always saves one more dollar if the coupon is applied before tax rather than after
tax.
your opinion.

continued
76
Figure 8 continued
Argument D2: Suppose the original price of the bike is x dollars.

(x – 20) × (1 + 5%) = 1.05x – 21 dollars.
x × (1 + 5%) – 20 = 1.05x – 20 dollars.
Notice that (1.05x – 20) – (1.05x – 21) = 1. Therefore, Ravi always saves one
more dollar if the coupon is applied before tax rather than after tax.
your opinion.

Argument D3: If the coupon is applied before tax, then Ravi doesn’t need to pay the tax
for the $20 discount. If the coupon is applied after tax, then he needs to pay the
tax of the original price of the bike. Notice that $20 × 5% = 1. Therefore Ravi
always saves one more dollar if the coupon is applied before tax rather than after
tax.
your opinion.

continued
77
Figure 8 continued
Argument D4: Let x be the original price of the bike and y be how much Ravi actually
needs to pay (after applying the coupon and tax). Based on calculation, the graph
below is generated by a graphing calculator to illustrate the two situations: the
solid line represents how much Ravi needs to pay if the coupon is applied after
tax; the dashed line represents how much he needs to pay if the coupon is applied
before tax. From the graph, we can
see that the solid line is parallel to the
dashed line and is always 1 unit
above it. Therefore, Ravi can always
save one more dollar if the coupon is
applied before tax rather than after
tax.
your opinion.

arguing about Ravi's claim?
Argument D1 Argument D2 Argument D3 Argument D4
78
The design of SMR was informed by Healy & Hoyles’s (2000) student-proof
questionnaire. The student-proof questionnaire was created to capture the respondents’
views about proofs. The questionnaire consisted of three sections. First, students were
asked to offer a written description of their general understanding of the purpose of
proving. Then several mathematical conjectures and different arguments to justify the
conjectures were provided, and students were asked to pick arguments they would adopt
for themselves and those they considered would receive the best mark from their teachers.
Lastly, students were asked to offer an evaluation of the arguments based on how
convincing and explanatory they found each one. Based on the specific focus of this
study, several modifications were made to Healy & Hoyles’ questionnaire to
accommodate the research goals, as described below.
First, the current study didn’t concern students’ perception of “mathematical
proof,” rather, the kind of arguments they find convincing, explanatory and appealing.
Therefore, participants were not asked to offer a written description of their
understanding of “proof” on the survey, nor was there a need to identify arguments that
they believed would receive “the best mark” (Healy & Hoyles, 2000). However,
participants were still asked to identify an argument in each of the problem contexts that
they were likely to adopt for themselves. Moreover, they judged whether each argument
was convincing and explanatory to them.
Second, the design of problem contexts (mathematical items) as well as the choice
of arguments included in each case was informed by the following considerations:
 The concepts involved in the conjectures and arguments must be
understandable by the participants.

79
 The reasoning process utilized in each argument shouldn’t include a complex
combination of reasoning modes.
 A variety of techniques to justify the conjecture needed to be present.
 The difference between the content of the conjectures should be apparent.
As shown by various cognitive development models (Tall et al., 2012; van Hiele,
1986; Yang & Lin, 2008), understanding of the concepts was a prerequisite for the
realization of a connection to occur. Since this study focused on reasoning rather than
representation and concept building, it was necessary to make sure that the participants
understood the concepts so that the differences in their judgment could be attributed to
their evaluation of the reasoning methods. The reason to choose relatively “simple”
arguments was for the feasibility of analysis. If an argument involved multiple modes of
reasoning, it would be difficult to identify its features according to the CCIA model,
which would make the coding less usable. Various arguments were desirable to verify
each conjecture since we needed a considerable amount of different arguments to inform
the comparison and to examine the framework. Lastly, different contexts were supposed
to magnify the impact of subject strand on participants’ judgment.
To meet these goals, three conjectures (in Problems A, B, and D, see Figure 8)
were chosen, each representing a topic from one of the three branches of school
mathematics: number theory, and geometry (Problem A was also used by Stylianides &
Stylianides, 2008a). However, it was not assumed that students’ reasoning would be
identical within a branch. Instead, the purpose of choosing conjectures from three
different areas was to provide distinct contexts to detect differentiated judgment and
80
preference of argument types. The three conjectures were all true statements 6, which
provided structural consistency across the problems. Nevertheless, we also included a
false conjecture (in Problem C) in the survey with the intent to seek contrasting data. The
cases that would falsify the conjecture in Problem C were not familiar to the students and
were not easy to be detected. By including the contrasting problem, we aimed to detect
any patterns in students’ judgment that would continue to persist when evaluating false
arguments.
Figure 9. The structure of SMR
6
In Problem D, it was assumed that a bike costs more than $20. This condition was not articulated in the statement of
this problem in order to see if any student would point out this issue.
81
Figure 9 demonstrates the structure of SMR. Four arguments (e.g. Arguments A1 -
A4 in Problem A) were provided in validating each conjecture. All four arguments
supported the validity of the conjecture (even the validity of the false conjecture in the
contrasting problem). We did so because proving and disproving a conjecture engages
different reasoning processes. Finding a counter example is adequate to disprove a
conjecture; however, finding some (but all) examples that satisfy a conjecture is not
adequate to prove its validity. Since fostering realization of the latter point is one of the
major goals of proof instruction (Stylianides & Stylianides, 2008b; Waring, 2000), this
study leaned towards exploring student thinking when evaluating “proof” instead of
“refutation.” The four arguments developed in proving each conjecture were classified as
inductive, algebraic, visual, and perceptual. The inductive argument showed proving
attempts by offering a few examples that supported the validity of the proposed
conjecture. The algebraic argument engaged symbolic representation of the context and
then reinterpreted symbolic results to support the conjecture. The visual argument relied
on graphs and figures to provide proof evidence. The perceptual argument related the
problem to a more familiar context and supported the conjecture via such a connection.
Among all the arguments, four (A1, B1, C1 and D1) were inductive; four (A2, B3, C2
and D2) were algebraic; four (A3, B2, C4, D3) were perceptual; and four (A4, B4, C3,
and D4) were visual (see Table 3).
82
Inductive A1, B1, C1, D1
Algebraic A2, B3, C2, D2
Perceptual A3, B2, C4, D3
Visual A4, B4, C3, D4
Table 3. Type of the arguments used in SMR
Participants needed to respond to several questions that were related to each
argument. They were asked to determine whether they understood the concepts used in
the each of the arguments (for the purpose of confirmation), whether they believed the
argument showed the conjecture was always true, and if the argument helped them
understand why the conjecture was true. We designed these questions since the power of
verification and explanation were regarded as two major functions of proofs that are
recognized by students (de Villiers, 1990, 2003; Hanna, 2000b; Healy & Hoyles, 2000).
After reading all 4 arguments for each conjecture, participants were asked to determine
which of them was the closest to what they would use in the same context. We were
particularly interested in studying students’ preferred type of arguments since
understanding students’ preference, common or diverse, would be helpful in explaining
why students might experience difficulty when learning about proofs. This knowledge
can also inform the design of tasks that encourage conceptual understanding of proof as a
reliable way of reasoning.
83
Interview Participants
Participants for the follow up interviews were selected from those whose SMR
responses were used in the survey analysis. Based on the survey results, the participants
were divided in two groups, the consistent group and the inconsistent group. The
consistent group was composed of those who had preferred the same type of arguments in
at least 3 of the 4 problems (i.e. they chose the same type of argument at least 3 times in
Questions 11, 16, 21 and 26. See Figure 8). 141 of the 476 participants belonged to this
group. The inconsistent group was composed of the remaining participants (a total of
335). No member of this group had preferred any particular type of arguments in more
than 2 of the 4 problems.
To select the representatives to participate the interview, we divided the consistent
group into 4 subgroups, each of which consisted members who demonstrated a tendency
to prefer a particular argument type in their responses to SMR (inductive, algebraic,
visual or perceptual). To select a representative from each subgroup, we obtained a
random number, say n, using a random number generator, and then chose the nth students
from the top of the list as an interview participant. By doing so, we randomly picked 4
representatives, Allen, Abby, Alice, and Amy, from the consistent group. The names are
pseudonyms and are gender appropriate. Using the similar strategy, we randomly picked
4 representatives (by running the random number generator 4 times), Beth, Betty, Blake,
and Brenda, from the inconsistent group. These names are pseudonyms and are gender
appropriate as well. These 8 students participated the interviews.
All the subjects were taking Algebra I or an equivalent Integrated 8th Grade
Mathematics class at the time when they were interviewed. Two of the subjects were
84
taking Honors Algebra I, among which one was from the consistent group while the other
was from the inconsistent group. Since the interviews were recorded at the end of the
spring semester, the subjects were close to finishing their coursework for the school year.
There were 1 male and 3 females in the each group. Seven of the subjects were Caucasian
and only one interviewee (Betty) was African-American. All subjects were enrolled in
rural or suburban school districts. All subjects were native English speakers. The subjects’
background information is summarized in Table 4.
Math Course Number of Preferred Arguments by Scheme

Subject Gender
Taking Inductive Visual Perceptual Algebraic
Allen M Algebra I (Honors) 1 3 0 0
Abby F Algebra I 3 0 0 1
Alice F Integrated Math 0 1 3 0
Amy F Algebra I 1 0 0 3
Beth F Algebra I 0 2 1 1
Betty F Algebra I (Honors) 1 1 1 1
Blake M Integrated Math 1 0 2 1
Brenda F Algebra I 2 2 0 0
Table 4. Background information of the subjects
Interview Procedure
The survey results suggested that students’ preference for arguments were highly
diverse across the problems and between individuals. Therefore, we were not able to
make conclusive assertions regarding the types of arguments that students found more
appealing, nor were we able to distinguish the features of the arguments, as pre-identified
85
by the researcher, that could have significantly impacted the students’ evaluation of the
arguments by comparing those who had received high and low ratings. Since students’
judgment was made upon their personal standards of each argument, we believed there
were hidden factors that had influenced their judgment. In order to further investigate
those factors, we relied on follow up interviews to understand the rationale behind
students’ judgment as indicated in the survey results. In particular, we examined the
sources that students drew from to make evaluations, which were triggered by particular
contexts and arguments.
Each subject was interviewed separately and each interview lasted approximately
an hour. Each interview consisted of three parts (see Table 5). During the first part, the
subjects were provided with the same problems that were used on the survey, however in
a different format. The subjects had the conjecture as well as each argument of the
problem on a separate piece of paper. Different problems were printed on paper with
different colors. The subjects were asked to read the conjecture again and to then rank the
problems according to how convincing they found the arguments. The subjects were
allowed to change their ranking of arguments at any time during the interview. We did so
to make sure the subjects’ list was not offered randomly but after a careful consideration.
86
Subject Interviewer
First Reexamine each problem and rank Ask the subject why he/she believed
part arguments based on how convincing one argument was more convincing
they were to them. Explain the than another.
rationale of the arrangement by
explicit comparison between
arguments in the same context.
Second Compare the rankings across the Identify the inconsistency in subject’s
part problems. Confirm or revise the rankings across the contexts and
arrangement. Explain the differences explicitly point it out. Ask the subject
between arguments in different to explain how he/she viewed the
contexts in justifying the rankings. same types of argument differently in
different contexts.
Third Rank arguments for the new problem Ask the subject why he/she believed
part (See Figure 10) according to how one argument was more convincing
convincing they found the arguments. than another. Compare subject’s
Explain the rationale of arrangement responses for the new problem to
again. his/her previous answer and probe an
explanation from the subject.
Table 5. Overview of the interview process
It has been suggested that people usually find it more difficult to reflect on their
own thoughts (Tarricone, 2011). However, by asking the subjects to justify their selection,
their explanation could reveal factors that had impacted their preference. In addition to
87
what the subjects offered, we selected the following items as backup questions in case
they remained quiet or didn’t provide explanations that were understandable to us.
 Do you think that one of the arguments is wrong?
 Do you think that one of the arguments can only prove the conjecture is true
for some cases instead for all cases?
 Do you think that one argument helps you understand the problem better?
 Do you think that one argument offers better evidence?
 Do you think that one argument’s evidence cannot support its conclusion?
Throughout the interviews, the subjects were encouraged to explain their thoughts
as they felt inclined to do so. Furthermore, if their answer to a question was yes without
comment, we asked them to elaborate on their responses. Students’ responses to these
questions allowed us to identify their conception of the argument according to the CCIA
framework.
During the first part of the interview the subjects were asked to compare
arguments in each context. During the second part, we asked students to compare the
arguments across the contexts. We were interested in whether the subjects would modify
the order after such a comparison. We were also interested in learning whether the
subjects from the consistent group would act differently from representatives from the
inconsistent group. Most importantly, we wanted to know how the subjects justified their
preference when diversity existed in the types of arguments their preferred (e.g. a subject
preferred an empirical argument in one problem while ranked it as the least convincing in
another problem). Their explanation again revealed factors and features they considered
88
as important when making judgment of mathematical arguments and how such factors
and features might vary across the contexts.
During the last part of the interview, a new problem similar to those on the survey
problems was given to each subject (see Figure 10). The new problem required basic
understanding of elementary probability and proportional reasoning. The four arguments
used (E1 – E4) were inductive, visual, perceptual and algebraic, respectively. The
subjects were again asked to rank the arguments according to how convincing they were
in justifying the conjecture. Comparing the ranking provided for Problem E to their
responses to the previous four problems, the subjects were asked for the last time to offer
rationales for their decisions.
89
PROBLEM E
There are some white and orange ping-pong balls in a box. You cannot see what’s inside
the box but you will get a reward if you pick out an orange ping-pong ball from the box.
Jenna claims that:
“If the number of white ping-pong balls and the number of orange ping-pong balls
are both doubled, the chance for you to get a reward still stays the same.”
Arguments E1 - E4 are offered by different people to justify Jenna’s claim.
************************************************************************
Argument E1: Suppose there are 2 orange ping-pong balls and 3 white ping-pong balls
in the box, then the chance for you to get a reward is 2 out of 2+3, which is 40%.
If the numbers of ping-pong balls of each color are both doubled, then there will
be 4 orange ping-pong balls and 6 white ping-pong balls. Hence the chance for
you to get a reward is 4 out of 4 + 6, which is also 40%. Therefore, the chance of
winning the reward won’t change.
Argument E2: As shown in the figure below, if the numbers of orange and white ping-
pong balls are both doubled, the ratio between the ping-pong balls of the two
colors will still be the same. Therefore, the chance of winning won’t change.
… …
… …
Argument E3: When the number of orange ping-pong balls is doubled, the cases for
winning the reward are also doubled. However, when the number of white ping-
pong balls is doubled, the cases for not winning the reward are also doubled. As a
result, the ratio of the cases of winning to the cases of not winning stays the same.
Therefore, the chance of winning won’t change.
Argument E4: Suppose there are n orange ping-pong balls and m white ping-pong balls
in the box, then the chance for you to get a reward is n / (n + m). If the numbers of
ping-pong balls of each color are both doubled, then the chance for you to get a
reward becomes 2n / (2n + 2m), which is equal to n / (n + m). Therefore, the
chance of winning the reward won’t change.
Figure 10. The additional problem used in interview
90
Data Analysis
Data analysis followed two phases, consisting of quantitative analysis of the
survey results and qualitative analysis of the interviews. Outline of the data analysis
process is included in Table 6.
Survey Data Cumulative data were used to identify the type of arguments that were
Analysis understandable, convincing, explanatory or appealing to the entire group
of participants.
Between subgroup comparisons were conducted to investigate between
subgroup differences and possible causes.
Interview Each subject’s responses in the interview were coded and factors that
Data Analysis impacted the individual’s decision were identified.
Common factors that impacted each individual’s evaluation were
summarized and the individual differences were investigated through
between subject contrasts.
The subjects’ responses were revisited and summarized by problem. The
context’s impact on students’ decision was explored.
Survey results were revisited. Explanations to unexpected findings and
proposed hypotheses about the survey data were provided based on the
interview analysis.
Table 6. Outline of data analysis process
Survey Data Analysis
Quantitative analysis of the survey results focused on answering the following
questions:
91
 Which argument in each problem was indicated understandable by the most
participants?
 Which argument in each problem was indicated by the most participants as
being sufficient to show the general validity of the conjecture?
being clear to explain the validity of the conjecture?
being closest to what they preferred to use when encountering the same
conjecture?
 Were the answers to the four questions above consistent for each problem?
 Were the participants’ ratings consistent across the problems when judging
the same type of argument?
The participants’ responses to each of the SMR’s items were summarized to
answer these questions. For example, in order to determine what argument in Problem A
was indicated understandable by the most participants, we calculated and compared the
percentages of those who answered “agree” and “disagree” to the first question under
Arguments A1 – A4 (i.e. “You understand the concepts and notations used in the
argument”). The more participants answered “agree” and the less answered “disagree” to
this question under A1, the more understandable we considered A1 was. Since the
questions listed above were directly related to the SMR items, a cumulative summary of
the participants’ responses was enough to provide an answer to each of them.
We recognized that solely relying on documenting the cumulative data was not
sufficient to identify whether the difference of the evaluations between two arguments
92
were significant. For example, there might be more participants who found A1 to be
understandable than those who answered “agree” to the same question under A2,
however without clarifying whether the scale of the additional participants was
significant, it would be premature to claim that A1 was considered more understandable
than A2. Therefore, tests of significance of the differences between the accumulative
percentages were used to clarify any claims regarding the survey data. In particular, we
adopted within group ANOVA tests to examine the significances of the differences in
students’ responses to different arguments.
In addition to the analysis of the results from the entire group, we also examined
the data according to specific subgroups of the participants. We suspected that
comparison between responses according to particular subgroups of students might
enable us to associate the inherent differences between the subgroups and to identify
factors that had influenced the participants’ evaluation of mathematical arguments. We
managed to examine whether students who achieved higher scores on state standardized
mathematics tests demonstrated greater maturity in mathematical reasoning as measured
by SMR. In addition, data were also analyzed according to gender. The rationale was that
since the male and female students were enrolled in the same schools and in the same
classrooms taught by the same teachers using the same teaching materials and techniques,
the differences in their responses might be attributed to non-mathematical experiences.
The techniques used in the between subgroup comparisons were similar to what was done
when analyzing the entire group’s responses. We examined and compared each
subgroup’s responses to each question in the SMR and adopted the between group
ANOVA to evaluate the significance of differences. During the analysis of survey results
93
from the entire group and from various subgroups, conjectures were also made to explain
why certain arguments received higher ratings from the participants based on the
researchers’ understanding of the context, curriculum, and sample population. Evidence
to support or decline those conjectures were sought after during the interview analysis.
Interview Data Analysis
Summary of the survey responses revealed the participants’ judgment of and
preference for mathematical arguments at the macro level; however they were not
adequate to explain why an individual had made certain decisions when completing the
survey. The latter was the focus of interview analysis. In examining the interview
responses, we first identified both positive and negative comments made by each
individual subject in explaining why an argument was convincing to him/her. These
comments were summarized in a table as raw data. The comments were then coded using
the CCIA framework (see Figure 7). Specifically, we identified if the comments referred
to the representation, evidence, or the link between evidence and conclusion. We then
calculated the frequency of occurrence of comments on representation, evidence and link
to make conclusion about the subjects’ attention in making decisions to determine which
had the largest impact on each subject’s decision. Furthermore, we traced the type of
representation, evidence or link contributed to students’ evaluation in a positive/negative
way by counting the how many times they were referenced by the subject in the
explanation. The frequency of references to these elements served to identify factors that
substantially impacted the subject’s judgment. In addition to the elements conceptualized
in the CCIA framework, it was expected that other factors that had impacted the subjects’
94
decision would be detected during the coding process. Those factors were considered as
personal standards and were specially studied. Below we offer an example to illustrate
the process of analysis on the interview of a subject.
The case of Allen
Allen’s case serves as an illustrative example of the interview analysis technique
used when examining the interviews. The same analytical model was utilized in all other
7 cases. The proceeding discussion provides details of the procedure about how Allen’s
responses during the interview were transformed into an analyzable form, how this form
was coded, and how the coding was interpreted to understand the rationale of his
evaluation of mathematical arguments.
Allen was an 8th grade student enrolled in an Honors Algebra I class at the time of
data collection. In his responses to SMR, the visual arguments were indicated to be
closest to how he would argue in all but Problem C, where he selected C1, the inductive
argument. Based on these results, we believed that Allen had exhibited preference
towards visual arguments. Therefore, he was considered to be a representative from the
consistent group. The discussion of Allen’s performance in the interview includes both
data report and data analysis. In data report, we describe the interview process in detail,
including how he ranked the arguments according to how convincing they were to him
and how he justified his rankings. In data analysis, we identify factors that seemly
influenced Allen’s evaluation of the arguments based on his comments on and rankings of
the arguments.
95
Ranking arguments
In the first part of the interview, Allen was asked to work on Problems A through
D one by one. He (as well as other subjects) decided which problems to work on first and
next. However the decision wasn’t based on the content of the problem but the color of
the paper on which it was printed. Table 7 illustrates the rankings provided by Allen for
each problem. Column One of the table represents the order of problems that he tackled.
Most convincing -------------------------> Least convincing

Problem C C2 (algebraic) C4 (perceptual) C3 (visual) C1 (inductive)
Problem B B3 (algebraic) B4 (visual) B2 (perceptual) B1 (inductive)
Problem D D4 (visual) D1 (inductive) D2 (algebraic) D3 (perceptual)
Problem A A4 (visual) A2 (algebraic) A3 (perceptual) A1 (inductive)
Problem E E2 (visual) E4 (algebraic) E1 (inductive) E3 (perceptual)
Table 7. Rankings of arguments provided by Allen
Allen chose to start with Problem C. From the most to the least convincing, Allen
ranked the argument as C2 (algebraic) - C4 (perceptual) - C3 (visual) – C1 (inductive). In
explaining why he put C2 at the top of the list, Allen suggested that “it uses formulas
which I know are fact, and I like seeing fact.” Later, he repeated a similar comment “this
one has a formula, which I love.”Additionally, he called C2 as the “simplest, quickest,
most effective way” and “very straightforward.” In explaining why he considered C4 less
convincing than C2 but more convincing than the other two arguments, Allen suggested
that C4 was convincing because “it still uses sides and areas.” However, what made him
96
to consider it less convincing was because “it’s less straightforward.” In addition, he
suggested that he had never done something similar to what was described in C4 but he
could imagine the scene. In particular, he suggested that the argument “clearly states, if
you’re using a wired outline which is the figure, I can picture that in my mind, I know
exactly what they’re talking about.” In explaining why he put C1 at the bottom of the list,
Allen suggested that he viewed the “wording" of the problem problematic. For example,
he suggested that “it’s trying to relate too many things: a = b = c,” and “it trip me up for
the first few seconds.” Although the argument contained “a formula,” which he did like, it
was “not straightforward enough.” So this was the only argument in the problem that he
actually didn’t like. In explaining why he put C3 low on the list, Allen claimed that he
liked C3 since “it uses the length of the sides, which is what I would use any day of the
week.” He also liked the fact that “it has diagrams… which explains what they’re talking
about there.” However, when comparing C3 to C2 and C4, Allen only repeated what the
reasons for which he ranked C2 and C4 high on the list but didn’t specify why he didn’t
consider C3 as convincing.
After justifying his order in Problem C, Allen continued to work on Problem B.
From the most to the least convincing, he ranked the arguments as B3 (algebraic) – B4
(visual) – B2 (perceptual) – B1 (inductive). He chose to first explain why he didn’t think
B1 was convincing. In particular, he suggested that he was a visual learner but he wasn’t
“seeing any visual representation.” He considered the argument as a mere “opinion” and
suggested “there’s no other facts.” So “there’s not enough support for me, compared to
the other ones.” In evaluating B2, Allen suggested that “it does give a clear example”
which he did believe. So it was better than argument B1. However “it is still not my
97
favorite.” In evaluating B4, Allen suggested that “it shows a circle there, I do like these, I
can clearly relate to these.” He further explained his understanding of the details of the
argument, “I can see with, I guess you can call them formulas, bc here is the length of the
side of the rectangle, and it is smaller than bp, just by a little bit there, so I can believe it.”
He indicated that he liked B4 and B3 “very closely.” Lastly, Allen articulated that C3 was
his favorite since “I like the fact of using the Pythagorean theorem. It’s more
straightforward than using all the angle and the side relations.” He also suggested that by
using the Pythagorean theorem, he could “figure out the problem in a minute or so.”
The next problem Allen worked on was Problem D, where he ranked the
arguments, from the most to the least convincing, as D4 (visual) – D1 (inductive) – D2
(algebraic) – D3 (perceptual). He suggested that D4 was his favorite since “I like the
visual aid again.” When asked to explain his understanding of the graph, Allen pointed at
the graph and stated, “seeing that after and before are always 1 separated there, 1
separated there, and it never changes, since they’re parallel, so using those, I do think that
that is the best saying that you can always save an extra dollar.” When commenting on
D1, Allen expressed that it was “a little wordy, and that’s why I put it the second.”
However, he expressed that he liked “the fact that they used other, that they can also plug
in a price here, as well as using a formula and plugging a price in.” He claimed that he
liked the argument since he was “a formula kinda guy.” It was interesting that D1 didn’t
contain any formulas, instead there were equations to calculate the results. However,
Allen was able to conclude that replacing a particular value in the equation wouldn’t
change the outcome. Therefore he considered the equations as formulas. While
evaluating D2, which had the actual formula, Allen said that he didn’t think “there was
98
too much difference.” However, he found D2 to have “more formulas” and “less
explanations.” He found D1 to be more “straightforward” and “it offered an example.”
Therefore, he considered D1 to be more convincing that D2. In evaluating D3, Allen
suggested that “that is more business, it’s not applying directly to math.” He further
explained that “just stating that and not giving that much evidence, it’s not very
convincing to me.” Therefore, he didn’t consider D3 convincing.
When assessing Problem A, Allen provided the rank as A4 (visual) - A2 (algebraic)
- A3 (perceptual) - A1 (inductive). He started justifying his ranking with A1, stating that
he didn’t see “very many supporting arguments.” He suggested that “there’s almost no
mathematical evidence here, except the opinions and personal work of other people doing
math, and not showing what they did.” Therefore, he considered A1 the least convincing.
Allen considered A3 to be more convincing than A1 since “it says what you can do to
figure it out.” However, since there was “no formula, or visual representation,” he didn’t
consider it as convincing as A2 and A4. To clarify, Allen claimed that in A3, “there is
proof; it’s just not solid, like always a formula.” In evaluating A2, Allen first substituted
a number, 3, to verify if the formula was correct. He then suggested that “it doesn’t
matter” which number he tried, however he needed to check “3 or 4 different ones” to be
sure about the result. In the evaluation of A4, Allen argued that it had “very simple visual
representation,” so it justified the conjecture “clearly, and in my mind effectively.”
In the second part of the interview, Allen was asked to revisit Problems A through
D and compare the ranking he had provided. He was first asked why he considered A1,
B1 and D1 to be the least convincing argument in the each respective problem. He
explained that those arguments did provide some examples but were more like “opinions
99
and people doing things that I have not personally seen.” He didn’t think such examples
were as convincing as those backed up by theorems and graphs. Allen was asked to
justify why he considered D1 convincing since it also showed just a few cases. He
responded that “there’s always the showing, they’re working it out,” which was better
than “plainly stating what they had tried.” Furthermore, Allen was asked if A1 was be
more convincing to him if it had provided more details of the checking procedure. He
replied yes to this question and offered that “giving concrete numbers and facts and
stating their observations of what they did the experiment on” would made it a better
argument.
In addition to the evaluation of inductive arguments, Allen was asked to explain
why he considered the algebraic and visual arguments convincing in all problems. He
stated that formulas and diagrams made arguments more clear and “if there’s a
combination of visual diagrams and formulas, that would be fabulous, that would be
perfect.” Furthermore, in justifying why he considered the perceptual arguments less
convincing, Allen reasoned that “simply saying to imagine it, then stating that it’s
definitely longer, you’re not giving any example.”
During the third part of the interview, Allen was asked to examine Problem E and
rank E1 – E4 according to how convincing they were. His rank was: E2 (visual) - E4
(algebraic) - E1 (inductive) - E3 (perceptual). This rank was consistent with the rank he
provided in Problems A – D, where visual and algebraic arguments were generally
considered more convincing than inductive and perceptual arguments. In evaluating E2,
Allen proposed that “immediately when I noticed the graph I know it will be high
ranking.” However the interviewer soon realized that Allen didn’t actually understand the
100
graph. Allen was allowed to reexamine the argument but he still couldn’t explain how the
graph supported the conjecture. Therefore, the interviewer explained what the graph
meant, in particular how it represented the “doubling” procedure stated in the problem.
This episode suggested that Allen’s preference for visual illustration might not be based
on a careful analysis of the argument. Instead, he might have been attracted to it due its
appearance. It was unclear if this was an isolated case. In fact, his understanding of the
graphs in Problems A though D was also checked by the interviewer and no
misunderstanding was revealed at those times.
Another interesting finding was that Allen actually found it difficult to explain his
understanding of the graph in E4. So he chose to refer to E4 (algebraic) and used the
symbols to describe his ideas. This case demonstrated that Allen was comfortable in
using letters to represent variables in mathematical contexts. Allen further suggested that
E2 and E4 “are in principle the same,” but that he still preferred E2 to E4 since E2 “ is
still stating that clearly, while giving me the visual.” Allen added that he in fact liked all
the arguments. However, he considered E1 and E3 less convincing since E1 “provides an
example, not an opportunity to provide your own examples,” while the description
offered in E3, although “in principle the same” as what was offered in E4, was less
appealing to him than visual and algebraic representations. This explanation revealed that
the representation of an argument had a clear influence on Allen’s evaluation of it.
Note that in both Problems E and D, Allen claimed that the inductive and
algebraic arguments were similar, however, he considered the inductive argument E1 to
be less convincing in Problem E, while he found the inductive argument D1 to be more
convincing in Problem D. In explaining this inconsistent selection, Allen expressed his

101
preference for formulas, “I love formulas, which are always in my mind second to visual
representations.” He claimed there were formulas in D1, D2, E1 and E4 even though the
text in D1 and E1 didn’t contain any formula (there were numerical equations instead).
This again demonstrated that Allen seemed to be able to conceptualize the formula by
looking at the equations. It was interesting that when evaluating D1, Allen expressed that
he “can see the formula,” while for E1, he claimed that “this is not straightforward
because it only gives one example” and “there would be a formula here, but it’s not
stated.” When evaluating D2, Allen suggested that “this is not straightforward, because it
is a longer and more complicated and not straightforward enough formula.” When
evaluating E4, he characterized that it “straightforward giving you the formula there,
instead of providing two examples.” That is, he used double standards when evaluating
arguments containing algebraic formulas. In his view, the arguments needed to be
“straightforward” to be convincing, and whether he preferred arguments with numerical
equations or algebraic formulas depended on whether the formulas he saw in those
numerical equations were more “straightforward” than the given algebraic formulas. It
was unclear the standard of being “straightforward” meant, however we suspected the
complexity of the formula and his familiarity with it might have been two important
factors.
Allen’s comments on the arguments used in the interview were summarized in
Table 8. Each comment was then characterized using a coding strategy in line with the
theoretical constructs of CCIA framework (see Figure 7).
102
Positive Comments Negative Comments
Problem C
It uses formulas which I know are fact, and It’s not straightforward enough. (P)
I like seeing fact. (E4, R4)
[Formula is the] Simplest, quickest, most It’s trying to relate too many things: a = b
effective way. (P) = c. (P)
Very straightforward. (P) If I’m trying to figure this out for the first
time, I wouldn’t think that a, it doesn’t
equal that, and that it doesn’t equal that,
and that would like, trip me up for the first
few seconds. (E2)
I would like to see a diagram… I’m a It is not clearly outlined. (P)
visual learner. (E2, R1)
It is a formula, which I do like. (E4, R4) It’s trying to relate too many things. (P)
It uses the length of the sides. (E4)
Any area or angle problem, I would

always use corresponding sides and
angles. (E4)
This one has formulas, which I love. (E4,
R4)
It has diagrams… which explains what
they’re talking about there. (E2, R1)
It clearly states, if you’re using a wired
outline which is the figure, I can picture
that in my mind, I know exactly what
they’re talking about. (E3, L2)
Problem B
It shows an example. (E2) I’m not seeing any visual representation.
(R1)
It does give a clear example. (E2) It’s really opinion; there’s no other facts.
(E6-)
It shows a circle there, I do like these, I There’s not enough support for me,
can clearly relate to these. (E2) compared to the other ones. (E3-, L2-)
I can see with, I guess you can call them
formulas, bc here is the length of the side
of the rectangle, and it is smaller than bp,
just by a little bit there, so I can believe it.
(E4, L4, R1)
Continued
Table 8. Summary of comments made by Allen
103
Table 8 continued
I like the fact of using the Pythagorean
Theorem. It’s more straightforward than
using all the angle and the side relations.
(E4, P)
It also has a formula that I can work out by
myself and see the process of doing it. (E4,
L4)
Problem D
I like the visual aid again. (R1) It is a little wordy. (R2-)
Seeing that after and before are always 1 I like straightforward ones. (P)
separated there, 1 separated there, and it
never changes, since they’re parallel, so
using those, I do think that that is the best
saying that you can always save an extra
dollar. (R1)
I like the fact that they used other, that [There’s] less explaining. (P)
they can also plug in a price here, as well
as using a formula and plugging a price in.
(E2, L5)
It’s a formula, I’m a formula kinda guy. Just stating that and not giving that much
(E4, R4) evidence, it’s not very convincing to me.
(E6-)
I like the fact that it’s always constant, you
can plug any value in. (E2, L4)
More explaining, as well as they use
examples here. (E2, R2)
Problem A
It says what you can do to figure it out. I’m not seeing very many supporting
(E3, L2) arguments. (E4)
Very simple visual representation. (R1) There’s almost no mathematical evidence
here, except the opinions and personal
work of other people doing math, and not
showing what they did. (E4, E6-)
No formula… or visual representation…
(R1, R4)
There is proof, it’s just not solid, like
always a formula. (R4)
continued
104
Table 8 continued
Comparing Problems A-D
When someone is trying to convince me of Opinions and people doing things that I
something, I would like facts. (E4) have not personally seen. (E6-)
Formulas and diagrams. (E4, R1, R4) There’s no solid, stated proof right there is
not as convincing as a line graph, or the
Pythagorean theorem, or those. (E4, R1,
R4)
There’s always the showing, they’re Just plainly stating. (E6-)
working it out. (P)
Giving concrete numbers and facts and Simply saying to imagine it, then stating
stating their observations of what they did that it’s definitely longer, you’re not
the experiment on. (E2, R3) giving any example. (E2, E6-)
If there’s a combination of visual diagrams This argument is based on shapes and
and formulas, that would be fabulous, that common sense, but not explanatory sense.
would be perfect. (E2, E4, R1, R4) (E6-)
Problem E
This is still stating that clearly, while Just given that information and no outside
giving me the visual. (R1) knowledge that that would work for all
cases. (E6-)
It does explain it clearly. (P) It provides an example, not an opportunity
to provide your own examples. (E2)
An opportunity to provide your own
examples. (E2)
Work it out on my own, and find out more.
(E2)
Additional Comments
I love formulas, which are always in my It’s not as clear. (P)
mind second to visual representations. (E4,
R1, R4)
This is more straightforward. (P) It only gives one example. (E2)
OK, because it gives examples that This is not straightforward… because it is
worked. (E2) a longer and more complicated and not
straightforward enough formula. (P)
I would still have the opportunity, because There would be a formula here, but it’s not
it’s a formula, to provide my own stated. (R4)
examples. (E2, R4)
Straightforward giving you the formula
there. (E4, R4)
I can see the formula. (E4, R4)
105
Data Analysis
As shown in Table 7, Allen demonstrated a clear preference towards arguments
backed by visual illustration and formulas during the interview. He chose visual
illustrations as the most convincing arguments in Problems A & D, and selected algebraic
arguments as the most convincing arguments in Problems B & C. In addition, he
considered the inductive arguments as the least convincing arguments in Problems A, B
& C, believing that those arguments didn’t offer enough support to prove the conjecture.
Allen’s general preference toward visual arguments was consistent with his responses in
the SMR.
In order to identify the factors and features of the arguments that had impacted
Allen’s evaluation, his comments during the interview were coded according to the CCIA
framework (see Table 9). We did so to identify whether each of his comments referred to
the representation, evidence, or the link between evidence and conclusion 7 (which was
denoted by the Capitalized letter, R, E, and L, respectively) and the kind of representation,
evidence and link (which was denoted by a single digit following the letter) that had
positively/negatively impacted the subjects’ evaluation of the argument. The coding was
included in Table 8 at the end of each comment. For example, the comment that “it uses
formulas which I know are fact, and I like seeing fact” was coded “E4” and “R4” (see
Table 8, third line from the top), since it was based on a mathematical fact as evidence,
which was expressed in a symbolic form. According to the CCIA framework, this type of
7
“Link between evidence and conclusion” of an argument is referred as “link” of the argument for convenience in the
discussion.
106
evidence is considered “fact,” which is coded “E4” as listed in Table 9. The
representation is considered “symbolic,” which is coded “R4.”
Representation Evidence Link

Visual: R1 Authority: E1 Direct: L1
Narrative: R2 Example: E2 Perceptual: L2
Numerical: R3 Imaginary: E3 Inductive: L3
Symbolic: R4 Fact: E4 Transformational: L4
Assumption: E5 Ritual: L5
Opinion: E6 Deductive: L6
Note:
i). “P” denotes comments that didn’t refer to the representation, evidence, or link of
arguments. These comments were analyzed separately from others.
ii). “NA” denotes comments that the subject claimed that he/she didn’t understand the
argument and didn’t offer any explanation.
iii). A notation “-” was added behind the code to indicate that this factor made the
argument less convincing to the subject.
Table 9. Table of codes
The following points are important in understanding the coding procedure.
1) Not all comments could be coded according to the CCIA framework. In cases
where factors that contributed to the judgment were not identifiable or were not about the
type of representation, evidence, or link of an argument, we coded them as “P,” denoting
that there were personal standards that need to be further examined. We called it personal
standards since those reasons might not be associated with any particular type of
107
argument. For example, the comment “it’s not straightforward enough” was coded “P”
since it could apply to many different types of arguments. There were also cases when the
subject indicated that he/she was not able to understand an argument. We use “NA” to
denote such comments, suggesting that the subject was unable to provide an evaluation
for the argument.
2) A certain factor could make an argument more/less convincing to the subject.
To identify the different effect, a “-” was added to the end of a code if the identified
factor made the argument less convincing. If there was no such a mark after a code, then
the corresponding factor made the argument more convincing to the subject.
3) A comment could refer to different features or factors of an argument. In this
case, it was classified using multiple codes. For example, the comment that “it’s a
formula, I’m a formula kinda guy” referred to the formula as the mathematical evidence
as well as a symbolic representation. Hence it was coded both “E4” and “R4.”
4) There were cases where it was difficult to judge what exactly an argument
meant merely based on the text of the comment. In this case, we reexamined the dialogue
in the recorded video to determine the contextually embedded meaning of a comment.
For example, only by reading the comment that “I’m not seeing very many supporting
arguments,” we were not able to understand what exactly the “supporting arguments”
meant. However, by fitting this comment in the conversation, we identified that Allen
referred the formulas and mathematical facts as what he called “supporting arguments.”
Therefore, we assumed that this comment referred to a mathematical fact as evidence,
which was coded as “E4.”
108
5). Similar comments might have been mentioned by Allen in multiple places
during the interview. Such comments were counted multiple times. The assumption to do
so was that if a point was addressed multiple times, it should be viewed as being more
significant than other opinions to the subject.
The codes for Allen’s comments were then summarized to examine his evaluation
criteria. Table 10 illustrates the result.
Total number of references to representation: 27

Visual Narrative Numerical Symbolic
Positive 12 1 1 12
Negative 0 1 0 0
Total number of references to evidence: 47

Positive 0 18 2 17 0 0
Negative 0 1 1 0 0 8
Total number of references to link: 7

Direct Perceptual Inductive Transformational Ritual Deductive
Positive 0 2 0 3 1 0
Table 10. Categories of comments made by Allen
As shown in Table 10, the total number of comments that focused on the
representation, evidence and link of the arguments were 27, 47, and 7, respectively,
109
indicating that the evidence of arguments had the largest impact on Allen’s judgment.
Among all types of evidence, Allen found that fact (i.e. known mathematical results) and
examples (i.e. results from an immediate test) as reliable source to establish an argument,
each of which was referred 17 and 18 times. His explanation was heavily rooted in the
discussion of specific mathematical concepts (e.g. specific numbers’ properties, specific
geometric properties, meaning of graphs, etc.) instead of personal assumptions or
opinions. This was highlighted by his claims that “when someone is trying to convince
me of something, I would like facts” and “giving concrete numbers and facts and stating
their observations of what they did the experiment on” would make an argument
convincing. In addition, he clearly emphasized that “opinions and people doing things
that I have not personally seen” didn’t make an argument convincing to him. Similar
statements were mentioned for 8 times during the interview. Overall, Allen’s comments
demonstrated his need of seeing specific and concrete evidence to be convinced.
The representation of arguments also influenced Allen’s judgment. In particular,
he indicated that visual and symbolic representations contributed to his conviction, each
was noted 12 times during the interview. Allen claimed that he loved “formulas, which
are always in my mind second to visual representations.” He also suggested that if
“there’s a combination of visual diagrams and formulas, that would be fabulous, that
would be perfect.” This tendency was backed up by his capability to represent variables
with symbols and manipulate the symbols fluently, as well as the capability to connect
graphs to the content of the problem.
However, Allen’s algebraic skills didn’t enable him to evaluate the logic used to
connect evidence and conclusions. Among all the comments he made, 7 referred to a
110
certain type of link between evidence and the conclusion of an argument. In 2, 3, and1
cases, respectively, Allen found a perceptual, transformational, and ritual link convincing.
We found Allen was not able to recognize that showing a few examples couldn’t prove a
conjecture is always true. He considered an argument convincing “because it gives
examples that worked.” Nevertheless, this didn’t suggest that Allen’s mathematical
reasoning ability was underdeveloped. In fact, we argue that the ability to examine a
single case carefully was a required step for a further conceptualization of generic
examples (Balacheff, 1988). We had noticed that Allen was capable of extracting
properties he saw in one example and applied them to other cases. This was demonstrated
when he claimed “I like the fact that it’s always constant, you can plug any value in” after
examining a few cases.
In addition to the features/factors identified by CCIA, Allen had personal
standards for deciding whether an argument was convincing or not. There were 14
comments that were coded as “P” (see Table 8). In particular, 9 of these comments
concerned the simplicity of an argument, using terms such as “straightforward,” “simple,”
and “quick” to explain why he was or was not convinced, while the other 6 comments
referred to the clarity of the arguments (e.g. “There’s always the showing, they’re
working it out.”). In fact, we found that the pursuit of simplicity and clarity overrode his
preference for the type of representation and evidence. This was demonstrated by his
claim that “this is not straightforward… because it is a longer and more complicated and
not straightforward enough formula.” That means, in order for formulas, one of his
favorite types of evidence and representations, to be convincing in an argument, they
need to be “straightforward” and not too “complicated.”

111
Combining Allen’s personal scheme and those characterized by the CCIA
framework, we obtained a clearer picture of Allen’s rationale for evaluating mathematical
arguments (see Figure 11). Allen viewed arguments that utilized precise description and
involved simple reasoning procedure as convincing. To him, known mathematical facts
and concrete examples were the most straightforward source of evidence, while the visual
and symbolic representations were the clearest ways to describe and relate those
examples. However, since Allen was not yet able to reflect on the rigidness of logic
embedded in an argument, the type of link between the evidence and conclusion was not
among his major focuses. Arguments that used transformation, perceptual and ritual link
were all perceived as convincing by him. An argument was convincing to him as long as
the reasoning looked “straightforward” to him, regardless of it logical rigidity.
Ritual
Perceptual
Transformational Examples, Facts
Convincing
arguments
Simple procedure (Visual, Symbolic)
Precise description
Figure 11. Illustration of Allen’s rationale for evaluating mathematical arguments
With this platform, Allen’s rankings of the arguments (see Table 7) became more
sensible. In Problem C, the clarity of the evidence provided in each argument determined
their ranking. The evidence provided C2, C4, C3, and C1 was the triangle area formula,
112
the imagery triangle made by wire, the drawn triangle within a transformation process,
and a collection of triangles, respectively. Among all these, the formula was the most
simple and clear; the imagery triangle made by wire was less clear but also very simple;
the triangle within a transformation process looked more complex; while the collection of
triangles offered a mix pond of information and “trip me (Allen) up for the first few
seconds.” This explained Allen’s rankings of them.
Arguments in Problem B were also ranked based on the simplicity and clarity of
the evidence provided by them. Compared to his ranking for Problem C, the only
difference was that the rankings of the visual and perceptual arguments were switched.
Allen’s explanation was that the image of the triangle made by wire was clearer than the
image of a football field. Therefore, the argument based on the football field scene was
not less convincing to him.
In the other three arguments, Allen found the visual arguments to be the most
convincing option while the algebraic arguments were ranked lower. A possible
explanation was that in Problem C and B, both algebraic arguments contained well
known mathematical facts (triangle area formula and the Pythagoras Theorem); however,
in Problems D, A and E, the algebraic expressions were not known results but were used
to represent the variables’ relationship in the problem. Therefore, the absence of clear and
simple evidence made them less convincing to him.
The different rankings of the inductive arguments across the problems could also
be explained. Notice that in Problems A, B and C, the inductive arguments were
considered the least convincing. This was because there was no concrete example given
in A1 and B1, while in C1, the examples might seem confusing to him. However, since
113
D1 and E1 discussed more details about the examples, they were considered more
convincing.
Overall, we found that the pursuit of simple and clear statements, the need to see
mathematical facts and concrete examples, and preference towards visual and symbolic
representation could help explain Allen’s evaluation of mathematical arguments.
Cross comparison
Seven other subjects’ interview data were analyzed using the same process as
illustrated about. Details of these analyses are included in the next chapter. Following
each individual analysis, a cross comparison of data for all subjects was performed to
order to document the similarities and differences among their responses. In seeking the
similarities, we considered whether there were factors that consistently impacted all (or
the majority of) subjects’ decisions. We calculated the frequency of occurrence of factors
for all subjects and identified those most prominently referenced. We also contrasted the
coded summary of each subject’s comments to identify between individual differences in
terms of the elements identified in the CCIA framework and any additional personal
schemes detected during the coding process.
The final stage of analysis focused on exploring context specific factors that
influenced the subjects’ decisions. In particular, we examined causes for the inconsistent
rankings that were provided by the subjects to the same type of arguments. Those factors
were identified to explain how context influences the evaluation of mathematical
arguments. Details of the survey and interview results as well as findings from
quantitative and qualitative analysis are shared in the following chapter.
114
CHAPTER 4. RESULTS
This chapter is composed of two sections. The first section is dedicated to the
analysis of the survey data. The second section offers a discussion of the results of the
interviews.
Findings from SMR
We analyzed SMR responses from 476 eighth grade students. The survey results
suggested that students’ judgment of the same type of arguments were highly diverse
across the problems and between individuals.
Arguments understandable to students
The first step in the analysis process considered arguments that the participants
claimed to understand. In order to do so, we calculated the percentage of students who
responded “agree” to the question “You understand the concepts and notations used in the
argument” under each argument (see Figure 12). If a participant answered “agree” to the
question when judging a argument, we considered the argument understandable to the
participant.
115
Figure 12. The percentage of participants who considered each argument understandable
Whether the questions were understandable to students was critical in verifying if
the items were appropriate for this age group. As shown in Figure 12, most arguments
were reported to be understood by more than 58% of the participants.
We calculated the number of arguments that each respondent claimed to
understand. Figure 13 illustrates a summary of the participants’ responses. As shown, the
majority of the participants (80.5%) claimed that they understood more than half of the
arguments. We also generated a similar graph to illustrate the distribution of the number
of arguments that were identified as not understandable (not including “not sure”
116
responses, see Figure 14). As shown, those who had claim not having understood more
than 4 arguments counted for less than 15% of the participants. This data suggested that
the arguments / conjectures included in SMR were considered understandable by most of
the respondents. This also suggested that the SMR problems didn’t exceed the
participants’ self assessed mathematical ability so that their evaluation of the arguments
were less likely to have been made randomly.
70
62
60 56 54
49 51
50 44
39
40
28 28
30
19 21
20
12
10 4
2 2 2 3
0
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
Figure 13. Distribution of the number of arguments indicated understandable by each

participant
117
160
136
140
120
100
80 83
80
58
60 49
40 34
16
20 6 6 4 0 2 0 0 1 1 0
0
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
Figure 14. Distribution of the number of arguments indicated not understandable by each
participant
Higher percentages of participants indicated A1, B1, C3 and D1 as understandable
in each of the respective problems (see Figure 12). Among them, A1, B1 and D1 analyzed
the conjecture by examining a few examples; while C3 was a visual illustration of why
the conjecture was true. Among the second most understood arguments in each problem
(i.e. A4, B4, C4, and D3), A4 and B4 were visual arguments, while C4 and D3 argued in
a perceptual way. Among the least understood arguments in each problem (i.e. A2, B3,
C1&C2 (tie), and D2), all utilized symbolic representations and argued algebraically
except Argument C1, which was an inductive argument but involved plenty of symbols
and labeled figures. Argument C1 required a more careful realization of the connection
between figures, symbols and narratives, hence it may have been harder to understand
than inductive and visual arguments in other problems.
118
In order to clarify whether the differences in ratings were statistically significant,
we applied the within group ANOVA tests (i.e. repeated measures ANOVA) in the
analysis. To classify students’ responses to the corresponding questions, we assigned
numerical value “1” for “agree”, “-1” for disagree, and “0” for “not sure”. We call each of
these numbers a participant’s rating on whether an arguments was understandable. We
then adopted the within group ANOVA to test whether the students’ ratings (which were
considered within-subjects variables) on the arguments in a problem were significantly
different from each other. The results are included in Appendix A, Table 39.
Figure 15 summarizes the results of the analysis presented in Table 39. In
particular, the arguments in each problem were listed sequentially from the most
understandable (top) to the least understandable (bottom). In addition, two arguments
were connected using a curve if the ratings they received were not significantly different
from each other (p > .05). In an intuitive sense, if two arguments were connected, then
they were “close” to (not significantly different from) each other; if not connected, then
the two arguments were separated from (significantly different from) each other. For
example, in Problem D, from top to bottom, D1 received the highest rating; D3 received
the second highest; D4 came after D3; and D2 received the lowest rating. The differences
in ratings between D1 & D3, and D4 & D2 were insignificant so each pair was connected
using a curve. The differences in ratings between D1 & D4, D1 & D2, D3 & D4, and D3
& D2 were significant so each pair was not connected using a curve. As shown in Figure
15, D1 and D3 were considered significantly more convincing than D4 and D2.
119
Figure 15. Illustration of how understandable the arguments were to the participants
In Problem A, A1 (inductive) was significantly more understandable than A3
(perceptual) and A2 (algebraic); B1 (inductive) was significantly more understandable
than B3 (algebraic) and B4 (visual); and D1 (inductive) was significantly more
understandable than D2 (algebraic) and D4 (visual). These results again demonstrated
that students were more likely to understand an argument when it showed examples. In
addition, it was also detected that the differences between A1 (inductive) and A4 (visual),
and D1 (inductive) and D3 (perceptual) were not significant. This signaled that students
may have been able to understand other types of arguments as well as inductive ones. The
participants’ view on the inductive argument in Problem C was different from the other
three problems where the visual argument was considered as insignificantly less
120
understandable than the two others (perceptual and visual). Therefore, the results tended
to suggest that although in general inductive arguments seem to be easier to understand
by the participants, other types of arguments were well perceived by them when
satisfying certain conditions (e.g. the arguments may connect the problem with familiar
experience or offer visually appealing illustration). Exploring those conditions was
among the goals of the follow up interviews.
Arguments convincing to students
In addition to the ratings on how understandable an argument was, the
participants’ evaluation of whether an argument showed that the conjecture was always
true was also analyzed. Students’ judgments of the second claim under each argument (i.e.
“the argument shows that the statement is always true”, see Figure 8) were assessed using
numerical values: “1” for “agree”, “-1” for disagree, and “0” for “not sure”. We call each
of these numbers a participant’s rating on whether an arguments was convincing. Note
that if a participant didn’t indicate that he/she understood an argument, then we
considered him/her to be not sure if the argument was convincing. We then adopted the
within group ANOVA to test whether the students’ ratings (which were considered
within-subjects variables) on the arguments in a problem were significantly different
from each other. The results of statistical analysis of this data set is included in Appendix
A, Table 40.
Figure 16 summarizes the results presented in Table 40. Adopting a procedure
similar to that described in previous section, the arguments in Figure 13 were listed
sequentially from the most convincing (top) to the least convincing (bottom) by problem.
121
In addition, two arguments were connected using a curve if the ratings they received were
not significantly different from each other (p > .05). In an intuitive sense, if two
arguments were connected, then they were “close” to (not significantly different from)
each other; if not connected, then the two arguments were separated from (significantly
different from) each other. For example, in Problem D, all the arguments were connecting
using curves, indicating the difference between any pair of arguments was insignificant.
Figure 16. Illustration of how convincing the arguments were to the participants
Figure 16 shows that in Problem A, A4 (visual) was considered the most
convincing argument, while A1 (inductive) received the lowest rating. Note that the
differences between A1 and any of the other 3 arguments were significant; and the
differences between A4 and any of the other 3 arguments were also significant. This
122
suggested that the participants found the visual demonstration more convincing than any
other type of argument in the number theory problem, while they suggested that checking
a few numbers couldn’t convince them that the conjecture was always true. We suspected
that the figures used in A4, where manipulatives commonly used in mathematics
instruction were represented, might have contributed to the high rating on A4.
In Problem B, Argument B3, which adopted the Pythagoras Theorem, received a
rating that was significantly higher than any other argument; while B1 (inductive), again
the inductive argument, received a significantly lower rating than B3 (algebraic) and B4
(visual). The same as what was detect in Problem A, the inductive argument received the
lowest rating and the perceptual argument’s rating was not significantly higher. This
finding suggested that either an imaginary (i.e. the football field in B2) or a few concrete
examples as cited in B1 made them to be as well perceived as the visual and algebraic
arguments in this problem. In particular the participants found the algebraic argument
more convincing than all other arguments in this geometry problem. We suspected two
factors might have contributed to the high rating of B3. First we realized that Pythagoras
Theorem is one of the most well known result in school geometry and therefore strongly
recognizable by the students. Second we perceived that 8th grade students may have just
learnt the theorem and the topic was still fresh in their mind.
In Problems C, C2 (algebraic) received the highest rating, and it was rated
significantly higher than C3 (visual) and C1 (inductive) but was not rated significantly
higher than C4 (perceptual). Compare to Problems A and B, the differences between
ratings were smaller in the sense that the between argument differences were not
significant for many pairs. In particular, the ANOVA test suggested insignificant
123
differences among the ratings on C1, C3, C4. C2 stood out as being significantly more
convincing than two of the other three arguments. We suspected the well known triangle
area formula might contributed to its high rating. Similar to what was detected in
Problem A and B, this finding could be viewed as the mathematics curriculum’s impact
on the participants.
In Problem D, although D4 (visual) received the highest rating, it was not rated
significantly higher than any other arguments used in this problem. Therefore, all the four
arguments seemed to have been equally as convincing to the participants. This was a
good illustration that in some cases there might not be a unique most convincing
argument. Each argument might be convincing to a certain group of students so that
approaching a problem using multiple strategies might be the only plausible way to help
all students to understand why a conjecture was true.
Data from the survey suggested that the participants were not completely satisfied
with empirical checking and verifying. Among all the lowest rated arguments in each
problem, two were inductive. However it was premature to claim that the participants
were able to realize that checking a few example was adequate to prove the general
validity of an conjecture. We made this claim since among all the 476 respondents, only
10 could identify, in all four contexts, that the inductive argument was not sufficient to
establish the validity of the conjecture. At the same time, there were some indicators
implying that information other than pure empirical checking could have contributed to
students’ conviction in the process. However, it was not clear what type of information
was most helpful. As shown in the Figure 16, visual illustration (A4 & D4), theorem (B3),
formula (C2), mental image of real life experience (C4), closer examination of examples
124
(D1) could all contribute to a higher rating. Further investigation of how these various
types of arguments contributed to the participants’ conviction of the validity of the
conjectures was carried out during the interview phase of the study.
Arguments explanatory to students
The proceeding discussion concerns whether each argument helped the
participants to better understand why a statement was true. In doing so students’
judgments to the third claim under each argument (i.e. “the argument helps you better
understand why the statement is true”, see Figure 8) were assessed by assigning the
numerical values: “1” for “agree”, “-1” for disagree, and “0” for “not sure”. Each of these
numbers was called a participant’s rating on whether an arguments was explanatory. Note
that if a participant didn’t indicate that he/she understood an argument, we considered
him/her to be not sure if the argument was explanatory. We then adopted the within group
ANOVA to test whether the students’ ratings (which were considered within-subjects
variables) on the arguments in a problem were significantly different from each other.
The statistical results of the analysis are included in Appendix A, Table 41.
Figure 17 summarizes the results presented in Table 41. In particular, the
arguments in each problem were listed sequentially from the most explanatory (top) to
the least explanatory (bottom). In addition, two arguments were connected using a curve
if the ratings they received were not significantly different from each other (p > .05). In
an intuitive sense, if two arguments were connected, then they were “close” to (not
significantly different from) each other; if not connected, then the two arguments were
separated from (significantly different from) each other. For example, in Problem A, A4
125
was not connected with any other argument while the other three arguments were
connected to each other. This suggests that the rating on A4 (visual) was significantly
higher than the other three arguments, whose ratings were not significantly different from
each other. The results show that the participants found the visual demonstration more
explanatory than any other type of argument in the number theory problem.
Figure 17. Illustration of how explanatory the arguments were to the participants
The situation in Problem B was similar to that of Problem A, where B3 (algebraic)
received a rating that was significantly higher than the other three arguments, whose
ratings were not significantly different from each other. Therefore, B3 was not only the
most convincing argument in this problem, but was also the most explanatory one to the
126
participants. Again, it was suspected that the use of Pythagoras Theorem as evidence
contributed to its high ratings.
The ratings on all arguments in Problem C were not significantly different from
each other. This implies that the participants could extract information from each of the
arguments, which helped them understand better why the conjecture was valid. If true,
this could demonstrate the benefit and need of explaining mathematical results from
multiple aspects.
In Problem D, D4 (visual) received the highest rating, however the differences
between D4 and D1 (inductive) and D3 (perceptual) were not significant. D2 (algebraic)
received a rating that was significantly lower than any of the other three arguments.
Different from the cases in the other three problems, there was a single argument in
Problem D that received a significantly lower rating. We suspected that the way in which
the variable was used in D4 might be unfamiliar to the students. In classrooms, students
are usually asked to solve for the variable when it is given in the equation form. However
in D4, the presence of variable didn’t require solving for a value. Rather, the variable was
used to represent general cases and was eventually cancelled out in the calculation to
show the conjecture was always true regardless of its value.
Visual arguments appeared twice on top of the lists in the contexts of number
theory and geometry. These demonstrated the power of visual illustration in helping
students understand the problem better. Algebraic argument was considered most
explanatory in the geometry problem but was considered the least explanatory in the
algebraic problem. This demonstrated that students possessed the ability to understand
argument in abstract form. However they might still had difficulties if the form was
127
unfamiliar or too complex for them. Overall, there wasn’t any single type of argument
was considered more explanatory than others in all the problem contexts. In addition, as
demonstrated in Figure 17, the “explanatory” ratings were not significantly different in 14
of the total 24 pairwise comparisons. This suggested the need for and benefit on
promoting student understanding about the validity of a mathematical statement from
multiple aspects. A closer look at the data revealed that even the least explanatory
argument (D2) was considered explanatory by close to 60% of the participants who
claimed to understand the argument, which further supported the suggestion.
Arguments appealing to students
After evaluating all the arguments in a problem, the participants were asked to
choose one which they believed was closest to how they themselves would have argued
(e.g. see Question 11, Figure 8). A participant’s choice in answering this question was
considered as “appealing” to the participant. Unlike the previous three ratings where each
argument was evaluated separately and there could be multiple arguments in a problem to
be considered understandable, convincing, and/or explanatory, in choosing the appealing
argument, the participants needed to compare all arguments in a problem and then select
only one as the appealing option. Figure 18 illustrates the percentage of the participants
who found each argument appealing to them8.
8
Since the participants were allowed to choose none of the arguments, the percentages of the participants choosing
each argument in one problem did not add up to 100%.
128
Figure 18. The percentage of participants who considered each argument the appealing
The between group ANOVA was applied to determine whether the participants’
argument preferences were significantly different. Specifically, we decomposed the
participants’ choices of the appealing option into 4 columns, assigning the numerical
value “1” to an argument if it was indicated by the student as the appealing option and “0”
to the other three arguments (see Figure 19 for an illustration). Treating the 4 columns as
the 4 levels of within-subject variables, the within group ANOVA were applied to test the
129
between argument differences in each problem. The statistical results of the ANOVA tests
are included in Appendix A, Table 42.
Students Appealing A1 A2 A3 A4
Option
S1 A2 0 1 0 0
S2 A1 1 0 0 0
S3 A3 => 0 0 1 0
S4 A4 0 0 0 1
S5 A3 0 0 1 0
S6 A2 0 1 0 0
S7 None 0 0 0 0
Figure 19. An example of the data transformation for within group ANOVA test
Figure 20 illustrates the results presented in Table 42. In particular, the arguments
in each problem were listed sequentially from the most appealing (top) to the least
appealing (bottom). In addition, two arguments were connected using a curve if the
ratings they received were not significantly different (p > .05). In an intuitive sense, if
two arguments were connected, then they were “close” to (not significantly different from)
each other; if not connected, then the two arguments were separated from (significantly
different from) each other. For example, in Problem A, where A4 was not connected to
any other arguments while the other three arguments were connected to each other. This
indicated that the difference between A4 and any other argument was significant, while
the differences between A1 & A2, A1 & A3, and A2 & A3 were not significant.
130
Figure 20. Illustration of how appealing the arguments were to the participants
As illustrated in Figures 15 and 17, 38.9% of the participants selected A4 (visual)
as the appealing argument in Problem A, which was significantly more than those who
picked any other option. The percentages of participants who chose the other three
arguments (21.2%, 21.0% and 16.6%, respectively) were not significantly different from
each other. The preference towards A4 was consistent with findings in previous sections
which explored the most understandable, convincing, and explanatory arguments. It
suggested that the participants preferred to adopt manipulatives as visual aid to facilitate
their reasoning in the number theory problem.
In Problem B, the largest percentage (28.6%) of participants chose B2 (perceptual)
as the appealing argument. This number was significantly larger than those who had
chosen B1 (inductive, 20.0%) and B4 (visual, 21.4%), however it was not significantly
131
larger than those who selected B3 (algebraic, 27.3%). B2 used “football field” as a
context to demonstrate why the conjecture was true, where the explanation relied on an
illustration from real life experience; while B3 was based on the Pythagoras Theorem,
which was a well known results referenced in school curriculum. This rating was
interesting since the source of evidence used in the two arguments were different, yet
they were perceived as appealing by the same numbers of students. This again
demonstrated the diversity of students’ preferred ways of reasoning. The visual argument
might have been less appealing to the participants due to the complexity of its structure.
Compared to the simple image of a football field, the geometric figure used in B4
involves many more components (such as rectangle, circle, lines) and the relationship
among those components. It required more analytical thinking to be fully understand.
In Problem C, the arguments received close ratings. Among all the pairwise
comparisons only the difference between the most appealing option (C1, inductive,
preferred by 26.7% of the participants) and the least appealing option (C4, perceptual,
preferred by 19.7% of the participants) was statistically significant. Compared to the
participants’ responses in Problem B, where the perceptual argument was chosen as the
most appealing option, it was surprising to see C4 to have received the lowest rating in
Problem C. Two reasons might help explain this phenomenon. First, the scene created by
B2, i.e. the football field, might be more familiar to the participants than the scene
created by C4, i.e. using wires to make triangles. Second, the other options provided in
Problem C might be more appealing to the participants for various reasons. For example,
the visual illustration in Problem C, i.e. C3, requires less analytical thinking to
understand than B4, the visual argument in Problem B.

132
In Problem D, 30.7% of the participants selected D1 (inductive). This number was
significantly higher than those who chose D4 (visual, 22.5%) and the least appealing
option, D2 (algebraic, 18.3%). The same as in Problem C, the inductive argument was
again appealing to the largest percentage of participants. Compared to the inductive
arguments in Problems A and B, C1 (inductive) offered visual images of the samples and
D1 (inductive) offered a detail calculation procedure for one case. In A1 (inductive) and
B1 (inductive), such detail was not present. Therefore, we suspected that the extra details
provided by C1 and D1 contributed to their higher ratings.
The data revealed that no particular type of arguments was appealing in all 4
contexts. In fact, only 19 of the 476 participants considered the same types of arguments
to be appealing to them across the 4 problems. 122 participants chose one type of
arguments 3 times in the 4 problems. The rest of the participants (a number of 335) didn’t
pick any type of arguments more than 2 times. This result suggested that for a majority of
the participants, the appealing reasoning methods were highly context based and didn’t
uniformly lean on any particular type. However, it also suggested that some participants
might have developed more uniform preference towards certain types of arguments.
Investigating the rationale for choice by the participants whose judgment seemed to have
followed a uniform base was a focus of questioning during the interview.
Comparison across the ratings
In the previous sections the students’ responses were analyzed to determine what
argument was considered by most participants as understandable, convincing to show the
general validity of the conjecture, helpful to explain why the conjecture was true, and
133
closest to how they would argue in the same context. These arguments were referred to as
the most understandable, convincing, explanatory and appealing arguments as evaluated
by students, respectively. In this section, we offer an examination of whether students’
evaluation, using the four different ratings (i.e. understandable, convincing, explanatory
and appealing), was consistent in each problem, and to explain what might have
contributed to such consistency or inconsistency in choice.
Table 11 summarizes the participants’ choices of the most understandable,
convincing, explanatory and appealing arguments based on Figures 13, 14, 15 and 18. In
particular, the highest rated argument as well as those whose ratings were not
significantly lower than it were included in the proper cell of the table. In each cell,
arguments to the left received higher ratings.
Problem A Problem B Problem C Problem D

Understandable A1, A4 B2 C3, C4, C1 D1, D3
Convincing A4 B3 C2, C4 D4, D1, D3, D2
Explanatory A4 B3 C2, C1, C4, C3 D4, D1, D3
Appealing A4 B2, B3 C1, C2, C3 D1, D3
Table 11. Summary of the most understandable, convincing, explanatory and appealing
arguments as evaluated by the participants in each problem
In contrast, Table 12 summarizes the least understandable, convincing,
explanatory and appealing arguments. In particular, the lowest rated argument as well as
134
those whose ratings were not significantly higher than it were included in the appropriate
cell of Table 12. In each cell, arguments to the left received lower ratings.
Problem A Problem B Problem C Problem D

Understandable A2, A3 B4, B3 C2, C1 D2, D4
Convincing A1 B1, B2 C3, C1, C4 D2, D3, D4, D1
Explanatory A1, A3, A2 B1, B2 C3, C4, C1, C2 D2
Appealing A3, A2, A1 B1, B4 C4, C3 D2, D4
Table 12. Summary of the least understandable, convincing, explanatory and appealing
arguments as evaluated by the participants in each problem
Note that in Problem A, the participants’ choices in all 4 rating standards were
quite consistent. A4 (visual) was considered as the most convincing, explanatory and
appealing option. It was considered the second most understandable option, which was
not significantly lower (p > .05) than the most understandable option A1 (inductive). We
suspect that the visual image provided by A4 was close to the graphic illustration
provided in their early mathematics classroom that introduced multiplication and division.
Students’ familiarity with such a representation might have contributed to the higher
ratings. This suggested that visual representation could be reliable and helpful for
students when making judgment. It also suggested that the classroom experience had an
impact on students’ conviction. Aside from A4, the participants’ ratings on other
arguments were close (see Table 12) except that A1 was considered significantly less
convincing than all other arguments. This suggested that although A1 was the most
135
understandable option, the participants didn’t consider it more convincing and
explanatory than other arguments when showing that the conjecture was always true.
In Problem B, B2 (perceptual) was considered the most understandable and the
most appealing option, while B3 (algebraic) was considered the most convincing and
explanatory option. B3 was also considered insignificantly less appealing than B2. To
explain these inconsistent ratings, we conducted a closer inspection of the data. In
particular, we analyzed data from those who claimed to understand both B2 and B3. It
was found that in this subgroup, 33.9% found B3 more appealing and 29.7% found B2
more appealing. Therefore, B3 was considered the most convincing, explanatory and
appealing option among those who claimed to understood both B2 & B3. In fact, B2 was
considered significantly less convincing and explanatory than B3 and B4 (see Table 12).
Since there were significantly more participants who had found B2 understandable
compared to those who claimed to understand B3, and it was those who had found B2
understandable raised the overall appealing rating of B2. The result was sensible since we
assumed almost every student knew what a football field looked like. Additionally, B2
used easier language while building a perceptual connection between the scenario and the
conjecture, which requires less analysis. On the other hand, B3 was convincing,
explanatory and appealing to those who understood it because the Pythagoras Theorem is
one of the most well known and reputable results in school geometry. Therefore, if a
student understood B3, they would most likely give it a high rating. In addition, B1
(inductive) was considered the least convincing, explanatory and appealing option, which
again demonstrated that inductive arguments without further explanation were not
preferred by the participants.

136
Problem C represented a context that was the least familiar to the students. As the
contrasting item, the conjecture was not true and all the arguments presented in that
problem were false. In this context, only 2 participants indicated that none of the 4
arguments could show the conjecture was always true or could help them see why the
conjecture was true. After a closer examination of these 2 participants’ responses, we
found one of them had indicated that he didn’t understand any of the 4 arguments; while
the other suggested she didn’t think any of the arguments was close to how she would
reason. Furthermore, she claimed that “the tringle 9 1 is clearly bigger than tringle 2 so if
you put bigger number in tringle 1 then it will be bigger than 2.” With the exception of
these 2 cases, all other participants selected “agree” for at least one statement suggesting
that one argument in Problem C showed them or helped them see the conjecture was true.
Although the participants might not have been sure that the conjecture was always true
even when they chose the “agree” option, the data suggested that no student clearly
pointed out the conjecture was false, hence there was no clear evidence to show that
when working on Problem C, any of the participants had assumed the conjecture false.
Tables 23 and 24 indicated the participants’ ratings on the arguments in Problem C were
close. Not a single argument was received a rating that was significantly higher than the
others in any criteria. Indeed significantly more students found C3 (visual) and C4
(perceptual) more understandable than C2. This was not surprising since C3 and C4 used
easier language and there was explanation involving intensive usage of abstract symbols.
C2 and C4 were considered by most students as convincing. It was surprising to see that
9
The participant misspelled “triangle” as “tringle” in her response to the survey.
137
C2 was considered convincing by more students than C3. A possible explanation is that
the triangle area formula included in Argument C2 was repeatedly addressed in
mathematics classroom and its appearance might add credibility to the argument. All the
arguments were considered as almost equally explanatory by the participants, suggesting
that when encountering an unfamiliar context, explanation from various aspects could
help students better understand the problem. When choosing the appealing arguments,
Arguments C1, C2 and C3 received approximately an equal number of votes while
Argument C4 fell behind. This was also surprising since Argument C4 was rated as the
most understandable and second most convincing argument. A tentative explanation
could be that the scenario used in Argument C4 was not closely associated with the
mathematical content presented in the conjecture, so the students might not have seen a
natural connection there. Therefore fewer students thought they would adopt such a
strategy when encountering the problem.
In Problem D, D1 (inductive) was considered as the most understandable and
appealing option and was rated a close second to D4 in the other two criteria. This result
was particularly interesting compared to how inductive arguments were viewed in
Problems A & B. As mentioned in the previous analysis, we suspected that the extra
details offered in D1, i.e. the layout of the calculation procedure, contributed to its higher
ratings. Another interesting finding in Problem D was that D2 (algebraic), which shared
the same procedure as what was used in D1 was considered as the least understandable,
convincing, explanatory and appealing option. Our conjecture is that the symbolic
representation appeared to be more complex than the numerical equations. Therefore,
138
students tended to choose the easier one when they saw both options without considering
the difference in logic that they might have offered.
Collectively we found that among the 32 cells in Tables 23 and 24, only 8
contained one single argument, while 10 contained 3 or 4 arguments. This suggested that
students rarely gave significantly higher or lower rating to any particular argument, and
in many cases, the between argument differences was small. Furthermore, as shown in
Figure 20, even the least appealing argument (i.e. A3), was chosen by about 1/6 of all
participants. While the difference between A3 and other arguments’ ratings were
statistically significant, it didn’t mean that A3 was of no value to the students. In fact, it
was considered understandable by 76.7%, convincing by 55.5%, and explanatory by 67.7%
of the participants. Therefore, the use of A3 definitely could provide extra opportunities
for students to approach the conjecture in Problem A. The case of A3 illustrated that
although some arguments received significantly lower ratings than others (no matter in
what criteria), they were still chosen by some students and their preference shouldn’t be
ignored.
In order to distinguish if there was one type of argument that received higher
ratings than all others, we counted the number of times each type of arguments appeared
in Table 11 and Table 12 (see Table 13). As shown in Table 13, there were almost an
equal number of each type in both columns. The result indicated that there wasn’t any
particular type of argument that received higher/lower ratings than other ones. This
finding again illustrates that, in general, “type” couldn’t determine students’ evaluation
and there might be other factors influencing their assessment.
139
Argument Type # of Appearances in Table 11 # of Appearances in Table 12
(high ratings) (low ratings)
Inductive 8 10
Perceptual 9 9
Visual 9 8
Algebraic 7 10
Table 13. Summary of high and low rated arguments by type
Lastly, it was detected that the ratings of the same type of arguments were highly
inconsistent across the problems. For example, the inductive argument was rated high in
Problem D while received low ratings in Problem B; the algebraic argument was rated
high in Problem B but received low ratings in Problem D; the visual argument was rated
high in Problem A but received low ratings in Problem B, and etc. These results again
demonstrated the complexity of students’ evaluation of mathematical arguments. In
previous analysis, we identified a few factors that we suspected had influenced students’
choices, such as the amount of details provided, the fluidity of language, and the
familiarity of scenario. The exploration and synthesis of these factors were the major goal
of the follow up interviews.
Comparison between subgroups of students
As demonstrated in above discussion, the type of arguments that were rated as
most understandable, convincing, explanatory and appealing was distinct across the
contexts. While we were not able to make a grand conclusion about what type of
140
arguments were understandable, convincing, explanatory and appealing to students, the
high ratings that some arguments received made sense in their respective contexts.
Therefore, we suspected that there were factors other than the presentation and content of
the problems that had impacted the participants’ choices. While we were not able to
obtain an explanation for the choices merely based on the survey results, we probed for
factors by analyzing responses according to different subgroups of students. We assumed
some of these factors were rooted in both school-based mathematical experiences of
children as well as non-mathematical experiences gained from life outside the school
environment.
To investigate the influence of school mathematical experiences on the
participants’ responses, we compared data from participants who were enrolled in higher
performing schools to those enrolled in lower performing schools. The percentages of
mathematical proficiency of the two higher performing schools, as measured by the 2012
state standardized 7th grade mathematics tests, were 10% above state average; while the
percentages of the 8th grade mathematical proficiency of the two lower performing
schools were at least 10% below state average. Therefore, the difference in the students’
levels of mathematical proficiency between the higher and lower performing schools, as
measured by the standardized tests, was rather large. While this comparison couldn’t rule
out the influence exerted from non-mathematical experiences on students’ choice, it
would be valuable to see if students who achieved higher scores on state standardized
mathematics tests would demonstrate more maturity in mathematical reasoning ability as
measured by SMR.
141
The second comparison considered the potential impact of the participants’ gender
on their choices. The male and female students were enrolled in the same schools and
same classrooms, taught by the same teachers using the same teaching materials and
techniques. Although sitting in the same classroom didn’t mean the same classroom
experience for each individual learner, however if the cumulative data suggested a large
difference between female and male students’ responses, it was unlikely that this
difference was caused by instruction. Therefore, it was assumed that different responses
from male and female participants could provide additional insight on learner’s choices
and reasoning. Details of the two comparisons were shared in the following discussion.
Between school comparison
117 of the participants were enrolled in higher performing schools and 311 of the
participants attended lower performing schools. For convenience, participants from the
higher and lower performing schools were referred as Group H and Group L, respectively.
We adopted between group ANOVA to test the between group difference of the
participants’ responses to each question in SMR. Using the same data quantifying strategy,
“1” “0” and “-1” was assigned for “agree” “not sure” and “disagree” respectively.
Questions were also labeled in a format like “A1.2”, where A1 indicates the argument,
and 2 indicates the 2nd question under this argument, which assesses whether A1 is
convincing. In addition, four variables (e.g. A5.1 – A5.4) were used to quantify students’
choices of the most appealing argument in each problem. The quantifying strategy was
illustrated in Figure 19. Table 43 (in Appendix B) illustrate the results of the between
142
group comparisons of students’ evaluation of each argument by different ratings (i.e.
understandable, convincing, explanatory, and appealing).
As reflected in Table 43, the between school differences were not significant for
all 64 variables (questions) except for C5.3, D1.2, D4.2, and D5.4. Specifically,
significantly more participants from Group H considered C3 (visual) and D4 (visual)
appealing. In addition, significantly more participants from Group H considered D4
(visual) convincing, while significantly less participants from Group H considered D1
(inductive) convincing.
Compared to the between group differences in standardized test performance
(Group H as least 10% above state average and Group L at least 10% below state average,
as measured by percentage of proficiency), the differences detected within their
performance in SMR were much smaller. In particular, the two groups’ evaluations were
not significantly different on 60 of the 64 variables. This result suggested that a higher
performance on the standardized test didn’t represent higher maturity in mathematical
reasoning.
A closer examination of the 4 cases where the group differences were significant
revealed that participants from Group H tended to prefer the visual arguments in both
Problems C and D (i.e. C3 and D4). 32% in Group H selected C3 as the appealing option
while only 22% in Group L did so. 30% in Group H selected D4 as the appealing option
while only 21% in Group L did so. It made sense that Group H exhibited a higher
preference towards D4 since the understanding of D4 requires knowledge of coordinate
plane and such knowledge could contribute to higher standardized test scores as well.
Additionally, compared to Group L, Group H also considered D4 more convincing. This

143
result might indicate Group H’s better comprehension of D4, which could have
contributed to their higher preference for this argument. However Group H’s higher
preference towards C3 was less sensible. A possible explanation might be that the
participants in Group H had developed the skill to utilize transformational thinking in
certain geometry contexts and they were more capable of visualizing the change of
geometric shapes by reading the description and static images that depict stages of the
transformation. However, this explanation didn’t align with the fact that C3 was not
considered more understandable, convincing, or explanatory by Group H. In addition, it
was detected that Group H considered D1 (inductive) less convincing that Group L did.
Our hypothesis for this result was that there could be more students from Group H that
had realized that D1, although showed more details than the inductive arguments in other
problems, still couldn’t show the conjecture was always true.
As discussed above, there existed cases to illustrate some differences between
Group H and Group L; however, in general the two groups’ responses to the SMR were
compatible. Therefore, the survey data suggested that classroom experience that
enhanced higher proficiency in state standardized tests didn’t promote students’
mathematical reasoning capacity. In addition to the mathematics classroom experience,
we suspected there were other factors that might have impacted the participants’
responses. Hence we conducted a between gender comparison to probe more
explanations to the survey results.
144
Between gender comparison
Among the 476 participants of SMR, 229 were male and 237 were female. The
remaining 20 participants chose not to disclose their gender and were not included in this
comparison. The male and female students were enrolled in the same classrooms in the
same schools. They also had lived in the same communities. Therefore, we didn’t assume
large differences in the mathematical experience they obtained in school or at home.
Consequently, we suspected that the between gender comparison might reveal some non-
mathematical factors that could have influence their evaluation of mathematical
arguments. The same method (i.e. the between group ANOVA) was adopted to assess the
gender differences. Table 44 (in Appendix B) illustrate the statistical results of the
comparison.
As reflected in Table 44, the gender was an insignificant variable in all 64 cases
except for A2.1 and A5.2. That is, the female students considered Argument A2 (algebraic)
significantly less understandable and appealing than the male students did. This result
was surprising since A2 was stated using pure mathematical language and didn’t refer to
any life experience. Therefore it was difficult to perceive how the gender might have had
an impact on the evaluation of this argument. A possible explanation is that more male
students were comfortable using algebraic method to work on the number theory problem,
however bases for this hypothesis was quite weak.
Analysis revealed that gender difference was small. Therefore, the gender
difference test didn’t offer us insights into factors that impact students’ evaluation. Such
an inquiry was left to be accomplished during the interview analysis of the study.
145
Gender * School effect
Lastly, we studied the gender * school effect on the ratings provided by the
students to investigate if gender impact on the ratings were significantly different when
comparing the higher and lower performance schools. The results are included in
Appendix B, Table 45.
As shown in Table 45, the gender * school effect was significant (p < .05) for
A2.1, D2.1, D4.1, and D4.2. Note that A2.1, D2.1 and D4.1 measured whether A2
(algebraic), D2 (algebraic) and D4 (visual) were understandable to the participants. D4.2
measured if D4 (visual) could prove the conjecture in Problem D was always true. To
further investigated the gender * school effect on these four variables, we generated plots
using school as separate lines and gender as horizontal axis (see Figure 21).
146
Figure 21. Plots for variables on which the gender * school effect was significant
Figure 21 demonstrated that the gender differences of all variables were small in
the lower performing schools. However in the higher performing school, the differences
were large. In particular, the male students provided significantly higher ratings than
female students for all four variables. That is, the male students in the higher performing
schools found A1 (algebraic), D1 (algebraic), and D4 (visual) significantly more
147
understandable than the females students in the same schools did. In addition, the male
students in those schools also considered D4 (visual) significantly more convincing than
their female classmates did. This result suggested that male students from the higher
performing schools seemed to be more likely to understand algebraic arguments in non-
geometric contexts than their female counter parts. In addition, they were also more
likely to understand and be convinced by graphs in a coordinate plane. Since the gender
differences about the same questions were small in the lower performing school, we
suspected that the enlarged gap in higher performing school was caused by knowledge
perceived from classroom instruction. However, it was unclear why the male students
seemed to perceive the algebraic representation better.
Nevertheless, the significant gender * school effect was only found in 4 of the 64
tested variables. Therefore, the cross effect was not significant for the participants’
responses to most of the questions used in SMR.
Summary of findings from SMR
The survey data demonstrated great diversity among the participants’ evaluation
of the arguments used in SMR. Among all 16 arguments used in the 4 problems, even the
least understandable argument rated by the participants (D2, algebraic) was indicated as
understandable by nearly 60% of them; the least convincing argument (B1) was indicated
as being able to show the corresponding conjecture was true by about half of those who
understood the argument; the least explanatory argument (D2, algebraic) was considered
as being helpful to show why the conjecture was true by close to 60% of the participants
148
who understood the argument; and the least appealing argument (A3, perceptual) was
selected as the closest way to how they would argue by about 1/6 of the participants.
Although many arguments used in SMR were incorrect or incomplete using a
higher standard of mathematical rigor, they may be most compatible with ways in which
many students themselves argue. Data from the participants’ responses to SMR offered
much insight to these “natural” ways. To sum up, the study results indicated that:
 The participants’ evaluation of the same argument was highly diverse among
individuals.
 The participants were more likely to understand an argument when it showed
more details about concrete examples or provided visual support.
 The participants were unlikely to be completely convinced by checking and
verifying a few cases. Further support from multiple sources, such as visual
illustration, past experience, theorem and formula contributed to their
conviction.
 All argument was considered explanatory by the participants, suggesting the
need of multiple approaches to promote student understanding.
 The appealing reasoning mode varies in different mathematical problems; and
the group’s favorite argument types also varies across the contexts.
 No particular type of argument was consistently and significantly more
appealing to students than others across the contexts.
 Between school and gender comparison revealed insignificant differences
between high and lower performing schools and between male and female
students in their responses to most questions.

149
The survey results only enabled us to make conjectures about the factors that
might have contributed to students’ evaluation based on our understanding of the content
involved in the arguments. We suspected that concrete examples and visual illustrations
contributed to students’ conviction if they were understandable. We speculated that
perhaps examining one case in detail may have helped students see why the conjecture
was true. We conjectured that arguments that used easier language and offered shorter
description were more likely to be preferred by students. However, these conjectures
couldn’t be verified merely based on the survey data. The follow-up interviews aimed to
unpack students’ perception of the arguments and their rationale for decisions they had
made. The interview also allowed us to explore the mathematical and non-mathematical
factors that had impacted the students’ judgment. Results from the interviews are
presented in the section below.
Findings from the Interviews
The survey results suggested that the students’ preference of arguments were
highly diverse across the problems and between individuals. The results however didn’t
allow us to infer what types of arguments were more appealing to students. Furthermore,
the data didn’t capture specific features of the arguments that had significantly impacted
students’ evaluation of the arguments. Since students’ judgments were made based upon
their understanding of each argument, we believed there were hidden factors that could
have impacted their choice. In order to further investigate those factors, we relied on
follow up interviews in an attempt to understand the rationale behind students’ judgment
as indicated in the survey results.

150
Eight students participated in the interviews. The subjects’ background
information as well as the selection process were included in Chapter III. Each interview
lasted about an hour. Details regarding the interview procedure were also described in
Chapter III. This section offers analysis of the interview data. In particular, we first
provide a description of what happened during each interview as we investigated each
subject’s personal scheme when evaluating mathematical arguments. We then offer an
analysis of the participants’ responses to each problem so to examine the potential impact
of the context on students’ evaluation.
Report of interview data by individual
Survey results were insufficient to explain why the respondents had made certain
decisions. It was not assumed that an individual relied on the same factors and used the
same logic in every context; however, by comparing and analyzing his/her responses in
multiple problems, we assumed that we were more likely to detect factors that
consistently impacted his/her evaluation of mathematical arguments. In doing so, we
examined the subjects’ responses, including how they ranked the arguments from the
most convincing to the least convincing, along with their justification for the ranking.
The analysis of Allen’s interview responses has been elaborated in the methodology
chapter and served as an illustration of the analysis process. Below we included findings
from the other seven subjects which was obtained using the same analyzing techniques
demonstrated in Allen’s case.
151
The case of Abby
Abby was an 8th grade student enrolled in an Algebra I class at the time of data
collection. In her responses to SMR, the inductive arguments were indicated to be the
closest to how she would argue in all but Problem C, where she selected C2, the algebraic
argument. Based on this result, we believed that Abby had exhibited preference towards
inductive arguments. Therefore, she was considered to be a representative from the
consistent group.
Abby’s interview responses are summarized in Table 14 and Table 15. Table 14
illustrates the rankings provided by her for each problem. Column One of the table
represents the order of problems that she tackled. Table 15 summarizes Abby’s comments
when articulating why she found certain arguments convincing or not convincing (The
coding of each comment is explained in Table 9). These two tables served as the major
resource for the interview analysis.

Problem D D1 (inductive) D3 (perceptual) D2 (algebraic) D4 (visual)
Problem C C1 (inductive) C2 (algebraic) C4 (perceptual) C3 (visual)
Problem B B2 (perceptual) B1 (inductive) B4 (visual) B3 (algebraic)
Problem A A4 (visual) A2 (algebraic) A1 (inductive) A3 (perceptual)
Problem E E1 (inductive) E4 (algebraic) E3 (perceptual) E2 (visual)
Table 14. Rankings of arguments provided by Abby
152
Problem D
That’s how I would normally do it, it It doesn’t show, like, how it is after the
shows like how to get there. (E2) tax. (E2)
I think it’d be easier to do this way than, There’s just so much work… if you can
like, have a graph. (E2, R3, R1-) make it simple, like this one [points to
D1], why would you confuse yourself?
(E2, P)
That shows 20 times 5 equals 1, so then I would need to try it. (E2)
that’d just prove that he’s right. (E2)
I think work would be easier than trying to
make a graph. (R1-, R3)
It also says they tried it with 200 and 500,
which gives more information. (E2)
If it works for 200 and 500, why wouldn’t
it work for 300? (L3)
I just think ’cuz you’re multiplying it by
the same thing, and if it works for those
two, if you tried 300, I think it’d work.
(L3)
Problem C
It says the formula is base times height, I’ve never heard of using wire to make a
divided by 2, and I think since they’re all triangle. (E3)
greater, then that does prove that this
height would be bigger than this one, so
that’d prove it’s bigger. (E4, L2)
It shows you how he got, how they’re I’ve never heard of this either to do, to
bigger areas. (E2, R1) figure that out. (E3)
It puts it in a form how you can see 1 is These I’ve never actually done. (E3)
bigger than 2, and they did, like,
equilaterals, scalene, isosceles triangles, so
they did all the different triangles, and then
they showed. (E2, R1, L3)
That one would be easier. (P)
That’s just how I’ve been taught since I
was little. (E1, E3)
continued
Table 15. Summary of comments made by Abby
153
Table 15 continued
Problem B
Everyone knows what a football field This one makes no sense at all. (NA)
looks like, so you can just, like, imagine in
your head that the diagonal’s longer than
all sides. (E3, L2)
Anyone can draw rectangles and measure I’ve just never done it like that. (E3)
the sides, and they could obviously see the
diagonal’s longer. (R1, E2)
I’ve seen a football field before, and I
know how big they are. (E3)
It’d be simpler to do this than figure out,
make sure you did the circle right. (P)
Problem A
I know how to make, like, algebraic I haven’t tried, like, large numbers, say
expressions, so if I would put it this way, like, a thousand something, that a multiple
I’d understand it more, and it also proves of 6, I didn’t know if that’d be a multiple
that 6 equals 3 times 2. (E4, R4) of 3 too. (E2)
It shows that it could be for any number It proves that it could do that, but they
that’s a multiple of 6. (R4, L6) didn’t show how they did it. (E2)
This one is easy to see, visualize it. (E2, It’s confusing because they show many
R1, P) words in it. I just don’t like word
problems. (R2-)
I’ve been doing that this entire year You could see it, how they put it, but if
because of algebra. (E1, E3) you were just told to figure that out, and
you didn’t have these in front of you, it’d
be hard to tell. (E3, P)
I was taught to put those in algebraic If they said, like, use the square cards, and
expressions. (E1, E3) you didn’t have them in front of you,
you’d have to think and put them together
and draw them. (E3, P)
They, like, show you how to do it, and
they show you it’s true. (P)
In this problem, they like, show you They don’t show you how you get them.
pictures, they like show you the triangles, (E2)
and what their size is. (R1, E2)
The size, and I can put them together, in a We were taught to do a tree, and branch
way… like it makes sense to how it’s off the multiples, so… I would need the
smaller. (E2, R1) tree in front of me to see. (E2, E3)
continued
154
Table 15 continued
If I was by myself, and I didn’t have I wouldn’t know how to make the graph
someone to explain that, that would be a just off the top of my head for that certain
better pictures. (R1) problem. (E3-, P)
I know how to do this, but if I didn’t, that This one confuses me. (NA)
[points at C1] would be easier to find. (E2,
P)
They show you the problem within… the It doesn’t show enough, like it doesn’t
words… so they gave you an idea within give you enough numbers. (E2)
those. (R2)
Just how they worded it. (R2-)
I can imagine a cookie box, it’s just the
words… because it’s just so much. (R2-)
Problem E
It shows you how they got there, like, it This isn’t as easy to visualize. (R1, P)
shows you that it won’t change. (E2)
Just showing the picture, it can help you You’d have to think more, and work it out
visualize in your mind without having to more. (P)
do a lot of work, you just know it won’t
change. (E2, R1, L2)
It shows you, like, the percentage, and it There’s not really any work to show how
won’t change. (E2, R3) they got there, so if you didn’t know, like,
the problem, you wouldn’t be able to
figure this out. (E2, P)
They show the percentage, and when you You wouldn’t know how they got there,
double it, it still stays the same, so why because they didn’t show any work. (E6-)
would it be any different if you did 5 and
3? (E2, R3, L4)
It’s kind of hard to understand. (P)
It had to be explained to me. (P)
It’s a confusing picture. (P)
Additional Comments
It put the work in it. (E2)
It put the work within the problem. (E2)
When they did the 2 and the 3, they
showed the percentage. (E2, R3)
155
As shown in Table 14, Abby considered the inductive arguments most convincing
in 3 of the 5 problems (i.e. Problems C, D, and E), while the visual arguments were rated
least convincing in the same three problems. The algebraic and perceptual arguments
were placed between the visual and inductive arguments. This general preference toward
inductive arguments was consistent with his responses in the SMR. However, Abby’s
rankings for the arguments in the other two problems were different. In Problem A, she
considered the visual argument as the most convincing while the perceptual argument the
least convincing. In Problem B, the perceptual argument was ranked the most convincing
while the algebraic argument was considered the least convincing.
In order to better understand how Abby evaluated the proposed arguments and her
rationale when providing these rankings, the coding for her explanations in Table 15 were
summarized in Table 16 so to identify factors and features of the arguments that had
influenced her judgment.
As shown in Table 16, the total number of comments that referred to the
indicating that the evidence had the largest impact on Abby’s judgment. Among all types
of evidence, Abby found that examples (i.e. results from an immediate test) the most
reliable source to establish an argument. It was referenced 28 times throughout the
interview. Abby’s reliance on specific examples could be highlighted by her claim that “if
it works for 200 and 500, why wouldn’t it work for 300?” Furthermore, imaginary (i.e.
scenarios recalled from or created upon previous experience) was also considered reliable
evidence to her (referenced for 12 times). For example, she suggested that “I’ve seen a
football field before, and I know how big they are” and hence considered B2 convincing.
156
In addition, she found some arguments not convincing since she had “never done it like
that,” emphasizing the importance of personal experience to her conviction.

Positive 9 1 5 2
Negative 2 3 0 0

Positive 3 28 12 1 0 0

Table 16. Categories of comments made by Abby
The representation of arguments also influenced Abby’s judgment. However,
there didn’t seem to be a certain type of representation that particularly contributed to her
conviction. In fact, the same type of representation could affect her judgment negatively
and positively, depending on the context. For example, in Problem D, she found other
methods “easier than trying to make a graph,” while in Problem A she found A4
convincing since it was “easy to see, visualize it.” In addition, she found A3 “confusing
157
because they show many words in it” and she didn’t “like word problems.” However
when commenting on D1 she claimed it was convincing since “they show you the
problem within the words so they gave an idea within those.”
Abby’s comments revealed that she could be convinced by perceptual connection,
induc tion, transformation, and deduction, each of which was detected to have been
referenced for 3, 3, 1, and 1 times, respectively. She did realize that an argument should
be valid for all cases when working on Problem A. However, this realization wasn’t
evident in her comments when she worked on other problems. Therefore, she didn’t seem
to be consistently concerned with the logic of arguments.
14 of Abby’s comments were coded as “P.” A closer examination of those
arguments revealed that Abby tended to consider easier argument more convincing. This
point was repeated in 11 of the 14 comments. Therefore, the simplicity of an argument
indeed had impacted her conviction. This was demonstrated by her claim that “… if you
can make it simple, like this one, why would you confuse yourself?” In addition, it was
found that the need to do extra work made an argument less convincing to her. For
example, when evaluating A4, she claimed that the need to “think and put them
[manipulatives] together and draw them” complicated the process and made the argument
less convincing to her. Another example was her comment about D4. Although she
claimed that she understood the graph, she considered it less convincing because she was
not able to “make the graph just off the top of my head.” The pursuit of simplicity
explained her preference toward the use of easy examples and imaginaries created upon
previous experience as evidence, since examining a few examples and referring to
previous experience might be the easiest way for her to access the problem.
158
Based on Abby’s comments when explaining her rankings, Figure 22 was
generated to highlight her rationale for evaluating mathematical arguments.
Perceptual
Inductive Examples,
Convincing Imaginaries
arguments
Easy to understand (Visual, Numerical)
Familiar procedure
Figure 22. Illustration of Abby’s rationale for evaluating mathematical arguments
Adopting Figure 22 as a guide to understand the rankings Abby provided in Table
14, we believe that Arguments A4 (visual), B2 (perceptual), C1 (inductive), D1
(inductive), and E1 (inductive) provided the most accessible examples and scenarios and
hence were considered the most convincing. In particular, the manipulative model used in
A4 and the football field scenario in B2 were both familiar contexts to her. The examples
provided in C1, D1 and E1 were easier to understand than those used by other arguments.
On the other hand, Arguments A3 (perceptual), B3 (algebraic), C3 (visual), D4 (visual),
and E2 (visual) might be more difficult to access. A3 was too “wordy;” the graph in D4
was difficult to create; the diagram in E2 was “hard to understand;” and she had never
“actually done” anything like what was described in B3 and C3. Therefore, the difficult
access to understand these arguments made them less convincing to her.
159
The case of Alice
Alice was enrolled in an Integrated 8th Grade Mathematics class at the time of
data collection. In her responses to SMR, the perceptual arguments were indicated to be
closest to how she would argue in all but Problem A, where she chose A4, the visual
argument. Based on this result, we believed that Alice had exhibited preference towards
perceptual arguments. Therefore, she was considered to be a representative from the
consistent group.
Alice’s interview responses are summarized in Table 17 and Table 18. Table 17
represents the order of problems that she tackled. Table 18 summarizes Alice’s comments

Problem D D3 (perceptual) D4 (visual) D1 (inductive) D2 (algebraic)
Problem B B4 (visual) B1 (inductive) B3 (algebraic) B2 (perceptual)
Problem A A2 (algebraic) A1 (inductive) A4 (visual) A3 (perceptual)
Problem C C3 (visual) C2 (algebraic) C1 (inductive) C4 (perceptual)
Problem E E1 (inductive) E3 (perceptual) E2 (visual) E4 (algebraic)
Table 17. Rankings of arguments provided by Alice
160
Problem D
It shows the picture, which helps the I didn’t understand it; like, I tried and
reader understand more. (E2, R1) tried, but I just couldn’t figure out how to
do it. (NA)
It shows how much it is before tax and It doesn’t seem like it made as much sense
after tax, which helps you notice, or as the first two did. (NA)
realize, how stuff is. (E2, R1)
The height of the line before tax is, er,
after tax is higher up than the line before
tax. (R1, E4)
Problem B
It shows that, like, when you put it in a When I stand on the edge of the football
circle, BD would always follow along the, field, I look at the diagonal and then I look
well it won’t always follow along the edge straight, it looks the same, because like,
of the circle, but if you just imagine that it you can’t see it from like, up in the air,
would, then it’d be longer than BA or BC. you’re on the ground looking at it, so you
(E2, R1, L4) can’t really tell the distance, and it looks
the same. (E3, L2)
Usually, like, when you draw rectangles, When you do AB squared plus AD
and you measure the length of their sides, squared, um, like, it won’t always come
and then you draw a diagonal, the diagonal out to be BD squared, because when you
is always gonna be longer than the edges, combine AB squared and AD squared, it’ll
because in order to get, like, from the edge actually turn out to be farther than BD
and then down, like in order, like… when squared. (NA)
you draw the rectangle, like, the size of the
diagonal will always be longer than this
because, like, if you had a circle, and you
were to bring it up, it would come up to
like, right here, because the diagonal is
always longer than the straight line,
depending on how long the straight line is,
and when it's inside a rectangle, then the
diagonal will always be longer. (E2, R1,
L4)
Now that I think about it, the longer it [the It [Pythagorean theorem] doesn’t really
side] is, the longer the diagonal will be, so apply to this problem. (NA)
that it can go from corner to corner. (E3,
L4)
continued
Table 18. Summary of comments made by Alice
161
Table 18 continued
I honestly understand better, stuff better
when it has like, a picture, ’cuz I think
better when I can see it and not read it.
(E2, R1)
Problem A
I used the multiples of 6, like she used, and I didn’t really understand this one, because
for every one I tried, they’re multiples of 3 when they split ’em up, it showed that they
as well. (E2, L3) were multiples of 3, but… I don’t know…
it’s confusing. (NA)
When I work it out, like… when I choose I didn’t understand the wording that they
a random number for n, I put it in and like, put it in, and it made it really confusing.
for instance, here I chose 6 for n, and 6 (NA)
times 2 is 12, and I multiplied that by 3, it
equals 36, and on the other side, I plugged
6 in, and 6 times 6 is 36, so it’s right. (E2,
L3)
When you plug in a number for n, There might be some number of 6 that
whatever you do in the equation will be isn’t always [contained in the discussion of
the same on the other side, like, the A1]. (E2)
answers are. (E2, R4, L4)
Problem C
It’s more appealing to me because, like I
understand what it’s saying, but if you take
the sides of Triangle 1 and you cut them
down, and then you make it into like,
sometimes a similar triangle, then Triangle
1, then, like it can be the same shape but it
won't be the same size, because you cut the
sides down. (E2, R1, L4)
You can plug in any number, and you’ll You have to have a specific number in
always, like, get the same number on both order to get the answer that they’re looking
sides. (E2, R4, L6) for . (E2, R3)
I found that one more understanding It has more than one step, which makes it
because it gave you, like, the numbers that kind of harder to do. (P)
they tried to where you could try multiple
numbers. (E2)
They gave you more choices to… choose It’s confusing, like, the way they split it
from to where it’s like, not so complicated, up… (NA)
like you have more numbers to work with.
(E2, P)
continued
162
Table 18 continued
It gives you something that you can like, It was harder for me to comprehend. (P)
draw with to where like, you can even see
for yourself. (E2)
They only gave you two, but like, what if
the price is higher than 500. (E2, L3-)
They only gave you two numbers to work
with. (E2, L3-)
You can get many, like, possible ways.
(E2)
Problem E
I think of the problem in a different way.
(P)
I’m not comprehending what they’re all
saying… because I have a different way of
finding the answer than all of the
arguments. (NA)
The rankings provided by Alice were surprising to us since 3 perceptual
arguments were rated most appealing to her in the SMR; however, most of them were
placed at the bottom of the list during the interview (see Table 17). In addition, her
evaluation of the same type of arguments were inconsistent across the problems. For
example, algebraic argument was considered the most convincing in Problem A, second
most convincing in Problem C, third convincing in Problem B, and least convincing in
Problems D and E. Therefore, the ranking provided by Alice hardly revealed any pattern
in her judgment. In order to better understand how Alice evaluated the proposed
arguments and her rationale when providing these rankings, the coding for her
explanations in Table 18 were summarized in Table 19 so to identify factors and features
of the arguments that had influenced her judgment.
163
Positive 7 0 1 2
Negative 0 0 0 0

Positive 0 18 2 1 0 0

Table 19. Categories of comments made by Alice
As shown in Table 19, the total number of Alice’s comments that focused on the
representation, evidence and link of the arguments were 10, 21, and 11, respectively. It
was found that most of Alice’s comments were about the evidence of the arguments.
Among all types of evidence, Alice found that examples (i.e. results from an immediate
test) the most reliable source to establish an argument. It was referred 18 times
throughout the interview. Additionally, in 3 cases she also considered the arguments that
used imaginary scenarios and mathematical facts convincing.
She found induction reliable in some cases (e.g. she suggested that “I plugged 6 in,
and 6 times 6 is 36, so it’s right”); however, it was detected that in some other situation
164
she demonstrated a need to see more than just checking a few cases. This was
exemplified by her comment on D1 that “they only gave you two, but like, what if the
price is higher than 500?” A closer examination revealed that whether an argument
involved a “transformational” link between the evidence to a broader scope had an
impact on Alice’s judgment. This was detected 5 times, including the comments that “BD
would always follow along the, well it won’t always follow along the edge of the circle,
but if you just imagine that it would, then it’d be longer than BA or BC” and “Now that I
think about it, the longer it [the side] is, the longer the diagonal will be, so that it can go
from corner to corner.” These transformations were made in visual contexts.
The ability to visualize transformation helped us understand the rankings provided
by Alice in Table 17. Notice that she considered the visual arguments (B4 and C3) as the
most convincing in Problems B and C, both of which utilized transformational reasoning.
D4 was also considered convincing in Problem D, where she claimed to be able to see the
constant distance between the parallel lines, which convinced her that the difference
remained to be the same value. In Problem A, the A4 (visual) was considered not
convincing. Her explanation, however, revealed that she could clearly see how one
diagram transformed to another but was not able to see how it connected to the problem
context. Hence what made the argument less convincing was not due to the use of
transformation.
Transformational reasoning that convinced Alice was detected in visual contexts;
however, we suspected that it had also potentially impacted her judgment in other
contexts. For example, it was found that she realized the advantage of algebraic
representation in Problem A, where she claimed that “you can plug in any number, and
165
you’ll always, like, get the same number on both sides.” She believed A2 (algebraic) was
more convincing than A1 (inductive) since A1 didn’t prove the conjecture was true for all
cases. However, at the same time, she needed to plug in some numbers to verify if the
formula used in A2 was true. A possible explanation was that through plugging in the
numbers in the formula she might have detected some patterns that would transfer to
other situations as well, and consequently the formula became valid to her in general
cases. If this conjecture is true, we may claim that Alice found transformational reasoning
convincing in both visual and numerical contexts.
Note that Alice didn’t explicitly indicate a preference for relying on
transformational reasoning. Such a scheme was detected by extracting the similarity of
the comments she made. This revealed that Alice was not yet able to explicitly reflect on
reasoning of arguments (i.e. the link between evidence and conclusion).
4 of Alice’s comments were coded “P.” Similar to what was detected in the cases
of Allen and Abby, Alice also suggested the need for simplicity of an argument to be
convincing to her. For example, she claimed that D2 was less convincing since “it has
more than one step, which makes it kind of harder to do.” In addition, it was detected that
her own way of approaching a problem had an impact on her judgment of the arguments
given. This impact was most obvious when she was working on Problem E, where she
claimed that she didn’t consider any of the arguments convincing since she had “a
different way of finding the answer than all of the arguments.” In explaining her own
method, Alice also started with specific numbers. However she decided to give up that
approach after a few trials. What she had done didn’t seem to be different from what was
166
suggested in E1 (inductive). So it seemed that she didn’t understand what was offered in
E1.
Based on Alice’s comments when explaining her rankings, Figure 23 was
generated to illustrate her rationale for evaluating mathematical arguments. It is
suggested that Alice was likely to be convinced by arguments that were simple enough
and used approaches that were familiar to her. In particular, arguments that utilized visual
examples and engaged transformational reasoning seemed to be the most convincing type
to her.
Transformational
Examples
Convincing
arguments
Easy to understand (Visual)
Familiar procedure
Figure 23. Illustration of Alice’s rationale for evaluating mathematical arguments
The case of Amy
Amy was an 8th grade student enrolled in an Algebra I class at the time of data
collection. In her responses to SMR, the algebraic arguments were indicated as the closest
to how she would argue in all but Problem C, where she selected C1, the inductive
argument. Based on this result, we believed that Amy had exhibited preference towards
167
algebraic arguments. Therefore, she was considered to be a representative from the
consistent group.
Amy’s interview responses are summarized in Table 20 and Table 21. Table 20
represents the order of problems that she tackled. Table 21 summarizes Amy’s comments

Problem C C3 (visual) C1 (inductive) C4 (perceptual) C2 (algebraic)
Problem B B1 (inductive) B3 (algebraic) B4 (visual) B2 (perceptual)
Problem A A2 (algebraic) A4 (visual) A3 (perceptual) A1 (inductive)
Problem D D2 (algebraic) D4 (visual) D3 (perceptual) D1 (inductive)
Problem E E4 (algebraic) E3 (perceptual) E2 (visual) E1 (inductive)
Table 20. Rankings of arguments provided by Amy
168
Problem C
That’s kind of how I thought of it in my When I think of stuff, I don’t put it in like,
head. (P) diagram and shape form, I kind of just
think of it as just, like, stuff in my head, I
don’t think of any shapes or any examples
to it, and so I sometimes, they confuse me,
this at first confused me when I looked at
it. (E2-, E3, R1-)
It has the diagram of the pictures, which It’s got too much numbers in it, and so it
just makes sense in my head, and it has gets it confused in my head, so I have to
different cases, so that it just, there’s reread it in my head a couple times. (E2-,
different things to show instead of just one R3-)
example, there’s more. (R1, E2, L3)
Showing more examples makes it more It just slows me down mostly. (P)
convincing. (E2, L3)
It made more sense and it seemed more It doesn’t really, it starts with the b, and it
valid to me. (P) doesn’t explain the a and c in the first part,
and I think it probably should explain the a
and c, it just explains the b. (E2)
The picture confused me a little bit. (E2-,
R1-)
It’s not very detailed. (P)
This one’s too detailed, and not really,
completely true, and that one’s not really
detailed enough; the picture shows it, but
like I said, sometimes pictures get me lost.
(R1-, P)
Problem B
This person used more examples, they did It’s just one example, and so it’s not
more of, I guess, trials… and so, it’s more necessarily true because it just, it may not
likely to be true for this one than for other be a hundred percent true. (L3-)
ones. (E2, L3)
It has a whole formula. (E4, R4) It has more information to it, but I, it
confused me a little bit. (P)
It looks pretty true. (R1) I don’t think it shows it, because it says
many and several. (L3-)
continued
Table 21. Summary of comments made by Amy
169
Table 21 continued
It used an actual formula. (E4, R4) This is only, it just has one example, and it
doesn’t have anything other than one
example to back it up. (E2, L3-)
I don’t exactly think this is even correct,
because it says BQ is equal to BD, and I
might have been misunderstanding it
wrong, but it just doesn’t look equal to it,
it doesn’t seem very equal… after that, it
kind of just lost me, because I was just
like, well this isn't equal, so the rest of it
doesn't seem very true either. (E2, R1)
Problem A
You can put any number in there, and it This one is just shown by pictures. (R1-)
wouldn’t make a difference. (E4, R4)
It has the formula. (E4, R4) With pictures, like, sometimes it can be
incorrect, or not true for some things. (R1-
)
I think with a formula, it makes it true for The person just tried a couple different
any event. (E4, R4, L6) things; I mean, they might have tried a lot,
but they didn’t try all of them, which is
important. (E2-, L3-)
Most people like thinking math with food
and whatnot, once you get into food, it just
completely loses me. (E3-, L2-)
It’s talking about cookies… I just can’t
picture that in my head. (E3-, L2-)
Problem D
It’s got the formula, and it uses x instead It doesn’t necessarily say the formula, and
of an actual price, so it can be any number, so I don’t one hundred percent know
and the formula is correct. (E4, R4, L6) exactly what formula was used, in my
head. (E4, R4)
It’s kind of like using this formula, just This one has actual prices instead of x, and
putting it onto a graph instead. (E4, R1, so even though they use different prices,
R4) it’s not always true, because they can’t use
every number… and so it’s just, you can’t
tell from that one if it’s 100 percent true or
not. (E2-, R4, L3-)
It shows the formula, and so I’m, it’s They just don’t have a lot of stuff to back
really clear on what they’re doing. (E4, it up. (E6-)
R4)
continued
170
Table 21 continued
It doesn’t have much information to back
it up that it’s true, so it’s not as clear. (E6-)
I would probably add to it that, something
about the actual, something about the
before price and the after price, instead of
just the 20 dollars and the 5 percent. (E2)
It would be better if it had, like, an x for an
actual price… instead of just showing the
tax difference. (R4)
I am pretty sure that all rectangles are I don’t think the diagram… the picture, I
similar. [making rectangles using her don’t think it goes with this… yeah, with
fingers] (E2, L4) the description. (E2-, R1-)
They kind of have a formula here, and then You can’t necessarily go with the picture,
they back it up with different examples. because the picture doesn’t show all cases.
(E2, E4, R4) (E2-, R1-)
It [my brain] likes more figures and In your brain, your brain can just skew
numbers. (E2, R1, R3) everything if you just have one missed
piece of data or anything. (E3-, L2-)
Numbers are a lot simpler than trying to My brain doesn’t like to connect to
think of something in my head. (E2, R3) imaginative stuff. (E3-, L2-)
Those just don’t convince me as much as
numbers and something that I can actually
see on a piece of paper. (E2, E3-, R3)
It just shows a couple cases, not the whole
range of cases, ’cuz there could be
basically any number, there could be tons
of different things it could be. (E2-, L3-)
Problem E
It has more than one case, and it has It has one case, instead of all the, however
variables, so you can put anything into it, many, amount of cases. (E2-, L3-)
and so it will be true for anything, instead
of just one thing. (E4, R4, L6)
It’s got a picture instead of a number, and
the pictures can be misinterpreted, or
mismade. (E2, R1-, R3)
It doesn’t have any pictures or numbers, it
just has words to back it up. (E2, R1, R2-,
R3)
continued
171
Table 21 continued
[It was not backed up by] numbers and
objects. (E2, R3)
Additional Comments
The numbers, they seem to be right, but
they don’t really show anything else. (E2-,
L3-)
They say it in words instead of in numbers.
(E2, R2-, R3)
As shown in Table 20, Amy considered algebraic arguments as most convincing
in 3 of the 5 problems (i.e. Problems A, D, and E), while the inductive arguments were
rated least convincing in the same three problems. The visual and perceptual arguments
were ranked between the algebraic and inductive arguments. This general preference for
algebraic arguments was consistent with her responses in the SMR. However, Amy’s
rankings for the arguments in the other two problems were different. In Problem C, she
considered the visual argument as the most convincing while the algebraic argument
received the least convincing ranking. In Problem B, the inductive argument was ranked
the most convincing while the perceptual argument was considered the least convincing.
In order to better understand how Amy evaluated the proposed arguments and her
rationale when providing these rankings, the coding for her explanations in Table 21 were
summarized in Table 16 so to identify factors and features of the arguments that had
influenced her judgment.
172
Positive 6 0 7 13
Negative 8 2 1 0

Positive 0 16 1 12 0 0
Negative 0 10 5 0 0 2

Table 22. Categories of comments made by Amy
indicating that all three factors had impacted her evaluation. These also indicated that
much of Amy’s explanation was based on the features of the arguments instead of her
personal opinions.
There were three key findings that made Amy special. First she was the only
subject who had clearly and repeatedly emphasized her preference toward algebraic
arguments and made explicit claims about the logical rigidity of these arguments.
Mathematical facts and symbolic representation were referenced 12 and 13 times
173
respectively when she was talking about factors that convinced her. She made these
statements were made when justifying the rankings she provided for Problems A, D, and
E. In particular, she claimed that A2 (algebraic) had “a formula; it makes it true for any
event;” D2 (algebraic) “got the formula, and it uses x instead of an actual price, so it can
be any number, and the formula is correct;” and E2 had “variables, so you can put
anything into it, and so it will be true for anything, instead of just one thing.” Her
explanation demonstrated that she was not only attracted by symbolic format, but also
understood their rigidity in logic. This was explicitly addressed 3 times. As a natural
consequence of this realization, she had also repeatedly addressed the deficiency of
inductive and perceptual reasoning (4 and 8 times, respectively). This was exemplified by
her claims that “they might have tried a lot, but they didn’t try all of them, which is
important,” “this one has actual prices instead of x, and so even though they use different
prices, it’s not always true, because they can’t use every number” and “your brain can
just skew everything if you just have one missed piece of data or anything.”
Second, she was the only subject who clearly described the disadvantage of visual
illustrations, which was not about any specific image or graph, but about visual
illustration as a way to reason. Such claims include “you can’t necessarily go with the
picture, because the picture doesn’t show all cases” “it’s got a picture instead of a number,
and the pictures can be misinterpreted, or mismade,” and “with pictures, like, sometimes
it can be incorrect, or not true for some things.” This point was addressed 8 times during
the interview.
The third finding was that although the previous two results consistently appeared
in her explanation in the number theory, algebra, and probability problems, they were not
174
present when she was working on the two geometry problems. This is a good example to
illustrate how context may impact students’ reasoning method. If a reasoning test was
based on Problems A, D and E, Amy should be considered as one who demonstrated the
highest level of maturity in mathematical reasoning, especially among 8th graders. So the
question is why she evaluated arguments in geometry contexts differently.
One reason could be that Amy tended to avoid working on visual representations
in Problems A, D, and E since she believed they might misrepresent the content.
However, in geometry problems she had to work on images and figures. Amy’s
explanation for considering B1 (inductive) as the most convincing option in Problem B
further revealed her thinking in geometric context. When asked to compare B1 to the
inductive arguments in other contexts, Amy suggested that B1 was different because the
cases used in B1 were not numbers but rectangles. She further claimed that “all
rectangles are similar” shapes that shared common properties such as “equal opposite
sides,” and hence if the claim that “diagonal is longer than the sides” was true for some of
them, it should apply to others as well (While stating this, she used her fingers to made a
rectangle and made a movement to represent the adjustment of side lengths). This
explanation revealed that Amy utilized transformation to convince herself that B1 did
account for all cases. Similar strategy applied to her judgment in Problem C, where C3
(visual) utilized transformation. This argument was rated most convincing since she
believed it demonstrated that the conjecture was true for all cases. Further examination of
Amy’s judgment of the algebraic argument in the two geometry problems revealed that
she didn’t understand the algebraic argument in Problem C and hence considered it the
least convincing. She was convinced by B3 (algebraic) but rated it low because it
175
confused her slightly at the beginning. Supported by this evidence, we believe the
following three points capture Amy’s major rationale when judging whether
mathematical arguments were convincing.
First, in order for an argument to be convincing to Amy, it must show the
conjecture is true in all cases. This perception served as the primary guiding principle for
her judgment of arguments in all 5 contexts.
Second, she found testing a few numbers to helpful to understand a problem better;
however, she believed that algebra was the reliable tool to guarantee the general validity
of an argument. She didn’t consider other reasoning methods, in particular induction,
perceptual connection, and visual illustration, as reliable, and suggested they each had
their own deficiencies.
Deductive
Convincing
arguments (Symbolic,
True for all cases Numerical)
Figure 24. Illustration of Amy’s rationale for evaluating mathematical arguments
Third, she considered different numbers as separate cases but she viewed a group
of geometric shapes that share certain common properties as related cases. Therefore,
examples in some geometry context were viewed as generic examples and
transformational reasoning could be utilized to extend the validity of a detected property
176
to other cases. However, examples in numerical contexts were viewed as isolated
instances, hence their property might not hold in other situations. Amy’s rationale was
illustrated in Figure 24.
The case of Beth
Beth was an 8th grade student enrolled in an Algebra I class at the time of data
collection. When working on the SMR, she tended to prefer A4 (visual) B2 (perceptual)
C2 (algebraic) and D4 (visual) in respective problems and hence she was considered to be
a representative from the inconsistent group.
Beth’s interview responses are summarized in Table 23 and Table 24. Table 23
illustrates her rankings for each problem. Column One of the table represents the order of
problems that she tackled. Table 24 summarizes Beth’s comments when articulating why
she found certain arguments convincing or not convincing (The coding of each comment
is explained in Table 9). These two tables served as the major resource for the interview
analysis.

Problem B B2 (perceptual) B4 (visual) B3 (algebraic) B1 (inductive)
Problem D D4 (visual) D1 (inductive) D2 (algebraic) D3 (perceptual)
Problem A A3 (perceptual) A2 (algebraic) A4 (visual) A1 (inductive)
Problem C C4 (perceptual) C2 (algebraic) C1 (inductive) C3 (visual)
Problem E E1 (inductive) E4 (algebraic) E2 (visual) E3 (perceptual)
Table 23. Rankings of arguments provided by Beth
177
Problem B
I’ve been on a football field, so I know I got really confused. (NA)
what the shape is and everything, so if I
imagine to myself I’m standing at the
corner of a football field, like that says,
and I’ve had to run football fields, and
they’re called the suicide thing, so I had to
run that way, and then those two ways, and
that one was longer than those two when I
was running. (E3, L2)
It’s also because of a relatable thing, I, I guess it would be more convincing if I
like, I understand what it means when it knew what the actual numbers were, if
says, I can picture a rectangle being drawn, they actually use the actual numbers in
plus I’ve measured rectangles, so that’s them, and not just like, saying the square
longer than the two sides. (E3) of BD, if they actually put the actual
numbers. (E2, R3, R4-)
I just know more about B2, I’ve run the We’re probably not going to have rulers
football field before, so I guess that’s why. during the test, so it’s going to be harder.
(E3, L2) (P)
You can kind of look at the side lengths
and see what they mean by it, instead of
having to measure it and everything. (E2,
R1)
Problem D
D1 gave a little bit more of an explanation It makes sense, it’s just really short, and
at the end, and also just like on that one they don’t really give a lot of examples.
[points to previous question], they used (E2)
actual numbers, so even though it wouldn’t
really probably be that hard for me to
insert a number in there during the test,
that one's already done for me, so it's
probably a lot easier to do. (E2, R3, P)
I could insert the 200 dollars and the 500 They don’t give you examples of numbers
dollars that he’s suggesting is the same that fit into it really, they just… I guess
thing, and I could see if it was actually yeah, they just don’t give you numbers to
right. (E2, L3) support themselves, their claims. (E2, R3)
continued
Table 24. Summary of comments made by Beth
178
Table 24 continued
It just has an illustration, and I’m
sometimes, most of the time, I’m a visual
learner, so it helps a lot to see it and read
what it says, and it is, it makes sense. (R1,
P)
It only gives one example, but it also
offers 200, 500 if you wanted to insert
them, so yeah, I think it does support that.
(E2, L3)
Problem A
I can kind of imagine someone having six It confused me the first time I read it, and I
cookies in… having a multiple of six I had to re-read it, because I wasn’t really
imagine 36 because that’s the square I sure what it meant by the uh, when it was,
guess, square root or whatever, and um, so the way it was divided and everything.
I imagine 36 and I imagine six boxes of 36 (NA)
cookies and dividing each into two and
then there's three cookies in each, so… and
then you can just, you can put the three
cookies with the two boxes of three
cookies, you can put it back into one box
of six, and it's still a multiple of 36 either
way. (E3)
You can insert a number in there… and it I’m guessing that they’re doing what I
would make sense. (E2) think they’re doing. (NA)
It’s visual, so it’s a lot easier for me to It says that she’s tried plenty of multiples
understand when it’s visual. (R1, P) of six, and three as well, and that they’re
the same, but just ’cuz she’s tried a lot of
them, she hasn’t tried all of them, so you
could never really know, based on that
statement, if she was right or not. (E2-, L3-
)
Even if you just try a wide range of
numbers, you still, you never know. (E2-,
L3-)
Problem C
I can visualize that, and ’cuz I can think You’ve tried many cases, but you can
about it in my head. (E3, L2) never be sure, ’cuz you haven’t tried all
the possibilities, which really, you could
never do anyways. (E2-, L3-)
continued
179
Table 24 continued
I like they way that C4 is explained better; It says that she shortened the sides, but it
I like being able to imagine it, or being doesn’t say by how much, so she could
able to think… ’cuz actually, I thought the have shortened the sides at any, she could
area of this table surrounded by wire. (E3, have shortened a more than she shortened
L2) b or more than she shortened c, so she
doesn’t really say how much to shorten it
by. (E2)
It’s easier… to imagine. (E3, L2) I think that to make the claim more
believable, you would have to cut all the
sides by the same length, we would have
to cut the sides at the same length from
each side. (E6)
I still like that just because of the graph, You yourself would only be inserting a
and I can look at it and kind of understand certain amount of numbers, you wouldn’t
what they’re saying and everything. (E2, be sitting there inserting every single
R1) number in the world. (L3-)
It’s graphed with the two lines, and it You get to trust your answers, you don’t
shows that they’re all, that it’s one unit have to trust their answers, but you also
apart, and if you wanted to, you could kind are limited to a certain number of
of check that with all of them, they’re all numbers, so you can’t… it’s kind of like,
one unit apart and make sure that it was half and half, good and bad. (NA)
one unit apart the whole time like they said
it was. (E2, R1)
It’s relatable for me. (P)
I’m having the football field switch in my
mind, and every rectangle that I can think
of is, it works. (E3, L4)
I can imagine them in my mind, I can
picture them. (E3, R1)
It’s more visual. (R1)
If you use them [variables] you can insert
numbers, any number that you possibly
want, and even if you wanted to insert
numbers just to see if they were wrong…
(R4)
You can insert whatever numbers you
want, you don’t have to go by what they’re
saying as much. (R4)
continued
180
Table 24 continued
Problem E
If you take 2 out of 5, and you have 4 out it [the narrative description] doesn’t really
of 10… if you take 4 out of 10, it would support what they’re saying, it kind of just
reduce to 2 out of 5, which is the same doesn’t support this; it “unsupports” it not
percent, so that’s why it makes sense. (E2, making sense; it doesn’t really support it.
L3) (E6-)
It shows that they’re the same ratio, You can never really try all the numbers.
they’re still proportionate. (E2, L3) (L3-)
Additional Comments
It gives more information about the
illustration. (R2)
As shown in Table 23, Beth identified the perceptual arguments as the most
convincing in Problems A, B, and C but least convincing in the other two problems.
Algebraic arguments were never considered the most or least convincing. Beth’s
evaluation of visual and inductive arguments were highly inconsistent across the
problems and appeared at different places on the lists. In order to better understand how
Beth evaluated the proposed arguments and her rationale when providing these rankings,
the coding for her explanations in Table 24 were summarized in Table 25 so to identify
factors and features of the arguments that had influenced her judgment.
181
Positive 7 1 3 2
Negative 0 0 0 1

Positive 0 13 9 0 0 1

Table 25. Categories of comments made by Beth
As shown in Table 25, the total number of Beth’s comments that focused on the
representation, evidence and link of the arguments were 14, 27, and 15, respectively. It
was found that Beth made more comments about the evidence of the arguments than the
representation and link. Among the types of evidence, examples (i.e. results from an
immediate test) were the most frequently referenced. Specifically, 13 were the results of
immediate tests, mostly by plugging in numbers (e.g. “you can insert a number in there”).
There were 9 cases where imaginaries from past experience (e.g. “I just know more about
B2, I’ve run the football field before”) were recalled to decide whether arguments were
found convincing. Formulas and theorems were not treated as reliable sources of
182
evidence to Beth. In order for them to be convincing, she needed to plug in numbers to
verify.
Beth also commented on the impact of representations on her evaluation. Most
prominently, she claimed that “most of the time, I’m a visual learner,” and an argument
was “a lot easier for me to understand when it’s visual.” Note that by “visual” she didn’t
only mean visualizing something that was drawn on paper, but also visualizing something
in her mind, i.e. imagining some model. She didn’t distinguish between these two types
of visualization in her explanations. Overall, there were 7 times when Beth mentioned
that arguments with visual illustration contributed to her conviction. In addition, Beth
recognized the value of numerical expression in offering her concrete example to support
a claim. She acknowledged the value of symbolic expression in allowing her to test
numbers that she wanted to check. However, she thought that neither of the expressions
was powerful enough to show that the conjecture was true in all cases. This was further
explained in her view about the link between evidence and conclusion.
Beth didn’t believe that an algebraic argument could prove a conjecture was
always true. Compared to numerical expressions, the symbolic formulas only offered the
advantage that “you can insert whatever numbers you want, you don’t have to go by what
they’re saying as much.” Despite this, she sometimes preferred numerical expressions
since “they used actual numbers, so even though it wouldn’t really probably be that hard
for me to insert a number in there during the test, that one's already done for me, so it's
probably a lot easier to do.” Beth considered an argument to be more convincing if she
“knew what the actual numbers were, if they actually use the actual numbers in them, and
183
not just like, saying the square of BD.” Therefore, an algebraic expression was not
meaningful to her unless the variables were substituted by actual numbers.
Beth’s evaluation of inductive arguments were not consistent across the problems.
On the one hand, she explicitly pointed out that trying a few cases was not sufficient to
show a conjecture is always true. For example, in commenting on A1, she claimed that
“she’s tried a lot of them, she hasn’t tried all of them, so you could never really know,
based on that statement, if she was right or not.” Similar statements were articulated 5
times during the interview. However, when she was evaluating B2, even though she
realized that football field only represented a certain type of rectangle, she still
considered it as the most convincing one since she could “relate” to it. Similar situation
occurred in her ranking of arguments in Problem D, where she considered D1 (inductive)
the second most convincing, admitting that it couldn’t prove the conjecture was always
true. This suggested that being able to show the general validity of a conjecture was not a
required condition for Beth when considering an argument convincing. Other personal
factors played a more important role.
An examination of the personal standards Beth discussed during the interview
revealed her need to see simple, “relatable” and easy to access arguments in order for her
views to be convinced. Similar opinions were expressed 4 times. While “general validity”
contributed to the reliability of an argument (e.g. her comments on A1), it was not the
single decisive factor.
This explained Beth’s preference for perceptual arguments (A3, B2, and C4) in
Problems A, B and C (see Table 23), since the contexts provided in those argument
evoked familiar experiences and hence were most “relatable” to her. In contrast, the two
184
perceptual arguments (D3 and E3) in Problems D and E didn’t provide any “relatable”
scenarios and hence were considered less convincing.
Based on Beth’s comments when explaining her rankings, Figure 25 was
generated to illustrate her rationale for evaluating mathematical arguments. It was
suggested that Beth was likely to be convinced by arguments that were “relatable” to her
existing experience. In particular, arguments that create a scenario that can be visualized
by her seemed to be the most convincing type. Additionally, illustration of various
examples could help her access a problem and hence contributed to her conviction.
Perceptual Examples,
Convincing Imaginaries
arguments Easy to understand,
(Visual, Narrative)
Relatable scenario
Figure 25. Illustration of Beth’s rationale for evaluating mathematical arguments
The case of Betty
Betty was an 8th grade student enrolled in an Honor’s Algebra I class at the time
of data collection. In her responses to SMR, she considered the visual argument (A4) in
Problem A, the perceptual argument (B2) in Problem B, the algebraic argument (C2) in
Problem C, and the inductive argument (D1) in Problem D as the most appealing option
185
in each context. Since she exhibited preference towards different types of argument
across the context, she was considered to be a representative from the consistent group.
Betty’s interview responses are summarized in Table 26 and Table 27. Table 26
represents the order of problems that she tackled. Table 27 summarizes Betty’s comments

Problem D D1 (inductive) D2 (algebraic) D4 (visual) D3 (perceptual)
Problem C C2 (algebraic) C3 (visual) C1 (inductive) C4 (perceptual)
Problem A A2 (algebraic) A3 (perceptual) A4 (visual) A1 (inductive)
Problem B B3 (algebraic) B4 (visual) B1 (inductive) B2 (perceptual)
Problem E E1 (inductive) E4 (algebraic) E2 (visual) E3 (perceptual)
Table 26. Rankings of arguments provided by Betty
186
Problem D
When they explain it and show the work It wasn’t enough work to show how they
[examples] that that’s right. (E2) got a dollar off. (P)
This one is, like, no work at all. (P)
It just, like, gives you a graph and doesn’t
explain how they formed the graph and
like, how they got from the five percent to
a dollar. (R1-, R2)
Problem C
The statement they made is true; they said It [the perceptual argument] just states that
the area of a triangle equals half of the they’re larger. They have no idea what
product of its base and height, and that’s they’re talking about. It’s just, like, blank.
true. (E4) (E6-)
That [the formulas] describes how they They basically just stated that it’s larger.
found out the answer. (E4, L5) (E6-)
They diagramed the triangle part ... If you They wouldn’t even give any, like, work
cut it, you make it smaller. (E2, R1, L4) [in addition to the examples] ... They
didn’t explain why. (E2-, L3-)
That helps to see the actual work [formula
and related procedure] being done of how
to get the answer. (E4, R4, L5)
Problem A
It gives you an equation to solve for n, and It doesn’t really explain, it just breaks up
it comes out correct. (E4, R4) the pattern, like, the blocks. (R1-, R2)
It used cookies as an example. (E3, L2) It just says that, like, this can be an
opinion. (E6-)
It [A2] explained more of how to find the You didn’t go further in the numbers. (R2-
way… to get the answer. (E4, R4) , L3-)
You didn’t look for… like, multiples of
three and six, to see if, to compare them, to
see if they’re the same. (E2)
That’s not enough, I think they just, like,
picked random numbers. (E2-, L3-)
Problem B
I looked at the length of explanations. (P) It’s just, like, an opinion. (E5-)
They divided the rectangle ... then the It’s just, too plain… they didn’t even dig
Pythagoras Theorem ... (E4) deep and explain what they did. (E6-)
continued
Table 27. Summary of comments made by Betty
187
Table 27 continued
They are all radius [so they are equal]. There are small football field, and big one,
(E4) say NFL ... so that’s not true for all
football fields ... the size varies (E3, L3-)
They showed the length [pointed on the I think they (the diagonal and the side) are
figure]. (E2, R1) the same size. (E6)
They are true in their cases. (E2)
It explains more, they give you a problem They [inductive arguments] just give you
[example] for you to find the solution to the statement, it’s not really explanations
get the answer, to see if it’s right. (E2, P) of how they found it. (E6-)
You gotta work through the problem to get It’s okay to draw a picture, but you have to
the answer. (P) explain the picture too, and they didn’t
really explain it… as well as algebra
would. (R1-, R2, R4)
Algebra, it explains it more than just
saying, just making a statement, and they
give you equations and inequalities, and
problems to find the solutions to get your
answer, rather than just making a
statement. (E4, R4, P)
Problem E
It explains how they get through to use They just drew a picture, and they didn’t
percentages and ratios. (E2, R3, P) really explain it, they just basically said
that the ratio of two ping pong balls would
be the same, therefore they won’t change.
(E6-, R1-, R2)
They use algebra, and it like, and they use It just makes a statement. (E6-)
variables to explain how they got the
answer. (R4)
They give you a percentage. (E2, R3) It just gives you algebra for you to solve
it… it basically just, it isn’t as good as
[points at inductive argument]. (R4-, R3)
There’s more math involved here
[inductive argument]. (E2, R3)
Additional Comments
If you explain how you found it. (R2, P)
How you found the answer to the problem
being asked. (P)
You have to find, go… dig it further, take
further steps. (P)
188
As shown in Table 26, Betty considered the algebraic arguments most convincing
in Problems A, B, and C, and second most convincing in Problems D and E,
demonstrating a consistent preference toward algebraic argument. In addition, she
considered the perceptual arguments the least convincing options in 4 of the 5 problems,
also providing consistent evaluations toward this type of arguments. Therefore, although
Betty was selected as a representative of the inconsistent group, she exhibited more
consistent judgment of certain types of argument during the interview phase. To better
understand Betty’s rationale when providing these rankings, the coding for her
explanations in Table 27 were summarized in Table 28 so to identify factors and features
of the arguments that had influenced her judgment.

Positive 3 5 4 6
Negative 3 1 0 1


Table 28. Categories of comments made by Betty

189
As shown in Table 28, the total number of Betty’s comments that focused on the
indicating that all three factors had impacted her judgment.
Betty’s explanation revealed that she could be convinced by arguments that
utilized symbolic and numerical representations. Each of these was mentioned 6 and 4
times, respectively, during the interview. For example, she stated that “it [the argument]
explains how they get through to use percentages and ratios” and “algebra, it explains it
more than just saying, just making a statement, and they give you equations and
inequalities, and problems to find the solutions to get your answer, rather than just
making a statement.” These statements helped to explain her rankings, where the highest
ranked arguments were written in either symbolic or numerical format. However, Betty
didn’t consider visual illustrations convincing except in the two geometry problems.
Although she did rely on visual evidence in the two geometry problems (e.g. she needed
to visually compare the length of two line segments), she didn’t consider reliance on
visual illustrations a convincing way to validate the conjecture in the other three
problems. She suggested that “it’s okay to draw a picture, but you have to explain the
picture too.” Similar opinions were repeated 3 times. Therefore, she didn’t believe simply
showing the graphs and figures without robustly unpacking their meanings made an
argument convincing. This explained why she didn’t consider visual arguments
convincing in problems that didn’t involve geometry content. A need for narrative
description (to explain examples or pictures) was mentioned 5 times. However, it seemed
that arguments with only narrative representations were also not convincing to her. This
can be illustrated by Betty’s comment on C4 (perceptual), “it [the perceptual argument]

190
just states that they’re larger. They have no idea what they’re talking about. It’s just, like,
blank.”
Betty also found that the evidence provided in an argument contributed to its
validity. In particular, she considered facts (i.e. known mathematical results) and
examples (i.e. results from an immediate test) as reliable sources to establish validity of
an argument, each of which was referred 8 and 9 times. In the two geometry problems
she recognized the validity of the triangle area formula and the Pythagoras Theorem, both
of which made the corresponding arguments convincing to her. She also examined
particular shapes drawn on the paper. In the other problems, the numerical examples
served as primary source of evidence and she even added her own calculations to verify a
few statements. She also perceived arguments that built on imaginaries (football field and
cookie bags) convincing.
Despite this, Betty didn’t think that merely checking a few examples made an
argument convincing. She mentioned this point 6 times during the interview. For instance,
she commented on A1 (inductive) that “that’s not enough, I think they just, like, picked
random numbers.” When evaluating B2 (perceptual), she claimed that “there are small
football field, and big one, say NFL ... so that’s not true for all football fields ... the size
varies.” These comments revealed that she was able to see the differences among various
examples and had realized some properties might not be generally applicable. However,
she considered the inductive arguments in Problems D and E the most convincing option.
She suggested that these arguments “explain it and show the work that that’s right” and
“explains how they get through to use percentages and ratios.” In these cases, whether an
argument was valid in general cases was ignored. To investigate why these seemly
191
contradictory behaviors happened, we sought explanation from Betty’s personal standards
for her judgment.
It was found that Betty repeatedly emphasized the need for “explanations.” The
terms “explain” and “explanation” appeared a total of 16 times in her comments. In
addition, the need to see more “work” was addressed 6 times. This was highlighted by
her comments that “you have to find, go… dig it further, take further steps.” While it was
difficult to understand what exactly she meant by merely considering segments of the
interview, it became more sensible when taking into account the entire interview. In fact,
Betty could be convinced by a variety of explanations. It could be a description about a
picture, an examination of a few concrete cases, or an illustration of the calculation
process. When working on Problem B, she suggested that she “looked at the length of
explanations” instead of the content to see which argument was more convincing. We
didn’t believe the length of explanation was the single factor to determine her conviction
(in fact it was not, since A1 was short but considered convincing and D4 was long but
considered not convincing); however it didn’t seem that Betty would prefer any particular
type of explanations. Further analysis of the data revealed that whether the idea of an
argument was explained clearly could be more important to Betty than whether the idea
itself proved the proposed conjecture. This was detected in Problem B, where she
provided a ranking for the arguments but suggested that the conjecture was false and
neither argument could show the conjecture was always true (but B3 (algebraic) and B4
(visual) were still more convincing to her since they were “true in their cases”). A similar
situation occurred in her work on Problem A, where even after she had provided a
ranking for the four arguments, she was still unsure if the conjecture was true or false.
192
Therefore, we believe Betty had a personal standard of what a “convincing” argument
meant. To her, the “convincingness” of an argument was first determined by how much
detail the argument offered in order for her to understand the information, and the
purpose of the argument (i.e. to justify the general validity of a conjecture) seemed to be
less important.
Combining Betty’s personal scheme and her comments on the representation,
evidence and link of the arguments, Table 11 was created to highlight her rationale when
evaluating mathematical arguments. Betty’s interview responses suggested that she was
more likely to be convinced by explanations that were rooted in concrete examples and/or
known mathematical facts and developed through multiple steps of transformation or
perceptual connections.
Ritual
Perceptual
Convincing
arguments (Symbolic,
Detailed procedure Numerical)
Figure 26. Illustration of Betty’s rationale for evaluating mathematical arguments
The case of Blake
Blake was enrolled in an Integrated 8th grade Mathematics class at the time of
data collection. When working on the SMR, he had selected A2 (algebraic), B2
193
(perceptual), C1 (inductive) and D3 (perceptual) in respective problems as the most
appealing option and hence he was considered to be a representative from the
inconsistent group.
Blake’s interview responses are summarized in Table 29 and Table 30. Table 29
illustrates the rankings provided by him for each problem. Column One of the table
represents the order of problems that he tackled. Table 30 summarizes Blake’s comments
when articulating why he found certain arguments convincing or not convincing (The

Problem B B2 (perceptual) B1 (inductive) B3 (algebraic) B4 (visual)
Problem C C4 (perceptual) C1 (inductive) C3 (visual) C2 (algebraic)
Problem A A1 (inductive) A2 (algebraic) A3 (perceptual) A4 (visual)
Problem D D3 (perceptual) D1 (inductive) D2 (algebraic) D4 (visual)
Problem E E2 (visual) E1 (inductive) E3 (perceptual) E4 (algebraic)
Table 29. Rankings of arguments provided by Blake
194
Problem B
It’s a little bit more simple. (P) A little bit too math-like, they’re not even
thinking about the problem, they’re not
even talking about it, they’re just trying to
make you all confusing with different,
they’re trying to make it different formulas
so you can get all confused about this. (E4-
, R4-, P)
This gives you more of a visual type thing A little bit too much work here for you to
so you can actually understand it, so you understand. (P)
can imagine how that would actually work.
(R1, E3, L2)
You just gotta try to figure it out on your They’re just saying it in like, math, and
own. (P) why not just plainly say, like, it’s longer.
(P)
It’s trying to tell you how to think… and
that wouldn’t really work. (P)
I’ve known some people where they
actually want to think on how to actually
figure it out, not just like, OK, here’s ya
how to do it, think this way. (P)
Problem C
It simply says it, it’s making it not too A little bit too math-like, it wouldn’t really
complicated. (P) work. (R4-, R1-, P)
It’s actually giving you not that much One of the second complicated ones,
visualization, but if you combine these two where they’re trying to make you all
[inductive and perceptual] together, then confused. (P)
you would actually get the answer, right
here. (E2, E3, R1-)
The visual aids right here, they actually They just want to try to make you think
help you, but then there’s an explanation that the one triangle equals two, so that it
that goes with that. (E2, R1, R2) would just make you go all over the place,
and see. (P)
It [perceptual argument] gives you the It’s just going all over the place, it just
explanation, but if you added this with it wants you to think something else besides
[points to inductive argument] it would that, so it wouldn’t work. (P)
really help. (E2, R2)
This is trying to mess you up completely.
(P)
continued
Table 30. Summary of comments made by Blake
195
Table 30 continued
it’s like you’re trying to just blatantly out
say it, how you’re just trying to confuse
us. (P)
It just didn’t make any sense. I thought
that it wasn’t talking about the question at
all. (NA)
They’re trying to trick ya. (P)
Just trying to confuse ya. (P)
It’s talking about it in a college-like term.
(R4-)
Problem A
It has a visual aid like it’s supposed to. They’re all confused about this, because
(E2, R1) it’s like blurry and everything, and it’s
like, oh God, it’s too much. (R1-, P)
This is where it talks about the geometry There’s such thing as too much on these
and everything, this is where I could easily things. (P)
understand it. (P)
It gives you a kind of convincing It wouldn’t explain as much as I wanted it
visualization; it kind of gives it easy. (E2, to. (P)
R1)
If you actually did the math right here with It’s not explaining as much to where you
a calculator, you could actually understand can actually understand it; however it’s
it better. (E2, R3, L5) easier for you to understand; however, it’s
not exactly what you want for an answer;
it’s not explaining as much as you want it
to. (NA)
You’re thinking for yourself. (P) I could try and try, and it just gives me too
much work. (P)
There’s such a thing as too much work;
however, thinking for yourself and
actually trying to find out. (P)
Problem D
It’s doing it a lot more simple, to where They’re missing something in this
you can actually figure it out. (P) question… they don’t tell you what the
price of the bike is, so that where you
could actually find it out easier. (E2, R3)
It actually gives you a visualization. (E2, It makes it a little bit too complicated to
R1) where if you do the wrong thing, it’s
kaput. (P)
continued
196
Table 30 continued
It just simplifies it for you… it’s so you A lot of kids in my class where it wants
can actually figure it out for yourself. (P) variables, with dividers, with decimals,
and it just completely makes them…
blank. (R4-, R3-)
This one [perceptual] totally simplifies it For a sophomore, this’d be working, but
for you, so you wouldn’t have to do that for an 8th grader… [makes sound of
much work; however, you’re still actually disagreement]. (R4-)
learning math. (P)
It’s more like a word problem. (R2-)
They’re just trying to make it all
complicated. (P)
This is a tricky one they’re trying to pull
on ya. (P)
It’s just telling you on what you should do,
and a lot of people don’t like that. (P)
You need this, this… where’s my opinion
in it? (P)
It actually gives you visuals, to where you They’re making it too complicated. (P)
can actually understand it, and where you
can actually try it for yourself. (R1, E2)
If you remember it, you got it. (P) They’re throwing off what is being asked,
so that you can get confused; it’s just like
with assessments, that’s what they want.
(P)
I did understand a little bit of each, They’re trying to trick ya, just like with
because they’re the skills I learned in the assessments, they want you to get the right
past. (E3) answer; however, they’re just trying to
trick you to see if you know it. (P)
I knew immediately what it was saying, They’re trying to make it more
because I’m in 8th grade, I know my complicated so that you can try figuring
geometry. (E3) out… like, oh, wait a minute, that’s wrong,
gotta go over here. (P)
Right here, I understood, because we were Even though they look like visuals, that’s
talking about the Pythagorean theorem a the trap. (P)
ton during that time, so I understand it
more. (E3, E4)
They’re trying to make it look like it’s
easy, but when you try to do it, bam! It’s
wrong… (P)
continued
197
Table 30 continued
They’re throwing it off the question, they
have little pictures to try and trap ya, but
however, they’re doing it in what I like to
call high school and college terms, to
where they’re trying to make it all
complicated to you, and when you
understand that it's really complicated,
then that's where you need to dodge out.
(R1-, R4-, P)
More complicated terms than what I don’t
understand. (R4-)
This one, we didn’t talk about that much.
(E2)
Because we didn’t talk about it that much,
it was a little bit more complicated to
figure out. (E2, P)
Problem E
I can actually understand ratios a lot. (E2, It doesn’t give you any numbers, doesn’t
R3) give you any visuals. (R1, R3, E2)
This is with a visual, to where I could You’re talking about too many doubles,
actually imagine it. (E3, R1) and you’re just trying to confuse me. (P)
You can actually do the math, and you can There’s nothing concrete about it to where
actually figure it out. (P) you can actually figure it out. (E2, P)
If it’s with two different numbers, just like It doesn’t give you any numbers… 2n? I
on how you did, then it’s easier to figure mean, that doesn’t really work for me… It
out. (E2, R3) ain’t gonna really work if it’s just with
variables. (R4-, R3, E2)
Additional Comments
It’s talking about ratios, to where you can
figure it out. (E2)
I’m more good with trying to figure it out
with a calculator and with numbers. (E2,
R1, L5)
When it’s with ratios… I’m real good with
that. (E2, R3)
198
As shown in Table 29, Blake considered the perceptual arguments most
convincing in Problems B, C and D, but not convincing in the other two problems.
Algebraic arguments were generally not convincing to him, but it was ranked higher in
Problem A. The inductive arguments were considered the most convincing in Problem A,
and second most convincing in the other problems. The visual arguments were considered
the most convincing in Problem E, but not convincing in any other problem. In order to
better understand how Blake evaluated the proposed arguments and his rationale when
providing these rankings, the coding for his explanations in Table 30 was summarized in
Table 31 so to identify factors and features of the arguments that had influenced his
judgment.

Positive 8 3 7 0
Negative 4 1 1 8

Positive 0 19 5 1 0 0

Table 31. Categories of comments made by Blake

199
indicating that the representation and evidence had a larger impact on Blake’s judgment.
The 3 occasions Blake mentioned the link between evidence and conclusion were about
the perceptual connection in B2, and the use of calculators, which was classified as ritual
operations.
Blake found arguments that were based on examples (i.e. results from an
immediate test) convincing. This was mentioned 19 times during the interview. In
addition, the arguments referencing imaginary were also considered convincing. Besides,
he considered the argument built on the Pythagoras Theorem convincing, which was
classified as a known mathematical fact.
Blake mentioned 8 times that visual illustrations contributed to his conviction, but
there were also 4 times when he indicated the graphs made the arguments less convincing.
When he found visual aid was helpful, he stated that “it gives you a kind of convincing
visualization; it kind of gives it easy.” In other cases where he found the images less
helpful, he suggested they were unclear: “it’s like blurry and everything, and it’s like, oh
God, it’s too much.” This was similar to his comments on numerical representations,
where in 7 cases they made positive impact on his conviction (e.g. he commented that “if
it’s with two different numbers, just like on how you did, then it’s easier to figure out”),
and in 1 cases they made negative impact (where he expressed a dislike towards the use
of decimals). Nevertheless, numerical and visual representations, if not complicated, were
preferred types to Blake. He expressed this opinion explicitly when he was criticizing E3
(perceptual), suggesting that “it doesn’t give you any numbers, doesn’t give you any
200
visuals” and hence it was not convincing to him. His attitude toward narrative arguments
was compatible. If the description used easier language and was not long, he considered it
helpful explanations. On the other hand, he suggested the more complex description
looked like word problems, which he disliked. Last and perhaps the most evident was
Blake’s negative attitude toward symbolic representations, which was detected across the
problems (8 times in total). He called symbolic expressions to be too “math-like” and
appropriate only for “high school or college” students, labeling them as confusing. He
claimed that “it ain’t gonna really work if it’s just with variables.” Blake didn’t exhibit
understanding of the meaning of symbolic arguments. The appearance of these arguments
seemed to keep him from even trying to understand their content.
Blake demonstrated a strong personal standard of what a convincing argument
meant to him. Among the 43 comments classified as “P,” personal standards, 29 were
about how the simplicity (or complexity) made an argument convincing (or not
convincing) to him (e.g. “it’s doing it a lot more simple” and “they’re making it too
complicated”), and 12 were about his need to figure out the problem by himself instead of
being told what to do (e.g. “it’s trying to tell you how to think… and that wouldn’t really
work” and “I’ve known some people where they actually want to think on how to
actually figure it out, not just like, OK, here’s ya how to do it, think this way”). When
making his evaluations, Blake often imagined the scenarios in which he was being taught
the arguments in a mathematics class and expressed his feelings in such situations. He
claimed that some arguments were trying to “trick ya,” “confuse ya,” and “trap ya.” This
manifests the type of frustration some students experience when they face mathematical
problems that may be difficult for them to do. At the same time, it also reveals their needs
201
in these situations. To Blake, whether an argument was convincing didn’t depend on how
complete the argument was, instead, he liked the argument to help him access the
problem so that he could think for himself. Therefore, the argument didn’t need to be
logically correct or even mathematically complete, instead it should explain the problem,
illustrate a few simple examples, or create a context for him to better understand the task
first and then to proceed with solving it. Based on the findings above, Figure 27 was
created to illustrate Blake’s rationale for evaluating mathematical arguments.
Perceptual Examples,
Ritual Imaginaries
Convincing
arguments (Visual, Numerical,
Easy to understand,
Non-procedural Narrative)
Figure 27. Illustration of Blake’s rationale for evaluating mathematical arguments
Figure 27 helps explain the rankings provided by Blake in Table 29. B2
(perceptual) and C4 (perceptual) created scenarios that he could relate to the problem so
that he could think for himself. D3 (perceptual) used easy language to offer an
explanation of the phenomenon described by the conjecture. A1 (inductive) provided a
few examples of the interested numbers. E2 provided a picture to show the objects
studied in the problem. All of these arguments offered him a starting point for working on
the problem even though they didn’t provide details about what exact steps he should
202
take. Therefore, they were considered the most convincing options. On the contrary, D4
(visual) utilized the coordinate plane; E4 (algebraic) involved ritual operation of
symbolic equations; A4 (visual) provided a “blurry” picture; while B4 (visual) and C2
(algebraic) adopted symbolic language to explain integrated geometric structures. All of
these arguments required substantial background knowledge to understand, which
confused Blake. Therefore, they were considered the least convincing options.
The case of Brenda
Brenda was an 8th grade student enrolled in an Algebra I class at the time of data
collection. When working on the SMR, she had selected A4 (visual), B1 (inductive), C3
(visual) and D1 (inductive) in respective problems as the most appealing option and
hence she was considered to be a representative from the inconsistent group.
Brenda’s interview responses are summarized in Table 32 and Table 33. Table 32
represents the order of problems that she tackled. Table 33 summarizes her comments
203
Problem B B2 (perceptual) B1 (inductive) B4 (visual) B3 (algebraic)
Problem D D1 (inductive) D2 (algebraic) D3 (perceptual) D4 (visual)
Problem C C1 (inductive) C3 (visual) C2 (algebraic) C4 (perceptual)
Problem A A4 (visual) A1 (inductive) A3 (perceptual) A2 (algebraic)
Problem E E2 (visual) E3 (perceptual) E1 (inductive) E4 (algebraic)
Table 32. Rankings of arguments provided by Brenda
204
Problem B
I can imagine a football field, so like, it’s If I did it that way, it would take longer to
easier just to think that way. (E3, L2) figure out how to do it, other than just like,
thinking about how to do it, so you would
actually have to measure it to realize how
farther it is. (E2, P)
I know that it’s longer. (E6) I don’t understand them as well, that’s
why I don’t even like them. (NA)
I don’t like the Pythagorean theorem. (E4-,
R4-, P)
When they add circles to the thing, it kinda
confuses me, so I just don’t like ’em. (E2-,
R1-)
Problem D
It’s easier for me if it has a number in it, to It’s a little bit harder to figure out with the
be able to know, like, it’s easier just to x in it, so you have to figure out on both
figure out how to do it that way. (E2, R3) sides of the equals sign. (R4-)
It’s just the one side and you get the There’s two sides of the equals sign in this
answer. (L5) one, so you would have to figure out both
sides, and then you could get the answer.
(L5-, P)
It’s already on the one side, so you just This one I don’t think has enough
figure out the one side and get the answer, information for me to understand it really
and it’s a lot faster and easier. (L5, P) as well. (NA)
Because he also said, like, it’s “such as I don’t really like graphs so I don’t
200,” you didn’t say that you didn’t try understand… I just don’t get ’em that well.
300, so he could’ve tried it but just didn’t (R1-)
say that he did and it could’ve still worked.
(E2, L3)
Problem C
If you take any kind of triangle and you try I don’t understand it as much. (NA)
and fit it into, like, the first one, it’s always
gonna be smaller than the… the first
triangle’s always gonna be bigger than the
second triangle because of how the sides
are. (E2, R1, L4)
continued
Table 33. Summary of comments made by Brenda
205
Table 33 continued
I understand what they’re saying by It says A is greater than a ... I just got
cutting the lines and making it into a really confused about that part. (R4-)
shorter one, and I can tell by that that there
is, that it is smaller than that. (E2, R1, L4)
With the first one [argument], it shows a The last one [C4] I don’t think gave
lot more, ’cuz there’s different triangles enough information for me to understand
there that are, that have different sizes. what it meant by it. (L2-)
(E2, L3)
I can tell that it’s right because I know that
it’s bh divided by two, because we already
know that that’s how you find the area and
stuff, and then with, it would be lowercase
with the Triangle 2 and it would be
uppercase with the Triangle 1, so it’d have
to be bigger. (E4, R1)
Problem A
That one has a visual effect with it, so it That one kinda confused me on it, because
makes more sense that it you split up the 6, I didn’t know what was going on in the
it comes into threes, and I can understand problem. (NA)
that way. (E2, R1)
That would probably be another way I I don’t like to think of it that way, I just
would do it, so I would understand it that don’t get it that way, so I don’t think of it
way, ’cuz you can divide any of those by that way. (P)
3, and get a multiple of 3 that way. (E2,
R3)
I like visual things better than just thinking I don’t get how you do the 6n equals 3
in my head about it, so that one makes times 2n. (R4-)
more sense to me. (E2, R1)
I understand sales tax more than most The way that they did it, like they added
things, I get that better than the other the circle to it, and it didn’t make as much
things. (E2, R3) sense. (E2, R1-)
It, like, is straightforward, and it tells me
what it is and stuff, it’s a lot easier to
understand. (P)
Problem E
It gives me a visual effect of how it It just tells you how it wouldn’t change.
wouldn’t change, because it would still be (E6-)
double the amount of it, which wouldn’t
do anything to it. (E2, R1, P)
continued
206
Table 33 continued
It made sense ’cuz they explained what It doesn’t give a picture, it’s just an
happened with the orange and with the explanation. (E2, R1, R2-)
white, and that it would stay the same no
matter what, ’cuz the ratio would never
change of how much would be in there.
(E4, R2)
It gives an actual effect of how it wouldn’t I wouldn’t go that way with that, it’s just
change. (E2, R1) not how I would do that. (P)
It has a picture of how it wouldn’t change I understand it now but I don’t really like
and it gives an explanation with it. (E2, it. (P)
R1, R2)
Additional Comments
That one [E3] did because it said that it That one [C4] I don’t think gave as much
was exactly the same by just explaining information as what I needed to figure out
how it is. (R2) that it was. (E3-, L2-)
With the sales tax, I got that better because I don’t understand probability as well, so
I knew that, with that, it’s easier with when they threw in the numbers, I was
numbers. (E2, R3) kind of confused with it. (R3-)
It’s easier to figure out that whatever 6 is,
you can just divide by 3 and it’s an, it’s a
normal number that is 3. (E2, R3)
It [E2] does show you how and it explains
how to. (E2, R1, R2)
They explained that the ratio between the
ping pong balls would still be the same no
matter if it was doubled or whatever
number. (E2, R2)
As shown in Table 32, Brenda didn’t generally consider algebraic arguments
convincing. She rated them the least convincing in Problems A, B, and E, second least
convincing in Problem C, and second convincing in Problem D. Inductive arguments
were considered either the most or second most convincing to her, with the exception of
Problem E, where it was considered the second least convincing. Her evaluation of visual
and perceptual arguments was quite inconsistent across the problems. They appeared at
207
every place (most --> least convincing) in her rankings. In order to better understand how
Brenda evaluated the proposed arguments and her rationale when providing these
rankings, the coding for her explanations in Table 33 was summarized in Table 34 so to
identify factors and features of the arguments that had influenced her judgment.

Positive 8 7 5 0
Negative 3 1 1 4

Positive 0 19 1 2 0 1

Table 34. Categories of comments made by Brenda
As shown in Table 34, the total number of Brenda’s comments that focused on the
representation, evidence and link of the arguments were 29, 27, and 10, respectively.
Visual representation seemed to contribute to her conviction. This view was
conveyed 8 times. For example, she considered A4 (visual) convincing, suggesting it had
208
“a visual effect with it, so it makes more sense.” However, some visual illustration didn’t
make the argument convincing to her. She claimed to be confused by the geometric shape
in B4 (visual) and the graph in D4 (visual). Narrative description and numerical
representation could contribute to her conviction. She suggested that an explanation of
the meaning of pictures or graphs could make an argument more convincing to her. She
also believed that an argument was easier to understand “with numbers.” However, she
found numerical and narrative representations confusing to her in some cases. She
claimed that narrative arguments, with the absence of numbers or pictures, might not
offer enough information. She also found dealing with numbers in certain context (e.g.
probability) confusing. This explained her inconsistent judgment of the perceptual, visual,
and inductive arguments across the problems. Overall, it was found that visual, numerical
and narrative represented arguments could all contribute to her conviction if they were
understandable to her.
The symbolic representation had a more consistent negative influence on Brenda’s
conviction. In all 4 instances where she mentioned symbolic format, she characterized it
as confusing and not understandable. For example, she claimed not to understand why
“6n equals 3 times 2n.” Considering that this equation involves only the simplest
symbolic expressions, we believe she hadn’t yet developed adequate facility with algebra
to use it in problem solving. Therefore, it was not surprising that all the algebraic
arguments were considered as not convincing.
Brenda also paid attention to the evidence presented in the arguments and this was
evidenced 27 times during the interview. She often found arguments that relied on
checking a few cases (e.g. numbers or shapes) convincing. This was detected 19 times.
209
She suggested that she didn’t like argument using the Pythagoras Theorem although she
had learnt it in class; however she found the triangle area formula convincing. This
suggested that even two seemingly similar types of evidence sources could be assessed
quite differently by her.
It was also found that Brenda could be convinced by a variety of links between
evidence and conclusion. She found trying a few cases in Problem D adequate in showing
the conjecture was always true. She found the transformation model used in C3
convincing. She also found the perceptual connection between the football field scenario
and the property of rectangles in B2 convincing. However, she wasn’t able to make the
perceptual connection between the triangle made by wires and its geometric properties.
She didn’t explicitly comment on the link of any argument and it was not detected in the
interview that she was more likely to be convinced by any other type of link. Therefore, it
was believed that she wasn’t yet able to reflect on the logic of mathematical arguments.
Lastly, Brenda’s personal standards were studied. First, she found simple
arguments more convincing. The preference for “easier” arguments was mentioned 7
times during the interview. Second, it was found her appreciation of certain
concepts/topics had also impacted her evaluations. For example, she claimed to not “like
the Pythagorean theorem” and hence considered B3 (algebraic) not convincing. She
didn’t consider E3 convincing since she “wouldn’t go that way with that, it’s just not how
I would do that.” Based on the findings above, Figure 28 was created to illustrate Blake’s
rationale for evaluating mathematical arguments. Such personal preference was highly
context based and could explain her inconsistent view of the same type of argument
across the problems.

210
Ritual, Inductive
Perceptual
Transformational Examples,
Convincing
arguments (Visual, Numerical,
Easy to understand, Narrative)
Simple procedure
Figure 28. Illustration of Brenda’s rationale for evaluating mathematical arguments
Discussion
The investigation of each individual subject’s interview responses offered insights
into their own rationale for evaluating the mathematical arguments. The following
discussion focuses on the thinking pattern exhibited by the subjects as a whole group
during the interviews. In particular, we examined if any argument was considered
significantly more convincing than others in each problem context. We then studied if
any factor had a larger impact on the subjects’ judgment. In addition, we studied the
similarities and differenced among the individual subject. Lastly, we studied the context’s
potential impact on the subjects’ decision.
Most convincing arguments to the subjects
We first examined if students had found any argument significantly more
convincing that others in each problem context. In order to do so, we first assigned values
to each argument based on the rankings provided by the subjects. In particular, if an
argument was ranked as the most, second most, second least and least convincing
211
argument, it received a score of 1, 2, 3 and 4, respectively. Therefore, the rating
represented the position of the argument in the ranking. The lower the rating, the more
convincing an argument was perceived. We then calculated the average rating provided
by all subjects for each argument. Table 35 illustrates the result.
Allen Abby Alice Amy Beth Betty Blake Brenda Average

A1 4 3 2 4 4 4 1 2 3
A2 2 2 1 1 2 1 2 4 1.875
A3 3 4 4 3 1 2 3 3 2.875
A4 1 1 3 2 3 3 4 1 2.25
B1 4 2 2 1 4 3 2 2 2.5
B2 3 1 4 4 1 4 1 1 2.375
B3 1 4 3 2 3 1 3 4 2.625
B4 2 3 1 3 2 2 4 3 2.5
C1 4 1 3 2 3 3 2 1 2.375
C2 1 2 2 4 2 1 4 3 2.375
C3 3 4 1 1 4 2 3 2 2.5
C4 2 3 4 3 1 4 1 4 2.75
D1 2 1 3 4 2 1 2 1 2
D2 3 3 4 1 3 2 3 2 2.625
D3 4 2 1 3 4 4 1 3 2.75
D4 1 4 2 2 1 3 4 4 2.625
E1 3 1 1 4 1 1 2 3 2
E2 1 4 3 3 3 3 1 1 2.375
E3 4 3 2 2 4 4 3 2 3
E4 2 2 4 1 2 2 4 4 2.625
Table 35. Summary of the subjects’ argument rankings
According to the average rating, A2 (algebraic) and A1 (visual) were considered
as the most and least convincing arguments in Problem A, respectively. B2 (perceptual)
and B3 (algebraic) were rated as the most and least convincing arguments in Problem B,
respectively. C1 and C2 (inductive and algebraic, tie) were the most convincing
212
arguments in Problem C where C4 (perceptual) was the least convincing. In Problem D,
D2 and D4 (algebraic and visual, tie) were the least convincing arguments, while D1
(inductive) was considered the most convincing option. Lastly, in Problem E, E1
(inductive) and E4 (algebraic) were considered the most and least convincing arguments,
respectively. These results suggested that the subjects’ evaluation of the same type of
arguments were highly inconsistent across the problems. The same type of argument
could be considered as the most convincing option in one problem but the least
convincing one in another (e.g. A2 (algebraic) was rated most convincing in Problem A
but B3 (algebraic) was rated the least convincing in Problem B). Therefore, it was
difficult to tell whether there was any particular type of arguments that the subjects found
more convincing than others. This finding was compatible with what was detected in the
survey analysis.
We further tested the differences of ratings among the arguments in each problem.
Adopting the within-subject ANOVA (using the arguments in each problem as the levels),
it was found that no argument was considered significantly (p < .05) more convincing
than any other option in every problem (see Appendix C, Table 46). Therefore, based on
the subjects’ rankings, there wasn’t any single argument that stood out in any of the
problems as the most convincing option. This result again demonstrated diversity in the
subjects’ assessment of the arguments.
213
Factors that impacted the subjects’ decision
Analysis of the subjects’ rankings didn’t provide conclusion about what
arguments they considered more convincing. Therefore, by merely looking at their
choices, it was difficult to identify what factors might have impacted the subjects’
decision. Therefore, an analysis of the subjects’ explanations when justifying their
rankings became crucial to this investigation. To summarize the characteristics of the
subjects’ explanations for their rankings , we calculated the total number of comments
about each type of representation, evidence and link of arguments. We also calculated the
percentage of each number in its own category. For example, there were 60 comments
that mentioned visual representation positively contributed to their conviction of an
argument. These comments were 31% of all the comments that referenced to the
representation of arguments (a total of 194). Table 36 illustrates the results.
214
Positive 60 (31%) 18 (31%) 33 (17%) 37 (19%)
Negative 20 (10%) 9 (10%) 3 (2%) 14 (7%)

Positive 3 (1%) 140 (51%) 34 (13%) 42 (15%) 0 3 (1%)
Negative 0 17 (6%) 9 (3%) 2 (1%) 1 (0.4%) 20 (7%)

Positive 0 14 (17%) 14 (17%) 13 (16%) 7 (9%) 5 (6%)
Negative 0 7 (9%) 19 (23%) 1 (1%) 1 (1%) 0
Table 36. Categories of comments made by all subjects
representation, evidence and link of the arguments were 194, 272, and 81, respectively.
The data suggested that opinion (i.e. personal conviction without an explicit
reason) was not considered a reliable source of evidence by the subjects. Although there
were 3 instances when subjects’ decision was made upon a personal conviction, it was
much more frequent when an opinion was indicated unreliable (20 times in total). This
suggested that most subjects were aware of the need to provide evidence other than
personal opinion to support a mathematical argument. When examining the impact of the
types of evidences on the subjects’ decision, it was found that examples (i.e. results from
215
an immediate test) were used most often to support an argument (140 times in total, more
than half of all evidence referenced). At the same time, it was also the second most
criticized source of evidence (only second to “opinion”). Criticism of the use of only
examples focused mostly on their logical limitation, i.e. their inability to show the
conjecture was always true, which was acknowledged by some subjects. The presence of
facts (i.e. known mathematical results) were the second most referenced types of
evidence (a total of 44: 42 positive and 2 negative). However they were mentioned much
less frequently than examples. Note that it was rare (only 2 instances) that a mathematical
fact was considered unreliable source of evidence. This suggested that once the subjects
recognized a known mathematical result, they were likely to consider it reliable.
Furthermore, imaginaries created upon past experience were referenced as reliable source
of evidence for 34 times. This number was close to that of “facts.” However there were
also 9 times when imaginaries were indicated to contribute negatively to the subjects’
conviction about an argument, suggesting that it was not considered reliable source of
evidence in some subjects’ view (e.g. Amy and Betty). Lastly, it was uncommon that a
subject referenced authority (3 instances, all positive) or assumption (1 instance, negative)
as evidence of arguments.
The most influential type of representation was visual, which was considered to
have positively contributed to the subjects’ conviction in 60 occasions. However, it was
also criticized 20 times. Therefore, although visual illustration often positively
contributed to the subjects’ conviction, it might also be considered misleading (e.g. for
Amy), confusing (e.g. for Blake), or not explanatory (e.g. for Betty). In addition,
numerical and symbolic representations were each considered to positively contributed to

216
the subjects conviction in 33 and 37 instances, respectively. However, symbolic
representations were criticized more frequently than numerical representation (14 times
vs. 3 times). This suggested that ideas expressed symbolically were not found as
convincing by some subjects (e.g. Blake and Brenda). Narrative arguments were
referenced 27 times (positive for 18 times, and negative for 9 times). The narratives had
contributed to the subjects’ conviction especially when they were used to explain a
picture or an equation (e.g. Betty). However, some subjects found narrative expressions
not clear or convincing when they were not supplemented by visual, numerical, or
symbolic expressed evidence. In sum, the visual, algebraic, and narrative representation
contributed either positively or negatively to the subjects’ conviction, while ideas
represented in numerical format seemed to have had a positive impact.
When commenting on the link between evidence and conclusion of an argument,
the subjects referenced induction most frequently (33 times in total). Although in 14
instances the subjects expressed that illustration of examples contributed positively to
their conviction, in 19 cases suggestions were made that showing a few examples
couldn’t show the conjecture was always true. This result indicated that the use of
induction was still popular, however some students had developed an awareness of its
limitation in logic. Perceptual connection and transformation contributed to the subjects’
conviction in as many case as induction did (14 and 13 times, respectively). However,
there was only one case where transformation was considered as not convincing while
perceptual connection was criticized 7 times. Perceptual connection is usually utilized to
connect personal experience to a mathematical problem, while transformation involves
more analysis of certain examples and pattern seeking. The later was uniformly
217
recognized as convincing link of evidence and conclusion in an argument. Furthermore,
ritual operation and deductive reasoning (mostly referenced by Amy) was considered as
reliable link when they were mentioned.
Figure 29 was generated based on the numbers in Table 36. A larger font denotes
that the item was more frequently referenced by the subjects. As illustrated, when
evaluating the arguments, the subjects paid the most attention to the evidence. Among all
types of evidence, examples were referenced the most frequently, followed by
imaginaries and mathematical facts. The representation of arguments had also impacted
the subjects’ judgment. Among all types of representations, visual illustration received
the most attention, however it was criticized by some students as well. Similar situation
applied to the view of algebraic representation, where between subject differences were
observed. The link between evidence and conclusion was the least concerned aspect
among the three. Induction was referenced most frequently but it might contribute either
positively or negatively to the subjects’ conviction depending on the context and
individual. Transformation, although not referenced for as many times, seemed to be
uniformly recognized as reliable reasoning mode.
218
Visual Numerical
Perceptual Inductive Ritual
Symbolic Narrative Transformational Deductive
Link
Representation
Imaginaries
Examples
Evidence
Facts Subjects’
conviction
Figure 29. Factors that impacted the subjects’ conviction
Similarities and differences among the subjects
In the previous discussion we revealed some general pattern about the subjects’
rationale in evaluating the arguments. However, it was unclear if such pattern applied to
every individual or only to some of the subjects. More importantly, the individual
differences were repeatedly described using their choices in the survey or the rankings
provided in the interview. It was unclear what factors might have caused the differences
in their ratings. The analysis of each individual subject’s interview responses had
219
provided the bases for the investigation of the similarities and differences of their
rationale. The following discussion offers a cross comparison among the subjects.
Table 37 provides an illustrative blueprint of the subjects’ rationale when
evaluating arguments. This table was generated based on the study of each individual’s
interview responses. As shown in the table, there were similarities as well as differences
among the subjects.
Personal Need Evidence Representation Link

Allen Simple procedure, Examples, Visual, Symbolic Transformational,
Precise description Facts, Perceptual, Ritual
Abby Easy to understand, Examples, Visual, Numerical Perceptual,
Familiar procedure Imaginaries Inductive
Alice Easy to understand, Examples Visual Transformational
Familiar procedure
Amy True for all cases Examples, Symbolic, Deductive,
Facts Numerical Transformational
Beth Easy to understand, Examples, Visual, Narrative Perceptual
Relatable scenario Imaginaries
Betty Detailed procedure Examples, Symbolic, Ritual, Perceptual,
Facts Numerical Transformational
Blake Easy to understand, Examples, Visual, Numerical, Perceptual, Ritual
Non-procedural Imaginaries Narrative
Brenda Easy to understand, Examples Visual, Narrative, Inductive, Ritual,
Simple procedure Numerical Transformational
Table 37. Summary of the subjects’ rationale in argument evaluation
220
View of evidence
The most prominent similarity among the subjects was that they all considered
examples as a reliable source of evidence. Testing a few cases and seeing if the
conjecture was true in specific conditions had contributed to the subjects’ conviction.
This was observed in the comments from all subjects on at least a few (if not all)
arguments.
As a contrast, students’ view of the use of mathematical facts was less consistent.
Allen, Amy and Betty indicated that they were likely to be convinced if an argument was
based on a known mathematical fact. On the contrary, Blake seemed unwilling to use any
established result and preferred exploring the problem by himself. The other four subjects
might acknowledge that some known results (e.g. the triangle are formula) helped
convince them an argument was true; however they might not consider such results as
established mathematical fact but rather as something they had heard about.
In addition, the subjects’ view of imaginaries also differed. To Abby, Beth and
Blake, imaginaries were major source of evidence, while in Amy’s view, people’s brain
can “skew everything” so imaginaries were definitely unreliable. To Allen, it depended
on whether the imaginary was adequately clear to him. Overall, the use of examples
seemed to be uniformly contribute to the subjects’ conviction, while each individual’s
view on the use of other sources, such as known mathematical facts and their own
imaginaries, differed.
View of representation
Visual representation were referenced most frequently in the interviews; however
it was not considered by all subjects as one to have positively contribute to their
221
conviction of an argument. Amy and Betty clearly expressed that they were unlikely to be
convinced by visual arguments. Amy claimed that it was possible that pictures and
figures misrepresented the problem. Betty didn’t tend to perceive connection through
examining the visual demonstrations. She needed to see a narrative explanation of what
the pictures meant when there was a visual illustration. Nevertheless, visual illustrations
still seemed to be the most preferred type of presentation. Six subjects explicitly stated
that visual aids could contribute to their conviction, especially when the image was
simple and understandable to them.
Numerical representation seemed to be favored by the subjects. Although its
function was not as commonly mentioned as visual illustration, we still identified at least
5 subjects who considered the numerically based illustration to positively contribute to
their conviction. To the remaining three subjects, i.e. Allen, Alice and Beth, it didn’t seem
that the use of numerical expressions made an argument less convincing to them. Its
function was just rarely articulated in their explanations. Therefore, the subjects seemed
to share some similarities on their view toward numerical expressions.
Narrative representation was the least frequently referenced type of representation
(positively or negatively) although non-mathematical language were used by every
subject when explaining their understanding of each argument. However, some subjects
demonstrated a higher need for narrative explanation than others. For example, Betty
suggested that visual illustration was not convincing unless it was also accompanied by
an explanation. In contrast, Allen preferred to read equations and examine graphs and
didn’t consider an argument convincing if it was too “wordy” and not “straightforward.”
The major advantage of narrative representation was the easy language, which helped the
222
subjects to understand an argument if adopted properly. However it might be difficult to
use narrative to describe some concepts or examples as precisely as using numeric, visual
or symbolic representations. Consequently, the subjects’ evaluation of narrative
descriptions highly depended on whether they understood the concepts embedded in
narratives without seeing any specific numbers, images or symbols, or whether they
understood the numbers, images, or symbols without a narrative description.
Compared to the other three types of representations, the subjects showed greatest
differences in their views about symbolically expressed arguments. To Amy, symbolic
representation could show the conjecture was true in every possible case. To Allen,
symbolic representation demonstrated the ideas clearly and concisely. To Betty, symbolic
representation helped her see the details of the argument procedure. Therefore, these
three subjects found symbolic representation positively contributed to their conviction.
On the contrary, Blake considered symbolic represented terms as confusing and not
appropriate for his age group. Brenda also found symbolically represented theorems not
appealing. Therefore, most arguments in symbolic representation were considered
unconvincing to them. The symbolic representation didn’t seem to have either positively
or negatively contributed to the other three subjects’ evaluation. They didn’t seem to
recognize the advantage of symbolic representation, nor did they find it confusing. This
finding was not surprising since symbolic expressions were usually more abstract than
ideas represented in the other three forms. Students who understood the ideas of symbolic
expressions might appreciate how clear and concise they were. More mathematically
mature learners might even see the general validity represented by symbolic arguments.
223
However, to those who hadn’t yet adapted to symbolic representations, they only looked
unnatural and difficult and hence were not found convincing.
View of link
The link between evidence and conclusion seemed to be the least influential
aspect on the subjects’ decision. This was natural since the subjects may not start
examining the link if they didn’t find the representation a reliable format or consider the
evidence convincing. Therefore, the link appeared to be the last thing among the three to
be considered.
Only Amy insisted that the evidence used in a convincing argument must show
the conjecture was always true and found symbolic deduction the most reliable way to
guarantee this. This condition wasn’t a requirement for a convincing argument in other
subjects’ view.
Several subjects (Alice, Amy, Beth, and Betty) had articulated that showing a few
examples might help them understand an argument but were not sufficient to convince
them that a conjecture was true. This suggested that some students were aware of
limitation of induction. Although they were not yet able to appreciate deductive reasoning,
they had developed the ability to understand generic examples. For example, Alice could
visualize that some geometric property would remain the same when the shape was
changing in a certain way. Allen could see a formula in a numeric equation since a value
in the equation could be substituted by others without changing the result. Overall, it was
detected that transformational reasoning was widely considered to positively contributed
to the subjects’ conviction. It was observed in 5 students’ explanations (except for Abby,
Beth and Blake).

224
Perceptual connection was also appreciated by many subjects (including Allen,
Abby, Beth, Betty and Blake). Perceptual connection relates a given mathematical
problem to imaginaries created upon previous experience, and in many cases such a
connection was not precisely described but was perceived by the subjects. Only Amy
pointed out such connection might not be a reliable way to build an argument. Other
students might not have been able to perceive some connections between a mathematical
problem and a real life scenario; however they might not have realized that arguing by
making such connection was a unreliable method.
Lastly, it seemed that all the subjects believed ritual operations, numerical or
symbolic, were convincing link between evidence and conclusion of an argument,
although they were not very frequently mentioned.
Personal standards
Personal standards played an important role in the subjects’ decision making
(Recio, & Godino, 2001). Having a personal standard of what a convincing argument
meant caused the distinct evaluations of the arguments.
Amy seemed to be the only person who believed a convincing argument should
be one that proved the conjecture was always true. To the other subjects, this wasn’t a
principle that guided their decision. Instead, to many of them (Abby, Alice, Beth, Blake,
and Brenda), whether an argument was easy to understand determined, largely, its
credibility. These subjects’ standards for easy argument were not mutually exclusive.
Most prominently was that none of the five subjects considered algebraic arguments as
easy to understand. However, they used different standards to determine whether an
argument was easy to understand. Blake found an argument easy to understand if it used
225
easy language, easy examples, and easy visual illustrations. Brenda was able to
appreciate more complex examples and visual illustrations; however she preferred an
argument that didn’t involve a complex procedure (e.g. multiple steps). Beth considered
an argument easy to understand if the argument was built upon a life scenario to which
she could relate. Abby and Alice found an argument easy to understand if the concepts
used in the argument and its reasoning procedure were familiar to them. While Beth
preferred a context rooted in her life experience, Abby and Alice also considered
something familiar to what they learnt in mathematics class as convincing.
Allen and Betty were the only two subjects who didn’t claim that a convincing
argument needed to be easy to understand. Note that Allen did prefer arguments that
involved simple procedures. This was close to Brenda’s opinion. However, Allen’s
preference toward simple procedures was not because they were easier to understand. He
claimed that he didn’t have much difficulty understanding all the arguments used in the
interview. Despite this, he still preferred “straightforward” arguments since they were
concise and delivered clearer opinion.
Similar to Allen, Betty also demonstrated an understanding of a wide range of
arguments. However, different from Allen, Betty paid more attention to the details of
arguments. She found an argument that left too much space for the readers to decipher
not convincing. For example, she didn’t consider visual illustrations alone were
convincing. She believed they must be accompanied with descriptions that explained the
information embedded in the images and graphs. Consequently, Betty considered
arguments that clearly described the reasoning procedure convincing.
226
Summary
As shown in the discussion above, there were similarities as well as differences
among the subjects’ rationale in argument evaluation (see Table 38). In general, the
subjects found examples convincing in most cases. However their view towards the use
of existing mathematical results and their own imaginaries differed. In addition, the
subjects found numerical and narrative arguments easier to understand than symbolic
ones. Visual argument could be helpful or confusing depending on the actual images or
diagrams provided. Most students didn’t realize that symbolic representation had the
potential to prove the general validity of a conjecture. However, some students found
symbolic expressions concise and clear while other viewed them as confusing. With the
exception of one participant, the subjects were not aware that the link between evidence
and conclusion must show the argument was always true. Transformational and
perceptual reasoning was widely adopted. However, the subjects’ view toward induction
differed. Half of the subjects seemed to have realized its limitation. Lastly, the subjects
demonstrated various personal standards of what a convincing argument meant to them.
Five subjects found easier-to-understand arguments more convincing; however they also
used different standards to judge the “easiness.” For example some subjects found
arguments embedded in a familiar context easier to understand and hence perceived them
as more convincing. Three didn’t take “ easiness” into consideration but paid attention to
different aspects (logic, expressions, and reasoning procedure) of the arguments. These
differences among the individuals’ rationale caused the distinct evaluations of the
arguments as shown in their rankings.
227
Similarities Differences
Evidence  Examples were convincing  Imaginaries and known
source of evidence. mathematical results might or
 Authority, assumption and might not be viewed as reliable
personal opinion were rarely source of evidence.
considered convincing.
Representation  Numerical and narrative  Visual illustration could be
arguments were usually easier sufficient or not sufficient to
to understand. demonstrate the validity of a
 Seeing a few numbers in an conjecture.
argument was helpful in most  Narrative descriptions could be
cases. necessary or unnecessary.
 Visual illustration was helpful  Symbolic expression could be
if the provided image was concise and clear or confusing
understandable. and meaningless.
 Most subjects were not aware
of the logical advantage of
symbolic representation.
Link  Deduction was rarely used or  Induction could be viewed as
considered necessary. convincing, convincing in some
 Transformation and perceptual situations, or unconvincing.
connection was widely
adopted.
 Ritual operation was rarely
considered unconvincing.
Continued
Table 38. Similarities and differences in the subjects’ rationale of argument evaluation
228
Table 38 continued
Personal  Most subjects didn’t focus on  Whether an argument was easy
standards whether an argument could to understand was taken into
prove the conjecture was consideration by some but not
always true. all the subjects.
 Some subjects found arguments
embedded in a familiar context
or use familiar reasoning
techniques more convincing.
 The subjects had different
demand for the clarity of
arguments.
 Other various personal
opinions.
The impact of context on the subjects’ judgment
Although each of the subjects had exhibited some general standards in assessment
of the arguments, he/she still provided different evaluations for the same type of
arguments in different contexts. None of the participants chose the same type of
arguments as the most convincing option in more than 3 problems. This section discusses
the possible causes of this phenomenon.
Differences in complexity
The subjects’ responses revealed that whether an argument was easy to understand
had a substantial impact on their conviction. This might explain the difference in their
judgment of the same types of arguments. For visual arguments, E2 was probably the
229
easiest to understand since it was a direct representation of the problem content. A4 and
C3 might be considered more difficult since they required an understanding of the
transformation of the shapes. B4 was even more complex since it involved multiple
geometric components and the reasoning depended on an analysis of their spatial
relationship. D4 was also complex since it required a conceptual understanding of the
coordinate plane to be fully perceived.
For algebraic arguments, A2 might involve the simplest equation which had only
one variable and a one-step operation (i.e. multiplication). D2 also contained only one
variable but involved multiple steps of operations. B3, C2 and E4 all contained two or
more variables and involved multiple steps of reasoning. For inductive and perceptual
argument, the provided examples or evoked imaginary could also be easy or hard to
perceive by the subjects.
Five subjects had clearly pointed out that whether an argument was easy for them
to understand had impacted their evaluation of the argument and they might be confused
by complex images, equations and other component of an argument. Therefore, the
complexity of the same type of arguments was a cause for their different ratings.
Differences in familiarity
The same type of arguments might be evaluated differently because of students’
familiarity with their content. Such a difference was evident in the subjects’ judgment of
the evidence of arguments. For example, Brenda considered B3 (algebraic) not
convincing since she was not familiar with the Pythagoras Theorem. However, she found
C2 (algebraic) more convincing since she was familiar with the triangle area formula.
Similarly, Allen claimed that the imaginary of a triangle made of wire was more clear to
230
him than an imaginary of a football field. Hence he gave C4 (perceptual) a higher ranking
than B2 (perceptual). Abby’s view was just the opposite to Allen’s. She considered B2
convincing since she knew what a football field looked like. She found C4 not
convincing since she “never heard of using wire to make a triangle.” Since it was possible
for any argument, regardless of its type, to provide a context that was familiar or
unfamiliar to students, students evaluation of it could be quite different depending on
their previous classroom and life experiences.
Differences in clarity
Even the same type of arguments could be different in its perceived level of
clarity of the concepts and the reasoning procedure. A typical example was about the
inductive arguments (A1, B1, C1, D1 and E1). B1 was the least clear one since it just
stated that a few examples were tested but didn’t give any information about what
examples were used and what results were observed. A1 and C1 were more clear since
they provided the examples. D1 and E1 were the clearest since they didn’t only provide
examples, but also showed how an example was tested and demonstrated the operations.
These differences impacted Betty and Allen’s judgment as they considered A1, B1 and
C1 as the least convincing but ranked others two higher. Clarity of arguments also had
impacted Blake’s decision. He tended to prefer arguments that didn’t show the specific
steps and left some space for his own thinking.
Differences in function
Even though A2, B3, C2, D2 and E4 were all classified as algebraic, the function
of the symbolic expressions were different in each argument. Indeed all symbols
represented variables. However, in B3 and C2, the symbolic form was the carrier of
231
known mathematical results (i.e. the Pythagoras Theorem and triangle area formula),
which were not present in A2, D2 and E4. Additionally, in D2 and E4 the symbolic
representation provided the environment for algebraic operations, however ritual
operation wasn’t emphasized in A2, B3 and C2. Furthermore, inequalities were
considered in B3 and C2, which required a conceptualization of the variables as an
ordered collection of values. However, inequality was not a focus in A2, D2 and E4.
Different functions of visual illustrations were also observed in the visual
arguments. In A4, D4 and E2, the figures were used to represent the problem content in a
visual format so that the subjects could perceive the relationship within the diagrams and
then transform that understanding into the actual context of each problem. However, in
the geometry problems, the figures themselves were subjects of the study and the
participants didn’t need to relate them to anything else. Additionally, subjects needed to
concern about the quantities of the objects used in the diagrams in A4 and E2; however,
spatial relationship, distance and sizes were the focus of the figures used in B4, C3, and
D4. Therefore, visual illustration was a category that involved highly diverse internal
properties.
The situation was similar to the perceptual arguments. Whereas A3, B2, and C4
tended to prove their respective conjectures by connecting the content to imaginary
contexts that were familiar to students, additional contexts were not provided in D3 and
E3. Instead, D3 and E3 tended to use a narrative description to help students perceive the
connection of concepts within their own contexts.
The different functions of seemly similar components or features of arguments
had impacted students’ decision. For example, students might prefer B3 (algebraic) and
232
C2 (algebraic) since they saw the familiar mathematical results as reliable evidence stated
in the two arguments. However they might have find the other algebraic arguments less
convincing since those familiar results were not present (e.g. Allen). Students may have
recognized perceptual arguments that referenced a familiar scenario as convincing but
considered those that tended to make connection within the context as less convincing
(e.g. Beth). Students might have acknowledged illustrations of images convincing in the
geometry problems but consider visual aid in other problems less reliable (e.g. Amy and
Betty).
Lastly, students viewed the functions of examples differently in the inductive
arguments. For instance, Amy seemed to view examples provided in the geometry
context (i.e. different shapes) as cases that were connected to each other while she
considered different numbers as separated and unrelated cases. Therefore, when she
evaluated inductive arguments that used different numbers as examples, she considered
them unconvincing. She was however, convinced by an inductive argument in the
geometry context. Whether the illustrated examples were viewed as generic examples or
isolated cases also impacted Alice’s and Beth’s conviction. Alice considered D1
(inductive) not convincing since “they only gave two numbers for you to work with” and
“what if the price is higher than 500.” However, she considered A1 (inductive)
convincing since every multiple of 6 that she tried was a multiple of 3 as well. Beth’s
view was just the opposite. She suggested that A1 (inductive) was not convincing since
“just ’cuz she’s tried a lot of them, she hasn’t tried all of them.” However in evaluating
D1 (inductive), she claimed that “I could insert the 200 dollars and the 500 dollars that
233
he’s suggesting is the same thing, and I could see if it was actually right,” suggesting she
had seen properties in the given example which could transfer to other cases.
Summary
In sum, the subjects’ interview responses revealed that the complexity of the
arguments, students’ familiarity with the context used in the arguments, the clarity of the
explanation presented seemed to have impacted the subjects’ evaluation and judgment.
An argument could be difficult or easy, could use familiar or unfamiliar illustration, could
seem clear or unclear, regardless of its type. In addition, students’ perception of the
function of the same component of an argument also varied. They viewed the examples
used in one problem as isolated cases while considered the examples in another problem
as related instances. Therefore, these factors, which didn’t depend on the argument type
but aligned with the subjects’ personal standards for judgment, could have caused
students’ inconsistent evaluations across the problem contexts for arguments that were
categorized as the same type (see Figure 30).
Complexity
Functional
Inconsistent differences of like
Familiarity
evaluations components or
features
Clarity
Figure 30. Factors that caused inconsistent evaluation of the same type of arguments
234
Reflecting on survey results based on findings from the interviews
The argument rankings provided by participants in the interview and the
explanations of their rationale in making such evaluations helped us understand the
causes of the similarity and differences of individual learners’ survey responses.
The distinct rankings provided by the interview participants (see Table 35)
demonstrated that the highly diverse preference among students towards each argument,
as observed from the survey results, was unlikely to be a result of random selection. A
comparison between the interview participants’ ranking and their survey responses (in
judging whether an argument was convincing) revealed consistency for 6 of the 8
participants (except for Alice, who demonstrated a preference toward perceptual
arguments but considered them not convincing in the interview, and Beth, who showed a
preference toward algebraic arguments in the interview and such a tendency was not
observed in her survey responses). As illustrated by the explanation provided in
participants’ interview responses, each learner had his/her own rationales for his/her
decisions and the choices were unlikely to be arbitrarily made in the survey.
Findings from the interviews offered plausible explanations for our previous
conjectures about why certain options received higher ratings in the survey. It was
hypothesized that the participants were more likely to understand an argument when it
showed more details about concrete examples or provided visual support. This conjecture
was supported by the interview results, where the use of example was perceived as the
most frequently referenced evidence to build a reliable argument. Visual illustration was
the preferred representation by 6 of the 8 interviewees. We had also suspected that the
participants were less likely to be completely convinced by checking and verifying a few
235
cases. This conjecture was also supported by some subjects’ interview responses, where
four subjects articulated that showing a few examples was not sufficient to convince them
that a conjecture was always true. In addition, we had speculated that arguments that had
used easier language and offered shorter descriptions were more likely to be preferred by
students. This conjecture was demonstrated by the interview responses of 5 participants,
who considered an easier-to-understand argument convincing. However, the other 3
interviewees preferred the use of more abstract expressions and detailed explanations.
Therefore, students’ preference for the expression of arguments were different among
individuals.
The survey results indicated that students were not consistent in their evaluation
of the same type of argument across the contexts. This phenomenon was also observed in
the interview phase. An examination of the participants’ explanation revealed that the
complexity of the expression and concepts used in the arguments, students’ familiarity
with the context, the clarity of the explanation, and personal perception of specific
elements used in the arguments seemed to have impacted the subjects’ evaluation, which
caused inconsistent ratings across the contexts.
It was also observed that survey responses from students enrolled in higher
performing schools (as assessed by state standardized tests) were not significantly
different from those who enrolled in lower performing schools. This result suggested that
the knowledge and skills that could help students achieve higher scores in standardized
tests may not directly associate with greater maturity in mathematical reasoning. Related
results were also observed in the interview phase. Two participants of the interviews
(Allen and Betty) were enrolled in Honors Algebra I classes and they did demonstrated a
236
familiarity to formulas and fluency of symbolic operations. However, neither one of them
was aware that checking a few examples was not sufficient to prove a conjecture was
always true. They didn’t realize that algebraic expressions could be used to prove the
general validity of a conjecture either. Their personal preferences that didn’t focus on the
rigor of logic of arguments had also impacted their judgment of whether a conjecture was
convincing. Therefore, they apparently didn’t fully understand the purpose of the use of
algebra in mathematics despite of their greater familiarity with symbolic skills. It was
premature to assume Allen and Betty representative of higher achievers in standardized
tests. However, their explanation offered an understanding of the analysis concerning
between school comparison.
237
CHAPTER 5. CONCLUSION
This chapter is dedicated to a discussion of the key findings of the study. First, an
overview of the study is provided. In addition, findings to in respond to the proposed
research questions are summarized. Furthermore, the study’s contribution to the literature
is synthesized. Lastly, an implication for practice and further studies is discussed.
Overview of the Study
The study examined how 8th grade students evaluate arguments in a wide range of
mathematical contexts. The analysis included investigations on the types of mathematical
arguments that students found convincing, exploratory and appealing, common aspects
and features of arguments that impacted students’ evaluation of the arguments, and
problem contexts’ impact on their judgment.
The study involved two phases, a survey and a follow-up interview. Over five
hundred 8th grade students from five Ohio public schools participated the survey study,
where they were provided a variety of arguments in four different mathematical contexts
and were asked to determine which of these arguments were convincing, explanatory and
appealing to them. Eight subjects, whose survey responses were distinct from each other,
were selected to participate the follow-up interviews, where they were asked to explain
their rationale for determining their evaluation of an argument.
238
Both quantitative and qualitative methods were utilized in data analysis.
Statistical data from the survey was used to identify types of mathematical arguments that
students found convincing, exploratory and appealing. Interview data were coded using a
proof classification framework, i.e. CCIA (see Figure 7), to identify the aspects and
features of arguments that impacted students’ evaluation of the arguments.
Summary of the Findings
The findings from both the survey and interview are summarized to address each
of the three research questions.
Q.1. Are there certain types of mathematical arguments that students found convincing,
exploratory and appealing?
This question was explored using 3 different analytical methods. First, we
examined if any type (i.e. algebraic, inductive, perceptual, and visual) of argument was
considered the most convincing, explanatory or appealing option by analyzing the
cumulative data of all responses obtained to all problems. The survey results suggested
that no certain type of arguments received significantly higher ratings than others (see
Table 13). A certain type of argument might have received higher rating in one problem
but was rated low in another problem, and collectively, when combining the results for all
problems, no categorical type stood out as the most convincing, explanatory or appealing.
This result was compatible with findings from the interviews, where no argument
received significantly higher ranking than others in any problem (see Table 46).
239
Second, we examined if there was any argument type received significantly better
ratings than others in each problem context. As mentioned, the interview results didn’t
reveal significant differences in any of the problems. However, the survey results did
indicate that there were some arguments that were considered more convincing,
explanatory, or appealing in certain problem contexts (see Table 11). In particular, A4
(visual) and B3 (algebraic) were considered as the most convincing, explanatory, and
appealing option in their respective problem contexts (the appealing ratings for B3 and
B2 were not significantly different). However, in Problem C and D, no single argument
received significantly higher ratings than all others. This suggested that the participants’
preference of argument type was more uniform in some contexts than the others, such as
in Problems A and B; however their views were more diverse in other situations.
Nevertheless, even in Problems A and B, the lower rated arguments shouldn’t be ignored.
The most appealing arguments, A4 (visual) and B2 (perceptual), were chosen by students
as the closest way to how they would argue by no more than 40% of the participants (39%
for A4 and 28% for B2), while the least appealing options, A3 (perceptual) and B1
(inductive), were chosen by 17% and 20% of the participants. Although the difference
between the most appealing and least appealing options was significant statistically (p
< .05), this does not mean that more than 1/6 of the participants’ preference could be
ignored. Therefore, although some arguments received higher ratings in their respective
problem contexts, we found the lower rated arguments were still convincing, explanatory
and appealing to a considerable proportion of the participants.
Lastly, we examined if there was certain type of arguments that was preferred by
each individual. The interview results suggested that none of the participants offered the
240
highest ranking for the same type of argument in more than 3 of the 5 problems. The
survey results indicated that only 19 participants (4% of the sample) had chosen the same
type of arguments as the appealing options for all 4 problems, and an additional 122
participants (26%) had chosen the same type of arguments as the appealing options for 3
of the 4 problems. Therefore, most participants (70%) didn’t select the same type of
arguments as the appealing option in more than 2 of the 4 problems. This result suggested
that for most individuals, there wasn’t a single argument type that was preferred across
the problem contexts.
Overall, we found that students’ ratings for the same type of arguments were
highly inconsistent across the contexts and among individuals. Every argument was
considered convincing, explanatory and appealing by a substantial proportion of the
survey participants, while in the interview, no argument was ranked significantly lower (p
< .05) than others in any problem.
Q.2. Are there common aspects and features of arguments that significantly impact
students’ judgment of the arguments? If yes, what are they?
Analysis of the interview data focused on responding to this question. Results
were first reported on the thinking patterns of all subjects as a group. In addition, findings
from each individual interview were compared to identify similarities and differences
among the participants’ responses.
The interview responses revealed that among the three aspects of arguments
identified in CCIA (i.e. evidence, representation, and link between evidence and
conclusion), the evidence was the most frequently referenced by the subjects, followed by
241
representation, and the link was the least concerned aspects when they justify their
rankings of the arguments (see Figure 29).
Among all types of evidence, examples (i.e. results from immediate tests) were
referenced most frequently, followed by imaginaries (i.e. mental image created upon or
recalled from previous experience) and facts (i.e. well known mathematical results).
Among all types of representations, visual illustration received the most attention.
It was at times criticized as being confusing or unreliable. Similar situation aroused when
studying the algebraic representation. Some subjects found it concise and convincing,
while others found it confusing and meaningless.
Among all types of links between the evidence and conclusion, induction was
referenced most frequently. However, it also contributed either positively or negatively to
the subjects’ conviction depending on the context and one’s personal preference.
Transformation, although not referenced often, seemed to be uniformly recognized as a
reliable reasoning mode.
As mentioned above, the subjects’ rationale for their judgment of mathematical
arguments shared some common features but differences among the individuals also
existed. The between-subject similarities and differences were systematically studied (see
Table 38). Cross subject comparisons suggested the following.
First, when choosing the reliable source of evidence, the participants found
examples (i.e. results from immediate tests) convincing in most contexts. However, their
view on the use of existing mathematical results and their own imaginaries differed.
Some participants were more likely to be convinced by well known mathematical results
242
(e.g. theorem or formula). Others tended to rely on their own imaginaries and previous
experience when judging a problem.
Second, when examining representation’s impact on the participants’ judgment,
we found the participants were more likely to understand numerical and narrative
arguments than symbolic ones. The participants often found numerical results convincing
except for very rare occasions (e.g. Brenda had difficulties working with numbers in the
probability problem). In addition, most participants had articulated that visual
illustrations positively contributed to their conviction in some contexts. However, they
claimed that images/diagrams that were more difficult to understand (e.g. B4 and D4)
also confused them and hence were not helpful to their conviction. Furthermore, Amy
claimed that visual illustration could be misleading sometimes. Betty claimed that a
visual illustration by itself was not sufficient to convince her and suggested that it should
be accompanied by explanations using narratives, numbers or symbols. In explaining
their view of symbolic representations, 3 of the 8 participants (Allen, Amy and Betty)
found symbolic expressions concise and clear, and others viewed them confusing and not
helpful for their conviction.
Third, when evaluating the link between evidence and conclusion, except for Amy,
no participant had realized that symbolic representation had the potential to prove the
general validity of a conjecture. Even Allen and Betty, who demonstrated well perception
of the meaning of variables in each symbolic argument, were not aware of algebra’s
logical advantage. In fact, with the exception of Amy, the participants were not aware that
the link between evidence and conclusion must show the argument was always true.
Some participants claimed an argument convincing, but were still not sure if the
243
corresponding conjecture was always true (e.g. Alice and Betty). Therefore, deductive
reasoning was not utilized by most of them (except for Amy in her work on non-
geometric contexts). Instead, transformational and perceptual reasoning was widely
adopted based upon trials, experience and imaginaries. The participants’ view toward
induction differed. Half of them articulated that checking a few cases was not enough to
prove a conjecture was always true. However, for these participants, the realization of the
limitation of induction wasn’t present in their responses to every problem. For example,
Beth claimed that trying a few numbers in Problem A was not sufficient to show the
conjecture was always true; however she found checking a few values in Problem E was
adequate to demonstrate the conjecture was always true. Note that the familiarity with
algebraic techniques didn’t necessarily help the participants realize the limitation of
induction. For example, Allen was very confident working with algebra, however he
claimed that he needed to plug in a few values to make sure a formula was correct.
Lastly, the subjects demonstrated various personal standards of what a convincing
argument meant to them (see Table 37). Only Amy insisted that a convincing argument
must show the conjecture was always true. This criteria wasn’t a requirement for
convincing argument in other participants’ view. Instead, five of them found easier-to-
understand arguments more convincing. Two participants were more likely to be
convinced by familiar scenarios evoked by the arguments. Two participant claimed that a
convincing argument should adopt simple and straightforward reasoning process.
Overall, our data suggested that when evaluating mathematical arguments, the
evidence had the largest impact on the subjects’ judgment, followed by the representation,
and the logical link between evidence and conclusion seemed to have the least impact.
244
However whether a certain type of evidence, representation, and link caused positive or
negative impact on one’s conviction depended on the individual’s preference. In addition,
the subjects also had personal standards to determine if an argument was convincing. The
personal standards were found to be associated with various features of arguments,
including the easiness of expression, clarity of language, and reasoning structure.
Q.3. How does problem context impact students’ judgment of arguments?
Analysis of interview data revealed that the complexity of the expression and
concepts, students’ familiarity with the context used in the arguments, and the clarity of
the explanation presented seemed to have impacted the subjects’ evaluation and judgment
(see Figure 30). An argument, regardless of the type in which it was categorized, could be
perceived as difficult or easy, could use familiar or unfamiliar illustrations, and could
seem to offer adequate or insufficient explanation to students (e.g. Relationships
demonstrated by an image could be easy or difficult to understand). This suggested that
the features that students noticed didn’t align with the factors we used to categorize the
arguments. Because of this misalignment, arguments in the same category were evaluated
differently by the participants.
In addition, although certain components of arguments were identified to have the
same function mathematically, students’ perception of the function of these components
varied across the context. For example, some participants had viewed the examples used
in one problem as isolated cases while in another problem as related instances (e.g. Alice
and Beth), which led to their different judgment of inductive reasoning among the
contexts. In another instance, some participants had acknowledged illustrations of images

245
convincing in the geometry problems since those images were themselves the subject of
study. However they considered visual aid in other problems less reliable as a way to
interpret the content (e.g. Amy). These findings suggested that although some
features/factors of arguments were assumed as similar by mathematical standard, they
might seem difficult to students. Since those features/factors were used to categorize the
arguments, students’ inconsistent evaluations for arguments that were categorized as the
same type seemed to be a natural outcome.
Contribution to the Literature
This study advanced the understanding about proof learning from four aspects:
empirical report on results from a large sample, investigation on student thinking pattern,
proof classification framework development, and task design.
First, the study analyzed survey results from 476 eighth grade students who were
enrolled in five Ohio public schools that had demonstrated different levels of
achievement as measured by state standardized tests. Compared to Healy & Hoyles’s
(2000) study, which focused on the performance of high attending 14-15 year old
students, this study chose a sample that was more likely to represent the general eighth
grade student population. Healy and Hoyles found that students excluded algebraic
arguments when they were asked to select an argument that they found convincing and
explanatory. Our results do contrast findings reported in Healy and Hoyles’s study in that
our subjects didn’t show bias against algebraic arguments when making their choices. In
fact, the algebraic argument in each problem context was considered convincing and
explanatory by at least 3/5 of the participants. Algebraic arguments were not the least
246
appealing options in all but Problem D. In addition, the follow-up interviews revealed
that 3 of the 8 participants exhibited a preference toward the use of symbolic expressions.
Certainly, some participants expressed a negative view of algebraic arguments. However,
we didn’t observe that algebraic arguments were less convincing or preferred when
compared to other types of options. The most evident finding was that students’ preferred
argument type was highly inconsistent across content areas and different among
individuals. Hence, it was difficult to conclude whether a certain type of argument was
more likely to be considered convincing, explanatory and appealing by the students. This
result is compatible with findings of the previous literature that the understanding of
proof develops locally (Freudenthal, 1971; Reid, 2011), and hence an overarching
preference of proof type is unlikely to be achieved at early cognitive stages. The finding
of this study addressed that there didn’t seem to be any single approach that solely
facilitated the local development of the proof understanding. As illustrated by our data, at
least half of the students found two or more argument types convincing and explanatory
in the same context, and even the least appealing option was preferred by at least 1/6 of
the sample in each problem. Overall, this current study provided an analysis of empirical
data obtained from a considerable number of students. It demonstrated that students’
preference of argument types was highly diverse among individuals in each of the studied
contexts.
Second, the study documented detailed explanations of 8 students for their
judgment of arguments in 5 different mathematical contexts. Such an investigation
provided insights into the aspects of arguments that impacted eighth grade students’
evaluations. Similarities among the subjects’ comments highlighted patterns in their

247
thinking when assessing and judging arguments. Most prominently, examples used in an
argument received the greatest attention from the subjects and had a major impact on
their judgment. In addition, our analysis revealed that at least half of the subjects had
realized that a conjecture couldn’t be proved to be always true if only a few examples
were tested. However, most of them were not yet aware of the advantage of symbolic
expressions which could represent general cases. Balacheff’s (1988) description of
pragmatic justification (see Figure 1) and Waring’s (2000) proof levels well explained
this phenomenon. According to Balacheff’s theory, the subjects in this study no longer
relied on naive empiricism. Instead, their conviction depended on crucial experiment and
generic examples. According to Waring’s model, these subjects had reached Level 2,
where they still relied on empirical checking but were more careful in choosing examples
to verify with the potential to notice certain patterns in the process. Such an
argumentation mode between induction and deduction was also extensively documented
in Simon’s (1996) study on transformational reasoning.
The interview responses also revealed that the link between evidence and
conclusion was the issue students seemed least concerned with when they were
evaluating the arguments. This finding was compatible with Yang and Lin’s (2008) RCGP
model (see Figure 6). As suggested by RCGP, students would not start examining the link
if they found the evidence unconvincing or the argument was represented in an unreliable
format. Therefore, it was natural that the link was not as frequently referenced. This
phenomenon could also be well explained by the broad maturation of proof structure
model (Tall et al, see Figure 4). According to this model, lower cognitive stages involved
perceptual recognition, verbal description, pictorial or symbolic representation, and

248
definitions. Only at the higher levels, i.e. equivalence, crystalline concepts, and deductive
knowledge structure, would students become able to reflect on the link used in the
argumentation.
Additionally, factors that caused the subjects’ different rankings of the same type
of arguments in different contexts were discussed. Factors that were not context specific,
such as the complexity of the language used in the arguments, students’ familiarity with
the context used in the arguments, the clarity of the explanation presented were identified.
However some context specific factors were also detected. For example, some subjects
were more likely to see the common properties among shapes than between numbers,
which led to a different interpretation of the inductive arguments in the number theory
and geometry contexts. In particular, they viewed examples used in the number theory
problem as isolated cases which couldn’t show the general validity of the conjecture,
while they considered examples used in the geometry problems as generic examples that
demonstrated why the corresponding conjecture was always true. In addition, visual
arguments could serve for different purposes in different contexts. Diagrams and figures
adopted in a visual argument could be used to demonstrate quantity of objects or spatial
relationships. Depending on whether students were able to visualize the quantity or
spatial relationships, their views of the visual arguments could also be different. This was
an important finding since few past studies had specified an explanation for students’
different evaluations of seemly similar type of arguments used in different contexts.
It was also observed that for 5 subjects a convincing argument needed to be easy
to understand. This finding is compatible with Hanna & Jahnke’s (1993) suggestion that
whether an argument was understandable had a greater impact on students’ conviction

249
than its rigor. Results of the current study further provide an elaboration of this point by
offering that arguments that uses easy expression, simple examples, and familiar
concepts/procedures were more likely to be understood by students. In this study, the
participants used other personal standards used to determine whether an argument was
convincing. For example, one of the participant, Allen, believed that a convincing
argument’s reasoning procedure needed to be straightforward. Betty liked an argument to
contain more details. Blake advocated an opinion that countered Betty’s, claiming that a
convincing argument should not provide the complete and detailed procedure but leave
some space for readers to think. Rigor was not as important as other factors when
determining whether an argument was convincing to these students.
Third, the study relied on a novel theoretical framework, i.e. CCIA (see Figure 7),
to classify aspects of proofs and different genres within each aspect for documenting
students’ foci when they evaluated arguments. In order to clarify ambiguities associated
with sources contributing to individuals’ choices, it is important to acknowledge that
neither the representation, source of conviction, nor the link between source and the
conclusion can be identified merely by looking at the text and content of the argument.
Instead, they reside in one’s comprehension of the argument. To classify one’s
understanding of an argument instead of its expression was the most distinct feature of
CCIA as a proof classification model. An argument could appear to be inductive (e.g. D1
and E1); however, when a learner had perceived more general properties through the
examination of one or a few cases, then an seemly inductive argument was actually
treated as a transformational one even though description of the transformation was not
included in text (e.g. the cases of Allen, Amy and Betty). Therefore, to perceive students’
250
evaluation of different argument types, it was important to first understand their
comprehension of the arguments. CCIA took personal interpretation into consideration
and hence was a more accurate model for investigations on students’ thinking.
Lastly, few studies have investigated middle school students’ comprehension and
evaluation of given proofs. This has been, in part, due to the absence of instruments that
support such investigations. In this study, five problems were designed (four were
included in SMR and another was used in the interview) as to enrich the task reservoir
that were appropriate for students who have been introduced to symbolic expression and
proofs in school mathematics. These tasks were embedded in different branches of school
mathematics and provided a variety of problem contexts. The argument types provided in
each problems were aligned with those used in other contexts to assess whether students’
view of mathematical proof were consistently developed across the fields in school
mathematics. The tasks can also be used for older students to examine whether school
interventions have fostered their mathematical reasoning skills.
When comparing the tasks used in this study to existing materials, it was found
that problems used in Harel & Sowder’s (1998) study involved more advanced
mathematical topics since it was designed to be used for college students. In addition,
Harel & Sowder studied the arguments generated by students instead of their judgment of
proposed items. Yang & Lin (2008), Healy & Hoyles (2000), and Stylianides &
Stylianides (2008a) studied students’ evaluation of provided arguments. However, tasks
used in Yang & Lin’s study were restricted in the geometry contexts. Healy & Hoyles’s
questionnaire contained both geometry and number theory tasks, however only the later
were published in that work. Tasks used by Stylianides & Stylianides covered various
251
mathematical contexts, and some didn’t involve complex mathematical concepts and
hence can be used for younger students. However, the interested subjects for their study
were primarily college students. Therefore, the tasks used in this study enriched the task
reservoir in three major aspects: 1) Tasks were specially designed for students who were
first introduced to algebraic expression and geometry proofs; 2) Tasks were embedded in
multiple branches of school mathematics; 3) The type of arguments used in each problem
aligned with those used in other problems, which enabled a between-context comparison.
Limitation of the Study
The first limitation of the study was the unconfirmed effect of multiple
approaches on a learner’s conviction. Indeed the survey and interview data had
demonstrated that individuals’ judgment of arguments was highly diverse. Consequently,
the use of multiple strategies was suggested since any single approach might only
contribute to some students’ conviction. However, it was unclear, merely based on our
data, whether the use of multiple argument types had enhanced a particular individual’s
conviction about the validity of an argument than just adopting one approach that he/she
found most appealing 10. Therefore, studies to verify the actual effect of adopting
argumentation from different aspects are needed.
Second, the participants’ evaluation of one argument could have been altered by
their exposure to multiple arguments presented in each context. When providing an
evaluation, the participants read all four arguments from each problem at the same time.
10
Nonetheless, Allen did indicated that arguments that involved a combination of formula and visual illustration would
be “perfect.”
252
Since their perception of these arguments involved not only information extracted from
them but could also include construction of new mental images, their judgment
subconsciously may have been evoked by this new knowledge. Consequently, their
evaluation of an argument might not be based on information provided by this argument
alone. The possible disturbance between subjects’ evaluation of different arguments need
to be addressed in future studies.
Lastly, although this study had highlighted some aspects and features of
arguments that largely impacted students’ evaluation, the emergent patterns were not
precise enough to allow for a prediction of individuals’ choices when a new problem was
proposed. This wasn’t surprising since we had focused on the impact of the aspects of
arguments, while other personal and contextual factors, such as learners’ background
knowledge, existing classroom experience, and specific features of certain mathematical
topics, were not considered. Therefore, investigations that seek to identify personal and
contextual factors and their impact on one’s judgment of arguments are essential to
unpack students’ rationales more precisely. Realization of this viewpoint led to a
reflection of existing theories in describing children’s proof learning.
Reflection on Existing Theories
In studying the phenomena of children’s proof learning, it is a common practice to
identify phases where they are able to ide ntify, understand, appreciate or produce certain
types of proofs or certain components of arguments and to describe how they develop
through these phases (Tall et al, 2012; Waring, 2000). In order to do so, a classification of
the types of proofs or components of arguments children are capable to identify,

253
understand, appreciate, or produce was needed. The stages described in Yang & Lin’s
(2008) model and Tall et al’s (2012) framework concerned students’ understanding of
different components of arguments (e.g. the evidence, concepts, and links), whereas
Waring (2000), Harel & Sowder (1998), and Simon (1996) built their theories by
classifying different types of arguments (e.g. inductive, deductive, and transformational).
The theoretical framework of the current study, i.e. CCIA, considered students’
understanding of different components of mathematical arguments and used features of
these components to provide a more precise classification of the proof types so that each
argument could be classified based on its representation, source of evidence, and the link
between evidence and conclusion. These models, although different in many ways,
shared some common deficiencies. Most prominently, the types and components of
proofs identified in these theories are not content specific. For example, inductive
arguments, as categorized by Waring, Harel & Sowder, and CCIA, include verifications
by empirical tests in geometry, number theory, probability, and other mathematical areas.
Transformational reasoning, as described by Simon, includes visualizing movements in
geometric contexts as well as perceiving the fixed/changing properties when certain
values shift in number theory contexts. The ability to understand and apply definitions as
identified in Tall et al’s model describes a stage in learning geometric proofs as well as in
working on proofs in abstract algebra. Even Yang & Lin’s model, which was restricted to
the context of geometry, didn’t consider the differences between 2 dimensional and 3
dimensional geometry, or between triangles and circles. Since the designers of these
theories had aimed to develop an overarching understanding of the discipline of
mathematics, they were able to see the connection between two arguments in two
254
different mathematical fields or topics. Consequently, certain arguments were grouped
together as one category since they share some logical structure or possess some other
“macro” properties. However, proof learners, who haven’t yet developed the ability to
compare mathematical arguments across the content areas, might not be able to see the
connection between two arguments that were classified as the same type in
mathematicians’ view but were embedded in different contexts. Instead, their
understanding of an argument was rooted in their understanding of its specific
mathematical topic.
The argument types used in this study were generated based on the researcher’s
understanding of proofs. The results from both the survey and the interviews suggested
that students demonstrated inconsistent views toward the same type of argument in
different contexts. However, this phenomena might be explained differently. That is, the
inconsistency may have in fact existed in the different levels of understanding
represented by arguments that were classified as the same type (e.g. inductive argument
used in the number theory problem might not have offered as much explanation to
students as the inductive argument in the algebra problem did). As pointed out earlier, the
argument type was determined by the standard set by researchers who had a mature
understanding of mathematics and its logical structures. This level of understanding is
certainly different from that of school learners. Therefore, arguments classified in the
same category by researchers might not seem similar to students. As pointed out by
Lakatos (1976), even mathematician’s standard of reliable proofs changes when different
contexts were taken into consideration. For instance, visual illustration was widely used
in geometric proofs; however, they are no longer viewed as reliable when calculus was
255
taken into consideration. Therefore, it was natural for the learners to first develop their
justification skills in local contexts (Reid, 2011). Only when their reasoning skills
reached certain levels in two contexts were they able to identify and compare the
reasoning methods adopted.
With the emphasis on local development of mathematical reasoning ability, the
absence of content specific proof/argument classification model become more critical.
Drawing conclusion regarding the type of understanding students obtained from an
inductive argument might be premature without considering the specific problem context.
This was certainly evidence in the results of this study, where those interviewed clearly
demonstrated different perceptions of the inductive arguments among Problems A, B and
D. The status of visual arguments and their impact on students’ conviction wasn’t
conclusive either, since graphs and diagrams could serve for distinct functions in different
contexts (e.g. in B4, D4, and E2). Currently tools that measure students’ reasoning
maturity and characteristics in specific content areas are absent. Therefore, all features
and categories we constructed and used were based upon the synthesis of what was
known about mathematical reasoning as a generalized method. The theories were not
built upon the features of local content and learners’ understanding of such content. Note
that we didn’t attempted to deny the existence of more general patterns in students’
development of reasoning ability across the content areas. However, we would like to
address that merely identifying these general patterns might not be sufficient to
understand students’ development of disciplinary reasoning skills and as such limited in
the quality of guidance they provided to support curricular instructional designs.
256
Therefore, we call for the need to develop content specific proof/argument classification
and development models.
In addition, we claim the need to take personal factors into consideration in the
development of useful and explanatory theoretical models. The survey and interview
results had both identified great differences among the individuals. The differences did
not only appear in the participants’ judgment of arguments, but also resided in their focus
and rationale when making judgments. The individual differences were impacted by their
existing mathematical and non-mathematical experiences. Personal differences could
certainly influence learners’ mental images and how they perceive and interpret
arguments. The participants had found certain arguments convincing since they were
familiar with the scenarios provided in the arguments (e.g. “football field” in B2). They
had articulated topics and results learnt in class (e.g. “triangle are formula” in C2)
contributed to their conviction. Individuals’ path to reach conviction of a conjecture
might also vary. However, this element has rarely been taken into account from existing
proof understanding and reasoning development models. The focus has been on the
arguments that students were able to produce and the their judgment of certain type of
mathematical statements rather than the sources that may have contributed to their
thinking when making decisions. Therefore, in designing future (content specific) proof
classification and proof skill development models, personal factors may be considered as
key variables. These factors shouldn’t be treated as obstacles that impede the
development of proving skills, but as valuable sources for sense making and construction
of sound arguments.
257
Implication for Proof Teaching
As discussed in previous research, teaching students “the right way” of doing
proofs might help students pass examinations but it might also create a gap between the
work they used to show their teachers and how they would use to convince themselves
(Hanna & Jahnke, 1993; Healy & Hoyles, 2000). Therefore, pressing students to use a
rigorous reasoning format may not actually help them understand the logic embedded in a
mathematical problem. The process of nurturing mathematical reasoning should start with
an understanding of more “natural” ways in which the students argue. Those “natural”
ways are usually mathematically incomplete or at times incorrect, however they may help
learners understand and access the problem and can ultimately influence their judgment.
For example, our study revealed that students’ conviction was strongly impacted by
examination of examples. This idea coincided with the constructivist’s view of using
examples and counter examples to help students understand the construction of
mathematical structures in a heuristic way (Lakatos, 1976). Although using examples to
verify a statement is not a rigorous way to proof the statement is true, it does provide a
concrete context for students to work on and hence to understand the problem better.
The findings of this study highlighted the need to foster students’ proof capacity
in multiple branches of school mathematics. As suggested by existing literature and
supported by the findings of this study, learners’ understanding of proof develops locally
and doesn’t automatically transfer to other fields. Students may appreciate deductive
reasoning in one area, but still find visual illustration and use of examples convincing in
other contexts (e.g. Amy). Since proof ability essentially concerns the relationships
among concepts and properties, it is crucial for students to develop a conceptual

258
understanding of mathematical topics. When reasoning is addressed in different content
areas, there is greater potential for development of a coherent perception of mathematical
structure among learners.
Our findings pronounce the importance of nurturing students’ conviction via
multiple approaches, including the use of various evidence, representations and reasoning
modes. Since arguments that convinced students were highly diverse among individual
learners, any single approach might be appropriate and effective for only a small
proportion of students. Consequently, the use of multiple evidence, representations and
reasoning modes could be essential to help all students access a problem and perceive the
embedded connections or lack thereof. It is important to clarify that fluent use of
symbolic representation and known theorems didn’t guarantee that students understood
the algebraic expression’s general validity. This was evident in Betty’s case, where she
claimed the algebraic argument that adopted the Pythagoras Theorem was convincing and
she clearly explained the meaning of the variables used in the theorem; however she still
believed there existed counter examples to the conjecture. Therefore, cultivation into the
use of symbolic expression didn’t necessarily advance her reasoning ability.
Findings of this study do offer that a need exists for fostering students’
understanding of the standards used to distinguishing a convincing argument in
mathematics. Note that we are not suggesting that students at the introductory level need
to be taught to examine the rigor of each reasoning step in an argument. In fact, we posit
that students should be allowed to use any type of evidence, representation and reasoning
mode to investigate a problem and to convince themselves that a conjecture is always true.
However, since it was found that most of the interviewees didn’t realize that an argument
259
couldn’t be convincing if it didn’t show the conjecture was always true, we argue that
instruction should enable students to acquire an appreciation for such disciplinary
practice. Such an awareness is the foundation for future development of rigor in logic (e.g.
examination of counterexamples and the creation of cognitive conflict (Stylianides &
Stylianides, 2008b)).
Lastly, findings of this study emphasized the need to provide examples for
students to verify the validity of conjectures. Although empirical checking is not
considered a valid mathematical proving process, it does provide students access to the
problem, and provide them opportunities to make and testify conjecture, to seek patterns
and to explore approaches that verify a conjecture. The value of examples has been
addressed by other scholars (e.g. Balacheff, 1988; de Villiers, 2003; Simon, 1996;
Stylianides & Stylianides, 2008a). In this study, examples, as a type of evidence, were
also the most referenced components of arguments that impacted the subjects’ judgment.
Students’ preferred type of representations and reasoning modes might differ; however,
even those who were aware of the limitation of examples considered them helpful for
their understanding of the problem. Therefore, the use of examples (could be in various
representations) were highly recommended by findings of this study as introductory tools
for the instruction of proofs. This implication is compatible with the main stream
approach in this field of research.
260
REFERENCES
Armstrong, A. H. (Ed.) (1970). The Cambridge history of later Greek and early Medieval
philosophy. Cambridge, UK: Cambridge University Press.
Baker, A. (2009). Non-Deductive Methods in Mathematics. In Edward N. Zalta (ed.), The

Stanford Encyclopedia of Philosophy (Fall 2009 Edition). URL =
<http://plato.stanford.edu/archives/fall2009/entries/mathematics-nondeductive/>.
Balacheff, N. (1988). Aspects of proof in pupils’ practice of school mathematics. In D.

Pimm (Ed.), Mathematics, teachers and children (pp. 216-235). London, UK:
Holdder & Stoughton.
Balacheff, N. (1991). The benefit and limits of social interaction: The case of
mathematical proof. In A. Bishop, Mellin-Olsen, E. & van Dormolen, J. (Eds.),
Mathematical knowledge: Its growth through teaching (pp. 175-192). Dordrecht,
Netherlands: Kluwer.
Balaguer, M. (2008). Mathematical Platonism. In B. Gold and Simons, R. A. (Eds.),

Proof and Other Dilemmas: Mathematics and Philosophy (pp. 179-204).
Ball, D. L., & Bass, H. (2003). Making mathematics reasonable in school. In J. Kilpatrick,
Martin, W. G., & Schifter, D. (Eds.), A research companion to principles and
standards for school mathematics (pp. 27-44). Reston, VA: National Council of
Teachers of Mathematics.
Ball, D. L., & Bass, H. (2000). Interweaving content and pedagogy in teaching and
learning to teach: Knowing and using mathematics. In J. Boaler (Ed.), Multiple
perspectives on the teaching and learning of mathematics (pp. 83-104). Westport,
DT: Ablex.
Bell, A.W. (1976). A study of pupils’ proof-explanations in Mathematical situations.

Educational Studies in Mathematics, 7(1-2), 23-40.
Besicovitch, A. (1919). Sur deux questions d’integrabilite des fonctions. Journal of

Society of Physics and Mathematics, 2, 105–123.
Biggs, J., & Collis, K. (1982). Evaluating the quality of learning: the SOLO taxonomy.
New York: Academic Press.
261
Bloom, B. S. (1984). Taxonomy of educational objectives: Book I cognitive domain (2nd
edition). Boston, MA: Addison Wesley Publishing Company.
Boero, P. (Ed.) (2007). Theorems in school: From history, epistemology and cognition to
classroom practice. Rotterdam, Netherland: Sense Publisher.
Brabiner, J. V. (2009). Why Proof? Some Lessons from the History of Mathematics. In
Fou-Lai Lin, Hsieh, F., Hanna, G. & de Villiers, M. (Eds.), Proceedings of the
ICMI Study 19 conference: Proof and proving in mathematics education (Vol. 1,
pp. 12). Taipei, Taiwan: National Taiwan Normal University.
Brouwer, L. E. J. (1905/1996). Life, art and mysticism. Notre Dame Journal of Formal
Logic, 37(3), 389-429.
Brown, J. R. (2008). Philosophy of mathematics: A contemporary introduction to the

world of proofs and pictures (2nd ed.). Routledge: New York.
Bruner, K. (1987). The perception of man and the conception of society: Two approaches
to understand society. Economic Inquiry, 15(3), 367-388.
Bruner, J. S. (1966). Towards a Theory of Instruction. Cambridge, MA: Harvard

University Press.
Burger, W. F. & Shaughnessy, J. M. (1986). Characterizing the van Hiele levels of

development in geometry. Journal for Research in Mathematics Education, 17(1),
31-48.
Carnap, R. (1937). The logical syntax of language. London, UK: K. Paul Trench.
Chazan, D. (1993). High school geometry students’ justification for their views of
empirical evidence and mathematical proof. Educational Studies in Mathematics,
24(4), 359-387.
Chazan, D., & Lueke, H. M. (2009). Relationships between disciplinary knowledge and
school mathematics: Implications for understanding the place of reasoning and
proof in school mathematics. In D. A. Stylianou, Blanton, M. L., & Knuth E. J.
(Eds.), Teaching and Learning Proof Across the Grades: AK-16 Perspective (pp.
21-39). New York: Routledge.
Cohen, J. (1988). Statistical power analysis for the behavioral sciences (2nd ed.). New
Jersey: Lawrence Erlbaum.
Council of Chief State School Officers (2010). Common Core State Standards
(Mathematics). National Governors Association Center for Best Practices.
Washington, D. C.
262
Clements, D. H. & Battista, M. T. (1992). Geometry and spatial reasoning. In D. Grouws,
(Ed.), Handbook of Research on Mathematics Teaching and Learning (pp. 420-
464). New York: NCTM/Macmillan.
Creswell, J. W. & Plano Clark, V. L. (2011). Designing and conducting mixed methods
research. Thousand Oaks, CA: Sage Publications, Inc.
Davis, P. J. (1976). The nature of proof. In M. Carss (Ed.), Proceedings of the fifth
international congress on mathematical education. Boston, MA: Birkhauser.
de Villiers, M. (2012). An illustration of the explanatory and discovery function of proof.

In Proceedings of the 12th International Congress on Mathematics Education (pp.
de Villiers, M. (2003). Rethinking proof with the Geometer’s Sketchpad. Emeryville, CA:
Key Curriculum Press.
de Villiers, M. (1998). An alternative approach to proof in dynamic geometry. In R.

Lehrer & D. Chazan (Eds.), New directions in teaching and learning geometry (pp.
369-415). Lawrence Erlbaum.
de Villiers, M. (1990). The role and function of proof in mathematics. Pythagoras, 24,
17–24.
Dreyfus, T. (2006). Linking theories in mathematics education. In A. Simpson (Ed.),

Retirement as process and concept: A festschrift for Eddie Gray and David Tall
(pp. 77-82). Prague, Czec Republic: Karlova Univerzita v Praze.
Dreyfus, T. (1999). Why Johnny can’t prove. Educational Studies in Mathematics,

38(1/3), 85-109.
Ernest, P. (1996). New angles on old rules. Times Higher Educational Supplement. Times
Supplements Ltd.
Fawcett, H. P. (1995). The nature of proof. Thirteenth Yearbook of the NCTM. New York:
Teachers College, Columbia University. (Original work published 1938).
Fischbein, E. (1982). Intuition and proof. For the Learning of Mathematics 3(2), 9–24.
Freudenthal, H. (1971). Geometry between the devil and the deep sea. Educational
Studies in Mathematics, 3(3-4), 413-435.
Freudenthal, H. (1973). Mathematics as an educational task. Dordrecht: Reidel.
Grabiner, J. V. (2012). Why proof? A historian’s perspective. In G. Hanna & M. de

Villiers (Eds), Proof and Proving in Mathematics Education (The 19th ICMI
Study, New ICMI Study Series, Vol. 15), (pp. 147-167). Dordrecht: Springer.
263
Greene, J. C., Caracelli, V. J., & Graham, W. F. (1989). Toward a conceptual framework
for mixed-method evaluation designs. Educational Evaluation and Policy
Analysis, 11(3), 255-274.
Gödel, K. (1931). Über formal unentscheidbare Sätze der Principia Mathematica und
verwandter Systeme. Monatshefte für Mathematik und Physik, 38, 173–98.
González, G., & Herbst, P. (2006). Competing arguments for the geometry course: Why
were American high school students to study geometry in the twentieth century?
International Journal for the History of Mathematics Education, 1(1), 7-33.
Hanna, G. (1983). Rigorous proof in mathematics education. Toronto, CA: OISE Press.
Hanna, G. (2000a). A critical examination of three factors in the decline of proof.

Interchange, 31(1), 21-33.
Hanna, G. (2000b). Proof, explanation and exploration: An overview. Educational

Studies in Mathematics, 44, 5-23.
Hanna, G., & Jahnke, H. N. (Eds.). (1993). Aspects of proof [Special issue]. Educational
Studies in Mathematics, 24(4).
Harel, G., & Sowder, L. (1998). Students’ proof schemes: Results from exploratory
studies. In A. H. Schoenfeld, Kaput, J., & Dubinsky, E. (Eds.), Research in
Collegiate Mathematics Education III (pp. 234-283). American Mathematical
Society.
Harel, G., & Sowder, L. (2007). Toward comprehensive perspectives on the learning and
teaching of proof. In F. Lester (Ed.), Second handbook of research in mathematics
teaching and learning (pp. 805-842). Charlotte, NC: Information Age Publishing
Healy, L., & Hoyles, C. (2000). A study of proof conceptions in algebra. Journal for
Research in Mathematics Education, 31(4), 396–428.
Heinze, A., & Reiss, K. (2009). Developing argumentation and proof competencies in the
mathematics classroom. In D. A. Stylianou, Blanton, M. L., & Knuth E. J. (Eds.),
Teaching and Learning Proof Across the Grades: AK-16 Perspective (pp. 191-
203). New York: Routledge.
Herbst, P., & Brach, C. (2006). Proving and doing proofs in high school geometry classes:
What is it that is going on for students? Cognition and Instruction, 24(1), 73–122.
Hersh, R. (2009). What I would like my students to already know about proof. In D. A.
Stylianou, Blanton, M. L., & Knuth E. J. (Eds.), Teaching and Learning Proof
Across the Grades: AK-16 Perspective (pp. 17-20). New York: Routledge.
264
Hilbert, D., & Bernays, P. (1934/1939). Grundlagen der Mathematik I and II, first
editions. Berlin, Germany: Verlag Julius Springer.
Hoyles, C. (1997). The curricular shaping of students’ approaches to proof. For the
Learning of Mathematics, 17(1),7-16.
IAS/PCMI (2007). International Seminar: Bridging policy and practice in the context of
reasoning and proof. Institute for Advanced Study / Park City Mathematics
Institute. Princeton, NJ.
< http://mathforum.org/pcmi/int2007.html>
Inglis, M., & Alcock, L. (2012). Expert and novice approaches to reading mathematical
proofs. Journal of Research in Mathematics Education, 43(4), 358-390.
Jaffe, A. & Quinn, F. (1993). Theoretical mathematics: Toward a cultural synthesis of

mathematics and theoretical physics. Bulletin of the American Mathematics
Society, 29, 1-13.
Johnson, P. E. (1972). A History of Set Theory. Prindle, Weber & Schmidt
Johnson, R. B. & Onwuegbuzie, A. J. (2004). Mixed-methods research: a research

paradigm whose time has come. Educational Researcher, 33(7), 14-26.
Lakatos, I. (1976). Proofs and refutations: The logic of mathematical discovery.

Cambridge, UK: Cambridge University Press.
Lampert, M. (1992). Practices and problems in teaching authentic mathematics. In F. K.

Oser, A. Dick, & J. Patry (Eds.), Effective and responsible teaching: The new
synthesis (pp. 295–314). San Francisco, CA: Jossey-Bass Publishers.
Liu, Y. & Manouchehri, A. (2012). Nurturing high school students’ understanding of

proof as a convincing way of reasoning: Results from an exploratory study. In
Proceedings of the 12th International Congress on Mathematics Education (pp.
Longo, G. (2009). Theorems as constructive visions. In Fou-Lai Lin, Hsieh, F., Hanna, G.
& de Villiers, M. (Eds.), Proceedings of the ICMI Study 19 conference: Proof and
proving in mathematics education (Vol. 1, pp. 13-25). Taipei, Taiwan: National
Taiwan Normal University.
Kakeya, S (1917). Some problems on maximum and minimum regarding ovals. Tohoku
Science Reports, 6, 71–88.
Kieren, T., & Pirie, S. (1991). Recursion and the mathematical experience. In L. Steffe
(Ed.), The Epistemology of Mathematical Experience (pp. 78-101). New York:
Springer Verlag Psychology Series.
265
Knuth, E. J., Choppin, J. M., & Bieda, K. N. (2009). Middle school students’ production
of mathematical justifications. In D. A. Stylianou, Blanton, M. L., & Knuth, E. J.
(Eds.), Teaching and Learning Proof Across the Grades: AK-16 Perspective (pp.
153-170). New York: Routledge.
Krantz, S. G. (2007). The history and concept of mathematical proof.

http://www.math.wustl.edu/~sk/eolss.pdf
Kuchemann, D., & Hoyles, C. (2009). From empirical to structural reasoning in

mathematics: Tracking changes over time. In D. A. Stylianou, Blanton, M. L., &
Knuth E. J. (Eds.), Teaching and Learning Proof Across the Grades: AK-16
Perspective (pp. 171-190). New York: Routledge.
Martin, L. C. (2008). Folding back and the dynamical growth of mathematical

understanding: Elaborating the Pirie-Kieren Theory. Journal of Mathematical
Behavior, 27(1), 64-85.
Mason, J. (2009). Mathematics education: Theory, practice and memories over 50 years.
In S. Lerman and B. Davis (Eds.), Mathematical action & structures of noticing:
Studies on John Mason’s contribution to mathematics education (pp. 1-14).
Rotterdam: Sense Publisher.
Marrades, R., & Gutiérrez, A. (2000). Proofs produced by secondary school students
learning geometry in a dynamic computer environment. Educational Studies in
Mathematics, 44 (1&2), 87-125.
Mejia-Ramos, J. P., & Inglis, M. (2009). Argumentative and proving activities in

mathematics education research. In F.-L. Lin, F.-J. Hsieh, G. Hanna, and M. de
Villiers (Eds.), Proceedings of the ICMI Study 19 conference: Proof and Proving
in Mathematics Education (Vol. 2, pp. 88-93), Taipei, Taiwan.
Mejia-Ramos, J. P., Fuller, E., Weber, E.; Rhoads, K. & Samkoff, A. (2012). An
assessment model for proof comprehension in undergraduate mathematics.
Educational Studies in Mathematics, 79(1), 3-18.
McConaughy, S. H., & Achenbach, T. M. (2001). Manual for the Semistructured Clinical
Interview for Children and Adolescents (2nd ed.). Burlington, VT: University of
Vermont, Research Center for Children, Youth, and Families.
National Council of Teachers of Mathematics (2000). Principles and standards for school
mathematics. Reston, VA: NCTM.
Onwuegbuzie, A. J., & Leech, N. J. (2006). Linking research question to mixed methods
data analysis procedures. The Qualitative Report, 11(3), 2006, 474-498.
266
Pal, J. (1920). Ueber ein elementares variationsproblem. Kongelige Danske
Videnskabernes Selskab Math.-Fys, Medd. 2, 1–35.
Pegg, J., & Davey, G. (1998). Interpreting student understanding in geometry: A synthesis
of two models. In R. Lehrer & D. Chazan (Eds.), Designing learning
environments for developing understanding of geometry and space (pp. 109–133).
Mahwah, NJ: Lawrence Erlbaum Associates.
Piaget, J. (1987). The role of necessity in cognitive development. Minneapolis, MN:

University of Minnesota Press.
Piaget, J. (1985). The Equilibration of Cognitive Structures: The Central Problem of

Intellectual Development (T. Brown & K. J. Thampy, Trans.). Chicago, IL:
University of Chicago Press.
Piaget, J. (1928). Judgment and reasoning in the child. New York, NY: Harcourt, Brace,
and Co.
Pirie, S. (1988). Understanding – instrumental, relational, formal, intuitive … How can

we know? For the Learning of Mathematics, 8(3), 2-6.
Pirie, S., & Kieren, T. (1992). Creating constructivist environments and constructing
creative mathematics. Educational Studies in Mathematics, 23(5), 505-528.
Polya, G. (1954). Mathematics and plausible reasoning: Induction and analogy in

mathematics, Vol 1. Princeton, NJ: Princeton University Press.
Pruss, A. R. (2006). The Principle of Sufficient Reason: A Reassessment. Cambridge, UK:

Cambridge University Press.
Recio, A. M., & Godino, J. D. (2001). Institutional and personal meanings of

mathematical proof. Educational Studies in Mathematics, 48(1), 83-99.
Reid, D. A. (2011). Understanding proof and transforming teaching. In L. R. Wiest, &

Lamberg, T. (Eds.), Proceedings of the 33rd Annual Meeting of the North
American Chapter of the International Group for the Psychology of Mathematics
Education (pp. 15-30). Reno, NV: University of Nevada, Reno.
Reid, D. A. (2002). Conjectures and refutations in grade 5 mathematics. Journal for

Research in Mathematics Education, 33(1), 5–29.
Russell, B. (1903). Principles of mathematics. Cambridge, UK: Cambridge University

Press.
Schoenfeld, A. H. (Ed.) (1994). Mathematical thinking and problem solving. Hillsdale,

NJ: Erlbaum.
267
Schoenfeld, A. H. (1991). On mathematics as sense-making: An informal attack on the
unfortunate divorce of formal and informal mathematics. In J. Voss, D. N. Perkins,
& J. Segal (Eds.), Informal reasoning and education (pp. 311-343). Hillsdale, NJ:
Erlbaum.
Schoenfeld, A. H. (1988). When good teaching leads to bad results: The disasters of
“well-taught” mathematics courses. Educational Psychologist, 23(2), 145–166.
Sekiguchi, Y. (1991). An investigation on proofs and refutations in the mathematics

classroom. Unpublished doctoral dissertation, University of Georgia, Athens.
Selden, A., & Selden, J. (2003). Validations of proofs considered as texts: Can
undergraduates tell whether an argument proves a theorem? Journal for Research
in Mathematics Education, 34(1), 4–36.
Senk, S. L. (1985). How well do students write geometry proofs? Mathematics Teacher,
78(6), 448-456.
Shaughnessy, J. M. (1992). Research in probability and statistics: Reflections and

directions. In D.A. Grouws (Ed.), Handbook of Research on Mathematics
Teaching and Learning (pp. 465-494). Reston, VA: National Council of Teachers
of Mathematics.
Shulman, L. S. (1986). Those who understand: Knowledge growth in teaching.

Educational Researcher, 15(2), 4-14.
Simon, M. A. (1996). Beyond inductive and deductive reasoning: the search for a sense
of knowing. Educational Studies in Mathematics, 90(2), 197-210.
Stylianides, A. J. (2007). Proof and proving in school mathematics. Journal for Research
in Mathematics Education, 38(3), 289-321.
Stylianides, G. J., & Stylianides, A. J. (2008a). Proof in school mathematics: Insights

from psychological research into students’ ability for deductive reasoning.
Mathematical thinking and learning, 10(2), 103-133.
Stylianides, G. J., & Stylianides, A. J. (2008b). Enhancing undergraduate students’

understanding of proof. In Electronic Proceedings of the 11th Conference on
Research in Undergraduate Mathematics Education
(http://sigmaa.maa.org/rume/crume2008/Proceedings/Stylianides&Stylianides_L
ONG(21).pdf), San Diego, CA.
Tarricone, P. (2011). The taxonomy of metacognition. New York: Psychology Press.
Tall, D. (2009). The development of mathematical thinking: problem-solving and proof.

In Celebration of the academic life and inspiration of John Mason.
268
Tall, D. (2005). The transition from embodied thought experiment and symbolic
manipulation to formal proof. In M. Bulmer, H. MacGillivray & C. Varsavsky
(Eds.), Proceedings of Kingfisher Delta’05, Fifth Southern Hemisphere
Symposium on Undergraduate Mathematics and Statistics Teaching and Learning
(pp. 23-35). Fraser Island, Australia.
Tall, D. (2002). Differing modes of proof and belief in mathematics. In F.-L. Lin (Ed.),
International Conference on Mathematics: Understanding Proving and Proving to
Understand (pp. 91–107). National Taiwan Normal University, Taipei, Taiwan.
Tall, D. (1999). The cognitive development of proof: Is mathematical proof for all or for
some? In Z. Usiskin (Ed.), Developments in School Mathematics Education
Around the World (Vol. 4, pp. 117–136). Reston, Virginia: NCTM.
Tall, D. (1991). To prove or not to prove. Mathematics Review, 1(3), 29-32.
Tall, D. et al. (2012). Cognitive development of proof. In G. Hanna, & de Villiers, M.

(Eds.), Proof and proving in mathematics education (pp. 13-50). New York:
Springer.
Teddlie, C. & Tashakkori, A. (2009). Foundations of mixed methods research: Integrating

quantitative and qualitative approaches in the social and behavioral sciences. Los
Angeles, CA: Sage Publications, Inc.
Thurston, W. P. (1995). On proof and progress in mathematics. For the Learning of

Mathematics, 15(1), 29-37.
Troelstra, A. S. ( 1977). Choice sequences. Oxford Logic Guides.
van Hiele, P.M. (1986). Structure and insight: A theory of mathematics education. New
York: Academic Press.
von Glasersfeld, E. (1994).A radical constructivist view of basic mathematical concepts.

In Ernest, Paul (Ed.), Constructing mathematical knowledge: Epistemology and
mathematics education (Studies in mathematics education Vol. 4, pp.5-7).
Abingdon, Oxon: Routledge.
Vygotsky, L. S. (1978). Mind by society: The development of higher psychological

process. Cambridge, MA: Harvard University Press.
Usiskin, Z. (1980). What should not be in the algebra and geometry curricula of average
college-bound students? Mathematics Teacher, 73(6), 413-424.
Usiskin, Z. (1987). Resolving the continuing dilemmas in school geometry. In M. M.

Lindquist and A. P. Shulte (Eds .), Learning and Teaching Geometry, K-12, 1987
Yearbook (pp. 17-31). Reston, VA: National Council of Teachers of Mathematics.
269
Waring, S. (2000). Can you prove it? Developing concepts of proof in primary and
secondary schools. Leicester, UK: The Mathematical Association.
Weir, A. (2011). Formalism in the philosophy of mathematics. In E. N. Zalta (Ed.), The

Stanford Encyclopedia of Philosophy. URL =
<http://plato.stanford.edu/archives/fall2011/entries/formalism-mathematics/>
Weber, K. (2004). Traditional instruction in advanced mathematics courses: A case study

of one professor’s lectures and proofs in an introductory real analysis course.
Journal of Mathematical Behavior, 23(2), 115–133.
Weber, K. (2001). Student difficulty in constructing proofs: The need for strategic
knowledge. Educational Studies in Mathematics, 48(1), 101-119.
Yin, R. K. (2009). Case study research: Design and methods (Fourth Edition). Thousand
Oaks, CA: SAGE Publications.
Yang, K., & Lin, F. (2008). A model of reading comprehension of geometry proof.
Educational Studies in Mathematics, 67(1), 59-76.
Zack, V. (1997). “You have to prove us wrong”: Proof at the elementary school level. In
E. Pehkonen (Ed.), Proceedings of the 21st Conference of the International Group
for the Psychology of Mathematics Education (Vol. 4, pp. 291-298). Lahti:
University of Helsinki.
270
APPENDIX A. SURVEY RESULTS: PAIRWISE COMPARISONS OF
ARGUMENTS IN EACH PROBLEM
271
(I) (J) Mean Difference Std. Error Sig.b 95% Confidence Interval for
Argum Argum (I-J) Differenceb
ent ent Lower Bound Upper Bound
A1 A2 .174* .037 .000 .102 .246
A3 .111* .036 .002 .041 .182
A4 .057 .036 .115 -.014 .127
*
A2 A1 -.174 .037 .000 -.246 -.102
A3 -.063 .041 .121 -.143 .017
*
A4 -.118 .042 .005 -.200 -.036
*
A3 A1 -.111 .036 .002 -.182 -.041
A2 .063 .041 .121 -.017 .143
A4 -.055 .041 .187 -.136 .027
A4 A1 -.057 .036 .115 -.127 .014
A2 .118* .042 .005 .036 .200
A3 .055 .041 .187 -.027 .136
B1 B2 -.118* .042 .005 -.200 -.035
*
B3 .137 .047 .004 .044 .229
B4 .183* .048 .000 .089 .276
*
B2 B1 .118 .042 .005 .035 .200
*
B3 .254 .045 .000 .166 .343
*
B4 .300 .046 .000 .210 .391
*
B3 B1 -.137 .047 .004 -.229 -.044
*
B2 -.254 .045 .000 -.343 -.166
B4 .046 .046 .311 -.043 .136
*
B4 B1 -.183 .048 .000 -.276 -.089
*
B2 -.300 .046 .000 -.391 -.210
B3 -.046 .046 .311 -.136 .043
Continued
Table 39. Pairwise comparisons: Participants’ ratings on whether the arguments in each
problem were understandable
272
Table 39 continued
C1 C2 .011 .044 .811 -.076 .097
C3 -.080 .044 .073 -.167 .007
C4 -.086 .044 .053 -.173 .001
C2 C1 -.011 .044 .811 -.097 .076
*
C3 -.090 .046 .048 -.180 -.001
*
C4 -.097 .044 .028 -.183 -.010
C3 C1 .080 .044 .073 -.007 .167
*
C2 .090 .046 .048 .001 .180
C4 -.006 .044 .887 -.093 .081
C4 C1 .086 .044 .053 -.001 .173
*
C2 .097 .044 .028 .010 .183
C3 .006 .044 .887 -.081 .093
*
D1 D2 .237 .041 .000 .156 .319
D3 .021 .043 .624 -.063 .105
*
D4 .208 .044 .000 .121 .295
*
D2 D1 -.237 .041 .000 -.319 -.156
*
D3 -.216 .045 .000 -.305 -.128
D4 -.029 .044 .506 -.116 .057
D3 D1 -.021 .043 .624 -.105 .063
*
D2 .216 .045 .000 .128 .305
*
D4 .187 .044 .000 .100 .273
*
D4 D1 -.208 .044 .000 -.295 -.121
D2 .029 .044 .506 -.057 .116
*
D3 -.187 .044 .000 -.273 -.100
Based on estimated marginal means
*. The mean difference is significant at the .05 level.
b. Adjustment for multiple comparisons: Least Significant Difference (equivalent to no adjustments).
273
A1 A2 -.318* .073 .000 -.462 -.173
A3 -.270* .075 .000 -.418 -.123
*
A4 -.474 .075 .000 -.622 -.326
*
A2 A1 .318 .073 .000 .173 .462
A3 .047 .068 .487 -.087 .182
*
A4 -.156 .063 .014 -.281 -.032
*
A3 A1 .270 .075 .000 .123 .418
A2 -.047 .068 .487 -.182 .087
*
A4 -.204 .064 .002 -.330 -.078
*
A4 A1 .474 .075 .000 .326 .622
A2 .156* .063 .014 .032 .281
*
A3 .204 .064 .002 .078 .330
B1 B2 -.154 .094 .103 -.339 .032
*
B3 -.685 .090 .000 -.862 -.508
B4 -.510* .086 .000 -.681 -.340
B2 B1 .154 .094 .103 -.032 .339
*
B3 -.531 .089 .000 -.708 -.355
*
B4 -.357 .094 .000 -.543 -.170
*
B3 B1 .685 .090 .000 .508 .862
*
B2 .531 .089 .000 .355 .708
*
B4 .175 .074 .020 .028 .322
*
B4 B1 .510 .086 .000 .340 .681
*
B2 .357 .094 .000 .170 .543
B3 -.175* .074 .020 -.322 -.028
Continued
arguments in each problem were convincing
274
Table 40 continued
C1 C2 -.172* .071 .017 -.313 -.031
C3 .019 .082 .815 -.142 .180
C4 -.096 .072 .184 -.237 .046
*
C2 C1 .172 .071 .017 .031 .313
*
C3 .191 .077 .014 .039 .343
C4 .076 .067 .258 -.057 .209
C3 C1 -.019 .082 .815 -.180 .142
*
C2 -.191 .077 .014 -.343 -.039
C4 -.115 .077 .137 -.266 .037
C4 C1 .096 .072 .184 -.046 .237
C2 -.076 .067 .258 -.209 .057
C3 .115 .077 .137 -.037 .266
D1 D2 .074 .084 .376 -.091 .240
D3 .034 .079 .671 -.123 .191
D4 -.047 .087 .588 -.219 .125
D2 D1 -.074 .084 .376 -.240 .091
D3 -.041 .084 .630 -.207 .126
D4 -.122 .081 .137 -.282 .039
D3 D1 -.034 .079 .671 -.191 .123
D2 .041 .084 .630 -.126 .207
D4 -.081 .085 .341 -.249 .087
D4 D1 .047 .087 .588 -.125 .219
D2 .122 .081 .137 -.039 .282
D3 .081 .085 .341 -.087 .249
275
A1 A2 -.076 .069 .273 -.212 .060
A3 -.076 .066 .252 -.206 .054
*
A4 -.242 .059 .000 -.359 -.124
A2 A1 .076 .069 .273 -.060 .212
A3 .000 .068 1.000 -.133 .133
*
A4 -.166 .057 .004 -.279 -.053
A3 A1 .076 .066 .252 -.054 .206
A2 .000 .068 1.000 -.133 .133
*
A4 -.166 .057 .004 -.279 -.053
*
A4 A1 .242 .059 .000 .124 .359
A2 .166* .057 .004 .053 .279
*
A3 .166 .057 .004 .053 .279
B1 B2 -.056 .096 .559 -.245 .133
*
B3 -.329 .091 .000 -.508 -.149
B4 -.168 .091 .069 -.349 .013
B2 B1 .056 .096 .559 -.133 .245
*
B3 -.273 .089 .003 -.448 -.097
B4 -.112 .087 .198 -.283 .059
*
B3 B1 .329 .091 .000 .149 .508
*
B2 .273 .089 .003 .097 .448
*
B4 .161 .075 .035 .012 .310
B4 B1 .168 .091 .069 -.013 .349
B2 .112 .087 .198 -.059 .283
B3 -.161* .075 .035 -.310 -.012
arguments in each problem were explanatory
276
Table 41 continued
C1 C2 -.076 .078 .329 -.231 .078
C3 .045 .070 .523 -.093 .182
C4 .025 .069 .712 -.110 .161
C2 C1 .076 .078 .329 -.078 .231
C3 .121 .078 .122 -.033 .275
C4 .102 .074 .171 -.044 .248
C3 C1 -.045 .070 .523 -.182 .093
C2 -.121 .078 .122 -.275 .033
C4 -.019 .065 .769 -.147 .109
C4 C1 -.025 .069 .712 -.161 .110
C2 -.102 .074 .171 -.248 .044
C3 .019 .065 .769 -.109 .147
*
D1 D2 .176 .070 .014 .037 .315
D3 .020 .071 .777 -.121 .161
D4 -.014 .068 .844 -.149 .122
*
D2 D1 -.176 .070 .014 -.315 -.037
*
D3 -.155 .078 .047 -.309 -.002
*
D4 -.189 .075 .013 -.338 -.041
D3 D1 -.020 .071 .777 -.161 .121
*
D2 .155 .078 .047 .002 .309
D4 -.034 .065 .606 -.163 .095
D4 D1 .014 .068 .844 -.122 .149
D2 .189* .075 .013 .041 .338
D3 .034 .065 .606 -.095 .163
277
A1 A2 .002 .030 .944 -.056 .061
A3 .046 .028 .101 -.009 .102
*
A4 -.176 .035 .000 -.245 -.108
A2 A1 -.002 .030 .944 -.061 .056
A3 .044 .028 .117 -.011 .099
*
A4 -.179 .035 .000 -.246 -.111
A3 A1 -.046 .028 .101 -.102 .009
A2 -.044 .028 .117 -.099 .011
*
A4 -.223 .033 .000 -.287 -.159
*
A4 A1 .176 .035 .000 .108 .245
*
A2 .179 .035 .000 .111 .246
*
A3 .223 .033 .000 .159 .287
*
B1 B2 -.086 .032 .007 -.148 -.024
*
B3 -.074 .031 .019 -.135 -.012
B4 -.015 .030 .618 -.073 .043
*
B2 B1 .086 .032 .007 .024 .148
B3 .013 .034 .713 -.055 .080
*
B4 .071 .032 .027 .008 .135
*
B3 B1 .074 .031 .019 .012 .135
B2 -.013 .034 .713 -.080 .055
B4 .059 .032 .066 -.004 .122
B4 B1 .015 .030 .618 -.043 .073
*
B2 -.071 .032 .027 -.135 -.008
B3 -.059 .032 .066 -.122 .004
Continued
arguments in each problem were appealing
278
Table 42 continued
C1 C2 .013 .033 .704 -.052 .078
C3 .021 .033 .523 -.044 .086
*
C4 .069 .031 .026 .008 .130
C2 C1 -.013 .033 .704 -.078 .052
C3 .008 .032 .796 -.055 .072
C4 .057 .031 .066 -.004 .117
C3 C1 -.021 .033 .523 -.086 .044
C2 -.008 .032 .796 -.072 .055
C4 .048 .030 .113 -.012 .108
*
C4 C1 -.069 .031 .026 -.130 -.008
C2 -.057 .031 .066 -.117 .004
C3 -.048 .030 .113 -.108 .012
*
D1 D2 .124 .032 .000 .062 .186
D3 .038 .035 .277 -.031 .106
*
D4 .082 .033 .014 .017 .147
*
D2 D1 -.124 .032 .000 -.186 -.062
*
D3 -.086 .031 .005 -.146 -.026
D4 -.042 .029 .151 -.099 .015
D3 D1 -.038 .035 .277 -.106 .031
*
D2 .086 .031 .005 .026 .146
D4 .044 .032 .171 -.019 .107
*
D4 D1 -.082 .033 .014 -.147 -.017
D2 .042 .029 .151 -.015 .099
D3 -.044 .032 .171 -.107 .019
b. Adjustment for multiple comparisons: Least Significant Difference (equivalent to no
adjustments).
279
APPENDIX B. SURVEY RESULTS: COMPARISON BETWEEN SUBGROUPS
OF STUDENTS
280
Dependent (I) School (J) School Mean Std. Error Sig.b 95% Confidence
Variable Difference Interval for Differenceb
(I-J)
Lower Upper
Bound Bound
A1.1 Group H Group L 0.01 0.056 0.853 -0.099 0.119
A1.2 Group H Group L 0.029 0.091 0.755 -0.151 0.208
A1.3 Group H Group L -0.068 0.078 0.383 -0.22 0.085
A2.1 Group H Group L -0.125 0.075 0.097 -0.273 0.023
A2.2 Group H Group L -0.096 0.082 0.244 -0.258 0.066
A2.3 Group H Group L -0.075 0.07 0.281 -0.213 0.062
A3.1 Group H Group L -0.016 0.072 0.825 -0.158 0.126
A3.2 Group H Group L 0.06 0.08 0.457 -0.098 0.218
A3.3 Group H Group L 0.038 0.07 0.59 -0.1 0.176
A4.1 Group H Group L -0.048 0.067 0.469 -0.179 0.083
A4.2 Group H Group L 0.079 0.077 0.308 -0.073 0.23
A4.3 Group H Group L 0.045 0.065 0.488 -0.082 0.172
A5.1 Group H Group L -0.004 0.044 0.93 -0.091 0.083
A5.2 Group H Group L -0.055 0.045 0.22 -0.144 0.033
A5.3 Group H Group L 0.01 0.04 0.8 -0.069 0.089
A5.4 Group H Group L 0.037 0.053 0.481 -0.067 0.141
B1.1 Group H Group L 0.119 0.081 0.143 -0.04 0.278
B1.2 Group H Group L -0.064 0.086 0.455 -0.234 0.105
B1.3 Group H Group L 0.029 0.082 0.722 -0.132 0.19
B2.1 Group H Group L 0.107 0.07 0.129 -0.031 0.245
B2.2 Group H Group L 0.053 0.086 0.54 -0.117 0.222
B2.3 Group H Group L 0.067 0.08 0.401 -0.09 0.224
B3.1 Group H Group L -0.052 0.086 0.548 -0.22 0.117
B3.2 Group H Group L -0.001 0.07 0.987 -0.139 0.137
B3.3 Group H Group L -0.042 0.066 0.526 -0.172 0.088
B4.1 Group H Group L 0.007 0.088 0.934 -0.165 0.18
B4.2 Group H Group L 0.047 0.071 0.504 -0.092 0.186
B4.3 Group H Group L 0.094 0.07 0.179 -0.043 0.232
B5.1 Group H Group L -0.058 0.043 0.176 -0.143 0.026
B5.2 Group H Group L 0.057 0.049 0.25 -0.04 0.154
B5.3 Group H Group L -0.069 0.048 0.148 -0.163 0.025
B5.4 Group H Group L 0.077 0.045 0.084 -0.01 0.165
Continued
Table 43. Survey results: Between school comparison
281
Table 43 continued
Dependent (I) School (J) School Mean Std. Error Sig.b 95% Confidence
b
Variable Difference Interval for Difference
(I-J)
Lower Upper
Bound Bound
C1.1 Group H Group L 0.012 0.081 0.88 -0.147 0.171
C1.2 Group H Group L 0.054 0.079 0.495 -0.101 0.208
C1.3 Group H Group L 0.079 0.07 0.262 -0.059 0.217
C2.1 Group H Group L 0.037 0.082 0.654 -0.125 0.198
C2.2 Group H Group L 0.012 0.073 0.873 -0.132 0.155
C2.3 Group H Group L 0.124 0.069 0.074 -0.012 0.26
C3.1 Group H Group L 0.125 0.079 0.114 -0.03 0.281
C3.2 Group H Group L 0.062 0.081 0.445 -0.098 0.222
C3.3 Group H Group L 0.076 0.073 0.3 -0.068 0.219
C4.1 Group H Group L 0.036 0.077 0.646 -0.117 0.188
C4.2 Group H Group L 0.047 0.076 0.538 -0.103 0.197
C4.3 Group H Group L 0.049 0.071 0.494 -0.091 0.189
C5.1 Group H Group L -0.044 0.048 0.366 -0.139 0.051
C5.2 Group H Group L -0.009 0.047 0.843 -0.102 0.084
C5.3 Group H Group L .103* 0.047 0.028 0.011 0.195
C5.4 Group H Group L -0.022 0.042 0.604 -0.105 0.061
D1.1 Group H Group L -0.053 0.075 0.479 -0.201 0.095
D1.2 Group H Group L -.172* 0.08 0.031 -0.329 -0.016
D1.3 Group H Group L 0.032 0.073 0.664 -0.111 0.175
D2.1 Group H Group L 0.076 0.088 0.39 -0.097 0.249
D2.2 Group H Group L 0.037 0.072 0.612 -0.105 0.179
D2.3 Group H Group L -0.001 0.069 0.99 -0.137 0.135
D3.1 Group H Group L 0.13 0.077 0.094 -0.022 0.281
D3.2 Group H Group L -0.004 0.078 0.956 -0.158 0.149
D3.3 Group H Group L 0.044 0.074 0.557 -0.102 0.189
D4.1 Group H Group L 0.013 0.085 0.882 -0.154 0.179
D4.2 Group H Group L .150* 0.075 0.045 0.003 0.297
D4.3 Group H Group L -0.042 0.068 0.542 -0.176 0.093
D5.1 Group H Group L -0.015 0.05 0.766 -0.113 0.083
D5.2 Group H Group L -0.036 0.042 0.39 -0.118 0.046
D5.3 Group H Group L -0.052 0.048 0.279 -0.147 0.042
D5.4 Group H Group L .093* 0.046 0.041 0.004 0.183
282
Dependent (I) (J) Mean Std. Error Sig.b 95% Confidence
Variable Gender Gender Difference Interval for Differenceb
(I-J)
Lower Upper
Bound Bound
A1.1 Female Male .064 .048 .190 -.032 .159
A1.2 Female Male -.082 .077 .290 -.233 .070
A1.3 Female Male .061 .067 .357 -.069 .192
*
A2.1 Female Male -.192 .064 .003 -.318 -.067
A2.2 Female Male -.121 .069 .080 -.257 .014
A2.3 Female Male -.106 .060 .076 -.224 .011
A3.1 Female Male -.087 .061 .152 -.206 .032
A3.2 Female Male -.054 .069 .435 -.189 .082
A3.3 Female Male -.009 .060 .882 -.127 .110
A4.1 Female Male -.076 .059 .197 -.192 .040
A4.2 Female Male -.009 .065 .895 -.137 .120
A4.3 Female Male .000 .056 .998 -.109 .109
A5.1 Female Male .040 .038 .293 -.035 .114
A5.2 Female Male -.076* .037 .043 -.149 -.002
A5.3 Female Male -.027 .034 .431 -.095 .041
A5.4 Female Male .085 .045 .060 -.004 .174
B1.1 Female Male -.054 .068 .425 -.188 .079
B1.2 Female Male -.076 .073 .299 -.220 .068
B1.3 Female Male .007 .069 .921 -.129 .143
B2.1 Female Male -.071 .060 .233 -.188 .046
B2.2 Female Male -.041 .073 .576 -.185 .103
B2.3 Female Male .046 .068 .500 -.088 .180
B3.1 Female Male -.088 .073 .226 -.231 .055
B3.2 Female Male -.055 .060 .356 -.172 .062
B3.3 Female Male -.032 .057 .573 -.144 .080
B4.1 Female Male -.057 .074 .443 -.202 .089
B4.2 Female Male -.004 .060 .943 -.123 .115
B4.3 Female Male -.045 .060 .453 -.164 .073
B5.1 Female Male .015 .037 .694 -.058 .088
B5.2 Female Male -.040 .042 .342 -.122 .042
B5.3 Female Male .064 .041 .124 -.017 .145
B5.4 Female Male -.033 .038 .385 -.108 .042
continued
Table 44. Survey results: Between gender comparison
283
Table 44 continued
Dependent (I) (J) Mean Std. Error Sig.b 95% Confidence
b
Variable Gender Gender Difference Interval for Difference
(I-J)
Lower Upper
Bound Bound
C1.1 Female Male -.082 .069 .236 -.218 .054
C1.2 Female Male -.100 .066 .128 -.229 .029
C1.3 Female Male .020 .060 .740 -.097 .137
C2.1 Female Male -.090 .071 .203 -.229 .049
C2.2 Female Male .001 .061 .993 -.119 .120
C2.3 Female Male .033 .059 .577 -.082 .148
C3.1 Female Male -.050 .068 .465 -.184 .084
C3.2 Female Male -.053 .069 .442 -.188 .082
C3.3 Female Male .050 .063 .427 -.073 .173
C4.1 Female Male -.050 .066 .449 -.181 .080
C4.2 Female Male .017 .064 .796 -.110 .143
C4.3 Female Male .031 .060 .602 -.087 .150
C5.1 Female Male .064 .041 .124 -.017 .145
C5.2 Female Male -.022 .040 .593 -.101 .058
C5.3 Female Male -.039 .040 .336 -.117 .040
C5.4 Female Male -.002 .037 .948 -.075 .070
D1.1 Female Male .047 .064 .462 -.079 .174
D1.2 Female Male .022 .067 .739 -.110 .154
D1.3 Female Male .095 .061 .117 -.024 .215
D2.1 Female Male -.133 .074 .071 -.279 .012
D2.2 Female Male -.037 .061 .543 -.158 .083
D2.3 Female Male .025 .059 .672 -.091 .140
D3.1 Female Male .014 .066 .836 -.116 .143
D3.2 Female Male -.024 .067 .716 -.156 .107
D3.3 Female Male -.023 .063 .712 -.148 .101
D4.1 Female Male -.113 .072 .116 -.254 .028
D4.2 Female Male -.086 .063 .173 -.209 .038
D4.3 Female Male -.033 .057 .564 -.146 .080
D5.1 Female Male .058 .043 .176 -.026 .142
D5.2 Female Male .041 .036 .253 -.029 .111
D5.3 Female Male -.014 .041 .743 -.094 .067
D5.4 Female Male -.072 .039 .063 -.148 .004
284
Source Dependent Type III Sum df Mean F Sig. Partial Eta Observed
Variable of Squares Square Squared Powerbm
Gender * A1.1 .192 1 .192 .727 .394 .002 .136
School A1.2 .364 1 .364 .512 .475 .001 .110
A1.3 .621 1 .621 1.197 .275 .003 .194
A2.1 8.728 1 8.728 19.357 .000* .045 .992
A2.2 .107 1 .107 .186 .666 .000 .072
A2.3 .669 1 .669 1.631 .202 .004 .247
A3.1 .226 1 .226 .506 .477 .001 .109
A3.2 .087 1 .087 .158 .692 .000 .068
A3.3 .041 1 .041 .097 .755 .000 .061
A4.1 .131 1 .131 .342 .559 .001 .090
A4.2 .115 1 .115 .227 .634 .001 .076
A4.3 .540 1 .540 1.497 .222 .004 .231
A5.1 .012 1 .012 .073 .788 .000 .058
A5.2 .079 1 .079 .474 .491 .001 .106
A5.3 .487 1 .487 3.540 .061 .008 .467
A5.4 .078 1 .078 .330 .566 .001 .088
B1.1 .304 1 .304 .545 .461 .001 .114
B1.2 .537 1 .537 .853 .356 .002 .152
B1.3 .002 1 .002 .004 .950 .000 .050
B2.1 .232 1 .232 .547 .460 .001 .114
B2.2 .845 1 .845 1.333 .249 .003 .210
B2.3 .667 1 .667 1.235 .267 .003 .198
B3.1 1.331 1 1.331 2.130 .145 .005 .308
B3.2 .065 1 .065 .155 .694 .000 .068
B3.3 .245 1 .245 .650 .421 .002 .127
B4.1 .446 1 .446 .681 .410 .002 .131
B4.2 .063 1 .063 .146 .703 .000 .067
B4.3 .103 1 .103 .242 .623 .001 .078
B5.1 .099 1 .099 .627 .429 .002 .124
B5.2 .246 1 .246 1.190 .276 .003 .193
B5.3 .064 1 .064 .330 .566 .001 .088
B5.4 .031 1 .031 .181 .671 .000 .071
Continued
Table 45. The gender * school effect
285
Table 45 continued
Source Dependent Type III Sum df Mean F Sig. Partial Eta Observed
bm
Variable of Squares Square Squared Power
Gender *
C1.1 .167 1 .167 .302 .583 .001 .085
School
C1.2 .109 1 .109 .207 .649 .000 .074
C1.3 .617 1 .617 1.456 .228 .003 .226
C2.1 .134 1 .134 .232 .630 .001 .077
C2.2 .276 1 .276 .605 .437 .001 .121
C2.3 .051 1 .051 .123 .726 .000 .064
C3.1 .122 1 .122 .225 .635 .001 .076
C3.2 .236 1 .236 .420 .517 .001 .099
C3.3 .049 1 .049 .106 .745 .000 .062
C4.1 1.691 1 1.691 3.327 .069 .008 .444
C4.2 .440 1 .440 .888 .347 .002 .156
C4.3 .008 1 .008 .017 .895 .000 .052
C5.1 .169 1 .169 .845 .359 .002 .150
C5.2 .007 1 .007 .036 .850 .000 .054
C5.3 .030 1 .030 .160 .689 .000 .068
C5.4 .050 1 .050 .330 .566 .001 .088
D1.1 .380 1 .380 .785 .376 .002 .143
D1.2 1.210 1 1.210 2.270 .133 .005 .324
D1.3 .027 1 .027 .060 .807 .000 .057
D2.1 2.543 1 2.543 3.891 .049* .009 .503
D2.2 .149 1 .149 .332 .565 .001 .089
D2.3 .012 1 .012 .028 .867 .000 .053
D3.1 1.215 1 1.215 2.377 .124 .006 .337
D3.2 .543 1 .543 1.041 .308 .003 .175
D3.3 .495 1 .495 1.050 .306 .003 .176
D4.1 3.509 1 3.509 5.801 .016* .014 .671
D4.2 2.508 1 2.508 5.417 .020* .013 .641
D4.3 .438 1 .438 1.094 .296 .003 .181
D5.1 .181 1 .181 .849 .357 .002 .151
D5.2 .024 1 .024 .164 .686 .000 .069
D5.3 .000 1 .000 .001 .969 .000 .050
D5.4 .088 1 .088 .497 .481 .001 .108
286
APPENDIX C. INTERVIEW RESULTS: PAIRWISE COMPARISON OF THE
RANKINGS OF ARGUMENTS IN EACH PROBLEM
287
(I) (J) Mean Std. Error Sig.a 95% Confidence Interval for
Argum Argum Difference (I-J) Differencea
A1 A2 1.125 .639 .122 -.386 2.636
A3 .125 .666 .857 -1.451 1.701
A4 .750 .675 .303 -.846 2.346
A2 A1 -1.125 .639 .122 -2.636 .386
A3 -1.000 .500 .086 -2.182 .182
A4 -.375 .653 .584 -1.919 1.169
A3 A1 -.125 .666 .857 -1.701 1.451
A2 1.000 .500 .086 -.182 2.182
A4 .625 .625 .351 -.853 2.103
A4 A1 -.750 .675 .303 -2.346 .846
A2 .375 .653 .584 -1.169 1.919
A3 -.625 .625 .351 -2.103 .853
B1 B2 .125 .693 .862 -1.513 1.763
B3 -.125 .666 .857 -1.701 1.451
B4 .000 .598 1.000 -1.413 1.413
B2 B1 -.125 .693 .862 -1.763 1.513
B3 -.250 .881 .785 -2.334 1.834
B4 -.125 .766 .875 -1.937 1.687
B3 B1 .125 .666 .857 -1.451 1.701
B2 .250 .881 .785 -1.834 2.334
B4 .125 .441 .785 -.917 1.167
B4 B1 .000 .598 1.000 -1.413 1.413
B2 .125 .766 .875 -1.687 1.937
B3 -.125 .441 .785 -1.167 .917
continued
Table 46. Pairwise comparison of the rankings of arguments in each problem
288
Table 46 continued
a
(I) (J) Mean Std. Error Sig. 95% Confidence Interval for
a
Argum Argum Difference (I-J) Difference
C1 C2 .000 .707 1.000 -1.672 1.672
C3 -.125 .581 .836 -1.498 1.248
C4 -.375 .653 .584 -1.919 1.169
C2 C1 .000 .707 1.000 -1.672 1.672
C3 -.125 .666 .857 -1.701 1.451
C4 -.375 .680 .598 -1.982 1.232
C3 C1 .125 .581 .836 -1.248 1.498
C2 .125 .666 .857 -1.451 1.701
C4 -.250 .796 .763 -2.133 1.633
C4 C1 .375 .653 .584 -1.169 1.919
C2 .375 .680 .598 -1.232 1.982
C3 .250 .796 .763 -1.633 2.133
D1 D2 -.625 .532 .279 -1.884 .634
D3 -.750 .648 .285 -2.282 .782
D4 -.625 .730 .420 -2.352 1.102
D2 D1 .625 .532 .279 -.634 1.884
D3 -.125 .666 .857 -1.701 1.451
D4 .000 .598 1.000 -1.413 1.413
D3 D1 .750 .648 .285 -.782 2.282
D2 .125 .666 .857 -1.451 1.701
D4 .125 .789 .879 -1.741 1.991
D4 D1 .625 .730 .420 -1.102 2.352
D2 .000 .598 1.000 -1.413 1.413
D3 -.125 .789 .879 -1.991 1.741
continued
289
Table 46 continued
a
(I) (J) Mean Std. Error Sig. 95% Confidence Interval for
a
Argum Argum Difference (I-J) Difference
E1 E2 -.375 .730 .623 -2.102 1.352
E3 -1.000 .627 .155 -2.482 .482
E4 -.625 .653 .370 -2.169 .919
E2 E1 .375 .730 .623 -1.352 2.102
E3 -.625 .532 .279 -1.884 .634
E4 -.250 .726 .741 -1.966 1.466
E3 E1 1.000 .627 .155 -.482 2.482
E2 .625 .532 .279 -.634 1.884
E4 .375 .625 .567 -1.103 1.853
E4 E1 .625 .653 .370 -.919 2.169
E2 .250 .726 .741 -1.466 1.966
E3 -.375 .625 .567 -1.853 1.103
a. Adjustment for multiple comparisons: Least Significant Difference (equivalent to no
adjustments).
290

Bahan Tugas 3.3.2

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Bahan Tugas 3.3.2

Uploaded by

Copyright:

Available Formats

Aspects of Mathematical Arguments that Influence Eighth Grade Students’ Judgment of

in the Graduate School of The Ohio State University

Yating Liu, M.A.

Graduate Program in Education

The Ohio State University

Azita Manouchehri, Advisor

range of mathematical contexts. The analysis included investigations on the types of

mathematical arguments that students found convincing, exploratory and appealing,

arguments, and problem contexts’ impact on their judgment.

their rationale for determining their evaluation of an argument.

Both quantitative and qualitative methods were utilized in data analysis.

impacted students’ evaluation of the arguments.

evidence to support a convincing argument. Students’ preferred representation and

ability to show the general validity of a conjecture as a requirement for convincing

Dedicated to my family and friends

Studying mathematics education in the U.S. has been a transformative experience.

the following words.

I am incredibly grateful to my advisor, Professor Azita Manouchehri. Dr.

Manouchehri first introduced me to the Young Scholars Program, which sparked my

time into my growth as a mathematics educator, including the development of my

truly helped me grow as a professional in the field.

on mathematics education from the perspective of a university mathematics professor and

for guiding me in obtaining my master’s degree in mathematics. I thank Dr. Brosnan,

students who participated in the study.

and understanding, helped me preserve my physical and psychological well-being

June 2012 M.A., Education, The Ohio State University

March 2011 M.S., Mathematics, The Ohio State University

July 2008 B.S., Mathematics, Peking University, P. R. China

Major Field: Education

List of Tables .................................................................................................................xii

List of Figures ............................................................................................................... xv

Chapter 1. Introduction .................................................................................................... 1

The Role of Proof in Mathematics ........................................................................... 2

The Debates and Status of Proof Learning............................................................... 3

Educational Research about Proof Learning ............................................................ 7

Pilot Study Findings ................................................................................................ 9

Purpose of the Study ............................................................................................. 12

Overview of Research Methodology ..................................................................... 12

Significance of the Study ...................................................................................... 14

Chapter 2. Literature Review ......................................................................................... 15

The Nature of Mathematical Proof: A Philosophical Account ................................ 15

The Functions of Proof in the Study of Mathematics ............................................. 26

Existing Theories of Proof Learning...................................................................... 36

Chapter 3. Methodology ................................................................................................ 57

Mixed Method Designs ......................................................................................... 57

Procedure of the Study .......................................................................................... 59

Survey Participants ............................................................................................... 61

Survey Instrument: Survey of Mathematical Reasoning ........................................ 62

Interview Participants ........................................................................................... 84

Interview Procedure .............................................................................................. 85

Data Analysis ........................................................................................................ 91

Chapter 4. Results........................................................................................................ 115

Findings from SMR ............................................................................................ 115

Findings from the Interviews .............................................................................. 150

Chapter 5. Conclusion ................................................................................................. 238

Overview of the Study ........................................................................................ 238

Summary of the Findings .................................................................................... 239

Contribution to the Literature .............................................................................. 246

Limitation of the Study ....................................................................................... 252

Reflection on Existing Theories .......................................................................... 253

Implication for Proof Teaching ............................................................................ 258

References ................................................................................................................... 261

Appendix B. Survey results: Comparison between subgroups of students .................... 280

problem ................................................................................................................ 287